Re: Indexing related record with TS2 (WAS: Re: [ts] TS 2 Indexing Linked Model "skipping non-plain index")

Pat Allan Tue, 01 Apr 2014 17:55:29 -0700

Yeah, I really don't think that line is doing anything at all. Also: if the 
model where the index is defined is Title, then what you've specified with that 
line is "Please include the association 'content' from the model returned by 
the association 'title', from the model where this index is defined. Thus, it's 
actually looking for an association 'title' in the Title model.


On 2 Apr 2014, at 11:49 am, Walter Lee Davis <[email protected]> wrote:

> It's a little confusing, I know. The titles table also has a field named 
> title in it. Here's a bit of the schema:
> 
>  create_table "contents", :force => true do |t|
>    t.integer  "title_id"
>    t.text     "plain",       :limit => 2147483647
>    t.datetime "created_at"
>    t.datetime "updated_at"
>  end
> 
>  create_table "titles", :force => true do |t|
>    t.string   "title"
>    t.string   "list_title"
>    t.string   "sort_title"
>    t.boolean  "publish"
>    t.integer  "file_id"
>    t.integer  "copyright_id"
>    t.text     "teaser"
>    t.text     "bibliography"
>    t.integer  "publication_date"
>    t.integer  "edition_date"
>    t.string   "publication_place"
>    t.string   "publisher"
>    t.integer  "lf_store_id"
>    t.string   "title_type"
>    t.integer  "set",               :default => 0
>    t.datetime "created_at"
>    t.datetime "updated_at"
>    t.string   "slug"
>    t.integer  "role_id"
>    t.string   "xml_filename"
>    t.string   "author_name"
>    t.text     "toc"
>  end
> 
> I didn't see any improvement in the indexing behavior until I added that join 
> statement, but as I said, I did several things at once, so I may just be 
> cargo-culting a little here.
> 
> Walter
> 
> On Apr 1, 2014, at 7:31 PM, Pat Allan wrote:
> 
>> Hi Walter
>> 
>> Great to know things are working now.
>> 
>> Just to clarify one thing: `join title.content` is not required - the join 
>> method is for specifying associations that are only used via custom SQL 
>> snippets (instead of the usual association/column chained methods).
>> 
>> But given you've specified that - is title an association, or a column? 
>> Because your first field indicates it's a column, but then you've used it 
>> with the join method as an association.
>> 
>> -- 
>> Pat
>> 
>> On 2 Apr 2014, at 12:54 am, Walter Lee Davis <[email protected]> wrote:
>> 
>>> Thanks, you've been great rubber ducks!
>>> 
>>> Just after I posted this, I tried "one more thing". (Actually, it was two 
>>> or more things at once -- bad practice, I know.) As I hinted, I changed the 
>>> relationship from has_many to has_one :content, and left the join in place, 
>>> and changed the name of the content.plain from :content to :plain_text.
>>> 
>>> define_index do
>>>  set_property :group_concat_max_len => 10.megabytes
>>> 
>>>  indexes :title, :sortable => true
>>>  indexes teaser
>>>  join title.content
>>>  indexes content.plain, :as => :plain_text
>>>  indexes author_name, :sortable => true
>>>  has created_at, updated_at
>>>  where sanitize_sql(["publish", true])
>>> end
>>> 
>>> 
>>> Then in my search method, I removed a dangling :include => :contents, 
>>> parameter. Somewhere between these changes, the indexing process started 
>>> making sense (about 1GB of index, as predicted) and the search started 
>>> working -- and it started working faster, too. 
>>> 
>>> As my old creative director used to say of me, "Even a blind squirrel finds 
>>> a nut or two."
>>> 
>>> Thanks for all you do,
>>> 
>>> Walter
>>> 
>>> On Apr 1, 2014, at 9:23 AM, Walter Lee Davis wrote:
>>> 
>>>> Here's the info from the production.sphinx.conf, including the SQL:
>>>> 
>>>> source title_core_0
>>>> {
>>>> type = mysql
>>>> sql_host = 10.12.29.11
>>>> sql_user = user
>>>> sql_pass = pass
>>>> sql_db = oll2
>>>> sql_query_pre = SET SESSION group_concat_max_len = 10485760
>>>> sql_query_pre = SET NAMES utf8
>>>> sql_query_pre = SET TIME_ZONE = '+0:00'
>>>> sql_query = SELECT SQL_NO_CACHE `titles`.`id` * CAST(4 AS SIGNED) + 3 AS 
>>>> `id` , `titles`.`title` AS `title`, `titles`.`teaser` AS `teaser`, 
>>>> GROUP_CONCAT(DISTINCT IFNULL(`contents`.`plain`, '0') SEPARATOR ' ') AS 
>>>> `content`, `titles`.`author_name` AS `author_name`, `titles`.`id` AS 
>>>> `sphinx_internal_id`, 0 AS `sphinx_deleted`, 3942078319 AS `class_crc`, 
>>>> IFNULL('Title', '') AS `sphinx_internal_class`, IFNULL(`titles`.`title`, 
>>>> '') AS `title_sort`, IFNULL(`titles`.`author_name`, '') AS 
>>>> `author_name_sort`, UNIX_TIMESTAMP(`titles`.`created_at`) AS `created_at`, 
>>>> UNIX_TIMESTAMP(`titles`.`updated_at`) AS `updated_at` FROM `titles` LEFT 
>>>> OUTER JOIN `contents` ON `contents`.`title_id` = `titles`.`id` WHERE 
>>>> (`titles`.`id` >= $start AND `titles`.`id` <= $end AND publish) GROUP BY 
>>>> `titles`.`id` ORDER BY NULL
>>>> sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) FROM 
>>>> `titles` 
>>>> sql_attr_uint = sphinx_internal_id
>>>> sql_attr_uint = sphinx_deleted
>>>> sql_attr_uint = class_crc
>>>> sql_attr_timestamp = created_at
>>>> sql_attr_timestamp = updated_at
>>>> sql_attr_string = sphinx_internal_class
>>>> sql_attr_string = title_sort
>>>> sql_attr_string = author_name_sort
>>>> sql_query_info = SELECT * FROM `titles` WHERE `id` = (($id - 3) / 4)
>>>> }
>>>> 
>>>> index title_core
>>>> {
>>>> source = title_core_0
>>>> path = /data/www/db/sphinx/production/title_core
>>>> charset_type = utf-8
>>>> }
>>>> 
>>>> index title
>>>> {
>>>> type = distributed
>>>> local = title_core
>>>> }
>>>> 
>>>> This is after I added an explicit line to the define_index statement:
>>>> 
>>>> join title.contents
>>>> 
>>>> There was no difference in terms of the index size that I noted earlier, 
>>>> and I'm not astute enough to tell you if the join is correct in the 
>>>> generated SQL. Here's the entirety of the define_index method:
>>>> 
>>>> define_index do
>>>> set_property :group_concat_max_len => 10.megabytes
>>>> 
>>>> indexes :title, :sortable => true
>>>> indexes teaser
>>>> join title.contents
>>>> indexes contents.plain, :as => :content
>>>> indexes author_name, :sortable => true
>>>> has created_at, updated_at
>>>> where sanitize_sql(["publish", true])
>>>> end
>>>> 
>>>> Do you see anything odd about this? Should I change the relationship to 
>>>> contents from has_many to has_one? (The has_many was a relic from an 
>>>> earlier structure where I was splitting the plain text on an arbitrary 
>>>> length. This was before I found a way to increase the column width in my 
>>>> database to a size guaranteed to hold my largest title.)
>>>> 
>>>> Thanks again for your tireless support!
>>>> 
>>>> Walter
>>>> 
>>>> On Mar 22, 2014, at 7:49 PM, Pat Allan wrote:
>>>> 
>>>>> Your points on the expected size are spot on... the syntax you're using 
>>>>> (association_name.column_name) should ensure the SQL join is made 
>>>>> automatically, but you can confirm that by looking at the sql_query value 
>>>>> for the title_core index and see whether it's joining on contents, and 
>>>>> how the plain column is being used.
>>>>> 
>>>>> Is there anything odd/wrong in that SQL statement?
>>>>> 
>>>>> -- 
>>>>> Pat
>>>>> 
>>>>> On 23 Mar 2014, at 8:25 am, Walter Lee Davis <[email protected]> wrote:
>>>>> 
>>>>>> I've tried both, and when I don't separately index the contents table, I 
>>>>>> don't get any hits on a fairly unique string that I know is in the 
>>>>>> contents.plain field when I search on titles. The indexing process makes 
>>>>>> it clear that this relation is not being touched -- when I index the 
>>>>>> contents table, the result is this:
>>>>>> 
>>>>>> collected 1099 docs, 913.5 MB
>>>>>> 
>>>>>> When I index the titles (including the related field in contents) I only 
>>>>>> get this:
>>>>>> 
>>>>>> collected 1603 docs, 24.6 MB
>>>>>> 
>>>>>> contents.plain is the only large part of contents, it's just title_id 
>>>>>> and timestamps besides the plain column. If contents.plain was being 
>>>>>> accessed as a related column (and thus added to the metadata from the 
>>>>>> titles table), I would expect the result from the title indexing process 
>>>>>> to be in the 1GB neighborhood. That would make sense to me, given the 
>>>>>> amount of data being indexed.
>>>>>> 
>>>>>> What I have done as a fallback for now is added an index to the contents 
>>>>>> table, and rigged up my search results page to display those results as 
>>>>>> if they were to the title. But I'd still like to be searching once 
>>>>>> across all titles, and finding hits whether they were against the 
>>>>>> metadata (in the titles table) or the content (in the contents table). 
>>>>>> Is there some join syntax I need to use here? Remember -- this is Sphinx 
>>>>>> 2, not 3.
>>>>>> 
>>>>>> Thanks so much for your excellent support!
>>>>>> 
>>>>>> Walter
>>>>>> 
>>>>>> On Mar 21, 2014, at 11:04 PM, Pat Allan wrote:
>>>>>> 
>>>>>>> * The indexes line you're using will work fine - also, it'll work with 
>>>>>>> the arguments being shifted to method calls:
>>>>>>> 
>>>>>>> indexes contents.plain, :as => :content
>>>>>>> 
>>>>>>> If there's still issues, do get in touch.
>>>>>> 
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Thinking Sphinx" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>> an email to [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>> Visit this group at http://groups.google.com/group/thinking-sphinx.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>> 
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "Thinking Sphinx" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>>> email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at http://groups.google.com/group/thinking-sphinx.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>> 
>>>> -- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "Thinking Sphinx" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/thinking-sphinx.
>>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Thinking Sphinx" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/thinking-sphinx.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Thinking Sphinx" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/thinking-sphinx.
>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/thinking-sphinx.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/d/optout.

Re: Indexing related record with TS2 (WAS: Re: [ts] TS 2 Indexing Linked Model "skipping non-plain index")

Reply via email to