Re: Indexing related record with TS2 (WAS: Re: [ts] TS 2 Indexing Linked Model "skipping non-plain index")

Pat Allan Tue, 01 Apr 2014 16:32:21 -0700

Hi Walter

Great to know things are working now.


Just to clarify one thing: `join title.content` is not required - the join 
method is for specifying associations that are only used via custom SQL 
snippets (instead of the usual association/column chained methods).

But given you've specified that - is title an association, or a column? Because 
your first field indicates it's a column, but then you've used it with the join 
method as an association.

-- 
Pat

On 2 Apr 2014, at 12:54 am, Walter Lee Davis <[email protected]> wrote:

> Thanks, you've been great rubber ducks!
> 
> Just after I posted this, I tried "one more thing". (Actually, it was two or 
> more things at once -- bad practice, I know.) As I hinted, I changed the 
> relationship from has_many to has_one :content, and left the join in place, 
> and changed the name of the content.plain from :content to :plain_text.
> 
>  define_index do
>    set_property :group_concat_max_len => 10.megabytes
> 
>    indexes :title, :sortable => true
>    indexes teaser
>    join title.content
>    indexes content.plain, :as => :plain_text
>    indexes author_name, :sortable => true
>    has created_at, updated_at
>    where sanitize_sql(["publish", true])
>  end
> 
> 
> Then in my search method, I removed a dangling :include => :contents, 
> parameter. Somewhere between these changes, the indexing process started 
> making sense (about 1GB of index, as predicted) and the search started 
> working -- and it started working faster, too. 
> 
> As my old creative director used to say of me, "Even a blind squirrel finds a 
> nut or two."
> 
> Thanks for all you do,
> 
> Walter
> 
> On Apr 1, 2014, at 9:23 AM, Walter Lee Davis wrote:
> 
>> Here's the info from the production.sphinx.conf, including the SQL:
>> 
>> source title_core_0
>> {
>> type = mysql
>> sql_host = 10.12.29.11
>> sql_user = user
>> sql_pass = pass
>> sql_db = oll2
>> sql_query_pre = SET SESSION group_concat_max_len = 10485760
>> sql_query_pre = SET NAMES utf8
>> sql_query_pre = SET TIME_ZONE = '+0:00'
>> sql_query = SELECT SQL_NO_CACHE `titles`.`id` * CAST(4 AS SIGNED) + 3 AS 
>> `id` , `titles`.`title` AS `title`, `titles`.`teaser` AS `teaser`, 
>> GROUP_CONCAT(DISTINCT IFNULL(`contents`.`plain`, '0') SEPARATOR ' ') AS 
>> `content`, `titles`.`author_name` AS `author_name`, `titles`.`id` AS 
>> `sphinx_internal_id`, 0 AS `sphinx_deleted`, 3942078319 AS `class_crc`, 
>> IFNULL('Title', '') AS `sphinx_internal_class`, IFNULL(`titles`.`title`, '') 
>> AS `title_sort`, IFNULL(`titles`.`author_name`, '') AS `author_name_sort`, 
>> UNIX_TIMESTAMP(`titles`.`created_at`) AS `created_at`, 
>> UNIX_TIMESTAMP(`titles`.`updated_at`) AS `updated_at` FROM `titles` LEFT 
>> OUTER JOIN `contents` ON `contents`.`title_id` = `titles`.`id` WHERE 
>> (`titles`.`id` >= $start AND `titles`.`id` <= $end AND publish) GROUP BY 
>> `titles`.`id` ORDER BY NULL
>> sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) FROM 
>> `titles` 
>> sql_attr_uint = sphinx_internal_id
>> sql_attr_uint = sphinx_deleted
>> sql_attr_uint = class_crc
>> sql_attr_timestamp = created_at
>> sql_attr_timestamp = updated_at
>> sql_attr_string = sphinx_internal_class
>> sql_attr_string = title_sort
>> sql_attr_string = author_name_sort
>> sql_query_info = SELECT * FROM `titles` WHERE `id` = (($id - 3) / 4)
>> }
>> 
>> index title_core
>> {
>> source = title_core_0
>> path = /data/www/db/sphinx/production/title_core
>> charset_type = utf-8
>> }
>> 
>> index title
>> {
>> type = distributed
>> local = title_core
>> }
>> 
>> This is after I added an explicit line to the define_index statement:
>> 
>> join title.contents
>> 
>> There was no difference in terms of the index size that I noted earlier, and 
>> I'm not astute enough to tell you if the join is correct in the generated 
>> SQL. Here's the entirety of the define_index method:
>> 
>> define_index do
>>   set_property :group_concat_max_len => 10.megabytes
>> 
>>   indexes :title, :sortable => true
>>   indexes teaser
>>   join title.contents
>>   indexes contents.plain, :as => :content
>>   indexes author_name, :sortable => true
>>   has created_at, updated_at
>>   where sanitize_sql(["publish", true])
>> end
>> 
>> Do you see anything odd about this? Should I change the relationship to 
>> contents from has_many to has_one? (The has_many was a relic from an earlier 
>> structure where I was splitting the plain text on an arbitrary length. This 
>> was before I found a way to increase the column width in my database to a 
>> size guaranteed to hold my largest title.)
>> 
>> Thanks again for your tireless support!
>> 
>> Walter
>> 
>> On Mar 22, 2014, at 7:49 PM, Pat Allan wrote:
>> 
>>> Your points on the expected size are spot on... the syntax you're using 
>>> (association_name.column_name) should ensure the SQL join is made 
>>> automatically, but you can confirm that by looking at the sql_query value 
>>> for the title_core index and see whether it's joining on contents, and how 
>>> the plain column is being used.
>>> 
>>> Is there anything odd/wrong in that SQL statement?
>>> 
>>> -- 
>>> Pat
>>> 
>>> On 23 Mar 2014, at 8:25 am, Walter Lee Davis <[email protected]> wrote:
>>> 
>>>> I've tried both, and when I don't separately index the contents table, I 
>>>> don't get any hits on a fairly unique string that I know is in the 
>>>> contents.plain field when I search on titles. The indexing process makes 
>>>> it clear that this relation is not being touched -- when I index the 
>>>> contents table, the result is this:
>>>> 
>>>> collected 1099 docs, 913.5 MB
>>>> 
>>>> When I index the titles (including the related field in contents) I only 
>>>> get this:
>>>> 
>>>> collected 1603 docs, 24.6 MB
>>>> 
>>>> contents.plain is the only large part of contents, it's just title_id and 
>>>> timestamps besides the plain column. If contents.plain was being accessed 
>>>> as a related column (and thus added to the metadata from the titles 
>>>> table), I would expect the result from the title indexing process to be in 
>>>> the 1GB neighborhood. That would make sense to me, given the amount of 
>>>> data being indexed.
>>>> 
>>>> What I have done as a fallback for now is added an index to the contents 
>>>> table, and rigged up my search results page to display those results as if 
>>>> they were to the title. But I'd still like to be searching once across all 
>>>> titles, and finding hits whether they were against the metadata (in the 
>>>> titles table) or the content (in the contents table). Is there some join 
>>>> syntax I need to use here? Remember -- this is Sphinx 2, not 3.
>>>> 
>>>> Thanks so much for your excellent support!
>>>> 
>>>> Walter
>>>> 
>>>> On Mar 21, 2014, at 11:04 PM, Pat Allan wrote:
>>>> 
>>>>> * The indexes line you're using will work fine - also, it'll work with 
>>>>> the arguments being shifted to method calls:
>>>>> 
>>>>> indexes contents.plain, :as => :content
>>>>> 
>>>>> If there's still issues, do get in touch.
>>>> 
>>>> -- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "Thinking Sphinx" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/thinking-sphinx.
>>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Thinking Sphinx" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/thinking-sphinx.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Thinking Sphinx" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/thinking-sphinx.
>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/thinking-sphinx.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/d/optout.

Re: Indexing related record with TS2 (WAS: Re: [ts] TS 2 Indexing Linked Model "skipping non-plain index")

Reply via email to