Hi Walter Great to know things are working now.
Just to clarify one thing: `join title.content` is not required - the join method is for specifying associations that are only used via custom SQL snippets (instead of the usual association/column chained methods). But given you've specified that - is title an association, or a column? Because your first field indicates it's a column, but then you've used it with the join method as an association. -- Pat On 2 Apr 2014, at 12:54 am, Walter Lee Davis <[email protected]> wrote: > Thanks, you've been great rubber ducks! > > Just after I posted this, I tried "one more thing". (Actually, it was two or > more things at once -- bad practice, I know.) As I hinted, I changed the > relationship from has_many to has_one :content, and left the join in place, > and changed the name of the content.plain from :content to :plain_text. > > define_index do > set_property :group_concat_max_len => 10.megabytes > > indexes :title, :sortable => true > indexes teaser > join title.content > indexes content.plain, :as => :plain_text > indexes author_name, :sortable => true > has created_at, updated_at > where sanitize_sql(["publish", true]) > end > > > Then in my search method, I removed a dangling :include => :contents, > parameter. Somewhere between these changes, the indexing process started > making sense (about 1GB of index, as predicted) and the search started > working -- and it started working faster, too. > > As my old creative director used to say of me, "Even a blind squirrel finds a > nut or two." > > Thanks for all you do, > > Walter > > On Apr 1, 2014, at 9:23 AM, Walter Lee Davis wrote: > >> Here's the info from the production.sphinx.conf, including the SQL: >> >> source title_core_0 >> { >> type = mysql >> sql_host = 10.12.29.11 >> sql_user = user >> sql_pass = pass >> sql_db = oll2 >> sql_query_pre = SET SESSION group_concat_max_len = 10485760 >> sql_query_pre = SET NAMES utf8 >> sql_query_pre = SET TIME_ZONE = '+0:00' >> sql_query = SELECT SQL_NO_CACHE `titles`.`id` * CAST(4 AS SIGNED) + 3 AS >> `id` , `titles`.`title` AS `title`, `titles`.`teaser` AS `teaser`, >> GROUP_CONCAT(DISTINCT IFNULL(`contents`.`plain`, '0') SEPARATOR ' ') AS >> `content`, `titles`.`author_name` AS `author_name`, `titles`.`id` AS >> `sphinx_internal_id`, 0 AS `sphinx_deleted`, 3942078319 AS `class_crc`, >> IFNULL('Title', '') AS `sphinx_internal_class`, IFNULL(`titles`.`title`, '') >> AS `title_sort`, IFNULL(`titles`.`author_name`, '') AS `author_name_sort`, >> UNIX_TIMESTAMP(`titles`.`created_at`) AS `created_at`, >> UNIX_TIMESTAMP(`titles`.`updated_at`) AS `updated_at` FROM `titles` LEFT >> OUTER JOIN `contents` ON `contents`.`title_id` = `titles`.`id` WHERE >> (`titles`.`id` >= $start AND `titles`.`id` <= $end AND publish) GROUP BY >> `titles`.`id` ORDER BY NULL >> sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) FROM >> `titles` >> sql_attr_uint = sphinx_internal_id >> sql_attr_uint = sphinx_deleted >> sql_attr_uint = class_crc >> sql_attr_timestamp = created_at >> sql_attr_timestamp = updated_at >> sql_attr_string = sphinx_internal_class >> sql_attr_string = title_sort >> sql_attr_string = author_name_sort >> sql_query_info = SELECT * FROM `titles` WHERE `id` = (($id - 3) / 4) >> } >> >> index title_core >> { >> source = title_core_0 >> path = /data/www/db/sphinx/production/title_core >> charset_type = utf-8 >> } >> >> index title >> { >> type = distributed >> local = title_core >> } >> >> This is after I added an explicit line to the define_index statement: >> >> join title.contents >> >> There was no difference in terms of the index size that I noted earlier, and >> I'm not astute enough to tell you if the join is correct in the generated >> SQL. Here's the entirety of the define_index method: >> >> define_index do >> set_property :group_concat_max_len => 10.megabytes >> >> indexes :title, :sortable => true >> indexes teaser >> join title.contents >> indexes contents.plain, :as => :content >> indexes author_name, :sortable => true >> has created_at, updated_at >> where sanitize_sql(["publish", true]) >> end >> >> Do you see anything odd about this? Should I change the relationship to >> contents from has_many to has_one? (The has_many was a relic from an earlier >> structure where I was splitting the plain text on an arbitrary length. This >> was before I found a way to increase the column width in my database to a >> size guaranteed to hold my largest title.) >> >> Thanks again for your tireless support! >> >> Walter >> >> On Mar 22, 2014, at 7:49 PM, Pat Allan wrote: >> >>> Your points on the expected size are spot on... the syntax you're using >>> (association_name.column_name) should ensure the SQL join is made >>> automatically, but you can confirm that by looking at the sql_query value >>> for the title_core index and see whether it's joining on contents, and how >>> the plain column is being used. >>> >>> Is there anything odd/wrong in that SQL statement? >>> >>> -- >>> Pat >>> >>> On 23 Mar 2014, at 8:25 am, Walter Lee Davis <[email protected]> wrote: >>> >>>> I've tried both, and when I don't separately index the contents table, I >>>> don't get any hits on a fairly unique string that I know is in the >>>> contents.plain field when I search on titles. The indexing process makes >>>> it clear that this relation is not being touched -- when I index the >>>> contents table, the result is this: >>>> >>>> collected 1099 docs, 913.5 MB >>>> >>>> When I index the titles (including the related field in contents) I only >>>> get this: >>>> >>>> collected 1603 docs, 24.6 MB >>>> >>>> contents.plain is the only large part of contents, it's just title_id and >>>> timestamps besides the plain column. If contents.plain was being accessed >>>> as a related column (and thus added to the metadata from the titles >>>> table), I would expect the result from the title indexing process to be in >>>> the 1GB neighborhood. That would make sense to me, given the amount of >>>> data being indexed. >>>> >>>> What I have done as a fallback for now is added an index to the contents >>>> table, and rigged up my search results page to display those results as if >>>> they were to the title. But I'd still like to be searching once across all >>>> titles, and finding hits whether they were against the metadata (in the >>>> titles table) or the content (in the contents table). Is there some join >>>> syntax I need to use here? Remember -- this is Sphinx 2, not 3. >>>> >>>> Thanks so much for your excellent support! >>>> >>>> Walter >>>> >>>> On Mar 21, 2014, at 11:04 PM, Pat Allan wrote: >>>> >>>>> * The indexes line you're using will work fine - also, it'll work with >>>>> the arguments being shifted to method calls: >>>>> >>>>> indexes contents.plain, :as => :content >>>>> >>>>> If there's still issues, do get in touch. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups >>>> "Thinking Sphinx" group. >>>> To unsubscribe from this group and stop receiving emails from it, send an >>>> email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>>> For more options, visit https://groups.google.com/d/optout. >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Thinking Sphinx" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>> For more options, visit https://groups.google.com/d/optout. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Thinking Sphinx" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/thinking-sphinx. >> For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/thinking-sphinx. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/thinking-sphinx. For more options, visit https://groups.google.com/d/optout.
