Okay, well I'm taking it out, then, and let's see how the indexing goes this time.
... and, it's working fine. I wonder what I did to make it work again? The only other significant change was to specify has_one instead of has_many on the content association. Walter On Apr 1, 2014, at 8:54 PM, Pat Allan wrote: > Yeah, I really don't think that line is doing anything at all. Also: if the > model where the index is defined is Title, then what you've specified with > that line is "Please include the association 'content' from the model > returned by the association 'title', from the model where this index is > defined. Thus, it's actually looking for an association 'title' in the Title > model. > > On 2 Apr 2014, at 11:49 am, Walter Lee Davis <[email protected]> wrote: > >> It's a little confusing, I know. The titles table also has a field named >> title in it. Here's a bit of the schema: >> >> create_table "contents", :force => true do |t| >> t.integer "title_id" >> t.text "plain", :limit => 2147483647 >> t.datetime "created_at" >> t.datetime "updated_at" >> end >> >> create_table "titles", :force => true do |t| >> t.string "title" >> t.string "list_title" >> t.string "sort_title" >> t.boolean "publish" >> t.integer "file_id" >> t.integer "copyright_id" >> t.text "teaser" >> t.text "bibliography" >> t.integer "publication_date" >> t.integer "edition_date" >> t.string "publication_place" >> t.string "publisher" >> t.integer "lf_store_id" >> t.string "title_type" >> t.integer "set", :default => 0 >> t.datetime "created_at" >> t.datetime "updated_at" >> t.string "slug" >> t.integer "role_id" >> t.string "xml_filename" >> t.string "author_name" >> t.text "toc" >> end >> >> I didn't see any improvement in the indexing behavior until I added that >> join statement, but as I said, I did several things at once, so I may just >> be cargo-culting a little here. >> >> Walter >> >> On Apr 1, 2014, at 7:31 PM, Pat Allan wrote: >> >>> Hi Walter >>> >>> Great to know things are working now. >>> >>> Just to clarify one thing: `join title.content` is not required - the join >>> method is for specifying associations that are only used via custom SQL >>> snippets (instead of the usual association/column chained methods). >>> >>> But given you've specified that - is title an association, or a column? >>> Because your first field indicates it's a column, but then you've used it >>> with the join method as an association. >>> >>> -- >>> Pat >>> >>> On 2 Apr 2014, at 12:54 am, Walter Lee Davis <[email protected]> wrote: >>> >>>> Thanks, you've been great rubber ducks! >>>> >>>> Just after I posted this, I tried "one more thing". (Actually, it was two >>>> or more things at once -- bad practice, I know.) As I hinted, I changed >>>> the relationship from has_many to has_one :content, and left the join in >>>> place, and changed the name of the content.plain from :content to >>>> :plain_text. >>>> >>>> define_index do >>>> set_property :group_concat_max_len => 10.megabytes >>>> >>>> indexes :title, :sortable => true >>>> indexes teaser >>>> join title.content >>>> indexes content.plain, :as => :plain_text >>>> indexes author_name, :sortable => true >>>> has created_at, updated_at >>>> where sanitize_sql(["publish", true]) >>>> end >>>> >>>> >>>> Then in my search method, I removed a dangling :include => :contents, >>>> parameter. Somewhere between these changes, the indexing process started >>>> making sense (about 1GB of index, as predicted) and the search started >>>> working -- and it started working faster, too. >>>> >>>> As my old creative director used to say of me, "Even a blind squirrel >>>> finds a nut or two." >>>> >>>> Thanks for all you do, >>>> >>>> Walter >>>> >>>> On Apr 1, 2014, at 9:23 AM, Walter Lee Davis wrote: >>>> >>>>> Here's the info from the production.sphinx.conf, including the SQL: >>>>> >>>>> source title_core_0 >>>>> { >>>>> type = mysql >>>>> sql_host = 10.12.29.11 >>>>> sql_user = user >>>>> sql_pass = pass >>>>> sql_db = oll2 >>>>> sql_query_pre = SET SESSION group_concat_max_len = 10485760 >>>>> sql_query_pre = SET NAMES utf8 >>>>> sql_query_pre = SET TIME_ZONE = '+0:00' >>>>> sql_query = SELECT SQL_NO_CACHE `titles`.`id` * CAST(4 AS SIGNED) + 3 AS >>>>> `id` , `titles`.`title` AS `title`, `titles`.`teaser` AS `teaser`, >>>>> GROUP_CONCAT(DISTINCT IFNULL(`contents`.`plain`, '0') SEPARATOR ' ') AS >>>>> `content`, `titles`.`author_name` AS `author_name`, `titles`.`id` AS >>>>> `sphinx_internal_id`, 0 AS `sphinx_deleted`, 3942078319 AS `class_crc`, >>>>> IFNULL('Title', '') AS `sphinx_internal_class`, IFNULL(`titles`.`title`, >>>>> '') AS `title_sort`, IFNULL(`titles`.`author_name`, '') AS >>>>> `author_name_sort`, UNIX_TIMESTAMP(`titles`.`created_at`) AS >>>>> `created_at`, UNIX_TIMESTAMP(`titles`.`updated_at`) AS `updated_at` FROM >>>>> `titles` LEFT OUTER JOIN `contents` ON `contents`.`title_id` = >>>>> `titles`.`id` WHERE (`titles`.`id` >= $start AND `titles`.`id` <= $end >>>>> AND publish) GROUP BY `titles`.`id` ORDER BY NULL >>>>> sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) FROM >>>>> `titles` >>>>> sql_attr_uint = sphinx_internal_id >>>>> sql_attr_uint = sphinx_deleted >>>>> sql_attr_uint = class_crc >>>>> sql_attr_timestamp = created_at >>>>> sql_attr_timestamp = updated_at >>>>> sql_attr_string = sphinx_internal_class >>>>> sql_attr_string = title_sort >>>>> sql_attr_string = author_name_sort >>>>> sql_query_info = SELECT * FROM `titles` WHERE `id` = (($id - 3) / 4) >>>>> } >>>>> >>>>> index title_core >>>>> { >>>>> source = title_core_0 >>>>> path = /data/www/db/sphinx/production/title_core >>>>> charset_type = utf-8 >>>>> } >>>>> >>>>> index title >>>>> { >>>>> type = distributed >>>>> local = title_core >>>>> } >>>>> >>>>> This is after I added an explicit line to the define_index statement: >>>>> >>>>> join title.contents >>>>> >>>>> There was no difference in terms of the index size that I noted earlier, >>>>> and I'm not astute enough to tell you if the join is correct in the >>>>> generated SQL. Here's the entirety of the define_index method: >>>>> >>>>> define_index do >>>>> set_property :group_concat_max_len => 10.megabytes >>>>> >>>>> indexes :title, :sortable => true >>>>> indexes teaser >>>>> join title.contents >>>>> indexes contents.plain, :as => :content >>>>> indexes author_name, :sortable => true >>>>> has created_at, updated_at >>>>> where sanitize_sql(["publish", true]) >>>>> end >>>>> >>>>> Do you see anything odd about this? Should I change the relationship to >>>>> contents from has_many to has_one? (The has_many was a relic from an >>>>> earlier structure where I was splitting the plain text on an arbitrary >>>>> length. This was before I found a way to increase the column width in my >>>>> database to a size guaranteed to hold my largest title.) >>>>> >>>>> Thanks again for your tireless support! >>>>> >>>>> Walter >>>>> >>>>> On Mar 22, 2014, at 7:49 PM, Pat Allan wrote: >>>>> >>>>>> Your points on the expected size are spot on... the syntax you're using >>>>>> (association_name.column_name) should ensure the SQL join is made >>>>>> automatically, but you can confirm that by looking at the sql_query >>>>>> value for the title_core index and see whether it's joining on contents, >>>>>> and how the plain column is being used. >>>>>> >>>>>> Is there anything odd/wrong in that SQL statement? >>>>>> >>>>>> -- >>>>>> Pat >>>>>> >>>>>> On 23 Mar 2014, at 8:25 am, Walter Lee Davis <[email protected]> wrote: >>>>>> >>>>>>> I've tried both, and when I don't separately index the contents table, >>>>>>> I don't get any hits on a fairly unique string that I know is in the >>>>>>> contents.plain field when I search on titles. The indexing process >>>>>>> makes it clear that this relation is not being touched -- when I index >>>>>>> the contents table, the result is this: >>>>>>> >>>>>>> collected 1099 docs, 913.5 MB >>>>>>> >>>>>>> When I index the titles (including the related field in contents) I >>>>>>> only get this: >>>>>>> >>>>>>> collected 1603 docs, 24.6 MB >>>>>>> >>>>>>> contents.plain is the only large part of contents, it's just title_id >>>>>>> and timestamps besides the plain column. If contents.plain was being >>>>>>> accessed as a related column (and thus added to the metadata from the >>>>>>> titles table), I would expect the result from the title indexing >>>>>>> process to be in the 1GB neighborhood. That would make sense to me, >>>>>>> given the amount of data being indexed. >>>>>>> >>>>>>> What I have done as a fallback for now is added an index to the >>>>>>> contents table, and rigged up my search results page to display those >>>>>>> results as if they were to the title. But I'd still like to be >>>>>>> searching once across all titles, and finding hits whether they were >>>>>>> against the metadata (in the titles table) or the content (in the >>>>>>> contents table). Is there some join syntax I need to use here? Remember >>>>>>> -- this is Sphinx 2, not 3. >>>>>>> >>>>>>> Thanks so much for your excellent support! >>>>>>> >>>>>>> Walter >>>>>>> >>>>>>> On Mar 21, 2014, at 11:04 PM, Pat Allan wrote: >>>>>>> >>>>>>>> * The indexes line you're using will work fine - also, it'll work with >>>>>>>> the arguments being shifted to method calls: >>>>>>>> >>>>>>>> indexes contents.plain, :as => :content >>>>>>>> >>>>>>>> If there's still issues, do get in touch. >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "Thinking Sphinx" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>>> an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Thinking Sphinx" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>> an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "Thinking Sphinx" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an >>>>> email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups >>>> "Thinking Sphinx" group. >>>> To unsubscribe from this group and stop receiving emails from it, send an >>>> email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>>> For more options, visit https://groups.google.com/d/optout. >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Thinking Sphinx" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>> For more options, visit https://groups.google.com/d/optout. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Thinking Sphinx" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/thinking-sphinx. >> For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/thinking-sphinx. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/thinking-sphinx. For more options, visit https://groups.google.com/d/optout.
