I would expect it to work with both has_one and has_many associations. To be honest, I expected it to work back at the start of this thread, so I'm not really sure what's changed to get things in the right state :)
On 2 Apr 2014, at 12:00 pm, Walter Lee Davis <[email protected]> wrote: > Okay, well I'm taking it out, then, and let's see how the indexing goes this > time. > > ... and, it's working fine. I wonder what I did to make it work again? The > only other significant change was to specify has_one instead of has_many on > the content association. > > Walter > > On Apr 1, 2014, at 8:54 PM, Pat Allan wrote: > >> Yeah, I really don't think that line is doing anything at all. Also: if the >> model where the index is defined is Title, then what you've specified with >> that line is "Please include the association 'content' from the model >> returned by the association 'title', from the model where this index is >> defined. Thus, it's actually looking for an association 'title' in the Title >> model. >> >> On 2 Apr 2014, at 11:49 am, Walter Lee Davis <[email protected]> wrote: >> >>> It's a little confusing, I know. The titles table also has a field named >>> title in it. Here's a bit of the schema: >>> >>> create_table "contents", :force => true do |t| >>> t.integer "title_id" >>> t.text "plain", :limit => 2147483647 >>> t.datetime "created_at" >>> t.datetime "updated_at" >>> end >>> >>> create_table "titles", :force => true do |t| >>> t.string "title" >>> t.string "list_title" >>> t.string "sort_title" >>> t.boolean "publish" >>> t.integer "file_id" >>> t.integer "copyright_id" >>> t.text "teaser" >>> t.text "bibliography" >>> t.integer "publication_date" >>> t.integer "edition_date" >>> t.string "publication_place" >>> t.string "publisher" >>> t.integer "lf_store_id" >>> t.string "title_type" >>> t.integer "set", :default => 0 >>> t.datetime "created_at" >>> t.datetime "updated_at" >>> t.string "slug" >>> t.integer "role_id" >>> t.string "xml_filename" >>> t.string "author_name" >>> t.text "toc" >>> end >>> >>> I didn't see any improvement in the indexing behavior until I added that >>> join statement, but as I said, I did several things at once, so I may just >>> be cargo-culting a little here. >>> >>> Walter >>> >>> On Apr 1, 2014, at 7:31 PM, Pat Allan wrote: >>> >>>> Hi Walter >>>> >>>> Great to know things are working now. >>>> >>>> Just to clarify one thing: `join title.content` is not required - the join >>>> method is for specifying associations that are only used via custom SQL >>>> snippets (instead of the usual association/column chained methods). >>>> >>>> But given you've specified that - is title an association, or a column? >>>> Because your first field indicates it's a column, but then you've used it >>>> with the join method as an association. >>>> >>>> -- >>>> Pat >>>> >>>> On 2 Apr 2014, at 12:54 am, Walter Lee Davis <[email protected]> wrote: >>>> >>>>> Thanks, you've been great rubber ducks! >>>>> >>>>> Just after I posted this, I tried "one more thing". (Actually, it was two >>>>> or more things at once -- bad practice, I know.) As I hinted, I changed >>>>> the relationship from has_many to has_one :content, and left the join in >>>>> place, and changed the name of the content.plain from :content to >>>>> :plain_text. >>>>> >>>>> define_index do >>>>> set_property :group_concat_max_len => 10.megabytes >>>>> >>>>> indexes :title, :sortable => true >>>>> indexes teaser >>>>> join title.content >>>>> indexes content.plain, :as => :plain_text >>>>> indexes author_name, :sortable => true >>>>> has created_at, updated_at >>>>> where sanitize_sql(["publish", true]) >>>>> end >>>>> >>>>> >>>>> Then in my search method, I removed a dangling :include => :contents, >>>>> parameter. Somewhere between these changes, the indexing process started >>>>> making sense (about 1GB of index, as predicted) and the search started >>>>> working -- and it started working faster, too. >>>>> >>>>> As my old creative director used to say of me, "Even a blind squirrel >>>>> finds a nut or two." >>>>> >>>>> Thanks for all you do, >>>>> >>>>> Walter >>>>> >>>>> On Apr 1, 2014, at 9:23 AM, Walter Lee Davis wrote: >>>>> >>>>>> Here's the info from the production.sphinx.conf, including the SQL: >>>>>> >>>>>> source title_core_0 >>>>>> { >>>>>> type = mysql >>>>>> sql_host = 10.12.29.11 >>>>>> sql_user = user >>>>>> sql_pass = pass >>>>>> sql_db = oll2 >>>>>> sql_query_pre = SET SESSION group_concat_max_len = 10485760 >>>>>> sql_query_pre = SET NAMES utf8 >>>>>> sql_query_pre = SET TIME_ZONE = '+0:00' >>>>>> sql_query = SELECT SQL_NO_CACHE `titles`.`id` * CAST(4 AS SIGNED) + 3 AS >>>>>> `id` , `titles`.`title` AS `title`, `titles`.`teaser` AS `teaser`, >>>>>> GROUP_CONCAT(DISTINCT IFNULL(`contents`.`plain`, '0') SEPARATOR ' ') AS >>>>>> `content`, `titles`.`author_name` AS `author_name`, `titles`.`id` AS >>>>>> `sphinx_internal_id`, 0 AS `sphinx_deleted`, 3942078319 AS `class_crc`, >>>>>> IFNULL('Title', '') AS `sphinx_internal_class`, IFNULL(`titles`.`title`, >>>>>> '') AS `title_sort`, IFNULL(`titles`.`author_name`, '') AS >>>>>> `author_name_sort`, UNIX_TIMESTAMP(`titles`.`created_at`) AS >>>>>> `created_at`, UNIX_TIMESTAMP(`titles`.`updated_at`) AS `updated_at` FROM >>>>>> `titles` LEFT OUTER JOIN `contents` ON `contents`.`title_id` = >>>>>> `titles`.`id` WHERE (`titles`.`id` >= $start AND `titles`.`id` <= $end >>>>>> AND publish) GROUP BY `titles`.`id` ORDER BY NULL >>>>>> sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) FROM >>>>>> `titles` >>>>>> sql_attr_uint = sphinx_internal_id >>>>>> sql_attr_uint = sphinx_deleted >>>>>> sql_attr_uint = class_crc >>>>>> sql_attr_timestamp = created_at >>>>>> sql_attr_timestamp = updated_at >>>>>> sql_attr_string = sphinx_internal_class >>>>>> sql_attr_string = title_sort >>>>>> sql_attr_string = author_name_sort >>>>>> sql_query_info = SELECT * FROM `titles` WHERE `id` = (($id - 3) / 4) >>>>>> } >>>>>> >>>>>> index title_core >>>>>> { >>>>>> source = title_core_0 >>>>>> path = /data/www/db/sphinx/production/title_core >>>>>> charset_type = utf-8 >>>>>> } >>>>>> >>>>>> index title >>>>>> { >>>>>> type = distributed >>>>>> local = title_core >>>>>> } >>>>>> >>>>>> This is after I added an explicit line to the define_index statement: >>>>>> >>>>>> join title.contents >>>>>> >>>>>> There was no difference in terms of the index size that I noted earlier, >>>>>> and I'm not astute enough to tell you if the join is correct in the >>>>>> generated SQL. Here's the entirety of the define_index method: >>>>>> >>>>>> define_index do >>>>>> set_property :group_concat_max_len => 10.megabytes >>>>>> >>>>>> indexes :title, :sortable => true >>>>>> indexes teaser >>>>>> join title.contents >>>>>> indexes contents.plain, :as => :content >>>>>> indexes author_name, :sortable => true >>>>>> has created_at, updated_at >>>>>> where sanitize_sql(["publish", true]) >>>>>> end >>>>>> >>>>>> Do you see anything odd about this? Should I change the relationship to >>>>>> contents from has_many to has_one? (The has_many was a relic from an >>>>>> earlier structure where I was splitting the plain text on an arbitrary >>>>>> length. This was before I found a way to increase the column width in my >>>>>> database to a size guaranteed to hold my largest title.) >>>>>> >>>>>> Thanks again for your tireless support! >>>>>> >>>>>> Walter >>>>>> >>>>>> On Mar 22, 2014, at 7:49 PM, Pat Allan wrote: >>>>>> >>>>>>> Your points on the expected size are spot on... the syntax you're using >>>>>>> (association_name.column_name) should ensure the SQL join is made >>>>>>> automatically, but you can confirm that by looking at the sql_query >>>>>>> value for the title_core index and see whether it's joining on >>>>>>> contents, and how the plain column is being used. >>>>>>> >>>>>>> Is there anything odd/wrong in that SQL statement? >>>>>>> >>>>>>> -- >>>>>>> Pat >>>>>>> >>>>>>> On 23 Mar 2014, at 8:25 am, Walter Lee Davis <[email protected]> wrote: >>>>>>> >>>>>>>> I've tried both, and when I don't separately index the contents table, >>>>>>>> I don't get any hits on a fairly unique string that I know is in the >>>>>>>> contents.plain field when I search on titles. The indexing process >>>>>>>> makes it clear that this relation is not being touched -- when I index >>>>>>>> the contents table, the result is this: >>>>>>>> >>>>>>>> collected 1099 docs, 913.5 MB >>>>>>>> >>>>>>>> When I index the titles (including the related field in contents) I >>>>>>>> only get this: >>>>>>>> >>>>>>>> collected 1603 docs, 24.6 MB >>>>>>>> >>>>>>>> contents.plain is the only large part of contents, it's just title_id >>>>>>>> and timestamps besides the plain column. If contents.plain was being >>>>>>>> accessed as a related column (and thus added to the metadata from the >>>>>>>> titles table), I would expect the result from the title indexing >>>>>>>> process to be in the 1GB neighborhood. That would make sense to me, >>>>>>>> given the amount of data being indexed. >>>>>>>> >>>>>>>> What I have done as a fallback for now is added an index to the >>>>>>>> contents table, and rigged up my search results page to display those >>>>>>>> results as if they were to the title. But I'd still like to be >>>>>>>> searching once across all titles, and finding hits whether they were >>>>>>>> against the metadata (in the titles table) or the content (in the >>>>>>>> contents table). Is there some join syntax I need to use here? >>>>>>>> Remember -- this is Sphinx 2, not 3. >>>>>>>> >>>>>>>> Thanks so much for your excellent support! >>>>>>>> >>>>>>>> Walter >>>>>>>> >>>>>>>> On Mar 21, 2014, at 11:04 PM, Pat Allan wrote: >>>>>>>> >>>>>>>>> * The indexes line you're using will work fine - also, it'll work >>>>>>>>> with the arguments being shifted to method calls: >>>>>>>>> >>>>>>>>> indexes contents.plain, :as => :content >>>>>>>>> >>>>>>>>> If there's still issues, do get in touch. >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "Thinking Sphinx" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>>>> an email to [email protected]. >>>>>>>> To post to this group, send email to [email protected]. >>>>>>>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "Thinking Sphinx" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>>> an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Thinking Sphinx" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>> an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "Thinking Sphinx" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an >>>>> email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups >>>> "Thinking Sphinx" group. >>>> To unsubscribe from this group and stop receiving emails from it, send an >>>> email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>>> For more options, visit https://groups.google.com/d/optout. >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Thinking Sphinx" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>> For more options, visit https://groups.google.com/d/optout. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Thinking Sphinx" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/thinking-sphinx. >> For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/thinking-sphinx. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/thinking-sphinx. For more options, visit https://groups.google.com/d/optout.
