It's a little confusing, I know. The titles table also has a field named title
in it. Here's a bit of the schema:
create_table "contents", :force => true do |t|
t.integer "title_id"
t.text "plain", :limit => 2147483647
t.datetime "created_at"
t.datetime "updated_at"
end
create_table "titles", :force => true do |t|
t.string "title"
t.string "list_title"
t.string "sort_title"
t.boolean "publish"
t.integer "file_id"
t.integer "copyright_id"
t.text "teaser"
t.text "bibliography"
t.integer "publication_date"
t.integer "edition_date"
t.string "publication_place"
t.string "publisher"
t.integer "lf_store_id"
t.string "title_type"
t.integer "set", :default => 0
t.datetime "created_at"
t.datetime "updated_at"
t.string "slug"
t.integer "role_id"
t.string "xml_filename"
t.string "author_name"
t.text "toc"
end
I didn't see any improvement in the indexing behavior until I added that join
statement, but as I said, I did several things at once, so I may just be
cargo-culting a little here.
Walter
On Apr 1, 2014, at 7:31 PM, Pat Allan wrote:
> Hi Walter
>
> Great to know things are working now.
>
> Just to clarify one thing: `join title.content` is not required - the join
> method is for specifying associations that are only used via custom SQL
> snippets (instead of the usual association/column chained methods).
>
> But given you've specified that - is title an association, or a column?
> Because your first field indicates it's a column, but then you've used it
> with the join method as an association.
>
> --
> Pat
>
> On 2 Apr 2014, at 12:54 am, Walter Lee Davis <[email protected]> wrote:
>
>> Thanks, you've been great rubber ducks!
>>
>> Just after I posted this, I tried "one more thing". (Actually, it was two or
>> more things at once -- bad practice, I know.) As I hinted, I changed the
>> relationship from has_many to has_one :content, and left the join in place,
>> and changed the name of the content.plain from :content to :plain_text.
>>
>> define_index do
>> set_property :group_concat_max_len => 10.megabytes
>>
>> indexes :title, :sortable => true
>> indexes teaser
>> join title.content
>> indexes content.plain, :as => :plain_text
>> indexes author_name, :sortable => true
>> has created_at, updated_at
>> where sanitize_sql(["publish", true])
>> end
>>
>>
>> Then in my search method, I removed a dangling :include => :contents,
>> parameter. Somewhere between these changes, the indexing process started
>> making sense (about 1GB of index, as predicted) and the search started
>> working -- and it started working faster, too.
>>
>> As my old creative director used to say of me, "Even a blind squirrel finds
>> a nut or two."
>>
>> Thanks for all you do,
>>
>> Walter
>>
>> On Apr 1, 2014, at 9:23 AM, Walter Lee Davis wrote:
>>
>>> Here's the info from the production.sphinx.conf, including the SQL:
>>>
>>> source title_core_0
>>> {
>>> type = mysql
>>> sql_host = 10.12.29.11
>>> sql_user = user
>>> sql_pass = pass
>>> sql_db = oll2
>>> sql_query_pre = SET SESSION group_concat_max_len = 10485760
>>> sql_query_pre = SET NAMES utf8
>>> sql_query_pre = SET TIME_ZONE = '+0:00'
>>> sql_query = SELECT SQL_NO_CACHE `titles`.`id` * CAST(4 AS SIGNED) + 3 AS
>>> `id` , `titles`.`title` AS `title`, `titles`.`teaser` AS `teaser`,
>>> GROUP_CONCAT(DISTINCT IFNULL(`contents`.`plain`, '0') SEPARATOR ' ') AS
>>> `content`, `titles`.`author_name` AS `author_name`, `titles`.`id` AS
>>> `sphinx_internal_id`, 0 AS `sphinx_deleted`, 3942078319 AS `class_crc`,
>>> IFNULL('Title', '') AS `sphinx_internal_class`, IFNULL(`titles`.`title`,
>>> '') AS `title_sort`, IFNULL(`titles`.`author_name`, '') AS
>>> `author_name_sort`, UNIX_TIMESTAMP(`titles`.`created_at`) AS `created_at`,
>>> UNIX_TIMESTAMP(`titles`.`updated_at`) AS `updated_at` FROM `titles` LEFT
>>> OUTER JOIN `contents` ON `contents`.`title_id` = `titles`.`id` WHERE
>>> (`titles`.`id` >= $start AND `titles`.`id` <= $end AND publish) GROUP BY
>>> `titles`.`id` ORDER BY NULL
>>> sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) FROM
>>> `titles`
>>> sql_attr_uint = sphinx_internal_id
>>> sql_attr_uint = sphinx_deleted
>>> sql_attr_uint = class_crc
>>> sql_attr_timestamp = created_at
>>> sql_attr_timestamp = updated_at
>>> sql_attr_string = sphinx_internal_class
>>> sql_attr_string = title_sort
>>> sql_attr_string = author_name_sort
>>> sql_query_info = SELECT * FROM `titles` WHERE `id` = (($id - 3) / 4)
>>> }
>>>
>>> index title_core
>>> {
>>> source = title_core_0
>>> path = /data/www/db/sphinx/production/title_core
>>> charset_type = utf-8
>>> }
>>>
>>> index title
>>> {
>>> type = distributed
>>> local = title_core
>>> }
>>>
>>> This is after I added an explicit line to the define_index statement:
>>>
>>> join title.contents
>>>
>>> There was no difference in terms of the index size that I noted earlier,
>>> and I'm not astute enough to tell you if the join is correct in the
>>> generated SQL. Here's the entirety of the define_index method:
>>>
>>> define_index do
>>> set_property :group_concat_max_len => 10.megabytes
>>>
>>> indexes :title, :sortable => true
>>> indexes teaser
>>> join title.contents
>>> indexes contents.plain, :as => :content
>>> indexes author_name, :sortable => true
>>> has created_at, updated_at
>>> where sanitize_sql(["publish", true])
>>> end
>>>
>>> Do you see anything odd about this? Should I change the relationship to
>>> contents from has_many to has_one? (The has_many was a relic from an
>>> earlier structure where I was splitting the plain text on an arbitrary
>>> length. This was before I found a way to increase the column width in my
>>> database to a size guaranteed to hold my largest title.)
>>>
>>> Thanks again for your tireless support!
>>>
>>> Walter
>>>
>>> On Mar 22, 2014, at 7:49 PM, Pat Allan wrote:
>>>
>>>> Your points on the expected size are spot on... the syntax you're using
>>>> (association_name.column_name) should ensure the SQL join is made
>>>> automatically, but you can confirm that by looking at the sql_query value
>>>> for the title_core index and see whether it's joining on contents, and how
>>>> the plain column is being used.
>>>>
>>>> Is there anything odd/wrong in that SQL statement?
>>>>
>>>> --
>>>> Pat
>>>>
>>>> On 23 Mar 2014, at 8:25 am, Walter Lee Davis <[email protected]> wrote:
>>>>
>>>>> I've tried both, and when I don't separately index the contents table, I
>>>>> don't get any hits on a fairly unique string that I know is in the
>>>>> contents.plain field when I search on titles. The indexing process makes
>>>>> it clear that this relation is not being touched -- when I index the
>>>>> contents table, the result is this:
>>>>>
>>>>> collected 1099 docs, 913.5 MB
>>>>>
>>>>> When I index the titles (including the related field in contents) I only
>>>>> get this:
>>>>>
>>>>> collected 1603 docs, 24.6 MB
>>>>>
>>>>> contents.plain is the only large part of contents, it's just title_id and
>>>>> timestamps besides the plain column. If contents.plain was being accessed
>>>>> as a related column (and thus added to the metadata from the titles
>>>>> table), I would expect the result from the title indexing process to be
>>>>> in the 1GB neighborhood. That would make sense to me, given the amount of
>>>>> data being indexed.
>>>>>
>>>>> What I have done as a fallback for now is added an index to the contents
>>>>> table, and rigged up my search results page to display those results as
>>>>> if they were to the title. But I'd still like to be searching once across
>>>>> all titles, and finding hits whether they were against the metadata (in
>>>>> the titles table) or the content (in the contents table). Is there some
>>>>> join syntax I need to use here? Remember -- this is Sphinx 2, not 3.
>>>>>
>>>>> Thanks so much for your excellent support!
>>>>>
>>>>> Walter
>>>>>
>>>>> On Mar 21, 2014, at 11:04 PM, Pat Allan wrote:
>>>>>
>>>>>> * The indexes line you're using will work fine - also, it'll work with
>>>>>> the arguments being shifted to method calls:
>>>>>>
>>>>>> indexes contents.plain, :as => :content
>>>>>>
>>>>>> If there's still issues, do get in touch.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups
>>>>> "Thinking Sphinx" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an
>>>>> email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at http://groups.google.com/group/thinking-sphinx.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups
>>>> "Thinking Sphinx" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an
>>>> email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/thinking-sphinx.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Thinking Sphinx" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/thinking-sphinx.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Thinking Sphinx" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/thinking-sphinx.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/thinking-sphinx.
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/d/optout.