Here's the info from the production.sphinx.conf, including the SQL:
source title_core_0
{
type = mysql
sql_host = 10.12.29.11
sql_user = user
sql_pass = pass
sql_db = oll2
sql_query_pre = SET SESSION group_concat_max_len = 10485760
sql_query_pre = SET NAMES utf8
sql_query_pre = SET TIME_ZONE = '+0:00'
sql_query = SELECT SQL_NO_CACHE `titles`.`id` * CAST(4 AS SIGNED) + 3 AS `id`
, `titles`.`title` AS `title`, `titles`.`teaser` AS `teaser`,
GROUP_CONCAT(DISTINCT IFNULL(`contents`.`plain`, '0') SEPARATOR ' ') AS
`content`, `titles`.`author_name` AS `author_name`, `titles`.`id` AS
`sphinx_internal_id`, 0 AS `sphinx_deleted`, 3942078319 AS `class_crc`,
IFNULL('Title', '') AS `sphinx_internal_class`, IFNULL(`titles`.`title`, '') AS
`title_sort`, IFNULL(`titles`.`author_name`, '') AS `author_name_sort`,
UNIX_TIMESTAMP(`titles`.`created_at`) AS `created_at`,
UNIX_TIMESTAMP(`titles`.`updated_at`) AS `updated_at` FROM `titles` LEFT OUTER
JOIN `contents` ON `contents`.`title_id` = `titles`.`id` WHERE (`titles`.`id`
>= $start AND `titles`.`id` <= $end AND publish) GROUP BY `titles`.`id` ORDER
BY NULL
sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) FROM
`titles`
sql_attr_uint = sphinx_internal_id
sql_attr_uint = sphinx_deleted
sql_attr_uint = class_crc
sql_attr_timestamp = created_at
sql_attr_timestamp = updated_at
sql_attr_string = sphinx_internal_class
sql_attr_string = title_sort
sql_attr_string = author_name_sort
sql_query_info = SELECT * FROM `titles` WHERE `id` = (($id - 3) / 4)
}
index title_core
{
source = title_core_0
path = /data/www/db/sphinx/production/title_core
charset_type = utf-8
}
index title
{
type = distributed
local = title_core
}
This is after I added an explicit line to the define_index statement:
join title.contents
There was no difference in terms of the index size that I noted earlier, and
I'm not astute enough to tell you if the join is correct in the generated SQL.
Here's the entirety of the define_index method:
define_index do
set_property :group_concat_max_len => 10.megabytes
indexes :title, :sortable => true
indexes teaser
join title.contents
indexes contents.plain, :as => :content
indexes author_name, :sortable => true
has created_at, updated_at
where sanitize_sql(["publish", true])
end
Do you see anything odd about this? Should I change the relationship to
contents from has_many to has_one? (The has_many was a relic from an earlier
structure where I was splitting the plain text on an arbitrary length. This was
before I found a way to increase the column width in my database to a size
guaranteed to hold my largest title.)
Thanks again for your tireless support!
Walter
On Mar 22, 2014, at 7:49 PM, Pat Allan wrote:
> Your points on the expected size are spot on... the syntax you're using
> (association_name.column_name) should ensure the SQL join is made
> automatically, but you can confirm that by looking at the sql_query value for
> the title_core index and see whether it's joining on contents, and how the
> plain column is being used.
>
> Is there anything odd/wrong in that SQL statement?
>
> --
> Pat
>
> On 23 Mar 2014, at 8:25 am, Walter Lee Davis <[email protected]> wrote:
>
>> I've tried both, and when I don't separately index the contents table, I
>> don't get any hits on a fairly unique string that I know is in the
>> contents.plain field when I search on titles. The indexing process makes it
>> clear that this relation is not being touched -- when I index the contents
>> table, the result is this:
>>
>> collected 1099 docs, 913.5 MB
>>
>> When I index the titles (including the related field in contents) I only get
>> this:
>>
>> collected 1603 docs, 24.6 MB
>>
>> contents.plain is the only large part of contents, it's just title_id and
>> timestamps besides the plain column. If contents.plain was being accessed as
>> a related column (and thus added to the metadata from the titles table), I
>> would expect the result from the title indexing process to be in the 1GB
>> neighborhood. That would make sense to me, given the amount of data being
>> indexed.
>>
>> What I have done as a fallback for now is added an index to the contents
>> table, and rigged up my search results page to display those results as if
>> they were to the title. But I'd still like to be searching once across all
>> titles, and finding hits whether they were against the metadata (in the
>> titles table) or the content (in the contents table). Is there some join
>> syntax I need to use here? Remember -- this is Sphinx 2, not 3.
>>
>> Thanks so much for your excellent support!
>>
>> Walter
>>
>> On Mar 21, 2014, at 11:04 PM, Pat Allan wrote:
>>
>>> * The indexes line you're using will work fine - also, it'll work with the
>>> arguments being shifted to method calls:
>>>
>>> indexes contents.plain, :as => :content
>>>
>>> If there's still issues, do get in touch.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Thinking Sphinx" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/thinking-sphinx.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/thinking-sphinx.
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/d/optout.