Thanks, that works really well.
Walter
On Sep 14, 2012, at 8:57 AM, Pat Allan wrote:
> Hi Walter
>
> You'll need to add a method to your title model that aggregates the contents
> values - it gets sent through to Sphinx:
>
> def plain_contents
> contents.collect(&:plain).join(' ')
> end
>
> And then you can use that via the excerpts object:
>
> title.excerpts.plain_contents
>
> Essentially, excerpts is a single-level proxy.
>
> Cheers
>
> --
> Pat
>
> On 14/09/2012, at 1:43 PM, Walter Lee Davis wrote:
>
>> My index works really well, almost instant searches over a huge collection
>> of large documents. But now I'm trying to enable excerpts, and I don't seem
>> to be getting them on my 'contents' concatenated field. Recall the structure:
>>
>> #content.rb
>> belongs_to :title
>> #has a field called plain containing the plain text of the document, 400
>> rows per instance
>>
>> #title.rb
>> has_many :contents
>>
>> define_index do
>> set_property :group_concat_max_len => 10.megabytes
>>
>> indexes :title, :sortable => true
>> indexes teaser
>> indexes role.person(:name), :as => :author, :sortable => true
>> indexes contents(:plain), :as => :content
>> has created_at, updated_at
>> where sanitize_sql(["publish", true])
>> end
>>
>> I can find anything in contents, but I'm not clear what my syntax would be
>> to show excerpts from it in a view.
>>
>> I have tried title.excerpts.contents and title.excerpts.contents.plain
>> without love.
>>
>> Walter
>>
>> On Sep 7, 2012, at 2:17 PM, Walter Lee Davis wrote:
>>
>>> THAT seems to be a very good fix. I'm watching a much different light show
>>> in the terminal now.
>>>
>>> collected 1466 docs, 820.1 MB
>>>
>>> Much better…
>>>
>>> Thanks so much!
>>>
>>> Walter
>>>
>>> On Sep 7, 2012, at 2:11 PM, Pat Allan wrote:
>>>
>>>> Ah... I really should have noticed that in the index definition. Sorry!
>>>> And reading through that again now, contents should be a field, not an
>>>> attribute - so switch it from a 'has' to an 'indexes'. That should be the
>>>> reason why it hasn't been working.
>>>>
>>>> --
>>>> Pat
>>>>
>>>> On 07/09/2012, at 8:08 PM, Walter Lee Davis wrote:
>>>>
>>>>> I did read that and added the recommended flag to my index definition. It
>>>>> doesn't seem to result in a working content index, though. Can you see
>>>>> anything I've left out of my index? Is there a need to add an indexes
>>>>> line and a has line?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Walter
>>>>>
>>>>> On Sep 7, 2012, at 2:05 PM, Pat Allan wrote:
>>>>>
>>>>>> Hi Walter
>>>>>>
>>>>>> If you're trying to match on words deep in the contents field, then you
>>>>>> may need to have a read through this part of the docs:
>>>>>> http://pat.github.com/ts/en/common_issues.html#mysql_large_fields
>>>>>>
>>>>>> If you're not seeing results returned when searching on the title field,
>>>>>> then that's probably a different matter. Either way, let me know how you
>>>>>> go.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> --
>>>>>> Pat
>>>>>>
>>>>>> On 07/09/2012, at 5:44 PM, Walter Davis wrote:
>>>>>>
>>>>>>> I have the following relationships:
>>>>>>>
>>>>>>> Title has_many :contents
>>>>>>> Content belongs_to :title
>>>>>>>
>>>>>>> In title.rb, I have the following index declaration:
>>>>>>>
>>>>>>> define_index do
>>>>>>> set_property :group_concat_max_len => 10.megabytes
>>>>>>>
>>>>>>> indexes :title, :sortable => true
>>>>>>> indexes teaser
>>>>>>> indexes role.person(:name), :as => :author, :sortable => true
>>>>>>> has contents(:plain), :as => :contents
>>>>>>> has created_at, updated_at
>>>>>>> where sanitize_sql(["publish", true])
>>>>>>> end
>>>>>>>
>>>>>>> When I run the index, it appears to work:
>>>>>>>
>>>>>>> indexing index 'title_core'...
>>>>>>> WARNING: collect_hits: mem_limit=0 kb too low, increasing to 14880 kb
>>>>>>> collected 1466 docs, 0.6 MB
>>>>>>> collected 3668728 attr values
>>>>>>> WARNING: sort_mva: merge_block_size=8 kb too low, increasing mem_limit
>>>>>>> may improve performance
>>>>>>> sorted 7.3 Mvalues, 100.0% done
>>>>>>> sorted 0.1 Mhits, 100.0% done
>>>>>>> total 1466 docs, 560458 bytes
>>>>>>> total 62.465 sec, 8972 bytes/sec, 23.46 docs/sec
>>>>>>> skipping non-plain index 'title'...
>>>>>>> total 445634 reads, 0.159 sec, 0.1 kb/call avg, 0.0 msec/call avg
>>>>>>> total 1836 writes, 0.092 sec, 34.1 kb/call avg, 0.0 msec/call avg
>>>>>>> Started successfully (pid 17630).
>>>>>>>
>>>>>>> But searches do not match any of the content strings.
>>>>>>>
>>>>>>> When I run the generated query directly in SQL:
>>>>>>>
>>>>>>> SELECT SQL_NO_CACHE `titles`.`id` * CAST(4 AS SIGNED) + 3 AS `id` ,
>>>>>>> `titles`.`title` AS `title`, `titles`.`teaser` AS `teaser`,
>>>>>>> `people`.`name` AS `author`, `titles`.`id` AS `sphinx_internal_id`, 0
>>>>>>> AS `sphinx_deleted`, 3942078319 AS `class_crc`, IFNULL('Title', '') AS
>>>>>>> `sphinx_internal_class`, IFNULL(`titles`.`title`, '') AS `title_sort`,
>>>>>>> IFNULL(`people`.`name`, '') AS `author_sort`, GROUP_CONCAT(DISTINCT
>>>>>>> IFNULL(`contents`.`plain`, '0') SEPARATOR ' ') AS `contents`,
>>>>>>> UNIX_TIMESTAMP(`titles`.`created_at`) AS `created_at`,
>>>>>>> UNIX_TIMESTAMP(`titles`.`updated_at`) AS `updated_at` FROM `titles`
>>>>>>> LEFT OUTER JOIN `roles` ON `roles`.`id` = `titles`.`role_id` LEFT OUTER
>>>>>>> JOIN `people` ON `people`.`id` = `roles`.`person_id` LEFT OUTER JOIN
>>>>>>> `contents` ON `contents`.`title_id` = `titles`.`id` WHERE
>>>>>>> (`titles`.`id` >= $start AND `titles`.`id` <= $end AND publish) GROUP
>>>>>>> BY `titles`.`id` ORDER BY NULL
>>>>>>>
>>>>>>> I get some genuinely odd results in the contents column. This may be an
>>>>>>> artifact of Sequel Pro, but the column doesn't appear to be very large
>>>>>>> at all, at most there are a dozen lines of text, some just include a
>>>>>>> single 0 character. The contents table includes thousands of rows of
>>>>>>> data and each row has up to 400 lines of text in it. When concatenated,
>>>>>>> these composite contents range from 300K to 8MB per title.
>>>>>>>
>>>>>>> Can anyone suggest a way to go here? Is there a better way to index
>>>>>>> text (XML) documents than slurping out their content into MySQL so that
>>>>>>> Sphinx can index them?
>>>>>>>
>>>>>>> Thanks in advance,
>>>>>>>
>>>>>>> Walter
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "Thinking Sphinx" group.
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msg/thinking-sphinx/-/V_uNKsdAenIJ.
>>>>>>> To post to this group, send email to [email protected].
>>>>>>> To unsubscribe from this group, send email to
>>>>>>> [email protected].
>>>>>>> For more options, visit this group at
>>>>>>> http://groups.google.com/group/thinking-sphinx?hl=en.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Thinking Sphinx" group.
>>>>>> To post to this group, send email to [email protected].
>>>>>> To unsubscribe from this group, send email to
>>>>>> [email protected].
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/thinking-sphinx?hl=en.
>>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups
>>>>> "Thinking Sphinx" group.
>>>>> To post to this group, send email to [email protected].
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected].
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/thinking-sphinx?hl=en.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups
>>>> "Thinking Sphinx" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to
>>>> [email protected].
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/thinking-sphinx?hl=en.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Thinking Sphinx" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected].
>>> For more options, visit this group at
>>> http://groups.google.com/group/thinking-sphinx?hl=en.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Thinking Sphinx" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/thinking-sphinx?hl=en.
>>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Thinking Sphinx" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/thinking-sphinx?hl=en.
>
--
You received this message because you are subscribed to the Google Groups
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/thinking-sphinx?hl=en.