Re: [ts] Indexing associated contents

Pat Allan Fri, 14 Sep 2012 05:58:04 -0700

Hi Walter

You'll need to add a method to your title model that aggregates the contents 
values - it gets sent through to Sphinx:


  def plain_contents
    contents.collect(&:plain).join(' ')
  end

And then you can use that via the excerpts object:

  title.excerpts.plain_contents

Essentially, excerpts is a single-level proxy.

Cheers

-- 
Pat

On 14/09/2012, at 1:43 PM, Walter Lee Davis wrote:

> My index works really well, almost instant searches over a huge collection of 
> large documents. But now I'm trying to enable excerpts, and I don't seem to 
> be getting them on my 'contents' concatenated field. Recall the structure:
> 
> #content.rb 
> belongs_to :title
> #has a field called plain containing the plain text of the document, 400 rows 
> per instance
> 
> #title.rb 
> has_many :contents
> 
> define_index do
>  set_property :group_concat_max_len => 10.megabytes
> 
>  indexes :title, :sortable => true
>  indexes teaser
>  indexes role.person(:name), :as => :author, :sortable => true
>  indexes contents(:plain), :as => :content
>  has created_at, updated_at
>  where sanitize_sql(["publish", true])
> end
> 
> I can find anything in contents, but I'm not clear what my syntax would be to 
> show excerpts from it in a view.
> 
> I have tried title.excerpts.contents and title.excerpts.contents.plain 
> without love.
> 
> Walter
> 
> On Sep 7, 2012, at 2:17 PM, Walter Lee Davis wrote:
> 
>> THAT seems to be a very good fix. I'm watching a much different light show 
>> in the terminal now. 
>> 
>>      collected 1466 docs, 820.1 MB
>> 
>> Much better…
>> 
>> Thanks so much!
>> 
>> Walter
>> 
>> On Sep 7, 2012, at 2:11 PM, Pat Allan wrote:
>> 
>>> Ah... I really should have noticed that in the index definition. Sorry! And 
>>> reading through that again now, contents should be a field, not an 
>>> attribute - so switch it from a 'has' to an 'indexes'. That should be the 
>>> reason why it hasn't been working.
>>> 
>>> -- 
>>> Pat
>>> 
>>> On 07/09/2012, at 8:08 PM, Walter Lee Davis wrote:
>>> 
>>>> I did read that and added the recommended flag to my index definition. It 
>>>> doesn't seem to result in a working content index, though. Can you see 
>>>> anything I've left out of my index? Is there a need to add an indexes line 
>>>> and a has line?
>>>> 
>>>> Thanks,
>>>> 
>>>> Walter
>>>> 
>>>> On Sep 7, 2012, at 2:05 PM, Pat Allan wrote:
>>>> 
>>>>> Hi Walter
>>>>> 
>>>>> If you're trying to match on words deep in the contents field, then you 
>>>>> may need to have a read through this part of the docs:
>>>>> http://pat.github.com/ts/en/common_issues.html#mysql_large_fields
>>>>> 
>>>>> If you're not seeing results returned when searching on the title field, 
>>>>> then that's probably a different matter. Either way, let me know how you 
>>>>> go.
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> -- 
>>>>> Pat
>>>>> 
>>>>> On 07/09/2012, at 5:44 PM, Walter Davis wrote:
>>>>> 
>>>>>> I have the following relationships:
>>>>>> 
>>>>>> Title has_many :contents
>>>>>> Content belongs_to :title
>>>>>> 
>>>>>> In title.rb, I have the following index declaration:
>>>>>> 
>>>>>> define_index do
>>>>>> set_property :group_concat_max_len => 10.megabytes
>>>>>> 
>>>>>> indexes :title, :sortable => true
>>>>>> indexes teaser
>>>>>> indexes role.person(:name), :as => :author, :sortable => true
>>>>>> has contents(:plain), :as => :contents
>>>>>> has created_at, updated_at
>>>>>> where sanitize_sql(["publish", true])
>>>>>> end
>>>>>> 
>>>>>> When I run the index, it appears to work:
>>>>>> 
>>>>>> indexing index 'title_core'...
>>>>>> WARNING: collect_hits: mem_limit=0 kb too low, increasing to 14880 kb
>>>>>> collected 1466 docs, 0.6 MB
>>>>>> collected 3668728 attr values
>>>>>> WARNING: sort_mva: merge_block_size=8 kb too low, increasing mem_limit 
>>>>>> may improve performance
>>>>>> sorted 7.3 Mvalues, 100.0% done
>>>>>> sorted 0.1 Mhits, 100.0% done
>>>>>> total 1466 docs, 560458 bytes
>>>>>> total 62.465 sec, 8972 bytes/sec, 23.46 docs/sec
>>>>>> skipping non-plain index 'title'...
>>>>>> total 445634 reads, 0.159 sec, 0.1 kb/call avg, 0.0 msec/call avg
>>>>>> total 1836 writes, 0.092 sec, 34.1 kb/call avg, 0.0 msec/call avg
>>>>>> Started successfully (pid 17630).
>>>>>> 
>>>>>> But searches do not match any of the content strings.
>>>>>> 
>>>>>> When I run the generated query directly in SQL:
>>>>>> 
>>>>>> SELECT SQL_NO_CACHE `titles`.`id` * CAST(4 AS SIGNED) + 3 AS `id` , 
>>>>>> `titles`.`title` AS `title`, `titles`.`teaser` AS `teaser`, 
>>>>>> `people`.`name` AS `author`, `titles`.`id` AS `sphinx_internal_id`, 0 AS 
>>>>>> `sphinx_deleted`, 3942078319 AS `class_crc`, IFNULL('Title', '') AS 
>>>>>> `sphinx_internal_class`, IFNULL(`titles`.`title`, '') AS `title_sort`, 
>>>>>> IFNULL(`people`.`name`, '') AS `author_sort`, GROUP_CONCAT(DISTINCT 
>>>>>> IFNULL(`contents`.`plain`, '0') SEPARATOR ' ') AS `contents`, 
>>>>>> UNIX_TIMESTAMP(`titles`.`created_at`) AS `created_at`, 
>>>>>> UNIX_TIMESTAMP(`titles`.`updated_at`) AS `updated_at` FROM `titles` LEFT 
>>>>>> OUTER JOIN `roles` ON `roles`.`id` = `titles`.`role_id` LEFT OUTER JOIN 
>>>>>> `people` ON `people`.`id` = `roles`.`person_id` LEFT OUTER JOIN 
>>>>>> `contents` ON `contents`.`title_id` = `titles`.`id` WHERE (`titles`.`id` 
>>>>>> >= $start AND `titles`.`id` <= $end AND publish) GROUP BY `titles`.`id` 
>>>>>> ORDER BY NULL
>>>>>> 
>>>>>> I get some genuinely odd results in the contents column. This may be an 
>>>>>> artifact of Sequel Pro, but the column doesn't appear to be very large 
>>>>>> at all, at most there are a dozen lines of text, some just include a 
>>>>>> single 0 character. The contents table includes thousands of rows of 
>>>>>> data and each row has up to 400 lines of text in it. When concatenated, 
>>>>>> these composite contents range from 300K to 8MB per title.
>>>>>> 
>>>>>> Can anyone suggest a way to go here? Is there a better way to index text 
>>>>>> (XML) documents than slurping out their content into MySQL so that 
>>>>>> Sphinx can index them?
>>>>>> 
>>>>>> Thanks in advance,
>>>>>> 
>>>>>> Walter
>>>>>> 
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Thinking Sphinx" group.
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msg/thinking-sphinx/-/V_uNKsdAenIJ.
>>>>>> To post to this group, send email to [email protected].
>>>>>> To unsubscribe from this group, send email to 
>>>>>> [email protected].
>>>>>> For more options, visit this group at 
>>>>>> http://groups.google.com/group/thinking-sphinx?hl=en.
>>>>> 
>>>>> 
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "Thinking Sphinx" group.
>>>>> To post to this group, send email to [email protected].
>>>>> To unsubscribe from this group, send email to 
>>>>> [email protected].
>>>>> For more options, visit this group at 
>>>>> http://groups.google.com/group/thinking-sphinx?hl=en.
>>>>> 
>>>> 
>>>> -- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "Thinking Sphinx" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to 
>>>> [email protected].
>>>> For more options, visit this group at 
>>>> http://groups.google.com/group/thinking-sphinx?hl=en.
>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Thinking Sphinx" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to 
>>> [email protected].
>>> For more options, visit this group at 
>>> http://groups.google.com/group/thinking-sphinx?hl=en.
>>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Thinking Sphinx" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected].
>> For more options, visit this group at 
>> http://groups.google.com/group/thinking-sphinx?hl=en.
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/thinking-sphinx?hl=en.
> 



-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.

Re: [ts] Indexing associated contents

Reply via email to