Re: [ts] Indexing associated contents

Pat Allan Fri, 07 Sep 2012 11:12:43 -0700

Ah... I really should have noticed that in the index definition. Sorry! And 
reading through that again now, contents should be a field, not an attribute - 
so switch it from a 'has' to an 'indexes'. That should be the reason why it 
hasn't been working.


-- 
Pat

On 07/09/2012, at 8:08 PM, Walter Lee Davis wrote:

> I did read that and added the recommended flag to my index definition. It 
> doesn't seem to result in a working content index, though. Can you see 
> anything I've left out of my index? Is there a need to add an indexes line 
> and a has line?
> 
> Thanks,
> 
> Walter
> 
> On Sep 7, 2012, at 2:05 PM, Pat Allan wrote:
> 
>> Hi Walter
>> 
>> If you're trying to match on words deep in the contents field, then you may 
>> need to have a read through this part of the docs:
>> http://pat.github.com/ts/en/common_issues.html#mysql_large_fields
>> 
>> If you're not seeing results returned when searching on the title field, 
>> then that's probably a different matter. Either way, let me know how you go.
>> 
>> Cheers
>> 
>> -- 
>> Pat
>> 
>> On 07/09/2012, at 5:44 PM, Walter Davis wrote:
>> 
>>> I have the following relationships:
>>> 
>>> Title has_many :contents
>>> Content belongs_to :title
>>> 
>>> In title.rb, I have the following index declaration:
>>> 
>>> define_index do
>>>   set_property :group_concat_max_len => 10.megabytes
>>> 
>>>   indexes :title, :sortable => true
>>>   indexes teaser
>>>   indexes role.person(:name), :as => :author, :sortable => true
>>>   has contents(:plain), :as => :contents
>>>   has created_at, updated_at
>>>   where sanitize_sql(["publish", true])
>>> end
>>> 
>>> When I run the index, it appears to work:
>>> 
>>> indexing index 'title_core'...
>>> WARNING: collect_hits: mem_limit=0 kb too low, increasing to 14880 kb
>>> collected 1466 docs, 0.6 MB
>>> collected 3668728 attr values
>>> WARNING: sort_mva: merge_block_size=8 kb too low, increasing mem_limit may 
>>> improve performance
>>> sorted 7.3 Mvalues, 100.0% done
>>> sorted 0.1 Mhits, 100.0% done
>>> total 1466 docs, 560458 bytes
>>> total 62.465 sec, 8972 bytes/sec, 23.46 docs/sec
>>> skipping non-plain index 'title'...
>>> total 445634 reads, 0.159 sec, 0.1 kb/call avg, 0.0 msec/call avg
>>> total 1836 writes, 0.092 sec, 34.1 kb/call avg, 0.0 msec/call avg
>>> Started successfully (pid 17630).
>>> 
>>> But searches do not match any of the content strings.
>>> 
>>> When I run the generated query directly in SQL:
>>> 
>>> SELECT SQL_NO_CACHE `titles`.`id` * CAST(4 AS SIGNED) + 3 AS `id` , 
>>> `titles`.`title` AS `title`, `titles`.`teaser` AS `teaser`, `people`.`name` 
>>> AS `author`, `titles`.`id` AS `sphinx_internal_id`, 0 AS `sphinx_deleted`, 
>>> 3942078319 AS `class_crc`, IFNULL('Title', '') AS `sphinx_internal_class`, 
>>> IFNULL(`titles`.`title`, '') AS `title_sort`, IFNULL(`people`.`name`, '') 
>>> AS `author_sort`, GROUP_CONCAT(DISTINCT IFNULL(`contents`.`plain`, '0') 
>>> SEPARATOR ' ') AS `contents`, UNIX_TIMESTAMP(`titles`.`created_at`) AS 
>>> `created_at`, UNIX_TIMESTAMP(`titles`.`updated_at`) AS `updated_at` FROM 
>>> `titles` LEFT OUTER JOIN `roles` ON `roles`.`id` = `titles`.`role_id` LEFT 
>>> OUTER JOIN `people` ON `people`.`id` = `roles`.`person_id` LEFT OUTER JOIN 
>>> `contents` ON `contents`.`title_id` = `titles`.`id` WHERE (`titles`.`id` >= 
>>> $start AND `titles`.`id` <= $end AND publish) GROUP BY `titles`.`id` ORDER 
>>> BY NULL
>>> 
>>> I get some genuinely odd results in the contents column. This may be an 
>>> artifact of Sequel Pro, but the column doesn't appear to be very large at 
>>> all, at most there are a dozen lines of text, some just include a single 0 
>>> character. The contents table includes thousands of rows of data and each 
>>> row has up to 400 lines of text in it. When concatenated, these composite 
>>> contents range from 300K to 8MB per title.
>>> 
>>> Can anyone suggest a way to go here? Is there a better way to index text 
>>> (XML) documents than slurping out their content into MySQL so that Sphinx 
>>> can index them?
>>> 
>>> Thanks in advance,
>>> 
>>> Walter
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Thinking Sphinx" group.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msg/thinking-sphinx/-/V_uNKsdAenIJ.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to 
>>> [email protected].
>>> For more options, visit this group at 
>>> http://groups.google.com/group/thinking-sphinx?hl=en.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Thinking Sphinx" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected].
>> For more options, visit this group at 
>> http://groups.google.com/group/thinking-sphinx?hl=en.
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/thinking-sphinx?hl=en.
> 

 

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.

Re: [ts] Indexing associated contents

Reply via email to