My index works really well, almost instant searches over a huge collection of large documents. But now I'm trying to enable excerpts, and I don't seem to be getting them on my 'contents' concatenated field. Recall the structure:
#content.rb belongs_to :title #has a field called plain containing the plain text of the document, 400 rows per instance #title.rb has_many :contents define_index do set_property :group_concat_max_len => 10.megabytes indexes :title, :sortable => true indexes teaser indexes role.person(:name), :as => :author, :sortable => true indexes contents(:plain), :as => :content has created_at, updated_at where sanitize_sql(["publish", true]) end I can find anything in contents, but I'm not clear what my syntax would be to show excerpts from it in a view. I have tried title.excerpts.contents and title.excerpts.contents.plain without love. Walter On Sep 7, 2012, at 2:17 PM, Walter Lee Davis wrote: > THAT seems to be a very good fix. I'm watching a much different light show in > the terminal now. > > collected 1466 docs, 820.1 MB > > Much better… > > Thanks so much! > > Walter > > On Sep 7, 2012, at 2:11 PM, Pat Allan wrote: > >> Ah... I really should have noticed that in the index definition. Sorry! And >> reading through that again now, contents should be a field, not an attribute >> - so switch it from a 'has' to an 'indexes'. That should be the reason why >> it hasn't been working. >> >> -- >> Pat >> >> On 07/09/2012, at 8:08 PM, Walter Lee Davis wrote: >> >>> I did read that and added the recommended flag to my index definition. It >>> doesn't seem to result in a working content index, though. Can you see >>> anything I've left out of my index? Is there a need to add an indexes line >>> and a has line? >>> >>> Thanks, >>> >>> Walter >>> >>> On Sep 7, 2012, at 2:05 PM, Pat Allan wrote: >>> >>>> Hi Walter >>>> >>>> If you're trying to match on words deep in the contents field, then you >>>> may need to have a read through this part of the docs: >>>> http://pat.github.com/ts/en/common_issues.html#mysql_large_fields >>>> >>>> If you're not seeing results returned when searching on the title field, >>>> then that's probably a different matter. Either way, let me know how you >>>> go. >>>> >>>> Cheers >>>> >>>> -- >>>> Pat >>>> >>>> On 07/09/2012, at 5:44 PM, Walter Davis wrote: >>>> >>>>> I have the following relationships: >>>>> >>>>> Title has_many :contents >>>>> Content belongs_to :title >>>>> >>>>> In title.rb, I have the following index declaration: >>>>> >>>>> define_index do >>>>> set_property :group_concat_max_len => 10.megabytes >>>>> >>>>> indexes :title, :sortable => true >>>>> indexes teaser >>>>> indexes role.person(:name), :as => :author, :sortable => true >>>>> has contents(:plain), :as => :contents >>>>> has created_at, updated_at >>>>> where sanitize_sql(["publish", true]) >>>>> end >>>>> >>>>> When I run the index, it appears to work: >>>>> >>>>> indexing index 'title_core'... >>>>> WARNING: collect_hits: mem_limit=0 kb too low, increasing to 14880 kb >>>>> collected 1466 docs, 0.6 MB >>>>> collected 3668728 attr values >>>>> WARNING: sort_mva: merge_block_size=8 kb too low, increasing mem_limit >>>>> may improve performance >>>>> sorted 7.3 Mvalues, 100.0% done >>>>> sorted 0.1 Mhits, 100.0% done >>>>> total 1466 docs, 560458 bytes >>>>> total 62.465 sec, 8972 bytes/sec, 23.46 docs/sec >>>>> skipping non-plain index 'title'... >>>>> total 445634 reads, 0.159 sec, 0.1 kb/call avg, 0.0 msec/call avg >>>>> total 1836 writes, 0.092 sec, 34.1 kb/call avg, 0.0 msec/call avg >>>>> Started successfully (pid 17630). >>>>> >>>>> But searches do not match any of the content strings. >>>>> >>>>> When I run the generated query directly in SQL: >>>>> >>>>> SELECT SQL_NO_CACHE `titles`.`id` * CAST(4 AS SIGNED) + 3 AS `id` , >>>>> `titles`.`title` AS `title`, `titles`.`teaser` AS `teaser`, >>>>> `people`.`name` AS `author`, `titles`.`id` AS `sphinx_internal_id`, 0 AS >>>>> `sphinx_deleted`, 3942078319 AS `class_crc`, IFNULL('Title', '') AS >>>>> `sphinx_internal_class`, IFNULL(`titles`.`title`, '') AS `title_sort`, >>>>> IFNULL(`people`.`name`, '') AS `author_sort`, GROUP_CONCAT(DISTINCT >>>>> IFNULL(`contents`.`plain`, '0') SEPARATOR ' ') AS `contents`, >>>>> UNIX_TIMESTAMP(`titles`.`created_at`) AS `created_at`, >>>>> UNIX_TIMESTAMP(`titles`.`updated_at`) AS `updated_at` FROM `titles` LEFT >>>>> OUTER JOIN `roles` ON `roles`.`id` = `titles`.`role_id` LEFT OUTER JOIN >>>>> `people` ON `people`.`id` = `roles`.`person_id` LEFT OUTER JOIN >>>>> `contents` ON `contents`.`title_id` = `titles`.`id` WHERE (`titles`.`id` >>>>> >= $start AND `titles`.`id` <= $end AND publish) GROUP BY `titles`.`id` >>>>> ORDER BY NULL >>>>> >>>>> I get some genuinely odd results in the contents column. This may be an >>>>> artifact of Sequel Pro, but the column doesn't appear to be very large at >>>>> all, at most there are a dozen lines of text, some just include a single >>>>> 0 character. The contents table includes thousands of rows of data and >>>>> each row has up to 400 lines of text in it. When concatenated, these >>>>> composite contents range from 300K to 8MB per title. >>>>> >>>>> Can anyone suggest a way to go here? Is there a better way to index text >>>>> (XML) documents than slurping out their content into MySQL so that Sphinx >>>>> can index them? >>>>> >>>>> Thanks in advance, >>>>> >>>>> Walter >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "Thinking Sphinx" group. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msg/thinking-sphinx/-/V_uNKsdAenIJ. >>>>> To post to this group, send email to [email protected]. >>>>> To unsubscribe from this group, send email to >>>>> [email protected]. >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/thinking-sphinx?hl=en. >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups >>>> "Thinking Sphinx" group. >>>> To post to this group, send email to [email protected]. >>>> To unsubscribe from this group, send email to >>>> [email protected]. >>>> For more options, visit this group at >>>> http://groups.google.com/group/thinking-sphinx?hl=en. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Thinking Sphinx" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]. >>> For more options, visit this group at >>> http://groups.google.com/group/thinking-sphinx?hl=en. >>> >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Thinking Sphinx" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/thinking-sphinx?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/thinking-sphinx?hl=en. > -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
