I did read that and added the recommended flag to my index definition. It doesn't seem to result in a working content index, though. Can you see anything I've left out of my index? Is there a need to add an indexes line and a has line?
Thanks, Walter On Sep 7, 2012, at 2:05 PM, Pat Allan wrote: > Hi Walter > > If you're trying to match on words deep in the contents field, then you may > need to have a read through this part of the docs: > http://pat.github.com/ts/en/common_issues.html#mysql_large_fields > > If you're not seeing results returned when searching on the title field, then > that's probably a different matter. Either way, let me know how you go. > > Cheers > > -- > Pat > > On 07/09/2012, at 5:44 PM, Walter Davis wrote: > >> I have the following relationships: >> >> Title has_many :contents >> Content belongs_to :title >> >> In title.rb, I have the following index declaration: >> >> define_index do >> set_property :group_concat_max_len => 10.megabytes >> >> indexes :title, :sortable => true >> indexes teaser >> indexes role.person(:name), :as => :author, :sortable => true >> has contents(:plain), :as => :contents >> has created_at, updated_at >> where sanitize_sql(["publish", true]) >> end >> >> When I run the index, it appears to work: >> >> indexing index 'title_core'... >> WARNING: collect_hits: mem_limit=0 kb too low, increasing to 14880 kb >> collected 1466 docs, 0.6 MB >> collected 3668728 attr values >> WARNING: sort_mva: merge_block_size=8 kb too low, increasing mem_limit may >> improve performance >> sorted 7.3 Mvalues, 100.0% done >> sorted 0.1 Mhits, 100.0% done >> total 1466 docs, 560458 bytes >> total 62.465 sec, 8972 bytes/sec, 23.46 docs/sec >> skipping non-plain index 'title'... >> total 445634 reads, 0.159 sec, 0.1 kb/call avg, 0.0 msec/call avg >> total 1836 writes, 0.092 sec, 34.1 kb/call avg, 0.0 msec/call avg >> Started successfully (pid 17630). >> >> But searches do not match any of the content strings. >> >> When I run the generated query directly in SQL: >> >> SELECT SQL_NO_CACHE `titles`.`id` * CAST(4 AS SIGNED) + 3 AS `id` , >> `titles`.`title` AS `title`, `titles`.`teaser` AS `teaser`, `people`.`name` >> AS `author`, `titles`.`id` AS `sphinx_internal_id`, 0 AS `sphinx_deleted`, >> 3942078319 AS `class_crc`, IFNULL('Title', '') AS `sphinx_internal_class`, >> IFNULL(`titles`.`title`, '') AS `title_sort`, IFNULL(`people`.`name`, '') AS >> `author_sort`, GROUP_CONCAT(DISTINCT IFNULL(`contents`.`plain`, '0') >> SEPARATOR ' ') AS `contents`, UNIX_TIMESTAMP(`titles`.`created_at`) AS >> `created_at`, UNIX_TIMESTAMP(`titles`.`updated_at`) AS `updated_at` FROM >> `titles` LEFT OUTER JOIN `roles` ON `roles`.`id` = `titles`.`role_id` LEFT >> OUTER JOIN `people` ON `people`.`id` = `roles`.`person_id` LEFT OUTER JOIN >> `contents` ON `contents`.`title_id` = `titles`.`id` WHERE (`titles`.`id` >= >> $start AND `titles`.`id` <= $end AND publish) GROUP BY `titles`.`id` ORDER >> BY NULL >> >> I get some genuinely odd results in the contents column. This may be an >> artifact of Sequel Pro, but the column doesn't appear to be very large at >> all, at most there are a dozen lines of text, some just include a single 0 >> character. The contents table includes thousands of rows of data and each >> row has up to 400 lines of text in it. When concatenated, these composite >> contents range from 300K to 8MB per title. >> >> Can anyone suggest a way to go here? Is there a better way to index text >> (XML) documents than slurping out their content into MySQL so that Sphinx >> can index them? >> >> Thanks in advance, >> >> Walter >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Thinking Sphinx" group. >> To view this discussion on the web visit >> https://groups.google.com/d/msg/thinking-sphinx/-/V_uNKsdAenIJ. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/thinking-sphinx?hl=en. > > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/thinking-sphinx?hl=en. > -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
