I have the following relationships:

Title has_many :contents
Content belongs_to :title

In title.rb, I have the following index declaration:

  define_index do
    set_property :group_concat_max_len => 10.megabytes
    
    indexes :title, :sortable => true
    indexes teaser
    indexes role.person(:name), :as => :author, :sortable => true
    has contents(:plain), :as => :contents
    has created_at, updated_at
    where sanitize_sql(["publish", true])
  end

When I run the index, it appears to work:

indexing index 'title_core'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 14880 kb
collected 1466 docs, 0.6 MB
collected 3668728 attr values
WARNING: sort_mva: merge_block_size=8 kb too low, increasing mem_limit may 
improve performance
sorted 7.3 Mvalues, 100.0% done
sorted 0.1 Mhits, 100.0% done
total 1466 docs, 560458 bytes
total 62.465 sec, 8972 bytes/sec, 23.46 docs/sec
skipping non-plain index 'title'...
total 445634 reads, 0.159 sec, 0.1 kb/call avg, 0.0 msec/call avg
total 1836 writes, 0.092 sec, 34.1 kb/call avg, 0.0 msec/call avg
Started successfully (pid 17630).

But searches do not match any of the content strings.

When I run the generated query directly in SQL:

SELECT SQL_NO_CACHE `titles`.`id` * CAST(4 AS SIGNED) + 3 AS `id` , 
`titles`.`title` AS `title`, `titles`.`teaser` AS `teaser`, `people`.`name` 
AS `author`, `titles`.`id` AS `sphinx_internal_id`, 0 AS `sphinx_deleted`, 
3942078319 AS `class_crc`, IFNULL('Title', '') AS `sphinx_internal_class`, 
IFNULL(`titles`.`title`, '') AS `title_sort`, IFNULL(`people`.`name`, '') 
AS `author_sort`, GROUP_CONCAT(DISTINCT IFNULL(`contents`.`plain`, '0') 
SEPARATOR ' ') AS `contents`, UNIX_TIMESTAMP(`titles`.`created_at`) AS 
`created_at`, UNIX_TIMESTAMP(`titles`.`updated_at`) AS `updated_at` FROM 
`titles` LEFT OUTER JOIN `roles` ON `roles`.`id` = `titles`.`role_id` LEFT 
OUTER JOIN `people` ON `people`.`id` = `roles`.`person_id` LEFT OUTER JOIN 
`contents` ON `contents`.`title_id` = `titles`.`id` WHERE (`titles`.`id` >= 
$start AND `titles`.`id` <= $end AND publish) GROUP BY `titles`.`id` ORDER 
BY NULL

I get some genuinely odd results in the contents column. This may be an 
artifact of Sequel Pro, but the column doesn't appear to be very large at 
all, at most there are a dozen lines of text, some just include a single 0 
character. The contents table includes thousands of rows of data and each 
row has up to 400 lines of text in it. When concatenated, these composite 
contents range from 300K to 8MB per title.

Can anyone suggest a way to go here? Is there a better way to index text 
(XML) documents than slurping out their content into MySQL so that Sphinx 
can index them?

Thanks in advance,

Walter

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/thinking-sphinx/-/V_uNKsdAenIJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.

Reply via email to