I have the following relationships:
Title has_many :contents
Content belongs_to :title
In title.rb, I have the following index declaration:
define_index do
set_property :group_concat_max_len => 10.megabytes
indexes :title, :sortable => true
indexes teaser
indexes role.person(:name), :as => :author, :sortable => true
has contents(:plain), :as => :contents
has created_at, updated_at
where sanitize_sql(["publish", true])
end
When I run the index, it appears to work:
indexing index 'title_core'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 14880 kb
collected 1466 docs, 0.6 MB
collected 3668728 attr values
WARNING: sort_mva: merge_block_size=8 kb too low, increasing mem_limit may
improve performance
sorted 7.3 Mvalues, 100.0% done
sorted 0.1 Mhits, 100.0% done
total 1466 docs, 560458 bytes
total 62.465 sec, 8972 bytes/sec, 23.46 docs/sec
skipping non-plain index 'title'...
total 445634 reads, 0.159 sec, 0.1 kb/call avg, 0.0 msec/call avg
total 1836 writes, 0.092 sec, 34.1 kb/call avg, 0.0 msec/call avg
Started successfully (pid 17630).
But searches do not match any of the content strings.
When I run the generated query directly in SQL:
SELECT SQL_NO_CACHE `titles`.`id` * CAST(4 AS SIGNED) + 3 AS `id` ,
`titles`.`title` AS `title`, `titles`.`teaser` AS `teaser`, `people`.`name`
AS `author`, `titles`.`id` AS `sphinx_internal_id`, 0 AS `sphinx_deleted`,
3942078319 AS `class_crc`, IFNULL('Title', '') AS `sphinx_internal_class`,
IFNULL(`titles`.`title`, '') AS `title_sort`, IFNULL(`people`.`name`, '')
AS `author_sort`, GROUP_CONCAT(DISTINCT IFNULL(`contents`.`plain`, '0')
SEPARATOR ' ') AS `contents`, UNIX_TIMESTAMP(`titles`.`created_at`) AS
`created_at`, UNIX_TIMESTAMP(`titles`.`updated_at`) AS `updated_at` FROM
`titles` LEFT OUTER JOIN `roles` ON `roles`.`id` = `titles`.`role_id` LEFT
OUTER JOIN `people` ON `people`.`id` = `roles`.`person_id` LEFT OUTER JOIN
`contents` ON `contents`.`title_id` = `titles`.`id` WHERE (`titles`.`id` >=
$start AND `titles`.`id` <= $end AND publish) GROUP BY `titles`.`id` ORDER
BY NULL
I get some genuinely odd results in the contents column. This may be an
artifact of Sequel Pro, but the column doesn't appear to be very large at
all, at most there are a dozen lines of text, some just include a single 0
character. The contents table includes thousands of rows of data and each
row has up to 400 lines of text in it. When concatenated, these composite
contents range from 300K to 8MB per title.
Can anyone suggest a way to go here? Is there a better way to index text
(XML) documents than slurping out their content into MySQL so that Sphinx
can index them?
Thanks in advance,
Walter
--
You received this message because you are subscribed to the Google Groups
"Thinking Sphinx" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/thinking-sphinx/-/V_uNKsdAenIJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/thinking-sphinx?hl=en.