I added the strip_html but so far it doesn't look like it's helped.

My production.sphinx.conf looks as follows:

source topic_comment_core_0
{
  type = mysql
  sql_host = localhost
  sql_user = openmind
  sql_pass = xxx
  sql_db = openmind
  sql_query_pre = UPDATE `comments` SET `delta` = 0 WHERE `delta` = 1
  sql_query_pre = SET NAMES utf8
  sql_query_pre = SET TIME_ZONE = '+0:00'
  sql_query = SELECT SQL_NO_CACHE `comments`.`id` * 6 + 4 AS `id` ,
`comments`.`body` AS `body`, `comments`.`id` AS `sphinx_internal_id`,
CAST(IFNULL(CRC32(NULLIF(`comments`.`type`,'')), 432825427) AS UNSIGNED) AS
`class_crc`, 0 AS `sphinx_deleted`, UNIX_TIMESTAMP(`comments`.`created_at`)
AS `created_at`, UNIX_TIMESTAMP(`comments`.`updated_at`) AS `updated_at`
FROM `comments`    WHERE `comments`.`id` >= $start AND `comments`.`id` <=
$end AND `comments`.`delta` = 0 AND `comments`.`type` = 'TopicComment' GROUP
BY `comments`.`id`, `comments`.`type`  ORDER BY NULL
  sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) FROM
`comments` WHERE `comments`.`delta` = 0
  sql_attr_uint = sphinx_internal_id
  sql_attr_uint = class_crc
  sql_attr_uint = sphinx_deleted
  sql_attr_timestamp = created_at
  sql_attr_timestamp = updated_at
  sql_query_info = SELECT * FROM `comments` WHERE `id` = (($id - 4) / 6)
}

index topic_comment_core
{
  source = topic_comment_core_0
  path = /home/bsturim/openmind/db/sphinx/production/topic_comment_core
  morphology = stem_en
  charset_type = utf-8
  min_infix_len = 1
  enable_star = 1
}

source topic_comment_delta_0 : topic_comment_core_0
{
  type = mysql
  sql_host = localhost
  sql_user = openmind
  sql_pass = xxx
  sql_db = openmind
  sql_query_pre =
  sql_query_pre = SET NAMES utf8
  sql_query_pre = SET TIME_ZONE = '+0:00'
  sql_query = SELECT SQL_NO_CACHE `comments`.`id` * 6 + 4 AS `id` ,
`comments`.`body` AS `body`, `comments`.`id` AS `sphinx_internal_id`,
CAST(IFNULL(CRC32(NULLIF(`comments`.`type`,'')), 432825427) AS UNSIGNED) AS
`class_crc`, 0 AS `sphinx_deleted`, UNIX_TIMESTAMP(`comments`.`created_at`)
AS `created_at`, UNIX_TIMESTAMP(`comments`.`updated_at`) AS `updated_at`
FROM `comments`    WHERE `comments`.`id` >= $start AND `comments`.`id` <=
$end AND `comments`.`delta` = 1 AND `comments`.`type` = 'TopicComment' GROUP
BY `comments`.`id`, `comments`.`type`  ORDER BY NULL
  sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) FROM
`comments` WHERE `comments`.`delta` = 1
  sql_attr_uint = sphinx_internal_id
  sql_attr_uint = class_crc
  sql_attr_uint = sphinx_deleted
  sql_attr_timestamp = created_at
  sql_attr_timestamp = updated_at
  sql_query_info = SELECT * FROM `comments` WHERE `id` = (($id - 4) / 6)
}
index topic_comment_delta : topic_comment_core
{
  source = topic_comment_delta_0
  path = /home/bsturim/openmind/db/sphinx/production/topic_comment_delta
}

index topic_comment
{
  type = distributed
  local = topic_comment_delta
  local = topic_comment_core
}

The size of the body of a comment can vary...it's not bounded. I would say
average is 700 characters...

Thanks.

Bob


On Wed, Oct 6, 2010 at 8:27 PM, Pat Allan <[email protected]> wrote:

> Hi Bob
>
> The HTML text may make a difference... you'll probably want to set the
> html_strip setting to true in your sphinx.yml file.
> http://www.sphinxsearch.com/docs/manual-0.9.9.html#conf-html-strip
>
> Also: how large are the values in the body column? And what does the source
> look like in development.sphinx.conf? (Make sure you remove the database
> password!)
>
> Cheers
>
> --
> Pat
>
> On 07/10/2010, at 11:11 AM, Robert Sturim wrote:
>
> > Thanks for your response. You are correct that my SQL query would pick up
> partial hits that thinking sphinx would not, but that's not the issue in
> this case -- first, because in my test search I have tried using wildcards
> in my search and secondly I know that there are a number of cases in which
> my search term is a full word and it is still missed in my search.
> >
> > Two pieces of detail I neglected to mention in my original post -- I'm
> using version 1.3.18. And, the body of the comments which I am searching
> against is html text. I'm not sure if that would make a difference.
> >
> > Thanks.
> >
> > Bob
> >
> > On Wed, Oct 6, 2010 at 6:31 PM, Pat Allan <[email protected]>
> wrote:
> > Hi Bob
> >
> > Not entirely sure why this is happening, but your comparison with a SQL
> query isn't quite accurate - Sphinx matches full words by default (not
> prefixes/infixes) - so '%value%' is different to 'value'.
> >
> > If you do want partial word searching, this is covered in the docs:
> > http://freelancing-god.github.com/ts/en/common_issues.html#wildcards
> >
> > Also, you may want to use :star => true to automatically add stars to
> each word in your search queries (so 'value' is treated as '*value*'). The
> other way to test this would be to modify the SQL query to match on word
> boundaries (perhaps using a regular expression?).
> >
> > Let us know if the numbers still aren't matching up then.
> >
> > Cheers
> >
> > --
> > Pat
> >
> > On 06/10/2010, at 3:42 AM, Bob Sturim wrote:
> >
> > > Thinking Sphinx appears to not be finding all the results I would
> > > expect it to find.
> > >
> > > I have a comments table. The topics table uses single table
> > > inheritance to store comments for both Ideas and Topics...so I am
> > > searching on the entity TopicComment.
> > >
> > > My defines_index declaration in TopicComment.rb is:
> > >
> > >  define_index do
> > >    indexes body
> > >    has created_at, updated_at
> > >    set_property :delta => true
> > >  end
> > >
> > > If I do the following:
> > >
> > > TopicComment.search('value')
> > >
> > > it will only find 6 hits...though if I go into sql and issue the
> > > following query:
> > >
> > > SELECT * FROM comments
> > > where type = 'TopicComment'
> > > and body like '%value%'
> > > order by topic_id
> > >
> > > it will retrieve 377 hits.
> > >
> > > There are 4141 entires in the comments table that are of type
> > > TopicComments. If I rebuild my index using
> > >
> > > rake ts:rebuild
> > >
> > > it shows that 4141 documents were indexed:
> > >
> > >
> > > indexing index 'topic_comment_core'...
> > > collected 4141 docs, 2.9 MB
> > > sorted 12.5 Mhits, 100.0% done
> > > total 4141 docs, 2940976 bytes
> > > total 8.233 sec, 357181 bytes/sec, 502.92 docs/sec
> > > indexing index 'topic_comment_delta'...
> > > collected 0 docs, 0.0 MB
> > > total 0 docs, 0 bytes
> > > total 0.001 sec, 0 bytes/sec, 0.00 docs/sec
> > > skipping non-plain index 'topic_comment'...
> > >
> > > My sphinx.yml is defined as follows:
> > >
> > >
> > > production:
> > >  enable_star: 1
> > >  min_infix_len: 1
> > >  max_matches: 5000
> > >  morphology: stem_en
> > >
> > >
> > > Am I missing something?
> > >
> > > Thanks very much.
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups "Thinking Sphinx" group.
> > > To post to this group, send email to [email protected].
> > > To unsubscribe from this group, send email to
> [email protected]<thinking-sphinx%[email protected]>
> .
> > > For more options, visit this group at
> http://groups.google.com/group/thinking-sphinx?hl=en.
> > >
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> "Thinking Sphinx" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> [email protected]<thinking-sphinx%[email protected]>
> .
> > For more options, visit this group at
> http://groups.google.com/group/thinking-sphinx?hl=en.
> >
> >
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> "Thinking Sphinx" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> [email protected]<thinking-sphinx%[email protected]>
> .
> > For more options, visit this group at
> http://groups.google.com/group/thinking-sphinx?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Thinking Sphinx" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<thinking-sphinx%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/thinking-sphinx?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.

Reply via email to