Did you use strip_html or html_strip? It needs to be the latter - can't spot it 
in the conf file.

-- 
Pat

On 07/10/2010, at 12:25 PM, Robert Sturim wrote:

> I added the strip_html but so far it doesn't look like it's helped.
> 
> My production.sphinx.conf looks as follows:
> 
> source topic_comment_core_0
> {
>   type = mysql
>   sql_host = localhost
>   sql_user = openmind
>   sql_pass = xxx
>   sql_db = openmind
>   sql_query_pre = UPDATE `comments` SET `delta` = 0 WHERE `delta` = 1
>   sql_query_pre = SET NAMES utf8
>   sql_query_pre = SET TIME_ZONE = '+0:00'
>   sql_query = SELECT SQL_NO_CACHE `comments`.`id` * 6 + 4 AS `id` , 
> `comments`.`body` AS `body`, `comments`.`id` AS `sphinx_internal_id`, 
> CAST(IFNULL(CRC32(NULLIF(`comments`.`type`,'')), 432825427) AS UNSIGNED) AS 
> `class_crc`, 0 AS `sphinx_deleted`, UNIX_TIMESTAMP(`comments`.`created_at`) 
> AS `created_at`, UNIX_TIMESTAMP(`comments`.`updated_at`) AS `updated_at` FROM 
> `comments`    WHERE `comments`.`id` >= $start AND `comments`.`id` <= $end AND 
> `comments`.`delta` = 0 AND `comments`.`type` = 'TopicComment' GROUP BY 
> `comments`.`id`, `comments`.`type`  ORDER BY NULL
>   sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) FROM 
> `comments` WHERE `comments`.`delta` = 0
>   sql_attr_uint = sphinx_internal_id
>   sql_attr_uint = class_crc
>   sql_attr_uint = sphinx_deleted
>   sql_attr_timestamp = created_at
>   sql_attr_timestamp = updated_at
>   sql_query_info = SELECT * FROM `comments` WHERE `id` = (($id - 4) / 6)
> }
> 
> index topic_comment_core
> {
>   source = topic_comment_core_0
>   path = /home/bsturim/openmind/db/sphinx/production/topic_comment_core
>   morphology = stem_en
>   charset_type = utf-8
>   min_infix_len = 1
>   enable_star = 1
> }
> 
> source topic_comment_delta_0 : topic_comment_core_0
> {
>   type = mysql
>   sql_host = localhost
>   sql_user = openmind
>   sql_pass = xxx
>   sql_db = openmind
>   sql_query_pre =
>   sql_query_pre = SET NAMES utf8
>   sql_query_pre = SET TIME_ZONE = '+0:00'
>   sql_query = SELECT SQL_NO_CACHE `comments`.`id` * 6 + 4 AS `id` , 
> `comments`.`body` AS `body`, `comments`.`id` AS `sphinx_internal_id`, 
> CAST(IFNULL(CRC32(NULLIF(`comments`.`type`,'')), 432825427) AS UNSIGNED) AS 
> `class_crc`, 0 AS `sphinx_deleted`, UNIX_TIMESTAMP(`comments`.`created_at`) 
> AS `created_at`, UNIX_TIMESTAMP(`comments`.`updated_at`) AS `updated_at` FROM 
> `comments`    WHERE `comments`.`id` >= $start AND `comments`.`id` <= $end AND 
> `comments`.`delta` = 1 AND `comments`.`type` = 'TopicComment' GROUP BY 
> `comments`.`id`, `comments`.`type`  ORDER BY NULL
>   sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) FROM 
> `comments` WHERE `comments`.`delta` = 1
>   sql_attr_uint = sphinx_internal_id
>   sql_attr_uint = class_crc
>   sql_attr_uint = sphinx_deleted
>   sql_attr_timestamp = created_at
>   sql_attr_timestamp = updated_at
>   sql_query_info = SELECT * FROM `comments` WHERE `id` = (($id - 4) / 6)
> }
> index topic_comment_delta : topic_comment_core
> {
>   source = topic_comment_delta_0
>   path = /home/bsturim/openmind/db/sphinx/production/topic_comment_delta
> }
> 
> index topic_comment
> {
>   type = distributed
>   local = topic_comment_delta
>   local = topic_comment_core
> }
> 
> The size of the body of a comment can vary...it's not bounded. I would say 
> average is 700 characters...
> 
> Thanks.
> 
> Bob
> 
> 
> On Wed, Oct 6, 2010 at 8:27 PM, Pat Allan <[email protected]> wrote:
> Hi Bob
> 
> The HTML text may make a difference... you'll probably want to set the 
> html_strip setting to true in your sphinx.yml file.
> http://www.sphinxsearch.com/docs/manual-0.9.9.html#conf-html-strip
> 
> Also: how large are the values in the body column? And what does the source 
> look like in development.sphinx.conf? (Make sure you remove the database 
> password!)
> 
> Cheers
> 
> --
> Pat
> 
> On 07/10/2010, at 11:11 AM, Robert Sturim wrote:
> 
> > Thanks for your response. You are correct that my SQL query would pick up 
> > partial hits that thinking sphinx would not, but that's not the issue in 
> > this case -- first, because in my test search I have tried using wildcards 
> > in my search and secondly I know that there are a number of cases in which 
> > my search term is a full word and it is still missed in my search.
> >
> > Two pieces of detail I neglected to mention in my original post -- I'm 
> > using version 1.3.18. And, the body of the comments which I am searching 
> > against is html text. I'm not sure if that would make a difference.
> >
> > Thanks.
> >
> > Bob
> >
> > On Wed, Oct 6, 2010 at 6:31 PM, Pat Allan <[email protected]> wrote:
> > Hi Bob
> >
> > Not entirely sure why this is happening, but your comparison with a SQL 
> > query isn't quite accurate - Sphinx matches full words by default (not 
> > prefixes/infixes) - so '%value%' is different to 'value'.
> >
> > If you do want partial word searching, this is covered in the docs:
> > http://freelancing-god.github.com/ts/en/common_issues.html#wildcards
> >
> > Also, you may want to use :star => true to automatically add stars to each 
> > word in your search queries (so 'value' is treated as '*value*'). The other 
> > way to test this would be to modify the SQL query to match on word 
> > boundaries (perhaps using a regular expression?).
> >
> > Let us know if the numbers still aren't matching up then.
> >
> > Cheers
> >
> > --
> > Pat
> >
> > On 06/10/2010, at 3:42 AM, Bob Sturim wrote:
> >
> > > Thinking Sphinx appears to not be finding all the results I would
> > > expect it to find.
> > >
> > > I have a comments table. The topics table uses single table
> > > inheritance to store comments for both Ideas and Topics...so I am
> > > searching on the entity TopicComment.
> > >
> > > My defines_index declaration in TopicComment.rb is:
> > >
> > >  define_index do
> > >    indexes body
> > >    has created_at, updated_at
> > >    set_property :delta => true
> > >  end
> > >
> > > If I do the following:
> > >
> > > TopicComment.search('value')
> > >
> > > it will only find 6 hits...though if I go into sql and issue the
> > > following query:
> > >
> > > SELECT * FROM comments
> > > where type = 'TopicComment'
> > > and body like '%value%'
> > > order by topic_id
> > >
> > > it will retrieve 377 hits.
> > >
> > > There are 4141 entires in the comments table that are of type
> > > TopicComments. If I rebuild my index using
> > >
> > > rake ts:rebuild
> > >
> > > it shows that 4141 documents were indexed:
> > >
> > >
> > > indexing index 'topic_comment_core'...
> > > collected 4141 docs, 2.9 MB
> > > sorted 12.5 Mhits, 100.0% done
> > > total 4141 docs, 2940976 bytes
> > > total 8.233 sec, 357181 bytes/sec, 502.92 docs/sec
> > > indexing index 'topic_comment_delta'...
> > > collected 0 docs, 0.0 MB
> > > total 0 docs, 0 bytes
> > > total 0.001 sec, 0 bytes/sec, 0.00 docs/sec
> > > skipping non-plain index 'topic_comment'...
> > >
> > > My sphinx.yml is defined as follows:
> > >
> > >
> > > production:
> > >  enable_star: 1
> > >  min_infix_len: 1
> > >  max_matches: 5000
> > >  morphology: stem_en
> > >
> > >
> > > Am I missing something?
> > >
> > > Thanks very much.
> > >
> > > --
> > > You received this message because you are subscribed to the Google Groups 
> > > "Thinking Sphinx" group.
> > > To post to this group, send email to [email protected].
> > > To unsubscribe from this group, send email to 
> > > [email protected].
> > > For more options, visit this group at 
> > > http://groups.google.com/group/thinking-sphinx?hl=en.
> > >
> >
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "Thinking Sphinx" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to 
> > [email protected].
> > For more options, visit this group at 
> > http://groups.google.com/group/thinking-sphinx?hl=en.
> >
> >
> >
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "Thinking Sphinx" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to 
> > [email protected].
> > For more options, visit this group at 
> > http://groups.google.com/group/thinking-sphinx?hl=en.
> 
> --
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/thinking-sphinx?hl=en.
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/thinking-sphinx?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.

Reply via email to