[Rails-deploy] Re: sphinx vs ferret

Peter Vandenabeele Wed, 30 Jan 2008 13:22:50 -0800

Jeff Cc wrote:
>> Do you mean the "*" feature (prefix*  and *infix*) ? Where
>> the search term "program*" matches the database text
>> "program", "programmer", "programs" ...
>>
>> Those work for me in version sphinx-0.9.8-svn-r1065 and
>> sphinx-0.9.8-svn-r1112 ... I have done quite some testing
>> on r0165 (still testing the newest r1112) and that seems
>> to work OK for me. Set the "enable_star" to 1 and set a
>> min_prefix_leng or a min_infix_leng.
> 
> No that's the stemming feature I believe.. it just changes prefixes
> and suffixes on words and is language dependent. Awesome feature
> however. Not sure how (or if) Ferret implements it.
> 
> What I meant was just straight wildcards as in a MySQL LIKE clause,
> example:  "[EMAIL PROTECTED]" to find all emails @gmail.com


I have the impression the enable_star _is_ really the feature that does 
allow
search for "[EMAIL PROTECTED]" to find all emails @ gamil.com  (if you add the
'@' sign to the char table actually ... (which is another problem, since
'@' also has a special meaning as a field indicator for field specific 
search).
For the enable star the user must explicitely give a '*'. WIthout a '*'
the match is only for "exact match". I give an example at the end of
my blog: (http://www.vandenabeele.com/Ultrasphinx-performance) where
I tested with and without the enable_star feature and always without 
stemming
(since I had not stemmer for the Duthch language).

0.001 sec [ext/0/rel 1409 (0,20)] [complete] c
0.001 sec [ext/0/rel 1409 (0,20)] [complete] c*
0.000 sec [ext/0/rel 35 (0,20)] [complete] co
0.000 sec [ext/0/rel 35 (0,20)] [complete] co*
0.000 sec [ext/0/rel 5 (0,20)] [complete] com
0.000 sec [ext/0/rel 5 (0,20)] [complete] com*
0.000 sec [ext/0/rel 10 (0,20)] [complete] comp
0.003 sec [ext/0/rel 5343 (0,20)] [complete] comp*
0.000 sec [ext/0/rel 0 (0,20)] [complete] compl
0.000 sec [ext/0/rel 1473 (0,20)] [complete] compl*
0.000 sec [ext/0/rel 0 (0,20)] [complete] comple
0.000 sec [ext/0/rel 1214 (0,20)] [complete] comple*
0.000 sec [ext/0/rel 0 (0,20)] [complete] complet
0.000 sec [ext/0/rel 793 (0,20)] [complete] complet*
0.000 sec [ext/0/rel 458 (0,20)] [complete] complete
0.000 sec [ext/0/rel 642 (0,20)] [complete] complete*
0.000 sec [ext/0/rel 30 (0,20)] [complete] completed
0.000 sec [ext/0/rel 30 (0,20)] [complete] completed*
0.000 sec [ext/0/rel 0 (0,20)] [complete] completel
0.000 sec [ext/0/rel 130 (0,20)] [complete] completel*
0.000 sec [ext/0/rel 10 (0,20)] [complete] completely.

What happens is that with less than 4 characters, the * has no effect, 
but from 4 characters on, the * expands to all words that match the same 
first 4 letters. And that is an interesting feature the major public 
search engines do not offer. At this time, with the relatively small 
database I expect initially for our project (< 10 MByte or so), it 
should not be a problem to keep indices with start expansion after 4 
letters in memory.

An issue that I still have is that a final '.' of a sentence is attached 
to the index data and so not found without attaching a '.' or '*' to the 
search term.

++++

I solved the '.' issue in the meanwhile with a crude solution of 
removing the '.' character from the char_table list (which causes other 
problems ...).

The stemming will e.g. 'companies' and 'company' to a stem of 'compani' 
(both in the search term and in the database index), without the user 
needing to add a special * to the search. so any combination of 
'company' and 'companies' will match.

HTH,

Peter
-- 
Posted via http://www.ruby-forum.com/.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Deploying Rails" group.
To post to this group, send email to rubyonrails-deployment@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-deployment?hl=en
-~----------~----~----~----~------~----~------~--~---

[Rails-deploy] Re: sphinx vs ferret

Reply via email to