Hoss,

Thanks. You've answered my question. To clarify, what I should have asked for 
instead of 'exact' was 'not fuzzy'. For some reason it didn't occur to me that 
I didn't need n-grams to use the wildcard. You asking for me to clarify what I 
meant made me realize that the n-grams are the source of all my current 
problems. :)

Thanks!

Devon Baumgarten


-----Original Message-----
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Thursday, December 29, 2011 7:00 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr, SQL Server's LIKE


: Thanks. I know I'll be able to utilize some of Solr's free text 
: searching capabilities in other search types in this project. The 
: product manager wants this particular search to exactly mimic LIKE%.
        ...
: Ex: If I search "Albatross" I want "Albert" to be excluded completely, 
: rather than having a low score.

please be specific about the types of queries you want. ie: we need more 
then one example of the type of input you want to provide, the type of 
matches you want to see for that input, and the type of matches you want 
to get back.

in your first message you said you need to match company titles "pretty 
exactly" but then seem to contradict yourself by saying the SQL's LIKE 
command fit's the bill -- even though the SQL LIKE command exists 
specificly for in-exact matches on field values.

Based on your one example above of Albatross, you don't need anything 
special: don't use ngrams, don't use stemming, don't use fuzzy anything -- 
just search for "Albatross" and it will match "Albatross" but not 
"Albert".  if you want "Albatross" to match "Albatross Road" use some 
basic tokenization.

If all you really care about is prefix searching (which seems suggested by 
your "LIKE%" comment above, which i'm guessing is shorthand for something 
similar to "LIKE 'ABC%'"), so that queries like "abc" and "abcd" both 
match "abcdef" and "abcdzzzz" but neither of them match "xxxxabcdyyyy" 
then just use prefix queries (ie: "abcd*") -- they should be plenty 
efficient for your purposes.  you only need to worry about ngrams when you 
want to efficiently match in the middle of a string. (ie: "TITLE LIKE 
%ABC%")


-Hoss

Reply via email to