RE: Spellchecking and suggesting part numbers

2014-09-24 Thread Dyer, James
Alexander,

You could use a higher value for spellcheck.count, maybe 20 or so, then in your 
application pick out the suggestions that make changes on the right side.

Another option is to use DirectSolrSpellChecker (usually a better choice 
anyhow) and set the minPrefix field.  This will require up to n characters on 
the left side to match before it will make suggestions.  Taking a quick look at 
the code, it seems to me it won't try and correct anything in this prefix 
region also.  So perhaps you can set this to 2-4 (default=1).  See 
http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinPrefix%28int%29
 .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Lochschmied, Alexander [mailto:alexander.lochschm...@vishay.com] 
Sent: Wednesday, September 24, 2014 9:06 AM
To: solr-user@lucene.apache.org
Subject: Spellchecking and suggesting part numbers

Hello Solr Users,

we are trying to get suggestions for part numbers using the spellchecker.

Problem scenario:

ABCD1234 // This is the search term
ABCE1234 // This is what we get from spellchecker
ABCD1244 // This is what we would like to get from spellchecker

Characters towards the left of our part numbers are more relevant.


The setup is:

searchComponent name=spellcheck_part 
class=solr.SpellCheckComponent
lst name=spellchecker
str name=classnamesolr.IndexBasedSpellChecker/str
str name=spellcheckIndexDir./spellchecker/str
str name=fielddid_you_mean_part/str
/lst
/searchComponent
requestHandler name=/spell_part class=solr.SearchHandler 
startup=lazy
lst name=defaults
str name=dfdid_you_mean_part/str
str name=spellcheckon/str
/lst
arr name=last-components
strspellcheck_part/str
/arr
/requestHandler


fieldType name=did_you_mean_part class=solr.TextField 
positionIncrementGap=100
analyzer type=index
charFilter 
class=solr.PatternReplaceCharFilterFactory pattern=[\s]+ replacement=/
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory 
minGramSize=1 maxGramSize=20 side=front/
filter 
class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
charFilter 
class=solr.PatternReplaceCharFilterFactory pattern=[\s]+ replacement=/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory 
minGramSize=1 maxGramSize=20 side=front/
/analyzer
/fieldType

Can we tweak the setup such that we should get more relevant part numbers?

Thanks,
Alexander




Re: Spellchecking and suggesting part numbers

2014-09-24 Thread Jorge Luis Betancourt Gonzalez
I’ve done something similar to this using the the EdgeNGram not the 
spellchecker component, I don’t know if this is along with your requirements:

The relevant portion of my fieldType config:

filter class=solr.WordDelimiterFilterFactory” 
generateWordParts=1 generateNumberParts=1
catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory maxGramSize=20 
minGramSize=1”/

Basically use the WorDelimiterFilterFactory to divide the ABCD1234 into two 
tokens (or don’t depending on your requirement) and then use the 
EdgeNGramFilterFactory to provide partial matching on the field.

On Sep 24, 2014, at 10:05 AM, Lochschmied, Alexander 
alexander.lochschm...@vishay.com wrote:

 Hello Solr Users,
 
 we are trying to get suggestions for part numbers using the spellchecker.
 
 Problem scenario:
 
 ABCD1234 // This is the search term
 ABCE1234 // This is what we get from spellchecker
 ABCD1244 // This is what we would like to get from spellchecker
 
 Characters towards the left of our part numbers are more relevant.
 
 
 The setup is:
 
   searchComponent name=spellcheck_part 
 class=solr.SpellCheckComponent
   lst name=spellchecker
   str name=classnamesolr.IndexBasedSpellChecker/str
   str name=spellcheckIndexDir./spellchecker/str
   str name=fielddid_you_mean_part/str
   /lst
   /searchComponent
   requestHandler name=/spell_part class=solr.SearchHandler 
 startup=lazy
   lst name=defaults
   str name=dfdid_you_mean_part/str
   str name=spellcheckon/str
   /lst
   arr name=last-components
   strspellcheck_part/str
   /arr
   /requestHandler
 
 
   fieldType name=did_you_mean_part class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
   charFilter 
 class=solr.PatternReplaceCharFilterFactory pattern=[\s]+ replacement=/
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EdgeNGramFilterFactory 
 minGramSize=1 maxGramSize=20 side=front/
   filter 
 class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
   charFilter 
 class=solr.PatternReplaceCharFilterFactory pattern=[\s]+ replacement=/
   tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EdgeNGramFilterFactory 
 minGramSize=1 maxGramSize=20 side=front/
   /analyzer
   /fieldType
 
 Can we tweak the setup such that we should get more relevant part numbers?
 
 Thanks,
 Alexander

Concurso Mi selfie por los 5. Detalles en 
http://justiciaparaloscinco.wordpress.com