RE: Spellchecking and suggesting part numbers
Alexander, You could use a higher value for spellcheck.count, maybe 20 or so, then in your application pick out the suggestions that make changes on the right side. Another option is to use DirectSolrSpellChecker (usually a better choice anyhow) and set the minPrefix field. This will require up to n characters on the left side to match before it will make suggestions. Taking a quick look at the code, it seems to me it won't try and correct anything in this prefix region also. So perhaps you can set this to 2-4 (default=1). See http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinPrefix%28int%29 . James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Lochschmied, Alexander [mailto:alexander.lochschm...@vishay.com] Sent: Wednesday, September 24, 2014 9:06 AM To: solr-user@lucene.apache.org Subject: Spellchecking and suggesting part numbers Hello Solr Users, we are trying to get suggestions for part numbers using the spellchecker. Problem scenario: ABCD1234 // This is the search term ABCE1234 // This is what we get from spellchecker ABCD1244 // This is what we would like to get from spellchecker Characters towards the left of our part numbers are more relevant. The setup is: searchComponent name=spellcheck_part class=solr.SpellCheckComponent lst name=spellchecker str name=classnamesolr.IndexBasedSpellChecker/str str name=spellcheckIndexDir./spellchecker/str str name=fielddid_you_mean_part/str /lst /searchComponent requestHandler name=/spell_part class=solr.SearchHandler startup=lazy lst name=defaults str name=dfdid_you_mean_part/str str name=spellcheckon/str /lst arr name=last-components strspellcheck_part/str /arr /requestHandler fieldType name=did_you_mean_part class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[\s]+ replacement=/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20 side=front/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=[\s]+ replacement=/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20 side=front/ /analyzer /fieldType Can we tweak the setup such that we should get more relevant part numbers? Thanks, Alexander
Re: Spellchecking and suggesting part numbers
I’ve done something similar to this using the the EdgeNGram not the spellchecker component, I don’t know if this is along with your requirements: The relevant portion of my fieldType config: filter class=solr.WordDelimiterFilterFactory” generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory maxGramSize=20 minGramSize=1”/ Basically use the WorDelimiterFilterFactory to divide the ABCD1234 into two tokens (or don’t depending on your requirement) and then use the EdgeNGramFilterFactory to provide partial matching on the field. On Sep 24, 2014, at 10:05 AM, Lochschmied, Alexander alexander.lochschm...@vishay.com wrote: Hello Solr Users, we are trying to get suggestions for part numbers using the spellchecker. Problem scenario: ABCD1234 // This is the search term ABCE1234 // This is what we get from spellchecker ABCD1244 // This is what we would like to get from spellchecker Characters towards the left of our part numbers are more relevant. The setup is: searchComponent name=spellcheck_part class=solr.SpellCheckComponent lst name=spellchecker str name=classnamesolr.IndexBasedSpellChecker/str str name=spellcheckIndexDir./spellchecker/str str name=fielddid_you_mean_part/str /lst /searchComponent requestHandler name=/spell_part class=solr.SearchHandler startup=lazy lst name=defaults str name=dfdid_you_mean_part/str str name=spellcheckon/str /lst arr name=last-components strspellcheck_part/str /arr /requestHandler fieldType name=did_you_mean_part class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[\s]+ replacement=/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20 side=front/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=[\s]+ replacement=/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20 side=front/ /analyzer /fieldType Can we tweak the setup such that we should get more relevant part numbers? Thanks, Alexander Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com