I am using Solr4.2 , I have few queries on new fuzzy implementation in
Solr4+
1) I come to know that Solr4+ accepts maximum editing distance to 2 (2
insertion, deletion, replacements). Is there any way , i can configure this
maximum editing distance limit ??
2) although I set editing distance to 1 in my query (e.g. worde~1), solr
returns me results having 2 editing distance (like WORDOES, WORHEE, WORKEE,
.. ect. )
3) Last and major issue, I had very few data at startup in my solr core (say
around 1K - 2K ), at that time, when i was searching with worde~1 , it was
returning many records (around 450).
Then I ingested few more records in my solr core (say around 1K). It was
ingested successfully , no errors or warning in Log. After that when I
performed the same fuzzy search (worde~1) on previous records only, not in
new ingested records , It did not return me previous results(around 450) as
well, and return total 1 record only having highlight as WORD!N .
It seems like , Issue is causing somewhere while ingesting last 1K records,
but can not able to catch that issue. also solr do not provide any error or
warning in log. Or I don't know the way of debugging this ingestion issue.
Below is configuration for my text field type text_en_splitting.
fieldType name=text_en_splitting class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=false
/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords_en.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1
catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1
protected=protwords.txt types=wdfftypes.txt
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords_extra_query.txt
enablePositionIncrements=false
/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords_en.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1
catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1
protected=protwords.txt types=wdfftypes.txt
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
/fieldType
Also I have one copy field on this text field , with field type
text_general_preserved. Below is configuration for it.
fieldType name=text_general_preserved class=solr.TextField
positionIncrementGap=100
analyzer type=index
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_ns.txt enablePositionIncrements=false /
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_extra_query.txt enablePositionIncrements=false /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType
Hope I explained all my question to be understandable, Please Help me on
This.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr4-2-Fuzzy-Search-Problems-tp4063199.html
Sent from the Solr - User mailing list archive at Nabble.com.