Solr4.2 - Fuzzy Search Problems

meghana Tue, 14 May 2013 04:07:22 -0700


I am using Solr4.2 , I have few queries on new fuzzy implementation in
Solr4+


1) I come to know that Solr4+ accepts maximum editing distance to 2 (2
insertion, deletion, replacements). Is there any way , i can configure this
maximum editing distance limit ??

2) although I set editing distance to 1 in my query (e.g. worde~1), solr
returns me results having 2 editing distance (like WORDOES, WORHEE, WORKEE,
.. ect. )

3) Last and major issue, I had very few data at startup in my solr core (say
around 1K - 2K ), at that time, when i was searching with worde~1 , it was
returning many records (around 450).

Then I ingested few more records in my solr core (say around 1K). It was
ingested successfully , no errors or warning in Log. After that when I
performed the same fuzzy search (worde~1) on previous records only, not in
new ingested records , It did not return me previous results(around 450) as
well, and return total 1 record only having highlight as WORD!N .

It seems like , Issue is causing somewhere while ingesting last 1K records,
but can not able to catch that issue. also solr do not provide any error or
warning in log. Or I don't know the way of debugging this ingestion issue.

Below is configuration for my text field type text_en_splitting.

<fieldType name="text_en_splitting" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
    <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="false"
                />
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_en.txt"
                enablePositionIncrements="true"
                />        
        <filter class="solr.WordDelimiterFilterFactory" 
                generateWordParts="1" generateNumberParts="1"
catenateWords="1" 
                catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" 
                protected="protwords.txt" types="wdfftypes.txt"  
                />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_extra_query.txt"
                enablePositionIncrements="false"
                />
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" 
                generateWordParts="1" generateNumberParts="1"
catenateWords="0" 
                catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" 
                protected="protwords.txt" types="wdfftypes.txt"  
                />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
</fieldType>

Also I have one copy field on this text field , with field type
text_general_preserved. Below is configuration for it.

<fieldType name="text_general_preserved" class="solr.TextField"
positionIncrementGap="100">
    <analyzer type="index">
        <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_ns.txt" enablePositionIncrements="false" />
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_extra_query.txt" enablePositionIncrements="false" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

Hope I explained all my question to be understandable, Please Help me on
This.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr4-2-Fuzzy-Search-Problems-tp4063199.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr4.2 - Fuzzy Search Problems

Reply via email to