Re: Solr4.2 - Fuzzy Search Problems

2013-05-23 Thread meghana
Thanks Chris , 

for my 2nd Query (~1 returns words with 2 editing distance), it may be the
issue.  

still m looking for my last issue. hope jira helps to resolve that. 


Chris Hostetter-3 wrote
 : 
 : 2) although I set editing distance to 1 in my query (e.g. worde~1), solr
 : returns me results having 2 editing distance (like WORDOES, WORHEE,
 WORKEE,
 : .. ect. )
 
 fuzzy search works on *terms* in your index -- if you use a stemme when 
 you index your data (your schema shows that you are) then a word in your 
 input like WORDOES might wind up in your index as a term within the edit 
 distance you specified (ie: wordo or word or something similar)
 
 : 3) Last and major issue, I had very few data at startup in my solr core
 (say
 : around 1K - 2K ), at that time, when i was searching with worde~1 , it
 was
 : returning many records (around 450).
 : 
 : Then I ingested few more records in my solr core (say around 1K). It was
 : ingested successfully , no errors or warning in Log. After that when I
 : performed the same fuzzy search (worde~1) on previous records only, not
 in
 : new ingested records , It did not return me previous results(around 450)
 as
 : well, and return total 1 record only having highlight as WORD!N .
 
 This sounds like the same issue as discribed in SOLR-4824...
 
 https://issues.apache.org/jira/browse/SOLR-4824
 
 
 -Hoss





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr4-2-Fuzzy-Search-Problems-tp4063199p4065576.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr4.2 - Fuzzy Search Problems

2013-05-21 Thread Chris Hostetter
: 
: 2) although I set editing distance to 1 in my query (e.g. worde~1), solr
: returns me results having 2 editing distance (like WORDOES, WORHEE, WORKEE,
: .. ect. )

fuzzy search works on *terms* in your index -- if you use a stemme when 
you index your data (your schema shows that you are) then a word in your 
input like WORDOES might wind up in your index as a term within the edit 
distance you specified (ie: wordo or word or something similar)

: 3) Last and major issue, I had very few data at startup in my solr core (say
: around 1K - 2K ), at that time, when i was searching with worde~1 , it was
: returning many records (around 450).
: 
: Then I ingested few more records in my solr core (say around 1K). It was
: ingested successfully , no errors or warning in Log. After that when I
: performed the same fuzzy search (worde~1) on previous records only, not in
: new ingested records , It did not return me previous results(around 450) as
: well, and return total 1 record only having highlight as WORD!N .

This sounds like the same issue as discribed in SOLR-4824...

https://issues.apache.org/jira/browse/SOLR-4824


-Hoss


Solr4.2 - Fuzzy Search Problems

2013-05-14 Thread meghana


I am using Solr4.2 , I have few queries on new fuzzy implementation in
Solr4+

1) I come to know that Solr4+ accepts maximum editing distance to 2 (2
insertion, deletion, replacements). Is there any way , i can configure this
maximum editing distance limit ??

2) although I set editing distance to 1 in my query (e.g. worde~1), solr
returns me results having 2 editing distance (like WORDOES, WORHEE, WORKEE,
.. ect. )

3) Last and major issue, I had very few data at startup in my solr core (say
around 1K - 2K ), at that time, when i was searching with worde~1 , it was
returning many records (around 450).

Then I ingested few more records in my solr core (say around 1K). It was
ingested successfully , no errors or warning in Log. After that when I
performed the same fuzzy search (worde~1) on previous records only, not in
new ingested records , It did not return me previous results(around 450) as
well, and return total 1 record only having highlight as WORD!N .

It seems like , Issue is causing somewhere while ingesting last 1K records,
but can not able to catch that issue. also solr do not provide any error or
warning in log. Or I don't know the way of debugging this ingestion issue.

Below is configuration for my text field type text_en_splitting.

fieldType name=text_en_splitting class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=false
/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords_en.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1
catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 
protected=protwords.txt types=wdfftypes.txt  
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords_extra_query.txt
enablePositionIncrements=false
/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords_en.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1
catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 
protected=protwords.txt types=wdfftypes.txt  
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
/fieldType

Also I have one copy field on this text field , with field type
text_general_preserved. Below is configuration for it.

fieldType name=text_general_preserved class=solr.TextField
positionIncrementGap=100
analyzer type=index
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_ns.txt enablePositionIncrements=false /
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_extra_query.txt enablePositionIncrements=false /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

Hope I explained all my question to be understandable, Please Help me on
This.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr4-2-Fuzzy-Search-Problems-tp4063199.html
Sent from the Solr - User mailing list archive at Nabble.com.