Re: Re: How to properly use Levenstein distance with ~ in Java

2014-10-23 Thread karsten-solr
Hi Aleksander,
 
The Fuzzy Searche '~' is not supported in dismax (defType=dismax)
https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
 
You are using SearchComponent spellchecker. This does not change the query 
results.
 

btw: It looks like you are using path /select with qt=dismax. This normaly 
would throw an exception.
Is there a tag
  requestHandler name=/dismax ...
inside your solrconfig.xml ? 
 
Best regards
 
  Karsten
 
P.S. in Context: 
http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-td4164793.html
 

 On 20 October 2014 11:13, Aleksander Sadecki wrote:

 Ok, thank you for your response. But why I cannot use '~'?


Re: How to properly use Levenstein distance with ~ in Java

2014-10-23 Thread Walter Underwood
We’re reimplementing fuzzy support in edismax on Solr 4.x right now. See: 
https://issues.apache.org/jira/browse/SOLR-629

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/

On Oct 22, 2014, at 11:08 PM, karsten-s...@gmx.de wrote:

 Hi Aleksander,
  
 The Fuzzy Searche '~' is not supported in dismax (defType=dismax)
 https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
  
 You are using SearchComponent spellchecker. This does not change the query 
 results.
  
 
 btw: It looks like you are using path /select with qt=dismax. This normaly 
 would throw an exception.
 Is there a tag
   requestHandler name=/dismax ...
 inside your solrconfig.xml ? 
  
 Best regards
  
   Karsten
  
 P.S. in Context: 
 http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-td4164793.html
  
 
 On 20 October 2014 11:13, Aleksander Sadecki wrote:
 
 Ok, thank you for your response. But why I cannot use '~'?



Re: How to properly use Levenstein distance with ~ in Java

2014-10-23 Thread Alexandre Rafalovitch
The last real update on that is 2.5 years old. Is there more recent
update? I am interested in this topic as well.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 23 October 2014 10:10, Walter Underwood wun...@wunderwood.org wrote:
 We’re reimplementing fuzzy support in edismax on Solr 4.x right now. See: 
 https://issues.apache.org/jira/browse/SOLR-629

 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/

 On Oct 22, 2014, at 11:08 PM, karsten-s...@gmx.de wrote:

 Hi Aleksander,

 The Fuzzy Searche '~' is not supported in dismax (defType=dismax)
 https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser

 You are using SearchComponent spellchecker. This does not change the query 
 results.


 btw: It looks like you are using path /select with qt=dismax. This normaly 
 would throw an exception.
 Is there a tag
   requestHandler name=/dismax ...
 inside your solrconfig.xml ?

 Best regards

   Karsten

 P.S. in Context: 
 http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-td4164793.html


 On 20 October 2014 11:13, Aleksander Sadecki wrote:

 Ok, thank you for your response. But why I cannot use '~'?



RE: How to properly use Levenstein distance with ~ in Java

2014-10-23 Thread Will Martin
In terms of recent work with edit-distance (specifically Levenshtein) and your 
expressed interest might find this paper provocative.

We measure the keyword similarity between two strings
by lemmatizing them, removing stopwords, and computing
the cosine similarity. We then include the keyword similar-
ity between the query and the input question, the keyword
similarity between the query and the returned evidence, and
an indicator feature for whether the query involves a join.
The evidence features compute KB-specific properties... We compute the join-key 
string similarity mea-
sured using the Levenshtein distance.


http://dx.doi.org/10.1145/2623330.2623677

re
will


-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Thursday, October 23, 2014 12:05 PM
To: solr-user
Subject: Re: How to properly use Levenstein distance with ~ in Java

The last real update on that is 2.5 years old. Is there more recent update? I 
am interested in this topic as well.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and 
newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers 
community: https://www.linkedin.com/groups?gid=6713853


On 23 October 2014 10:10, Walter Underwood wun...@wunderwood.org wrote:
 We’re reimplementing fuzzy support in edismax on Solr 4.x right now. 
 See: https://issues.apache.org/jira/browse/SOLR-629

 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/

 On Oct 22, 2014, at 11:08 PM, karsten-s...@gmx.de wrote:

 Hi Aleksander,

 The Fuzzy Searche '~' is not supported in dismax (defType=dismax) 
 https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Par
 ser

 You are using SearchComponent spellchecker. This does not change the query 
 results.


 btw: It looks like you are using path /select with qt=dismax. This normaly 
 would throw an exception.
 Is there a tag
   requestHandler name=/dismax ...
 inside your solrconfig.xml ?

 Best regards

   Karsten

 P.S. in Context: 
 http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-dis
 tance-with-in-Java-td4164793.html


 On 20 October 2014 11:13, Aleksander Sadecki wrote:

 Ok, thank you for your response. But why I cannot use '~'?




Re: How to properly use Levenstein distance with ~ in Java

2014-10-21 Thread Erick Erickson
When used on bare terms, ~ is indeed fuzzy matching rather than
proximity, it's an overloaded operator in that sense.

If I had to guess, I'd guess that your analysis chain for the field is
doing interesting things for taveranx and the resulting token is
far enough away (in the Levenshtein sense) that it's not found.

The admin/analysis page is very much your friend here, it'll show you
what the term taveranx becomes in your index.

You might try varying the closeness of the term by adding
taveranx~0.2 (or whatever) to your query to see if it's eventually
found.

And as a test see if specifying fuzzy operations works on other terms,
in which case my hypothesis will get a little support

Best,
Erick



On Tue, Oct 21, 2014 at 1:07 AM, Ramzi Alqrainy
ramzi.alqra...@gmail.com wrote:
 Because ~ is proximity matching. Lucene supports finding words are a within a
 specific distance away.
 Search for foo bar within 4 words from each other.

 foo bar~4

 Note that for proximity searches, exact matches are proximity zero, and word
 transpositions (bar foo) are proximity 1.
 A query such as foo bar~1000 is an interesting alternative to foo AND
 bar.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-tp4164793p4165079.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to properly use Levenstein distance with ~ in Java

2014-10-20 Thread Aleksander Sadecki
Ok, thank you for your response. But why I cannot use '~'?

On 20 October 2014 07:40, Ramzi Alqrainy ramzi.alqra...@gmail.com wrote:

 You can use Levenstein Distance algorithm inside solr without writing code
 by
 specifing the source of terms in solrconfig.xml

 searchComponent name=spellcheck class=solr.SpellCheckComponent
   lst name=spellchecker
 str name=classnamesolr.IndexBasedSpellChecker/str
 str name=spellcheckIndexDir./spellchecker/str
 str name=fieldcontent/str
 str name=buildOnCommittrue/str
   /lst
 /searchComponent

 This example shows the results of a simple query that defines a query using
 the spellcheck.q parameter. The query also includes a spellcheck.build=true
 parameter, which is needs to be called only once in order to build the
 index. spellcheck.build should not be specified with for each request.


 http://localhost:8983/solr/spellCheckCompRH?q=*:*spellcheck.q=hell%20ultrasharspellcheck=truespellcheck.build=true

 lst name=spellcheck
   lst name=suggestions
 lst name=hell
   int name=numFound1/int
   int name=startOffset0/int
   int name=endOffset4/int
   arr name=suggestion
 strdell/str
   /arr
 /lst
 lst name=ultrashar
   int name=numFound1/int
   int name=startOffset5/int
   int name=endOffset14/int
   arr name=suggestion
 strultrasharp/str
   /arr
 /lst
   /lst
 /lst



 Once the suggestions are collected, they are ranked by the configured
 distance measure (Levenstein Distance by default) and then by aggregate
 frequency.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-tp4164793p4164883.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Pozdrawiam / Best regards
Aleksander Sadecki


Re: How to properly use Levenstein distance with ~ in Java

2014-10-20 Thread Ramzi Alqrainy
Because ~ is proximity matching. Lucene supports finding words are a within a
specific distance away.
Search for foo bar within 4 words from each other.

foo bar~4

Note that for proximity searches, exact matches are proximity zero, and word
transpositions (bar foo) are proximity 1.
A query such as foo bar~1000 is an interesting alternative to foo AND
bar.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-tp4164793p4165079.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to properly use Levenstein distance with ~ in Java

2014-10-19 Thread Ramzi Alqrainy
You can use Levenstein Distance algorithm inside solr without writing code by
specifing the source of terms in solrconfig.xml

searchComponent name=spellcheck class=solr.SpellCheckComponent
  lst name=spellchecker
str name=classnamesolr.IndexBasedSpellChecker/str
str name=spellcheckIndexDir./spellchecker/str
str name=fieldcontent/str
str name=buildOnCommittrue/str
  /lst
/searchComponent

This example shows the results of a simple query that defines a query using
the spellcheck.q parameter. The query also includes a spellcheck.build=true
parameter, which is needs to be called only once in order to build the
index. spellcheck.build should not be specified with for each request.

http://localhost:8983/solr/spellCheckCompRH?q=*:*spellcheck.q=hell%20ultrasharspellcheck=truespellcheck.build=true

lst name=spellcheck
  lst name=suggestions
lst name=hell
  int name=numFound1/int
  int name=startOffset0/int
  int name=endOffset4/int
  arr name=suggestion
strdell/str
  /arr
/lst
lst name=ultrashar
  int name=numFound1/int
  int name=startOffset5/int
  int name=endOffset14/int
  arr name=suggestion
strultrasharp/str
  /arr
/lst
  /lst
/lst



Once the suggestions are collected, they are ranked by the configured
distance measure (Levenstein Distance by default) and then by aggregate
frequency.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-tp4164793p4164883.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to properly use Levenstein distance with ~ in Java

2014-10-18 Thread Aleksander Sadecki
Hi,

I have got a SOLR instance on my local machine with samples of data. When I
run in myhost:8083/solr/#/mycore/query a query: title:taverna it gives me 4
results. When I make a mistake, for instance: title:taveranx it gives me 0
results but with '~' it finds all of the 4 samples which have been found
before.

I am trying to add Levenstein distance in my Java code.

I have got a piece of code:

import org.apache.solr.client.solrj.SolrServer;
// ...
private SolrServer solrServer // ...
QueryResponse originalResponse = solrServer.query(solrQuery);

here you can find how my variable solrQuery looks
like:http://oi59.tinypic.com/izmiwk.jpg

and no documents have been found.

I found queries which have been executed in SOLR and it looks like this:

Oct 18, 2014 7:02:11 PM org.apache.solr.core.SolrCore execute

INFO: [aos] webapp=/solr path=/select
params={spellcheck=truespellcheck.collateExtendedResults=truetie=0.1spellcheck.maxCollations=4spellcheck.maxCollationTries=1000qf=tags^11.0+title^10.0+...q.alt=*:*wt=javabinspellcheck.collate=trueversion=2rows=10defType=dismaxfl=*bq=type:some_field^100start=0q=taveranx~bf=recip(ms(NOW,publication_date),1.15e-08,3650,3650)spellcheck.count=20qt=dismaxfq=-status:Cfq=-(-startDate:[NOW+TO+*]+AND+type:some_field)fq=-type:listing}
hits=0 status=0 QTime=22

Oct 18, 2014 7:02:11 PM org.apache.solr.core.SolrCore execute

INFO: [aos] webapp=/solr path=/select
params={spellcheck=truefacet=truespellcheck.collateExtendedResults=truetie=0.1spellcheck.maxCollations=4spellcheck.maxCollationTries=1000qf=tags^11.0+...q.alt=*:*wt=javabinspellcheck.collate=trueversion=2rows=0defType=dismaxfl=*bq=type:some_field^100start=0q=taveranx~bf=recip(ms(NOW,publication_date),1.15e-08,3650,3650)facet.field=typespellcheck.count=20qt=dismaxfq=-status:Cfq=-(-startDate:[NOW+TO+*]+AND+type:some_field)}
hits=0 status=0 QTime=29

Can someone tell me where could be a mistake?

Thank you in advance
Alex