2015-05-08 10:14 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>:
> Hi Alessandro, > > I'm using Solr 5.0.0, but it is still able to work. Actually I found this > to be better than <query>~1 or <query>~2, as it can automatically detect > and allow the 20% error rate that I want. > I don't think that the "double" param is supported anymore, so we should take a look to tricky formula underline to understand how the exact edits are calculated > > For this <query>~1 or <query>~2, does it mean that I'll have to manually > detect how many characters did I enter, before I assign the suitable > ~(tilde) > param in order to achieve the 20% error rate? > Yes > I'll probably need an edit distance of 0 for words with 3 or less > characters, 1 for words with 4 to 9 characters, edit distance of 2 for > words with 10 to 14 characters, and edit distance of 3 for words with more > than 15 characters. > This would be quite easy, just check the length and assign the proper edit accordingly to your requirements. > > Yes, for the performance I'm checking if the length check will affect the > query time. Thanks for your info on that. Currently my index is small, so > everything seems to run quite fast and the delay is un-noticeable. But not > so sure if it will slow down till it is noticeable by the user if I have > tens of collections with millions of records. > I think the length check will be constant time for any string ( if you are using java , most likely to be constant in all other languages) So i would say it won't be a problem in comparison with the actual query time. > > > Regards, > Edwin > > > > On 8 May 2015 at 16:53, Alessandro Benedetti <benedetti.ale...@gmail.com> > wrote: > > > Hi Zheng, > > actually that version of the fuzzy search is deprecated! > > Currently the fuzzy search syntax is : > > <query>~1 or <query>~2 > > The ~(tilde) param is the number of edit we provide to generate all the > > expanded query to run. > > Can I ask you which version of Solr are you using ? > > > > This article from 2011 shows the biggest change in fuzzy query, and I > guess > > it's still the current approach! > > Related the performance, what do you mean ? > > Are you worried if the length check will affect the query time ? > > The answer is yes, but the delay will be un-noticeable as you simply > check > > the length and apply the proper fuzzy param related. > > Regarding the fact fuzzy query being slower than a normal query, that is > > true, but the FST approach guarantee really fast fuzzy query. > > So if you do need the fuzziness, it's something you can cope with. > > > > Cheers > > > > 2015-05-08 3:12 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>: > > > > > Thank you for the information. > > > > > > I've currently using the fuzzy search and set the edit distance value > to > > > ~0.79, and this has allowed a 20% error rate. (ie for words with 5 > > > characters, it allows 1 mis-spelled character, and for words with 10 > > > characters, it allows 2 mis-speed characters). > > > > > > However, for words with 4 characters, I'll need to set the value to > ~0.75 > > > to allow 1 mis-spelled character, as in order to accommodate 4 > characters > > > word, it requires a 25% error rate for 1 mis-spelled character. We > > probably > > > will not accommodate for 3 characters word. > > > > > > I've gotten the information from here: > > > > > > http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Fuzzy%20Searches > > > > > > > > Just to check, will this affect the performance of the system? > > > > > > Regards, > > > Edwin > > > > > > > > > On 7 May 2015 at 20:00, Alessandro Benedetti < > benedetti.ale...@gmail.com > > > > > > wrote: > > > > > > > Hi ! > > > > Currently Solr builds FST to provide proper fuzzy search or > spellcheck > > > > suggestions based on the string distance . > > > > The current default algorithm is the Levenstein distance ( that > returns > > > the > > > > number of edit as distance metric). > > > > In your case you should calculate client side, the edit you want to > > apply > > > > to your search. > > > > In your client code, should be not difficult to process the query and > > > apply > > > > the proper number of edit depending on the length. > > > > > > > > Anyway the max edit for the levenstein default distance is fixed to > 2 . > > > > > > > > Cheers > > > > > > > > > > > > > > > > 2015-05-05 10:24 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com > >: > > > > > > > > > Hi, > > > > > > > > > > Would like to check, how do we implement character proximity > > searching > > > > > that's in terms of percentage with regards to the length of the > word, > > > > > instead of a fixed number of edit distance (characters)? > > > > > > > > > > For example, if we have a proximity of 20%, a word with 5 > characters > > > will > > > > > have an edit distance of 1, and a word with 10 characters will > > > > > automatically have an edit distance of 2. > > > > > > > > > > Will Solr be able to do that for us? > > > > > > > > > > Regards, > > > > > Edwin > > > > > > > > > > > > > > > > > > > > > -- > > > > -------------------------- > > > > > > > > Benedetti Alessandro > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > "Tyger, tyger burning bright > > > > In the forests of the night, > > > > What immortal hand or eye > > > > Could frame thy fearful symmetry?" > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England