2015-05-08 10:14 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>:

> Hi Alessandro,
>
> I'm using Solr 5.0.0, but it is still able to work. Actually I found this
> to be better than <query>~1 or <query>~2, as it can automatically detect
> and allow the 20% error rate that I want.
>
I don't think that the "double" param is supported anymore, so we should
take a look to tricky formula underline to understand how the exact edits
are calculated


>
> For this <query>~1 or <query>~2, does it mean that I'll have to manually
> detect how many characters did I enter, before I assign the suitable
> ~(tilde)
> param in order to achieve the 20% error rate?
>
Yes

> I'll probably need an edit distance of 0 for words with 3 or less
> characters, 1 for words with 4 to 9 characters, edit distance of 2 for
> words with 10 to 14 characters, and edit distance of 3 for words with more
> than 15 characters.
>
This would be quite easy, just check the length and assign the proper edit
accordingly to your requirements.

>
> Yes, for the performance I'm checking if the length check will affect the
> query time. Thanks for your info on that. Currently my index is small, so
> everything seems to run quite fast and the delay is un-noticeable. But not
> so sure if it will slow down till it is noticeable by the user if I have
> tens of collections with millions of records.
>
I think the length check will be constant time for any string ( if you are
using java , most likely to be constant in all other languages)
So i would say it won't be a problem in comparison with the actual query
time.

>
>
> Regards,
> Edwin
>
>
>
> On 8 May 2015 at 16:53, Alessandro Benedetti <benedetti.ale...@gmail.com>
> wrote:
>
> > Hi Zheng,
> > actually that version of the fuzzy search is deprecated!
> > Currently the fuzzy search syntax is :
> > <query>~1 or <query>~2
> > The ~(tilde)  param is the number of edit we provide to generate all the
> > expanded query to run.
> > Can I ask you which version of Solr are you using ?
> >
> > This article from 2011 shows the biggest change in fuzzy query, and I
> guess
> > it's still the current approach!
> > Related the performance, what do you mean ?
> > Are you worried if the length check will affect the query time ?
> > The answer is yes, but the delay will be un-noticeable as you simply
> check
> > the length and apply the proper fuzzy param related.
> > Regarding the fact fuzzy query being slower than a normal query, that is
> > true, but the FST approach guarantee really fast fuzzy query.
> > So if you do need the fuzziness, it's something you can cope with.
> >
> > Cheers
> >
> > 2015-05-08 3:12 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>:
> >
> > > Thank you for the information.
> > >
> > > I've currently using the fuzzy search and set the edit distance value
> to
> > > ~0.79, and this has allowed a 20% error rate. (ie for words with 5
> > > characters, it allows 1 mis-spelled character, and for words with 10
> > > characters, it allows 2 mis-speed characters).
> > >
> > > However, for words with 4 characters, I'll need to set the value to
> ~0.75
> > > to allow 1 mis-spelled character, as in order to accommodate 4
> characters
> > > word, it requires a 25% error rate for 1 mis-spelled character. We
> > probably
> > > will not accommodate for 3 characters word.
> > >
> > > I've gotten the information from here:
> > >
> >
> http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Fuzzy%20Searches
> >
> > >
> > > Just to check, will this affect the performance of the system?
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 7 May 2015 at 20:00, Alessandro Benedetti <
> benedetti.ale...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi !
> > > > Currently Solr builds FST to provide proper fuzzy search or
> spellcheck
> > > > suggestions based on the string distance .
> > > > The current default algorithm is the Levenstein distance ( that
> returns
> > > the
> > > > number of edit as distance metric).
> > > > In your case you should calculate client side, the edit you want to
> > apply
> > > > to your search.
> > > > In your client code, should be not difficult to process the query and
> > > apply
> > > > the proper number of edit depending on the length.
> > > >
> > > > Anyway the max edit for the levenstein default distance is fixed to
> 2 .
> > > >
> > > > Cheers
> > > >
> > > >
> > > >
> > > > 2015-05-05 10:24 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com
> >:
> > > >
> > > > > Hi,
> > > > >
> > > > > Would like to check, how do we implement character proximity
> > searching
> > > > > that's in terms of percentage with regards to the length of the
> word,
> > > > > instead of a fixed number of edit distance (characters)?
> > > > >
> > > > > For example, if we have a proximity of 20%, a word with 5
> characters
> > > will
> > > > > have an edit distance of 1, and a word with 10 characters will
> > > > > automatically have an edit distance of 2.
> > > > >
> > > > > Will Solr be able to do that for us?
> > > > >
> > > > > Regards,
> > > > > Edwin
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > --------------------------
> > > >
> > > > Benedetti Alessandro
> > > > Visiting card : http://about.me/alessandro_benedetti
> > > >
> > > > "Tyger, tyger burning bright
> > > > In the forests of the night,
> > > > What immortal hand or eye
> > > > Could frame thy fearful symmetry?"
> > > >
> > > > William Blake - Songs of Experience -1794 England
> > > >
> > >
> >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Reply via email to