I am trying to do approximative search with solr. We've tried fuzzy search, and spellcheck search, it's working ok but edit distance is limited (to 2 for DirectSolrSpellChecker in solr 4.10.1). With fuzzy operator, we've had performance issues, and I don't think you can have an edit distance more than 2.
What we used to do with a database was more efficient: storing trigrams with position, and then searching arround that position (not precisely at that position, since it's approximative search) Position is to avoid for a trigram like ams (amsterdam) to get answers where the same trigram is for instance at the end of the word. I would like answers with the same relative position between trigrams to score higher. Maybe using edismax'ss pf2 and pf3 is a way to do this. I don't see any other way. Please tell me if you do. >From you're answer, I get that position is stored, but I dont understand how I can preserve relative order between trigrams, apart from using pf2 pf3. Best regards, Elisabeth 2016-03-10 0:02 GMT+01:00 Alessandro Benedetti <abenede...@apache.org>: > if you store the positions for your tokens ( and it is by default if you > don't omit them), you have the relative position in the index. [1] > I attach a blog post of mine, describing a little bit more in details the > lucene internals. > > Apart from that, can you explain the problem you are trying to solve ? > The high level user experience ? > What kind of search/autocompletion/relevancy tuning are you trying to > achieve ? > Maybe we can help better if we start from the problem :) > > Cheers > > [1] > > http://alexbenedetti.blogspot.co.uk/2015/07/exploring-solr-internals-lucene.html > > On 9 March 2016 at 15:02, elisabeth benoit <elisaelisael...@gmail.com> > wrote: > > > Hello Alessandro, > > > > You may be right. What would you use to keep relative order between, for > > instance, grams > > > > __a > > _am > > ams > > mst > > ste > > ter > > erd > > rda > > dam > > am_ > > > > of amsterdam? pf2 and pf3? That's all I can think about. Please let me > know > > if you have more insights. > > > > Best regards, > > Elisabeth > > > > 2016-03-08 17:46 GMT+01:00 Alessandro Benedetti <abenede...@apache.org>: > > > > > Elizabeth, > > > out of curiousity, could we know what you are trying to solve with that > > > complex way of tokenisation ? > > > Solr is really good in storing positions along with token, so I am > > curious > > > to know why your are mixing the things up. > > > > > > Cheers > > > > > > On 8 March 2016 at 10:08, elisabeth benoit <elisaelisael...@gmail.com> > > > wrote: > > > > > > > Thanks for your answer Emir, > > > > > > > > I'll check that out. > > > > > > > > Best regards, > > > > Elisabeth > > > > > > > > 2016-03-08 10:24 GMT+01:00 Emir Arnautovic < > > emir.arnauto...@sematext.com > > > >: > > > > > > > > > Hi Elisabeth, > > > > > I don't think there is such token filter, so you would have to > create > > > > your > > > > > own token filter that takes token and emits ngram token of specific > > > > length. > > > > > It should not be too hard to create such filter - you can take a > look > > > how > > > > > nagram filter is coded - yours should be simpler than that. > > > > > > > > > > Regards, > > > > > Emir > > > > > > > > > > > > > > > On 08.03.2016 08:52, elisabeth benoit wrote: > > > > > > > > > >> Hello, > > > > >> > > > > >> I'm using solr 4.10.1. I'd like to index words with ngrams of fix > > > lenght > > > > >> with a position in the end. > > > > >> > > > > >> For instance, with fix lenght 3, Amsterdam would be something > like: > > > > >> > > > > >> > > > > >> a0 (two spaces added at beginning) > > > > >> am1 > > > > >> ams2 > > > > >> mst3 > > > > >> ste4 > > > > >> ter5 > > > > >> erd6 > > > > >> rda7 > > > > >> dam8 > > > > >> am9 (one more space in the end) > > > > >> > > > > >> The number at the end being the position. > > > > >> > > > > >> Does anyone have a clue how to achieve this? > > > > >> > > > > >> Best regards, > > > > >> Elisabeth > > > > >> > > > > >> > > > > > -- > > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log > > Management > > > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > -------------------------- > > > > > > Benedetti Alessandro > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > "Tyger, tyger burning bright > > > In the forests of the night, > > > What immortal hand or eye > > > Could frame thy fearful symmetry?" > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >