I am trying to do approximative search with solr. We've tried fuzzy search,
and spellcheck search, it's working ok but edit distance is limited (to 2
for DirectSolrSpellChecker in solr 4.10.1). With fuzzy operator, we've had
performance issues, and I don't think you can have an edit distance more
than 2.

What we used to do with a database was more efficient: storing trigrams
with position, and then searching arround that position (not precisely at
that position, since it's approximative search)

Position is to avoid  for a trigram like ams (amsterdam) to get answers
where the same trigram is for instance at the end of the word. I would like
answers with the same relative position between trigrams to score higher.
Maybe using edismax'ss pf2 and pf3 is a way to do this. I don't see any
other way. Please tell me if you do.

>From you're answer, I get that position is stored, but I dont understand
how I can preserve relative order between trigrams, apart from using pf2
pf3.

Best regards,
Elisabeth

2016-03-10 0:02 GMT+01:00 Alessandro Benedetti <abenede...@apache.org>:

> if you store the positions for your tokens ( and it is by default if you
> don't omit them), you have the relative position in the index. [1]
> I attach a blog post of mine, describing a little bit more in details the
> lucene internals.
>
> Apart from that, can you explain the problem you are trying to solve ?
> The high level user experience ?
> What kind of search/autocompletion/relevancy tuning are you trying to
> achieve ?
> Maybe we can help better if we start from the problem :)
>
> Cheers
>
> [1]
>
> http://alexbenedetti.blogspot.co.uk/2015/07/exploring-solr-internals-lucene.html
>
> On 9 March 2016 at 15:02, elisabeth benoit <elisaelisael...@gmail.com>
> wrote:
>
> > Hello Alessandro,
> >
> > You may be right. What would you use to keep relative order between, for
> > instance, grams
> >
> > __a
> > _am
> > ams
> > mst
> > ste
> > ter
> > erd
> > rda
> > dam
> > am_
> >
> > of amsterdam? pf2 and pf3? That's all I can think about. Please let me
> know
> > if you have more insights.
> >
> > Best regards,
> > Elisabeth
> >
> > 2016-03-08 17:46 GMT+01:00 Alessandro Benedetti <abenede...@apache.org>:
> >
> > > Elizabeth,
> > > out of curiousity, could we know what you are trying to solve with that
> > > complex way of tokenisation ?
> > > Solr is really good in storing positions along with token, so I am
> > curious
> > > to know why your are mixing the things up.
> > >
> > > Cheers
> > >
> > > On 8 March 2016 at 10:08, elisabeth benoit <elisaelisael...@gmail.com>
> > > wrote:
> > >
> > > > Thanks for your answer Emir,
> > > >
> > > > I'll check that out.
> > > >
> > > > Best regards,
> > > > Elisabeth
> > > >
> > > > 2016-03-08 10:24 GMT+01:00 Emir Arnautovic <
> > emir.arnauto...@sematext.com
> > > >:
> > > >
> > > > > Hi Elisabeth,
> > > > > I don't think there is such token filter, so you would have to
> create
> > > > your
> > > > > own token filter that takes token and emits ngram token of specific
> > > > length.
> > > > > It should not be too hard to create such filter - you can take a
> look
> > > how
> > > > > nagram filter is coded - yours should be simpler than that.
> > > > >
> > > > > Regards,
> > > > > Emir
> > > > >
> > > > >
> > > > > On 08.03.2016 08:52, elisabeth benoit wrote:
> > > > >
> > > > >> Hello,
> > > > >>
> > > > >> I'm using solr 4.10.1. I'd like to index words with ngrams of fix
> > > lenght
> > > > >> with a position in the end.
> > > > >>
> > > > >> For instance, with fix lenght 3, Amsterdam would be something
> like:
> > > > >>
> > > > >>
> > > > >> a0 (two spaces added at beginning)
> > > > >> am1
> > > > >> ams2
> > > > >> mst3
> > > > >> ste4
> > > > >> ter5
> > > > >> erd6
> > > > >> rda7
> > > > >> dam8
> > > > >> am9 (one more space in the end)
> > > > >>
> > > > >> The number at the end being the position.
> > > > >>
> > > > >> Does anyone have a clue how to achieve this?
> > > > >>
> > > > >> Best regards,
> > > > >> Elisabeth
> > > > >>
> > > > >>
> > > > > --
> > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log
> > Management
> > > > > Solr & Elasticsearch Support * http://sematext.com/
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --------------------------
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Reply via email to