Re: Search while typing (incremental search)

2021-10-08 Thread Michael Wechner




Am 08.10.21 um 18:49 schrieb Michael Sokolov:

Thank you for offering to add to the FAQ! Indeed it should mention the
suggester capability. I think you have permissions to edit that wiki?


yes :-)


Please go ahead and I think add a link to the suggest module javadocs


ok, will do!

Thanks

Michael



On Thu, Oct 7, 2021 at 2:30 AM Michael Wechner
 wrote:

Thanks very much for your feedback!

I will try it :-)

As I wrote I would like to add a summary to the Lucene FAQ
(https://cwiki.apache.org/confluence/display/lucene/lucenefaq)

Would the following questions make sense?

   - "Does Lucene support incremental search?"

   - "Does Lucene support auto completion suggestions?"

Or would other other terms / or another wording make more sense?

Thanks

Michael



Am 07.10.21 um 01:14 schrieb Robert Muir:

TLDR: use the lucene suggest/ package. Start with building suggester
from your query logs (either a file or index them).
These have a lot of flexibility about how the matches happen, for
example pure prefixes, edit distance typos, infix matching, analysis
chain, even now Japanese input-method integration :)

Run that suggester on the user input, retrieving say, the top 5-10
matches of relevant query suggestions.
return those in the UI (typical autosuggest-type field), but also run
a search on the first one.

The user gets the instant-search experience, but when they type 'tes',
you search on 'tesla' (if that's the top-suggested query, the
highlighted one in the autocomplete). if they arrow-down to another
suggestion such as 'test' or type a 't' or use the mouse or whatever,
then the process runs again and they see the results for that.

IMO for most cases this leads to a saner experience than trying to
rank all documents based on a prefix 'tes': the problem is there is
still too much query ambiguity, not really any "keywords" yet, so
trying to rank those documents won't be very useful. Instead you try
to "interact" with the user to present results in a useful way that
they can navigate.

On the other hand if you really want to just search on prefixes and
jumble up the results (perhaps because you are gonna just sort by some
custom document feature instead of relevance), then you can do that if
you really want. You can use the n-gram/edge-ngram/shingle filters in
the analysis package for that.

On Wed, Oct 6, 2021 at 5:37 PM Michael Wechner
 wrote:

Hi

I am trying to implement a search with Lucene similar to what for
example various "Note Apps" (e.g. "Google Keep" or "Samsung Notes") are
offering, that with every new letter typed a new search is being executed.

For example when I type "tes", then all documents are being returned
containing the word "test" or "tesla" and when I continue typing, for
example "tesö" and there are no documents containing the string "tesö",
then the app will tell me that there are no matches.

I have found a couple of articles related to this kind of search, for
example

https://stackoverflow.com/questions/10828825/incremental-search-using-lucene

https://stackoverflow.com/questions/120180/how-to-do-query-auto-completion-suggestions-in-lucene

but would be great to know whether there exist other possibilities or
what the best practice is?

I am even not sure what the right term for this kind of search is, is it
really "incremental search" or something else?

Looking forward to your feedback and will be happy to extend the Lucene
FAQ once I understand better :-)

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Search while typing (incremental search)

2021-10-08 Thread Michael Sokolov
Thank you for offering to add to the FAQ! Indeed it should mention the
suggester capability. I think you have permissions to edit that wiki?
Please go ahead and I think add a link to the suggest module javadocs

On Thu, Oct 7, 2021 at 2:30 AM Michael Wechner
 wrote:
>
> Thanks very much for your feedback!
>
> I will try it :-)
>
> As I wrote I would like to add a summary to the Lucene FAQ
> (https://cwiki.apache.org/confluence/display/lucene/lucenefaq)
>
> Would the following questions make sense?
>
>   - "Does Lucene support incremental search?"
>
>   - "Does Lucene support auto completion suggestions?"
>
> Or would other other terms / or another wording make more sense?
>
> Thanks
>
> Michael
>
>
>
> Am 07.10.21 um 01:14 schrieb Robert Muir:
> > TLDR: use the lucene suggest/ package. Start with building suggester
> > from your query logs (either a file or index them).
> > These have a lot of flexibility about how the matches happen, for
> > example pure prefixes, edit distance typos, infix matching, analysis
> > chain, even now Japanese input-method integration :)
> >
> > Run that suggester on the user input, retrieving say, the top 5-10
> > matches of relevant query suggestions.
> > return those in the UI (typical autosuggest-type field), but also run
> > a search on the first one.
> >
> > The user gets the instant-search experience, but when they type 'tes',
> > you search on 'tesla' (if that's the top-suggested query, the
> > highlighted one in the autocomplete). if they arrow-down to another
> > suggestion such as 'test' or type a 't' or use the mouse or whatever,
> > then the process runs again and they see the results for that.
> >
> > IMO for most cases this leads to a saner experience than trying to
> > rank all documents based on a prefix 'tes': the problem is there is
> > still too much query ambiguity, not really any "keywords" yet, so
> > trying to rank those documents won't be very useful. Instead you try
> > to "interact" with the user to present results in a useful way that
> > they can navigate.
> >
> > On the other hand if you really want to just search on prefixes and
> > jumble up the results (perhaps because you are gonna just sort by some
> > custom document feature instead of relevance), then you can do that if
> > you really want. You can use the n-gram/edge-ngram/shingle filters in
> > the analysis package for that.
> >
> > On Wed, Oct 6, 2021 at 5:37 PM Michael Wechner
> >  wrote:
> >> Hi
> >>
> >> I am trying to implement a search with Lucene similar to what for
> >> example various "Note Apps" (e.g. "Google Keep" or "Samsung Notes") are
> >> offering, that with every new letter typed a new search is being executed.
> >>
> >> For example when I type "tes", then all documents are being returned
> >> containing the word "test" or "tesla" and when I continue typing, for
> >> example "tesö" and there are no documents containing the string "tesö",
> >> then the app will tell me that there are no matches.
> >>
> >> I have found a couple of articles related to this kind of search, for
> >> example
> >>
> >> https://stackoverflow.com/questions/10828825/incremental-search-using-lucene
> >>
> >> https://stackoverflow.com/questions/120180/how-to-do-query-auto-completion-suggestions-in-lucene
> >>
> >> but would be great to know whether there exist other possibilities or
> >> what the best practice is?
> >>
> >> I am even not sure what the right term for this kind of search is, is it
> >> really "incremental search" or something else?
> >>
> >> Looking forward to your feedback and will be happy to extend the Lucene
> >> FAQ once I understand better :-)
> >>
> >> Thanks
> >>
> >> Michael
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: IntervalQuery replacement for SpanFirstQuery? Closest replacement for slops?

2021-10-08 Thread Uwe Schindler
Hi Alan,

this was all very helpful. Another thing about the intervals and transformation 
from SpanQuery to IntervalQuery: IntervalQuery only returns a score between 
0..1 and does not take term statistics into account. To combine them with term 
scoring, one should combine it with some term queries (which is perfectly fine 
as it decouples term scoring from their position and allows more flexibility).

My question now (and maybe this should be documented in some MIGRATE.txt or the 
Javadocs): How to best combine the scores from TermQuery and IntervalQuery to 
get a scoring *similar* (not identical) to the good old SpanQueries? I tried to 
read the SpanQuery scoring mechanisms but gave up because I did not figure out 
where the final score of the terms is combined with the span score.

My first idea was to create a BooleanQuery with the IntervalQuery as MUST 
clause and all terms appearing somewhere in the (positive) intervals  added as 
SHOULD clauses. My problem is now that the number of terms differs from query 
to query, but the IntervalQuery only adds 0..1 to the total score. So should 
you use a BoostQuery around the IntervalQuery that boosts by the number of 
terms added as sibling should clauses? Other suggestions?

Uwe

-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Alan Woodward 
> Sent: Monday, September 21, 2020 7:56 PM
> To: Dawid Weiss 
> Cc: Lucene Users 
> Subject: Re: IntervalQuery replacement for SpanFirstQuery? Closest
> replacement for slops?
> 
> Your filtered query should work the same as a SpanFirst, yes.  I didn’t add a
> shortcut just because you can do it this way, but feel free to add it if you 
> think
> it’s useful!
> 
> Re sloppy phrases, this one is trickier.  The closest you can get at the 
> moment is
> an unordered near, but that’s not the same thing as it doesn’t take
> transpositions into account when calculating the slop.  I think it should be
> possible to write something that works similarly to SloppyPhraseMatcher, but
> as always the tricky part is in dealing with duplicate entries.  I have some 
> ideas
> but they’re not ready to commit yet, unfortunately.
> 
> In terms of your suggested replacements: maxwidth will give you the
> equivalent of a SpanNearUnordered.  Maxgaps gives a restriction on how many
> internal holes there are in the query, so works better if the constituent 
> intervals
> are not necessarily single terms.
> 
> > On 21 Sep 2020, at 18:47, Dawid Weiss  wrote:
> >
> >
> > For what it is worth, I would be also interested in answers to these 
> > questions.
> ;)
> >
> > On Mon, Sep 21, 2020, 19:08 Uwe Schindler  > wrote:
> > Hi all, hi Alan,
> >
> > I am currently rewriting some SpanQuery code to use IntervalQuery. Most of
> the transformations can be done quite easily and it is also better to read 
> after
> transformation. What I am missing a bit is some document to compare the
> different query types and a guide how to convert those.
> >
> > I did not find a replacement for SpanFirstQuery (or at least any query stat
> takes absolute positions). I know intervals more deal with term intervals, 
> but I
> was successful in replacing a SpanFirstQuery with this:
> > IntervalsSource term = Intervals.term("foo");
> > IntervalsSource filtered = new 
> > FilteredIntervalsSource("FIRST"+distance,
> term) {
> >   @Override
> >   protected boolean accept(IntervalIterator it) {
> > return it.end() < distance; // or should this be <= distance???
> >   }
> > };
> > Query = new IntervalQuery(field, iv2);
> >
> > I am not fully sure if this works under all circumstances . To me it looks
> fine and also did work with more complex intervals than "term". If this is ok,
> how about adding a "first(int n, IntervalsSource iv)" method to Intervals 
> class?
> >
> > The second question: What's the "closest" replacement for a PhraseQuery
> with slop? Should I use maxwidth(slop + 1) or maxgaps(slop-1) or
> maxgaps(slop). I know SpanQuery slops cannot be fully replaced with intervals,
> but I don't care about those SpanQuery bugs.
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > https://www.thetaphi.de 
> > eMail: u...@thetaphi.de 
> >
> >
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> 
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> >



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org