Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-18 Thread Dawid Weiss
Hi Chris, > Because if you can adjust your parser syntax, this literallyly just > becomes: ' field:"foo bar"~N ' ... where N is the positionIncrementGap > on your analyzer ... OR ... ' field:"foo bar" ' ... if you call > setPhraseSlop on your QueryParser. Yes - correct. This would be

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-17 Thread Chris Hostetter
(caveat: i don't ever really understand what Intervals at hte lucene feature set stage) : Yup - similar to what Alan suggested. I'd have to rewrite the (general : text-to-query) query parser to only use intervals though. Still : thinking about possible approaches to this. ... : > You

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-14 Thread Dawid Weiss
Thanks Michael. The outcome of this discussion seems to be clear that everyone is trying to reinvent the wheel somehow. ;) I think it really should become part of core Lucene functionality. Seems like a corner case people are not aware of until they hit it (and then it's not clear what to do about

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-14 Thread Michael Gibney
This might be a little outside the spirit of this discussion (in that it's not really "off-the-shelf") -- but I implemented a proof-of-concept for a different use case that I think could be adapted here: For a given doc, for each term in your multivalued field, you could record a bitset

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-14 Thread Dawid Weiss
bq. Expanding a query over numerous fields grows combinatorically in the number of fields (if I want my query to match when all terms match in *some* field), doesn't it? I don't think it does? It grows linearly with the number of fields? In my experience the number of fields searchable "by

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-11 Thread Gus Heck
You're thinking of SurroundQuery parser for span queries I think... https://lucene.apache.org/solr/guide/8_6/other-parsers.html#surround-query-parser and the Advanced Query Parser will have a similar syntax On Thu, Sep 10, 2020 at 4:40 PM Michael Sokolov wrote: > A slightly different but

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread Michael Sokolov
A slightly different but related topic is how to manage lots of fields I agree that sub-fields are a pain and that mashing everything together in an all-field is a mess, but for best performance with a large number of fields/sub-fields, it is the only workable option I can see? Expanding a query

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread Dawid Weiss
> Ok so the more general question is whether we need an interval query parser Oh, to this I'd say: yes, yes, yes. I didn't have much prior experience writing frontend apps on top of Solr/Lucene but once I did have to go that route it quickly turns out that several things that are readily

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread jim ferenczi
Ok so the more general question is whether we need an interval query parser Le jeu. 10 sept. 2020 à 17:28, Dawid Weiss a écrit : > I am fine with the boundary token suggestion, actually. What I don't > see at the moment is how I can marry it with an output of a general > query parser (which

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread Dawid Weiss
I am fine with the boundary token suggestion, actually. What I don't see at the moment is how I can marry it with an output of a general query parser (which returns any Query). I could give an attempt to process the query node tree from standard query parser (which we're using at the moment

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread jim ferenczi
Right, I misunderstood Alan's answer. The boundary option is not "impure" in my opinion. It solves this issue nicely but maybe it needs something more packaged to add the boundaries and build queries easily. Le jeu. 10 sept. 2020 à 16:16, Dawid Weiss a écrit : > Yup - similar to what Alan

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread Dawid Weiss
Yup - similar to what Alan suggested. I'd have to rewrite the (general text-to-query) query parser to only use intervals though. Still thinking about possible approaches to this. D. On Thu, Sep 10, 2020 at 3:58 PM jim ferenczi wrote: > > You could set a very high position increment gap for

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread jim ferenczi
You could set a very high position increment gap for multi-valued fields (Analyzer#getPositionIncrementGap) and perform something like Intervals.maxWidth(Intervals.unordered(...), pos_gap-1) ? Le jeu. 10 sept. 2020 à 12:32, Dawid Weiss a écrit : > Yeah... I was thinking about adding synthetic

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread Dawid Weiss
Yeah... I was thinking about adding synthetic boundaries but this seems... impure. :) Another quick reflection is that I'd have to somehow translate the original query (which can be arbitrarily complex) into an interval query. Tough. D. On Thu, Sep 10, 2020 at 12:22 PM Alan Woodward wrote: > >

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread Alan Woodward
I’ve solved this sort of thing in the past by indexing boundary tokens, and wrapping the queries with the equivalent of Intervals.notContaining(query, boundary-query); you could also put a very large position increment gap and use a width filter, but that’s a bit more error prone if you could