Re: Multi-valued xxValue / xxValueSource implementations?

2021-10-26 Thread Robert Muir
On Tue, Oct 26, 2021 at 8:01 PM Robert Muir wrote: > > Hi Greg, I think the general issue is one of the API, the ValueSource > seems really geared at returning values from single-valued fields. I think really, this is the core issue. This ValueSource thing was created before the days of docvalues

Re: Multi-valued xxValue / xxValueSource implementations?

2021-10-26 Thread Robert Muir
A little history may help... (this is based on my bad memory, so it could all be wrong, nobody get offended): At the time, lucene could only sort single valued fields. But solr and elasticsearch would happily sort on multi-valued docs in various hacky ways. And this typically entailed large amoun

Re: Multi-valued xxValue / xxValueSource implementations?

2021-10-26 Thread Robert Muir
Hi Greg, I think the general issue is one of the API, the ValueSource seems really geared at returning values from single-valued fields. IMO, for the way the API is used (e.g. sorting), it makes sense to define a selector that works in O(1) time per-document, and use these existing valuesources:

Multi-valued xxValue / xxValueSource implementations?

2021-10-26 Thread Greg Miller
Hi folks- Out of curiosity, is there a reason Lucene doesn't have implementations for concepts like DoubleValues / DoubleValuesSource that support multiple values per document? Or maybe something like this does exist in Lucen that I'm not aware of? I can't believe this hasn't been a topic of discu

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Robert Muir
Well if, as I suggest, we use MultiTermQuery + DocValuesRewriteMethod to implement this, then the choice is yours. just run it against a "slow IndexReader" and go thru the ordinal map if you choose? There's nothing stopping you from doing that, and it will do what you want already. I just personal

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Joel Bernstein
There are times, particularly in ecommerce and access control, where speed really matters. So, you build stuff that's really fast at query time, with a tradeoff at commit time. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Oct 26, 2021 at 5:31 PM Robert Muir wrote: > Sorry, I don't thi

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Robert Muir
Sorry, I don't think there is a need to use any top-level ordinals. none of these docvalues-based query implementations need it. As far as query intersecting an input-stream, that is a big no-go. Lucene Queries need to have correct hashcode/equals/etc. That's why current stuff around this such as

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Joel Bernstein
One more wrinkle for extremely large lists, is pass the list in as an InputStream which is a presorted binary representation of the ASIN's and slide a BytesRef across the stream and merge it with the SortedDocValues. This saves on all the object creation and String overhead for really long lists of

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Joel Bernstein
If the list of ASIN's is presorted you can quickly merge it with the SortedDocValues and produce a FixedBitSet of the top level ordinals, which can be used as the post filter. This is a nice approach for things like passing in a long list of access control predicates. Joel Bernstein http://joelso

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Adrien Grand
I opened https://issues.apache.org/jira/browse/LUCENE-10207 about these ideas. On Tue, Oct 26, 2021 at 7:52 PM Robert Muir wrote: > On Tue, Oct 26, 2021 at 1:37 PM Adrien Grand wrote: > > > > > And then we could make an IndexOrDocValuesQuery with both the > TermInSetQuery and this SDV.newSlowIn

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Robert Muir
On Tue, Oct 26, 2021 at 1:37 PM Adrien Grand wrote: > > > And then we could make an IndexOrDocValuesQuery with both the > > TermInSetQuery and this SDV.newSlowInSetQuery? > > Unfortunately IndexOrDocValuesQuery relies on the fact that the "index" query > can evaluate its cost (ScorerSupplier#cos

Re: [External] : RE: Thank you! JDK 18 Early Access build 20 is now available

2021-10-26 Thread Rory O'Donnell
Many thanks Uwe, look forward to having a beer with you ! Rgds,Rory On 26/10/2021 17:50, Uwe Schindler wrote: Hallo Rory, huh, that’s good for you and bad for us 😊! I was wishing to see you one more time on FOSDEM, but looks like this will not happen (not only because of COVID). Maybe we ca

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Adrien Grand
> And then we could make an IndexOrDocValuesQuery with both the TermInSetQuery and this SDV.newSlowInSetQuery? Unfortunately IndexOrDocValuesQuery relies on the fact that the "index" query can evaluate its cost (ScorerSupplier#cost) without doing anything costly, which isn't the case for TermInSet

RE: Thank you! JDK 18 Early Access build 20 is now available

2021-10-26 Thread Uwe Schindler
Hallo Rory, huh, that’s good for you and bad for us 😊! I was wishing to see you one more time on FOSDEM, but looks like this will not happen (not only because of COVID). Maybe we can still have a beer together in Ireland! Unfortunately, I was not able to visit Ireland up to now, so I have a

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Robert Muir
On Tue, Oct 26, 2021 at 11:24 AM Robert Muir wrote: > > On Tue, Oct 26, 2021 at 10:58 AM Alan Woodward wrote: > > > > We have SortedSetDocValuesField.newSlowRangeQuery() which does something > > close to what you want here, I think. > > > > See also DocValuesRewriteMethod which might be useful,

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Robert Muir
On Tue, Oct 26, 2021 at 10:58 AM Alan Woodward wrote: > > We have SortedSetDocValuesField.newSlowRangeQuery() which does something > close to what you want here, I think. > See also DocValuesRewriteMethod which might be useful, at least as a start. You'd have to express the "SetQuery" as a Multi

Re: Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Alan Woodward
We have SortedSetDocValuesField.newSlowRangeQuery() which does something close to what you want here, I think. > On 26 Oct 2021, at 15:23, Michael McCandless > wrote: > > Hi Team, > > I was discussing this problem with Greg Miller (also at Amazon Product > Se

Slow DV equivalent of TermInSetQuery

2021-10-26 Thread Michael McCandless
Hi Team, I was discussing this problem with Greg Miller (also at Amazon Product Search): If I want to make a query that filters out a few primary keys (ASIN in our Amazon Product Search world), I can make a TermInSetQuery and add it as a MUST_NOT onto a BooleanQuery that has all the other interes

Thank you! JDK 18 Early Access build 20 is now available

2021-10-26 Thread Rory O'Donnell
Hi Uwe & Dawid, *Thank you.* I'm retiring at the end of November 2021, it's time to spend more time with the family. We started the Quality Outreach back in October 2014.  We now have 170+ projects participating. Thank you for taking the time to provide Testing feedback , excellent bugs and