[jira] [Commented] (LUCENE-9114) Add FunctionValues.cost
[ https://issues.apache.org/jira/browse/LUCENE-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109086#comment-17109086 ] David Smiley commented on LUCENE-9114: -- Yep I know we tried then avoided making ValueSourceScorer mutable for the cost and ultimately stopped at the way it is. I should have tried it myself as a final Q/A at that time. I can be forgiven; I was on vacation at the time :) RE "returning a weird Float.NEGATIVE_INFINITY" I don't see how this comes into play if there's an optional interface. If there is no optional interface, then ValueSourceScorer would have to cast the weight to FunctionRangeWeight in particular, which isn't cool. > Add FunctionValues.cost > --- > > Key: LUCENE-9114 > URL: https://issues.apache.org/jira/browse/LUCENE-9114 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/query >Reporter: David Smiley >Assignee: Atri Sharma >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > The FunctionRangeQuery uses FunctionValues.getRangeScorer which returns a > subclass of ValueSourceScorer. VSC's TwoPhaseIterator has a matchCost impl > that returns a constant 100. This is pretty terrible; the cost should vary > based on the complexity of the ValueSource provided to FRQ. ValueSource's > are typically nested a number of levels, so they should aggregate. > BTW there is a parallel concern for FunctionMatchQuery which works with > DoubleValuesSource which doesn't have a cost either, and unsurprisingly there > is a TPI with matchCost 100 there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9114) Add FunctionValues.cost
[ https://issues.apache.org/jira/browse/LUCENE-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108885#comment-17108885 ] Atri Sharma commented on LUCENE-9114: - [~dsmiley] If you recall, that was one of the ways that I had done in the iteration for this PR :) I agree with allowing the Weight to define an internal cost that ValueSourceScorer.matchCost can delegate to – it can be return a weird value (Float.NEGATIVE_INFINITY) to define that it is not implemented and then it is the matchCost's job to ensure that it does the right thing? > Add FunctionValues.cost > --- > > Key: LUCENE-9114 > URL: https://issues.apache.org/jira/browse/LUCENE-9114 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/query >Reporter: David Smiley >Assignee: Atri Sharma >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > The FunctionRangeQuery uses FunctionValues.getRangeScorer which returns a > subclass of ValueSourceScorer. VSC's TwoPhaseIterator has a matchCost impl > that returns a constant 100. This is pretty terrible; the cost should vary > based on the complexity of the ValueSource provided to FRQ. ValueSource's > are typically nested a number of levels, so they should aggregate. > BTW there is a parallel concern for FunctionMatchQuery which works with > DoubleValuesSource which doesn't have a cost either, and unsurprisingly there > is a TPI with matchCost 100 there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9114) Add FunctionValues.cost
[ https://issues.apache.org/jira/browse/LUCENE-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108878#comment-17108878 ] David Smiley commented on LUCENE-9114: -- [~atris] I noticed you forgot the "fix version" here which apparently should be "master (9.0)" based on the fact that you didn't port to branch_8x. But why not back-port? AFAICT it's backwards compatible. I was looking at this tonight to see how difficult it may be to supply the cost to FunctionRangeQuery (in Lucene) somehow. It's pretty difficult – requiring a delegating ValueSource and worse a delegating FunctionValues which is a huge interface and would add some overhead to per-document evaluation. I thought of another approach to customize the cost: What if there was an interface HasMatchCost (perhaps declared within the ValueSourceScorer interface to clearly associate where it's used) that can be implemented by the Weight, in this case, FunctionRangeWeight. ValueSourceRangeScorer.matchCost could check its Weight to see if it implements this, and if so then cast and call matchCost on that to return. I took a peek at the similar FunctionMatchQuery class to see what's different there, and I see the matchCost is defined within this file and thus should be very easy to customize the cost. Given that the ValueSource API is a big Legacy and "DoubleValueSource" & "LongValueSource" (used by FunctionMatchQuery) is the future, maybe I should just go this route instead. > Add FunctionValues.cost > --- > > Key: LUCENE-9114 > URL: https://issues.apache.org/jira/browse/LUCENE-9114 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/query >Reporter: David Smiley >Assignee: Atri Sharma >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > The FunctionRangeQuery uses FunctionValues.getRangeScorer which returns a > subclass of ValueSourceScorer. VSC's TwoPhaseIterator has a matchCost impl > that returns a constant 100. This is pretty terrible; the cost should vary > based on the complexity of the ValueSource provided to FRQ. ValueSource's > are typically nested a number of levels, so they should aggregate. > BTW there is a parallel concern for FunctionMatchQuery which works with > DoubleValuesSource which doesn't have a cost either, and unsurprisingly there > is a TPI with matchCost 100 there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9114) Add FunctionValues.cost
[ https://issues.apache.org/jira/browse/LUCENE-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051790#comment-17051790 ] ASF subversion and git services commented on LUCENE-9114: - Commit d751cf626ec639d38b955d3962ae347aea00c0ac in lucene-solr's branch refs/heads/master from Atri Sharma [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d751cf6 ] LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation (#1303) This commit makes ValueSourceScorer's costing algorithm also take the delegated FunctionValues's cost into consideration when calculating its cost. FunctionValues now exposes a cost method which is used by ValueSourceScorer's default matchCost method. In addition, ValueSourceScorer exposes a matchCost method which can be overridden to specify a custom costing mechanism > Add FunctionValues.cost > --- > > Key: LUCENE-9114 > URL: https://issues.apache.org/jira/browse/LUCENE-9114 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/query >Reporter: David Smiley >Assignee: Atri Sharma >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > The FunctionRangeQuery uses FunctionValues.getRangeScorer which returns a > subclass of ValueSourceScorer. VSC's TwoPhaseIterator has a matchCost impl > that returns a constant 100. This is pretty terrible; the cost should vary > based on the complexity of the ValueSource provided to FRQ. ValueSource's > are typically nested a number of levels, so they should aggregate. > BTW there is a parallel concern for FunctionMatchQuery which works with > DoubleValuesSource which doesn't have a cost either, and unsurprisingly there > is a TPI with matchCost 100 there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9114) Add FunctionValues.cost
[ https://issues.apache.org/jira/browse/LUCENE-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048492#comment-17048492 ] Atri Sharma commented on LUCENE-9114: - [~dsmiley] I have raised a PR for the same -- it is a minimalistic change to allow VSS to incorporate the delegated FunctionValues's cost into its cost. Would that help you get unblocked by adding stacked FunctionValues with custom costing functions? > Add FunctionValues.cost > --- > > Key: LUCENE-9114 > URL: https://issues.apache.org/jira/browse/LUCENE-9114 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/query >Reporter: David Smiley >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The FunctionRangeQuery uses FunctionValues.getRangeScorer which returns a > subclass of ValueSourceScorer. VSC's TwoPhaseIterator has a matchCost impl > that returns a constant 100. This is pretty terrible; the cost should vary > based on the complexity of the ValueSource provided to FRQ. ValueSource's > are typically nested a number of levels, so they should aggregate. > BTW there is a parallel concern for FunctionMatchQuery which works with > DoubleValuesSource which doesn't have a cost either, and unsurprisingly there > is a TPI with matchCost 100 there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9114) Add FunctionValues.cost
[ https://issues.apache.org/jira/browse/LUCENE-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048261#comment-17048261 ] Atri Sharma commented on LUCENE-9114: - I strongly believe that this is the right approach and we should be pursuing this. I am actively working on this and will post a patch by Monday morning > Add FunctionValues.cost > --- > > Key: LUCENE-9114 > URL: https://issues.apache.org/jira/browse/LUCENE-9114 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/query >Reporter: David Smiley >Priority: Major > > The FunctionRangeQuery uses FunctionValues.getRangeScorer which returns a > subclass of ValueSourceScorer. VSC's TwoPhaseIterator has a matchCost impl > that returns a constant 100. This is pretty terrible; the cost should vary > based on the complexity of the ValueSource provided to FRQ. ValueSource's > are typically nested a number of levels, so they should aggregate. > BTW there is a parallel concern for FunctionMatchQuery which works with > DoubleValuesSource which doesn't have a cost either, and unsurprisingly there > is a TPI with matchCost 100 there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9114) Add FunctionValues.cost
[ https://issues.apache.org/jira/browse/LUCENE-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047928#comment-17047928 ] David Smiley commented on LUCENE-9114: -- If this is too time-consuming, we could reduce the scope to merely make the cost settable by a query parser (this issue not touching any query parser, however). That's the minimum I need to unblock using the costs on the Solr side (it's QParsers), which I want to do after this. > Add FunctionValues.cost > --- > > Key: LUCENE-9114 > URL: https://issues.apache.org/jira/browse/LUCENE-9114 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/query >Reporter: David Smiley >Priority: Major > > The FunctionRangeQuery uses FunctionValues.getRangeScorer which returns a > subclass of ValueSourceScorer. VSC's TwoPhaseIterator has a matchCost impl > that returns a constant 100. This is pretty terrible; the cost should vary > based on the complexity of the ValueSource provided to FRQ. ValueSource's > are typically nested a number of levels, so they should aggregate. > BTW there is a parallel concern for FunctionMatchQuery which works with > DoubleValuesSource which doesn't have a cost either, and unsurprisingly there > is a TPI with matchCost 100 there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9114) Add FunctionValues.cost
[ https://issues.apache.org/jira/browse/LUCENE-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026083#comment-17026083 ] Atri Sharma commented on LUCENE-9114: - +1 - I am on it. I am a bit groggy from sleep right now -- will post my thoughts tomorrow. > Add FunctionValues.cost > --- > > Key: LUCENE-9114 > URL: https://issues.apache.org/jira/browse/LUCENE-9114 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/query >Reporter: David Smiley >Priority: Major > > The FunctionRangeQuery uses FunctionValues.getRangeScorer which returns a > subclass of ValueSourceScorer. VSC's TwoPhaseIterator has a matchCost impl > that returns a constant 100. This is pretty terrible; the cost should vary > based on the complexity of the ValueSource provided to FRQ. ValueSource's > are typically nested a number of levels, so they should aggregate. > BTW there is a parallel concern for FunctionMatchQuery which works with > DoubleValuesSource which doesn't have a cost either, and unsurprisingly there > is a TPI with matchCost 100 there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9114) Add FunctionValues.cost
[ https://issues.apache.org/jira/browse/LUCENE-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026082#comment-17026082 ] David Smiley commented on LUCENE-9114: -- I'd love it if you take it [~atris]; thanks! I have no WIP. I was thinking that the typical cost could be something like DEF_COST + the sum of costs of delegated FunctionValues. DEF_COST being say 5, perhaps less. This way you don't think to much; you can mechanically add a bunch of cost impls in this way. And if you or someone writes a different more thoughtful cost, it'll be more apparent that it's not the default. Maybe DoubleFieldSource should have an impl defaulting to its NumericDocValue's cost(), maybe plus some constant. But no, NumericDocValues is a DocIdSetIterator and the cost of that is the # of docs (100's of thousands maybe), not the cost-per-lookup. Be mindful of the distinction. I think the PR should be against master and make cost abstract, thus forcing implementations to choose something sensible. The 8x backport should provide a default implementation, though, for back-compat. > Add FunctionValues.cost > --- > > Key: LUCENE-9114 > URL: https://issues.apache.org/jira/browse/LUCENE-9114 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/query >Reporter: David Smiley >Priority: Major > > The FunctionRangeQuery uses FunctionValues.getRangeScorer which returns a > subclass of ValueSourceScorer. VSC's TwoPhaseIterator has a matchCost impl > that returns a constant 100. This is pretty terrible; the cost should vary > based on the complexity of the ValueSource provided to FRQ. ValueSource's > are typically nested a number of levels, so they should aggregate. > BTW there is a parallel concern for FunctionMatchQuery which works with > DoubleValuesSource which doesn't have a cost either, and unsurprisingly there > is a TPI with matchCost 100 there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9114) Add FunctionValues.cost
[ https://issues.apache.org/jira/browse/LUCENE-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026048#comment-17026048 ] Atri Sharma commented on LUCENE-9114: - [~dsmiley] This looks interesting -- I am happy to hack on this one unless you are planning to. Please let me know. > Add FunctionValues.cost > --- > > Key: LUCENE-9114 > URL: https://issues.apache.org/jira/browse/LUCENE-9114 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/query >Reporter: David Smiley >Priority: Major > > The FunctionRangeQuery uses FunctionValues.getRangeScorer which returns a > subclass of ValueSourceScorer. VSC's TwoPhaseIterator has a matchCost impl > that returns a constant 100. This is pretty terrible; the cost should vary > based on the complexity of the ValueSource provided to FRQ. ValueSource's > are typically nested a number of levels, so they should aggregate. > BTW there is a parallel concern for FunctionMatchQuery which works with > DoubleValuesSource which doesn't have a cost either, and unsurprisingly there > is a TPI with matchCost 100 there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org