[jira] [Commented] (LUCENE-8759) BlockMaxConjunctionScorer's simplified way of computing max scores hurts performance

2019-04-16 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819305#comment-16819305
 ] 

Adrien Grand commented on LUCENE-8759:
--

Maybe the test could explicitly test both normal and denormal floats all the 
time? Otherwise +1.

I'm curious whether this makes any difference when running luceneutil? 

> BlockMaxConjunctionScorer's simplified way of computing max scores hurts 
> performance
> 
>
> Key: LUCENE-8759
> URL: https://issues.apache.org/jira/browse/LUCENE-8759
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8759.patch
>
>
> BlockMaxConjunctionScorer computes the minimum value that the score should 
> have after each scorer in order to be able to interrupt scorer as soon as 
> possible. For instance say scorers A, B and C produce maximum scores that are 
> equal to 4, 2 and 1. If the minimum competitive score is X, then the score 
> after scoring A, B and C must be at least X, the score after scoring A and B 
> must be at least X-1 and the score after scoring A must be at least X-1-2.
> However this is made a bit more complex than that due to floating-point 
> numbers and the fact that intermediate score values are doubles which only 
> get casted to a float after all values have been summed up. In order to keep 
> things simple, BlockMaxConjunctionScore has the following comment and code
> {code}
> // Also compute the minimum required scores for a hit to be 
> competitive
> // A double that is less than 'score' might still be converted to 
> 'score'
> // when casted to a float, so we go to the previous float to avoid 
> this issue
> minScores[minScores.length - 1] = minScore > 0 ? 
> Math.nextDown(minScore) : 0;
> {code}
> It simplifies the problem by calling Math.nextDown(minScore). However this is 
> problematic because it defeats the fact that TopScoreDocCollector calls 
> setMinCompetitiveScore on the float value that is immediately greater than 
> the k-th greatest hit so far.
> nextDown(minScore) is not the value that we need. The value that we need is 
> the smallest double that converts to minScore when casted to a float, which 
> would be half-way between nextDown(minScore) and minScore. In some cases this 
> would help get better performance out of conjunctions, especially if some 
> clauses produce constant scores.
> MaxScoreSumPropagator#setMinCompetitiveScore has the same issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8759) BlockMaxConjunctionScorer's simplified way of computing max scores hurts performance

2019-04-16 Thread Jim Ferenczi (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819179#comment-16819179
 ] 

Jim Ferenczi commented on LUCENE-8759:
--

Thanks that looks cleaner, I think I got confused because I was not removing 1 
to the floatBits. I tested your solution on all positive floats and it works so 
I'll commit soon unless you have objections.

> BlockMaxConjunctionScorer's simplified way of computing max scores hurts 
> performance
> 
>
> Key: LUCENE-8759
> URL: https://issues.apache.org/jira/browse/LUCENE-8759
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8759.patch
>
>
> BlockMaxConjunctionScorer computes the minimum value that the score should 
> have after each scorer in order to be able to interrupt scorer as soon as 
> possible. For instance say scorers A, B and C produce maximum scores that are 
> equal to 4, 2 and 1. If the minimum competitive score is X, then the score 
> after scoring A, B and C must be at least X, the score after scoring A and B 
> must be at least X-1 and the score after scoring A must be at least X-1-2.
> However this is made a bit more complex than that due to floating-point 
> numbers and the fact that intermediate score values are doubles which only 
> get casted to a float after all values have been summed up. In order to keep 
> things simple, BlockMaxConjunctionScore has the following comment and code
> {code}
> // Also compute the minimum required scores for a hit to be 
> competitive
> // A double that is less than 'score' might still be converted to 
> 'score'
> // when casted to a float, so we go to the previous float to avoid 
> this issue
> minScores[minScores.length - 1] = minScore > 0 ? 
> Math.nextDown(minScore) : 0;
> {code}
> It simplifies the problem by calling Math.nextDown(minScore). However this is 
> problematic because it defeats the fact that TopScoreDocCollector calls 
> setMinCompetitiveScore on the float value that is immediately greater than 
> the k-th greatest hit so far.
> nextDown(minScore) is not the value that we need. The value that we need is 
> the smallest double that converts to minScore when casted to a float, which 
> would be half-way between nextDown(minScore) and minScore. In some cases this 
> would help get better performance out of conjunctions, especially if some 
> clauses produce constant scores.
> MaxScoreSumPropagator#setMinCompetitiveScore has the same issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8759) BlockMaxConjunctionScorer's simplified way of computing max scores hurts performance

2019-04-16 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819154#comment-16819154
 ] 

Adrien Grand commented on LUCENE-8759:
--

I gave a try at simplifying this a bit and have the following version which 
passes your test:

{code}
if (value <= 0) {
  throw new IllegalArgumentException("Value must be > 0, got " + value);
} else if (Float.isFinite(value) == false) { // preserve infinity or NaN
  return value;
}

final int floatBits = Float.floatToIntBits(value);
final int prevFloatBits = floatBits - 1;
final int prevFloatExp = prevFloatBits >>> 23;

// delta between the mantissa of the double representation of `value` and
// the previous float value is 2^shift
int shift = 52 - 23;
if (prevFloatExp == 0x0) {
  // we need to tune `shift` for denormal floats whose mantissa doesn't have
  // an implicit leading bit
  shift += Integer.numberOfLeadingZeros(prevFloatBits) - (31-23);
}

long doubleBits = Double.doubleToLongBits(value);
doubleBits -= (1L << (shift - 1)); // half way between the current float and 
the previous one
doubleBits += (floatBits & 0x1); // add one if necessary to compensate for the 
fact that Java rounds to even in case of tie

return Double.longBitsToDouble(doubleBits);
{code}

> BlockMaxConjunctionScorer's simplified way of computing max scores hurts 
> performance
> 
>
> Key: LUCENE-8759
> URL: https://issues.apache.org/jira/browse/LUCENE-8759
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8759.patch
>
>
> BlockMaxConjunctionScorer computes the minimum value that the score should 
> have after each scorer in order to be able to interrupt scorer as soon as 
> possible. For instance say scorers A, B and C produce maximum scores that are 
> equal to 4, 2 and 1. If the minimum competitive score is X, then the score 
> after scoring A, B and C must be at least X, the score after scoring A and B 
> must be at least X-1 and the score after scoring A must be at least X-1-2.
> However this is made a bit more complex than that due to floating-point 
> numbers and the fact that intermediate score values are doubles which only 
> get casted to a float after all values have been summed up. In order to keep 
> things simple, BlockMaxConjunctionScore has the following comment and code
> {code}
> // Also compute the minimum required scores for a hit to be 
> competitive
> // A double that is less than 'score' might still be converted to 
> 'score'
> // when casted to a float, so we go to the previous float to avoid 
> this issue
> minScores[minScores.length - 1] = minScore > 0 ? 
> Math.nextDown(minScore) : 0;
> {code}
> It simplifies the problem by calling Math.nextDown(minScore). However this is 
> problematic because it defeats the fact that TopScoreDocCollector calls 
> setMinCompetitiveScore on the float value that is immediately greater than 
> the k-th greatest hit so far.
> nextDown(minScore) is not the value that we need. The value that we need is 
> the smallest double that converts to minScore when casted to a float, which 
> would be half-way between nextDown(minScore) and minScore. In some cases this 
> would help get better performance out of conjunctions, especially if some 
> clauses produce constant scores.
> MaxScoreSumPropagator#setMinCompetitiveScore has the same issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8759) BlockMaxConjunctionScorer's simplified way of computing max scores hurts performance

2019-04-16 Thread Jim Ferenczi (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819049#comment-16819049
 ] 

Jim Ferenczi commented on LUCENE-8759:
--

I tried this approach but the shift for denormalized numbers is computed from 
the float bits so I started from the same bits for normalized numbers too.

> BlockMaxConjunctionScorer's simplified way of computing max scores hurts 
> performance
> 
>
> Key: LUCENE-8759
> URL: https://issues.apache.org/jira/browse/LUCENE-8759
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8759.patch
>
>
> BlockMaxConjunctionScorer computes the minimum value that the score should 
> have after each scorer in order to be able to interrupt scorer as soon as 
> possible. For instance say scorers A, B and C produce maximum scores that are 
> equal to 4, 2 and 1. If the minimum competitive score is X, then the score 
> after scoring A, B and C must be at least X, the score after scoring A and B 
> must be at least X-1 and the score after scoring A must be at least X-1-2.
> However this is made a bit more complex than that due to floating-point 
> numbers and the fact that intermediate score values are doubles which only 
> get casted to a float after all values have been summed up. In order to keep 
> things simple, BlockMaxConjunctionScore has the following comment and code
> {code}
> // Also compute the minimum required scores for a hit to be 
> competitive
> // A double that is less than 'score' might still be converted to 
> 'score'
> // when casted to a float, so we go to the previous float to avoid 
> this issue
> minScores[minScores.length - 1] = minScore > 0 ? 
> Math.nextDown(minScore) : 0;
> {code}
> It simplifies the problem by calling Math.nextDown(minScore). However this is 
> problematic because it defeats the fact that TopScoreDocCollector calls 
> setMinCompetitiveScore on the float value that is immediately greater than 
> the k-th greatest hit so far.
> nextDown(minScore) is not the value that we need. The value that we need is 
> the smallest double that converts to minScore when casted to a float, which 
> would be half-way between nextDown(minScore) and minScore. In some cases this 
> would help get better performance out of conjunctions, especially if some 
> clauses produce constant scores.
> MaxScoreSumPropagator#setMinCompetitiveScore has the same issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8759) BlockMaxConjunctionScorer's simplified way of computing max scores hurts performance

2019-04-16 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818980#comment-16818980
 ] 

Adrien Grand commented on LUCENE-8759:
--

I'm wondering whether the logic could be simplified: the current code takes 
care of both casting to a double and rounding to the smallest double, maybe it 
could cast the float to a double and then twiddle bits of the double value 
(maybe it doesn't help, just wondering).

> BlockMaxConjunctionScorer's simplified way of computing max scores hurts 
> performance
> 
>
> Key: LUCENE-8759
> URL: https://issues.apache.org/jira/browse/LUCENE-8759
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8759.patch
>
>
> BlockMaxConjunctionScorer computes the minimum value that the score should 
> have after each scorer in order to be able to interrupt scorer as soon as 
> possible. For instance say scorers A, B and C produce maximum scores that are 
> equal to 4, 2 and 1. If the minimum competitive score is X, then the score 
> after scoring A, B and C must be at least X, the score after scoring A and B 
> must be at least X-1 and the score after scoring A must be at least X-1-2.
> However this is made a bit more complex than that due to floating-point 
> numbers and the fact that intermediate score values are doubles which only 
> get casted to a float after all values have been summed up. In order to keep 
> things simple, BlockMaxConjunctionScore has the following comment and code
> {code}
> // Also compute the minimum required scores for a hit to be 
> competitive
> // A double that is less than 'score' might still be converted to 
> 'score'
> // when casted to a float, so we go to the previous float to avoid 
> this issue
> minScores[minScores.length - 1] = minScore > 0 ? 
> Math.nextDown(minScore) : 0;
> {code}
> It simplifies the problem by calling Math.nextDown(minScore). However this is 
> problematic because it defeats the fact that TopScoreDocCollector calls 
> setMinCompetitiveScore on the float value that is immediately greater than 
> the k-th greatest hit so far.
> nextDown(minScore) is not the value that we need. The value that we need is 
> the smallest double that converts to minScore when casted to a float, which 
> would be half-way between nextDown(minScore) and minScore. In some cases this 
> would help get better performance out of conjunctions, especially if some 
> clauses produce constant scores.
> MaxScoreSumPropagator#setMinCompetitiveScore has the same issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org