[jira] [Commented] (LUCENE-9025) Add more efficient lookupTerm() overload to SortedSetDocValues

2019-11-04 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966651#comment-16966651
 ] 

Jason Gerlowski commented on LUCENE-9025:
-

Ok, fair enough.  I'll close this out then and wait for work to progress on 
LUCENE-8836.

> Add more efficient lookupTerm() overload to SortedSetDocValues
> --
>
> Key: LUCENE-9025
> URL: https://issues.apache.org/jira/browse/LUCENE-9025
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Priority: Minor
> Attachments: LUCENE-9025.patch
>
>
> {{SortedSetDocValues.lookupTerm(BytesRef)}} performs a binary search of the 
> entire docValues range to find the ordinal of the requested BytesRef.
> For an individual invocation, this is optimal.  Without other context, binary 
> search needs to cover the entire space.
> But there are some common uses of {{lookupTerm}} where this shouldn't be 
> necessary.  For example: making multiple {{lookupTerm}} calls to fetch the 
> ordinals for each value in a sorted list of terms.  {{lookupTerm}} will 
> binary-search the whole space on each invocation, even though the caller 
> knows that there's no point searching anything before the ordinal that came 
> back from the previous {{lookupTerm}} call.
> I propose we add a {{SortedSetDocValues.lookupTerm}} overload which takes a 
> lower-bound to start the binary search at: {{public long lookupTerm(BytesRef 
> key, long lowerSearchBound) throws IOException}}  This saves each 
> binary-search a few iterations in usage scenarios like the one described 
> above, which can conceivably add up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9025) Add more efficient lookupTerm() overload to SortedSetDocValues

2019-11-01 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964871#comment-16964871
 ] 

David Smiley commented on LUCENE-9025:
--

+1 to close as duplicate and we work on LUCENE-8836 instead.  No new API, just 
make it faster/smarter.

> Add more efficient lookupTerm() overload to SortedSetDocValues
> --
>
> Key: LUCENE-9025
> URL: https://issues.apache.org/jira/browse/LUCENE-9025
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Priority: Minor
> Attachments: LUCENE-9025.patch
>
>
> {{SortedSetDocValues.lookupTerm(BytesRef)}} performs a binary search of the 
> entire docValues range to find the ordinal of the requested BytesRef.
> For an individual invocation, this is optimal.  Without other context, binary 
> search needs to cover the entire space.
> But there are some common uses of {{lookupTerm}} where this shouldn't be 
> necessary.  For example: making multiple {{lookupTerm}} calls to fetch the 
> ordinals for each value in a sorted list of terms.  {{lookupTerm}} will 
> binary-search the whole space on each invocation, even though the caller 
> knows that there's no point searching anything before the ordinal that came 
> back from the previous {{lookupTerm}} call.
> I propose we add a {{SortedSetDocValues.lookupTerm}} overload which takes a 
> lower-bound to start the binary search at: {{public long lookupTerm(BytesRef 
> key, long lowerSearchBound) throws IOException}}  This saves each 
> binary-search a few iterations in usage scenarios like the one described 
> above, which can conceivably add up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9025) Add more efficient lookupTerm() overload to SortedSetDocValues

2019-10-25 Thread juan camilo rodriguez duran (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959818#comment-16959818
 ] 

juan camilo rodriguez duran commented on LUCENE-9025:
-

This issue is related with https://issues.apache.org/jira/browse/LUCENE-8836

> Add more efficient lookupTerm() overload to SortedSetDocValues
> --
>
> Key: LUCENE-9025
> URL: https://issues.apache.org/jira/browse/LUCENE-9025
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Priority: Minor
> Attachments: LUCENE-9025.patch
>
>
> {{SortedSetDocValues.lookupTerm(BytesRef)}} performs a binary search of the 
> entire docValues range to find the ordinal of the requested BytesRef.
> For an individual invocation, this is optimal.  Without other context, binary 
> search needs to cover the entire space.
> But there are some common uses of {{lookupTerm}} where this shouldn't be 
> necessary.  For example: making multiple {{lookupTerm}} calls to fetch the 
> ordinals for each value in a sorted list of terms.  {{lookupTerm}} will 
> binary-search the whole space on each invocation, even though the caller 
> knows that there's no point searching anything before the ordinal that came 
> back from the previous {{lookupTerm}} call.
> I propose we add a {{SortedSetDocValues.lookupTerm}} overload which takes a 
> lower-bound to start the binary search at: {{public long lookupTerm(BytesRef 
> key, long lowerSearchBound) throws IOException}}  This saves each 
> binary-search a few iterations in usage scenarios like the one described 
> above, which can conceivably add up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9025) Add more efficient lookupTerm() overload to SortedSetDocValues

2019-10-24 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958870#comment-16958870
 ] 

Adrien Grand commented on LUCENE-9025:
--

Adding this overload has a downside as well: it complicates the API by adding 
another method, and we care about keeping API surface area to a minimum.

I think your proposal also has the problem that it doesn't help significantly 
alone: if the lookup logic keeps doing a regular binary search, this will only 
save one lookup by ord + comparison on average, which doesn't sound very 
compelling. Instead I think that a more efficient solution would be to keep 
state about the last lookup in Lucene80DocValuesProducer's terms dict, there 
are several interesting things we could do:
 - move from a regular binary search to exponential search when looking up 
blocks,
 - optimize for the case that two consecutive looked up terms belong to the 
same block, we could save many iterations in the final linear scan across terms 
that belong to the same block

> Add more efficient lookupTerm() overload to SortedSetDocValues
> --
>
> Key: LUCENE-9025
> URL: https://issues.apache.org/jira/browse/LUCENE-9025
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Priority: Minor
> Attachments: LUCENE-9025.patch
>
>
> {{SortedSetDocValues.lookupTerm(BytesRef)}} performs a binary search of the 
> entire docValues range to find the ordinal of the requested BytesRef.
> For an individual invocation, this is optimal.  Without other context, binary 
> search needs to cover the entire space.
> But there are some common uses of {{lookupTerm}} where this shouldn't be 
> necessary.  For example: making multiple {{lookupTerm}} calls to fetch the 
> ordinals for each value in a sorted list of terms.  {{lookupTerm}} will 
> binary-search the whole space on each invocation, even though the caller 
> knows that there's no point searching anything before the ordinal that came 
> back from the previous {{lookupTerm}} call.
> I propose we add a {{SortedSetDocValues.lookupTerm}} overload which takes a 
> lower-bound to start the binary search at: {{public long lookupTerm(BytesRef 
> key, long lowerSearchBound) throws IOException}}  This saves each 
> binary-search a few iterations in usage scenarios like the one described 
> above, which can conceivably add up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9025) Add more efficient lookupTerm() overload to SortedSetDocValues

2019-10-24 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958835#comment-16958835
 ] 

Jason Gerlowski commented on LUCENE-9025:
-

I think an improvement at the TermsEnum level would be helpful, but at the end 
of the day that's only covering 1 of 10+ concrete implementations of the 
SortedSetDocValues class.  Granted, it covers the main implementation, but it'd 
be better to cover them all.  It also (depending on the implementation, I'm not 
sure what specific seekCeil optimization you had in mind) could hurt uses where 
the looked up term is _less_ than the current one.

Adding lower/upper bound params to a {{lookupTerm}} overload avoids both of 
these drawbacks since all implementations get the benefit, and callers have 
control over when the optimization (smaller search range) is triggered.

> Add more efficient lookupTerm() overload to SortedSetDocValues
> --
>
> Key: LUCENE-9025
> URL: https://issues.apache.org/jira/browse/LUCENE-9025
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Priority: Minor
> Attachments: LUCENE-9025.patch
>
>
> {{SortedSetDocValues.lookupTerm(BytesRef)}} performs a binary search of the 
> entire docValues range to find the ordinal of the requested BytesRef.
> For an individual invocation, this is optimal.  Without other context, binary 
> search needs to cover the entire space.
> But there are some common uses of {{lookupTerm}} where this shouldn't be 
> necessary.  For example: making multiple {{lookupTerm}} calls to fetch the 
> ordinals for each value in a sorted list of terms.  {{lookupTerm}} will 
> binary-search the whole space on each invocation, even though the caller 
> knows that there's no point searching anything before the ordinal that came 
> back from the previous {{lookupTerm}} call.
> I propose we add a {{SortedSetDocValues.lookupTerm}} overload which takes a 
> lower-bound to start the binary search at: {{public long lookupTerm(BytesRef 
> key, long lowerSearchBound) throws IOException}}  This saves each 
> binary-search a few iterations in usage scenarios like the one described 
> above, which can conceivably add up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9025) Add more efficient lookupTerm() overload to SortedSetDocValues

2019-10-23 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958265#comment-16958265
 ] 

Adrien Grand commented on LUCENE-9025:
--

We don't need a new method for this, we just need to optimize seekCeil for the 
case that the looked up term is greater than the current one? 
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java#L1059

> Add more efficient lookupTerm() overload to SortedSetDocValues
> --
>
> Key: LUCENE-9025
> URL: https://issues.apache.org/jira/browse/LUCENE-9025
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Priority: Minor
> Attachments: LUCENE-9025.patch
>
>
> {{SortedSetDocValues.lookupTerm(BytesRef)}} performs a binary search of the 
> entire docValues range to find the ordinal of the requested BytesRef.
> For an individual invocation, this is optimal.  Without other context, binary 
> search needs to cover the entire space.
> But there are some common uses of {{lookupTerm}} where this shouldn't be 
> necessary.  For example: making multiple {{lookupTerm}} calls to fetch the 
> ordinals for each value in a sorted list of terms.  {{lookupTerm}} will 
> binary-search the whole space on each invocation, even though the caller 
> knows that there's no point searching anything before the ordinal that came 
> back from the previous {{lookupTerm}} call.
> I propose we add a {{SortedSetDocValues.lookupTerm}} overload which takes a 
> lower-bound to start the binary search at: {{public long lookupTerm(BytesRef 
> key, long lowerSearchBound) throws IOException}}  This saves each 
> binary-search a few iterations in usage scenarios like the one described 
> above, which can conceivably add up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org