[jira] [Commented] (LUCENE-10302) PriorityQueue: optimize where we collect then iterate by using O(N) heapify

2022-03-04 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501358#comment-17501358
 ] 

Greg Miller commented on LUCENE-10302:
--

As I worked on this a little more, it occurs to me that I think we might want 
to implement two separate abstractions here. It feels a bit like overloading 
the idea of a PQ builder to provide sorted results.

I started sketching out the idea of a "top-N collector" (naming isn't great...) 
that abstracts the idea of just "collecting" the top-n provided elements, 
either sorted or unsorted. It encapsulates details about whether-or-not it ever 
uses a heap, and just exposes methods to get the top-n, either sorted or 
unsorted.

I think it's separately still very useful to have an explicit PQ builder for 
cases where the user knows they need a PQ but can build it in bulk up-front.

> PriorityQueue: optimize where we collect then iterate by using O(N) heapify
> ---
>
> Key: LUCENE-10302
> URL: https://issues.apache.org/jira/browse/LUCENE-10302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Major
> Attachments: LUCENE_PriorityQueue_Builder_with_heapify.patch
>
>
> Looking at LUCENE-8875 (LargeNumHitsTopDocsCollector.java ) I got to 
> wondering if there was faster-than O(N*log(N)) way of loading a PriorityQueue 
> when we provide a bulk array to initialize the heap/PriorityQueue.  It turns 
> out there is: the JDK's PriorityQueue supports this in its constructors, 
> referring to "This classic algorithm due to Floyd (1964) is known to be 
> O(size)" -- heapify() method.  There's 
> [another|https://www.geeksforgeeks.org/building-heap-from-array/]  that may 
> or may not be the same; I didn't look too closely yet.  I see a number of 
> uses of Lucene's PriorityQueue that first collects values and only after 
> collecting want to do something with the results (typical / unsurprising).  
> This lends itself to a builder pattern that can look similar to 
> LargeNumHitsTopDocsCollector in terms of first having an array used like a 
> list and then move over to the PriorityQueue if/when it gets full (it may 
> not).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10302) PriorityQueue: optimize where we collect then iterate by using O(N) heapify

2022-03-04 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501354#comment-17501354
 ] 

Greg Miller commented on LUCENE-10302:
--

[~vigyas] ah, it looks like the URL pulled in a trailing dot ('.'). If you 
remove that, it should work.  Here it is properly:

https://github.com/gsmiller/lucene/tree/LUCENE-10302-pq-builder-sketch

> PriorityQueue: optimize where we collect then iterate by using O(N) heapify
> ---
>
> Key: LUCENE-10302
> URL: https://issues.apache.org/jira/browse/LUCENE-10302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Major
> Attachments: LUCENE_PriorityQueue_Builder_with_heapify.patch
>
>
> Looking at LUCENE-8875 (LargeNumHitsTopDocsCollector.java ) I got to 
> wondering if there was faster-than O(N*log(N)) way of loading a PriorityQueue 
> when we provide a bulk array to initialize the heap/PriorityQueue.  It turns 
> out there is: the JDK's PriorityQueue supports this in its constructors, 
> referring to "This classic algorithm due to Floyd (1964) is known to be 
> O(size)" -- heapify() method.  There's 
> [another|https://www.geeksforgeeks.org/building-heap-from-array/]  that may 
> or may not be the same; I didn't look too closely yet.  I see a number of 
> uses of Lucene's PriorityQueue that first collects values and only after 
> collecting want to do something with the results (typical / unsurprising).  
> This lends itself to a builder pattern that can look similar to 
> LargeNumHitsTopDocsCollector in terms of first having an array used like a 
> list and then move over to the PriorityQueue if/when it gets full (it may 
> not).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10302) PriorityQueue: optimize where we collect then iterate by using O(N) heapify

2022-03-03 Thread Vigya Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501101#comment-17501101
 ] 

Vigya Sharma commented on LUCENE-10302:
---

[~gsmiller] Is the link you pasted working? I get a 404 when I try to open it.

> PriorityQueue: optimize where we collect then iterate by using O(N) heapify
> ---
>
> Key: LUCENE-10302
> URL: https://issues.apache.org/jira/browse/LUCENE-10302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Major
> Attachments: LUCENE_PriorityQueue_Builder_with_heapify.patch
>
>
> Looking at LUCENE-8875 (LargeNumHitsTopDocsCollector.java ) I got to 
> wondering if there was faster-than O(N*log(N)) way of loading a PriorityQueue 
> when we provide a bulk array to initialize the heap/PriorityQueue.  It turns 
> out there is: the JDK's PriorityQueue supports this in its constructors, 
> referring to "This classic algorithm due to Floyd (1964) is known to be 
> O(size)" -- heapify() method.  There's 
> [another|https://www.geeksforgeeks.org/building-heap-from-array/]  that may 
> or may not be the same; I didn't look too closely yet.  I see a number of 
> uses of Lucene's PriorityQueue that first collects values and only after 
> collecting want to do something with the results (typical / unsurprising).  
> This lends itself to a builder pattern that can look similar to 
> LargeNumHitsTopDocsCollector in terms of first having an array used like a 
> list and then move over to the PriorityQueue if/when it gets full (it may 
> not).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10302) PriorityQueue: optimize where we collect then iterate by using O(N) heapify

2022-03-03 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500898#comment-17500898
 ] 

Greg Miller commented on LUCENE-10302:
--

Thanks [~dsmiley]. I'd sketched something out as well and have it sitting over 
here on a branch: 
[https://github.com/gsmiller/lucene/tree/LUCENE-10302-pq-builder-sketch.] I'll 
have a look at your patch file and see where ideas are similar/different.

 

[~vigyas] sounds good!

> PriorityQueue: optimize where we collect then iterate by using O(N) heapify
> ---
>
> Key: LUCENE-10302
> URL: https://issues.apache.org/jira/browse/LUCENE-10302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Major
> Attachments: LUCENE_PriorityQueue_Builder_with_heapify.patch
>
>
> Looking at LUCENE-8875 (LargeNumHitsTopDocsCollector.java ) I got to 
> wondering if there was faster-than O(N*log(N)) way of loading a PriorityQueue 
> when we provide a bulk array to initialize the heap/PriorityQueue.  It turns 
> out there is: the JDK's PriorityQueue supports this in its constructors, 
> referring to "This classic algorithm due to Floyd (1964) is known to be 
> O(size)" -- heapify() method.  There's 
> [another|https://www.geeksforgeeks.org/building-heap-from-array/]  that may 
> or may not be the same; I didn't look too closely yet.  I see a number of 
> uses of Lucene's PriorityQueue that first collects values and only after 
> collecting want to do something with the results (typical / unsurprising).  
> This lends itself to a builder pattern that can look similar to 
> LargeNumHitsTopDocsCollector in terms of first having an array used like a 
> list and then move over to the PriorityQueue if/when it gets full (it may 
> not).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10302) PriorityQueue: optimize where we collect then iterate by using O(N) heapify

2022-03-03 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500803#comment-17500803
 ] 

David Smiley commented on LUCENE-10302:
---

I attached my WIP as a patch file.  Looking back, I started with defining the 
Builder code.  I didn't yet implement "heapify".   Nobody calls any of it.

> PriorityQueue: optimize where we collect then iterate by using O(N) heapify
> ---
>
> Key: LUCENE-10302
> URL: https://issues.apache.org/jira/browse/LUCENE-10302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Major
> Attachments: LUCENE_PriorityQueue_Builder_with_heapify.patch
>
>
> Looking at LUCENE-8875 (LargeNumHitsTopDocsCollector.java ) I got to 
> wondering if there was faster-than O(N*log(N)) way of loading a PriorityQueue 
> when we provide a bulk array to initialize the heap/PriorityQueue.  It turns 
> out there is: the JDK's PriorityQueue supports this in its constructors, 
> referring to "This classic algorithm due to Floyd (1964) is known to be 
> O(size)" -- heapify() method.  There's 
> [another|https://www.geeksforgeeks.org/building-heap-from-array/]  that may 
> or may not be the same; I didn't look too closely yet.  I see a number of 
> uses of Lucene's PriorityQueue that first collects values and only after 
> collecting want to do something with the results (typical / unsurprising).  
> This lends itself to a builder pattern that can look similar to 
> LargeNumHitsTopDocsCollector in terms of first having an array used like a 
> list and then move over to the PriorityQueue if/when it gets full (it may 
> not).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10302) PriorityQueue: optimize where we collect then iterate by using O(N) heapify

2022-03-02 Thread Vigya Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500565#comment-17500565
 ] 

Vigya Sharma commented on LUCENE-10302:
---

+1. I'd also like to look at it and maybe pick up a piece.

> PriorityQueue: optimize where we collect then iterate by using O(N) heapify
> ---
>
> Key: LUCENE-10302
> URL: https://issues.apache.org/jira/browse/LUCENE-10302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Major
>
> Looking at LUCENE-8875 (LargeNumHitsTopDocsCollector.java ) I got to 
> wondering if there was faster-than O(N*log(N)) way of loading a PriorityQueue 
> when we provide a bulk array to initialize the heap/PriorityQueue.  It turns 
> out there is: the JDK's PriorityQueue supports this in its constructors, 
> referring to "This classic algorithm due to Floyd (1964) is known to be 
> O(size)" -- heapify() method.  There's 
> [another|https://www.geeksforgeeks.org/building-heap-from-array/]  that may 
> or may not be the same; I didn't look too closely yet.  I see a number of 
> uses of Lucene's PriorityQueue that first collects values and only after 
> collecting want to do something with the results (typical / unsurprising).  
> This lends itself to a builder pattern that can look similar to 
> LargeNumHitsTopDocsCollector in terms of first having an array used like a 
> list and then move over to the PriorityQueue if/when it gets full (it may 
> not).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10302) PriorityQueue: optimize where we collect then iterate by using O(N) heapify

2022-03-02 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500239#comment-17500239
 ] 

Greg Miller commented on LUCENE-10302:
--

{quote}I have the start of some code in progress I can share to anyone 
interested. I'm not sure when I'll get back to this.
{quote}
[~dsmiley] I'd be interested in exploring this a little bit as I have some 
spare time. I'm not going to assign it to myself since I'm not sure how much 
time I'll have for it, but I'd be curious to take a look at the progress you've 
made if you don't mind posting.

> PriorityQueue: optimize where we collect then iterate by using O(N) heapify
> ---
>
> Key: LUCENE-10302
> URL: https://issues.apache.org/jira/browse/LUCENE-10302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Major
>
> Looking at LUCENE-8875 (LargeNumHitsTopDocsCollector.java ) I got to 
> wondering if there was faster-than O(N*log(N)) way of loading a PriorityQueue 
> when we provide a bulk array to initialize the heap/PriorityQueue.  It turns 
> out there is: the JDK's PriorityQueue supports this in its constructors, 
> referring to "This classic algorithm due to Floyd (1964) is known to be 
> O(size)" -- heapify() method.  There's 
> [another|https://www.geeksforgeeks.org/building-heap-from-array/]  that may 
> or may not be the same; I didn't look too closely yet.  I see a number of 
> uses of Lucene's PriorityQueue that first collects values and only after 
> collecting want to do something with the results (typical / unsurprising).  
> This lends itself to a builder pattern that can look similar to 
> LargeNumHitsTopDocsCollector in terms of first having an array used like a 
> list and then move over to the PriorityQueue if/when it gets full (it may 
> not).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10302) PriorityQueue: optimize where we collect then iterate by using O(N) heapify

2021-12-09 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17456448#comment-17456448
 ] 

David Smiley commented on LUCENE-10302:
---

This builder could provide a buildList method in addition to buildQueue since 
some consumers don't really need a PriorityQueue specifically -- PQ is an 
implementation detail to get a top-N, typically want sorted top-N, and this 
list can be sorted quickly & easily (be it the unsorted input array or the heap 
array).  I'm not sure if it's actually faster to pop() the heap array 
one-by-one; I suspect not.

> PriorityQueue: optimize where we collect then iterate by using O(N) heapify
> ---
>
> Key: LUCENE-10302
> URL: https://issues.apache.org/jira/browse/LUCENE-10302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Major
>
> Looking at LUCENE-8875 (LargeNumHitsTopDocsCollector.java ) I got to 
> wondering if there was faster-than O(N*log(N)) way of loading a PriorityQueue 
> when we provide a bulk array to initialize the heap/PriorityQueue.  It turns 
> out there is: the JDK's PriorityQueue supports this in its constructors, 
> referring to "This classic algorithm due to Floyd (1964) is known to be 
> O(size)" -- heapify() method.  There's 
> [another|https://www.geeksforgeeks.org/building-heap-from-array/]  that may 
> or may not be the same; I didn't look too closely yet.  I see a number of 
> uses of Lucene's PriorityQueue that first collects values and only after 
> collecting want to do something with the results (typical / unsurprising).  
> This lends itself to a builder pattern that can look similar to 
> LargeNumHitsTopDocsCollector in terms of first having an array used like a 
> list and then move over to the PriorityQueue if/when it gets full (it may 
> not).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10302) PriorityQueue: optimize where we collect then iterate by using O(N) heapify

2021-12-09 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17456443#comment-17456443
 ] 

David Smiley commented on LUCENE-10302:
---

Fun fact: there are 56 usages of Lucene's PriorityQueue in Lucene.  I counted 
via adding the find-usages of both constructors (55 + 1).

> PriorityQueue: optimize where we collect then iterate by using O(N) heapify
> ---
>
> Key: LUCENE-10302
> URL: https://issues.apache.org/jira/browse/LUCENE-10302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Major
>
> Looking at LUCENE-8875 (LargeNumHitsTopDocsCollector.java ) I got to 
> wondering if there was faster-than O(N*log(N)) way of loading a PriorityQueue 
> when we provide a bulk array to initialize the heap/PriorityQueue.  It turns 
> out there is: the JDK's PriorityQueue supports this in its constructors, 
> referring to "This classic algorithm due to Floyd (1964) is known to be 
> O(size)" -- heapify() method.  There's 
> [another|https://www.geeksforgeeks.org/building-heap-from-array/]  that may 
> or may not be the same; I didn't look too closely yet.  I see a number of 
> uses of Lucene's PriorityQueue that first collects values and only after 
> collecting want to do something with the results (typical / unsurprising).  
> This lends itself to a builder pattern that can look similar to 
> LargeNumHitsTopDocsCollector in terms of first having an array used like a 
> list and then move over to the PriorityQueue if/when it gets full (it may 
> not).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10302) PriorityQueue: optimize where we collect then iterate by using O(N) heapify

2021-12-09 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17456441#comment-17456441
 ] 

David Smiley commented on LUCENE-10302:
---

As an aside; it's weird/annoying to have to subclass this PriorityQueue just to 
supply the comparator.  Can't we do a Comparator?

I have the start of some code in progress I can share to anyone interested.  
I'm not sure when I'll get back to this.

> PriorityQueue: optimize where we collect then iterate by using O(N) heapify
> ---
>
> Key: LUCENE-10302
> URL: https://issues.apache.org/jira/browse/LUCENE-10302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Major
>
> Looking at LUCENE-8875 (LargeNumHitsTopDocsCollector.java ) I got to 
> wondering if there was faster-than O(N*log(N)) way of loading a PriorityQueue 
> when we provide a bulk array to initialize the heap/PriorityQueue.  It turns 
> out there is: the JDK's PriorityQueue supports this in its constructors, 
> referring to "This classic algorithm due to Floyd (1964) is known to be 
> O(size)" -- heapify() method.  There's 
> [another|https://www.geeksforgeeks.org/building-heap-from-array/]  that may 
> or may not be the same; I didn't look too closely yet.  I see a number of 
> uses of Lucene's PriorityQueue that first collects values and only after 
> collecting want to do something with the results (typical / unsurprising).  
> This lends itself to a builder pattern that can look similar to 
> LargeNumHitsTopDocsCollector in terms of first having an array used like a 
> list and then move over to the PriorityQueue if/when it gets full (it may 
> not).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org