Re: Computing weight.count() cheaply in the face of deletes?

2024-02-06 Thread Uwe Schindler
Hi, my response was a bit unclear. Before Lucene 4.0 we saved *deletions* in a bitset (1 = doc deleted), so you were able to use the DocIdSetIterator provided directly. At this point there was no sparse implementation. My idea was more about this: "Because we marked *deleted* docs (not live

Re: Computing weight.count() cheaply in the face of deletes?

2024-02-06 Thread Uwe Schindler
Hi, A SparseBitset impl for DELETES would be fine if the model in Lucene would encode deleted docs (it did that in earlier times). As deletes are sparse (deletes are in most cases <40%), this would help to make the iterator cheaper. Uwe Am 06.02.2024 um 09:01 schrieb Adrien Grand: Hey

Re: Computing weight.count() cheaply in the face of deletes?

2024-02-06 Thread Adrien Grand
Good point, I opened an issue to discuss this: https://github.com/apache/lucene/issues/13084. Did we actually use a sparse bit set to encode deleted docs before? I don't recall that. On Tue, Feb 6, 2024 at 2:42 PM Uwe Schindler wrote: > Hi, > > A SparseBitset impl for DELETES would be fine if

Re: [VOTE] Release Lucene/Solr 8.11.3 RC1

2024-02-06 Thread Jason Gerlowski
Here's my +1 (binding) SUCCESS! [0:56:16.591754] On Mon, Feb 5, 2024 at 5:23 PM Houston Putman wrote: > Please vote for release candidate 1 for Lucene/Solr 8.11.3 > > The artifacts can be downloaded from: > >

Re: [VOTE] Release Lucene/Solr 8.11.3 RC1

2024-02-06 Thread Kevin Risden
I'm running the smoke tester on branch_8_11 python3 -u dev-tools/scripts/smokeTestRelease.py \ https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.3-RC1-revbaa7c80af4278cc8951a344d8e9320386588d12d and getting File >

Re: Computing weight.count() cheaply in the face of deletes?

2024-02-06 Thread Adrien Grand
Hey Michael, You are right, iterating all deletes with nextClearBit() would run in O(maxDoc). I am coming from the other direction, where I'm expecting the number of deletes to be more in the order of 1%-5% of the doc ID space, so a separate int[] would use lots of heap and probably not help that