Hi,
my response was a bit unclear. Before Lucene 4.0 we saved *deletions* in
a bitset (1 = doc deleted), so you were able to use the DocIdSetIterator
provided directly. At this point there was no sparse implementation.
My idea was more about this: "Because we marked *deleted* docs (not live
Hi,
A SparseBitset impl for DELETES would be fine if the model in Lucene
would encode deleted docs (it did that in earlier times). As deletes are
sparse (deletes are in most cases <40%), this would help to make the
iterator cheaper.
Uwe
Am 06.02.2024 um 09:01 schrieb Adrien Grand:
Hey
Good point, I opened an issue to discuss this:
https://github.com/apache/lucene/issues/13084.
Did we actually use a sparse bit set to encode deleted docs before? I don't
recall that.
On Tue, Feb 6, 2024 at 2:42 PM Uwe Schindler wrote:
> Hi,
>
> A SparseBitset impl for DELETES would be fine if
Here's my +1 (binding)
SUCCESS! [0:56:16.591754]
On Mon, Feb 5, 2024 at 5:23 PM Houston Putman wrote:
> Please vote for release candidate 1 for Lucene/Solr 8.11.3
>
> The artifacts can be downloaded from:
>
>
I'm running the smoke tester on branch_8_11
python3 -u dev-tools/scripts/smokeTestRelease.py \
https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.3-RC1-revbaa7c80af4278cc8951a344d8e9320386588d12d
and getting
File
>
Hey Michael,
You are right, iterating all deletes with nextClearBit() would run in
O(maxDoc). I am coming from the other direction, where I'm expecting the
number of deletes to be more in the order of 1%-5% of the doc ID space, so
a separate int[] would use lots of heap and probably not help that