[GitHub] lucene-solr pull request #502: LUCENE-8569: Never count soft-deletes if read...
Github user asfgit closed the pull request at: https://github.com/apache/lucene-solr/pull/502 --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #502: LUCENE-8569: Never count soft-deletes if read...
Github user jpountz commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/502#discussion_r235312499 --- Diff: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java --- @@ -4398,42 +4429,41 @@ private int mergeMiddle(MergePolicy.OneMerge merge, MergePolicy mergePolicy) thr // Let the merge wrap readers List mergeReaders = new ArrayList<>(); - int softDeleteCount = 0; + Counter softDeleteCount = Counter.newCounter(false); for (int r = 0; r < merge.readers.size(); r++) { SegmentReader reader = merge.readers.get(r); CodecReader wrappedReader = merge.wrapForMerge(reader); validateMergeReader(wrappedReader); if (softDeletesEnabled) { + if (reader != wrappedReader) { // if we don't have a wrapped reader we won't preserve any soft-deletes Bits hardLiveDocs = merge.hardLiveDocs.get(r); -Bits wrappedLiveDocs = wrappedReader.getLiveDocs(); -int hardDeleteCount = 0; -DocIdSetIterator softDeletedDocs = DocValuesFieldExistsQuery.getDocValuesDocIdSetIterator(config.getSoftDeletesField(), wrappedReader); -if (softDeletedDocs != null) { - int docId; - while ((docId = softDeletedDocs.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { -if (wrappedLiveDocs == null || wrappedLiveDocs.get(docId)) { - if (hardLiveDocs == null || hardLiveDocs.get(docId)) { -softDeleteCount++; - } else { -hardDeleteCount++; +if (hardLiveDocs != null) { // we only need to do this accounting if we have mixed deletes + Bits wrappedLiveDocs = wrappedReader.getLiveDocs(); + Counter hardDeleteCounter = Counter.newCounter(false); + countSoftDeletes(wrappedReader, wrappedLiveDocs, hardLiveDocs, softDeleteCount, hardDeleteCounter); + int hardDeleteCount = Math.toIntExact(hardDeleteCounter.get()); + // Wrap the wrapped reader again if we have excluded some hard-deleted docs + if (hardLiveDocs != null && hardDeleteCount > 0) { --- End diff -- hardLiveDocs is always non-null at this stage? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #502: LUCENE-8569: Never count soft-deletes if read...
Github user s1monw commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/502#discussion_r235273221 --- Diff: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java --- @@ -4398,42 +4426,41 @@ private int mergeMiddle(MergePolicy.OneMerge merge, MergePolicy mergePolicy) thr // Let the merge wrap readers List mergeReaders = new ArrayList<>(); - int softDeleteCount = 0; + Counter softDeleteCount = Counter.newCounter(false); for (int r = 0; r < merge.readers.size(); r++) { SegmentReader reader = merge.readers.get(r); CodecReader wrappedReader = merge.wrapForMerge(reader); validateMergeReader(wrappedReader); if (softDeletesEnabled) { + if (reader != wrappedReader) { // if we don't have a wrapped reader we won't preserve any soft-deletes Bits hardLiveDocs = merge.hardLiveDocs.get(r); -Bits wrappedLiveDocs = wrappedReader.getLiveDocs(); -int hardDeleteCount = 0; -DocIdSetIterator softDeletedDocs = DocValuesFieldExistsQuery.getDocValuesDocIdSetIterator(config.getSoftDeletesField(), wrappedReader); -if (softDeletedDocs != null) { - int docId; - while ((docId = softDeletedDocs.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { -if (wrappedLiveDocs == null || wrappedLiveDocs.get(docId)) { - if (hardLiveDocs == null || hardLiveDocs.get(docId)) { -softDeleteCount++; - } else { -hardDeleteCount++; +if (hardLiveDocs != null) { // we only need to do this accounting if we have mixed deletes + Bits wrappedLiveDocs = wrappedReader.getLiveDocs(); + int hardDeleteCount = countSoftDeletes(wrappedReader, wrappedLiveDocs, hardLiveDocs, softDeleteCount); --- End diff -- done --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #502: LUCENE-8569: Never count soft-deletes if read...
Github user mikemccand commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/502#discussion_r235148814 --- Diff: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java --- @@ -4398,42 +4426,41 @@ private int mergeMiddle(MergePolicy.OneMerge merge, MergePolicy mergePolicy) thr // Let the merge wrap readers List mergeReaders = new ArrayList<>(); - int softDeleteCount = 0; + Counter softDeleteCount = Counter.newCounter(false); for (int r = 0; r < merge.readers.size(); r++) { SegmentReader reader = merge.readers.get(r); CodecReader wrappedReader = merge.wrapForMerge(reader); validateMergeReader(wrappedReader); if (softDeletesEnabled) { + if (reader != wrappedReader) { // if we don't have a wrapped reader we won't preserve any soft-deletes Bits hardLiveDocs = merge.hardLiveDocs.get(r); -Bits wrappedLiveDocs = wrappedReader.getLiveDocs(); -int hardDeleteCount = 0; -DocIdSetIterator softDeletedDocs = DocValuesFieldExistsQuery.getDocValuesDocIdSetIterator(config.getSoftDeletesField(), wrappedReader); -if (softDeletedDocs != null) { - int docId; - while ((docId = softDeletedDocs.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { -if (wrappedLiveDocs == null || wrappedLiveDocs.get(docId)) { - if (hardLiveDocs == null || hardLiveDocs.get(docId)) { -softDeleteCount++; - } else { -hardDeleteCount++; +if (hardLiveDocs != null) { // we only need to do this accounting if we have mixed deletes + Bits wrappedLiveDocs = wrappedReader.getLiveDocs(); + int hardDeleteCount = countSoftDeletes(wrappedReader, wrappedLiveDocs, hardLiveDocs, softDeleteCount); + // Wrap the wrapped reader again if we have excluded some hard-deleted docs + if (hardLiveDocs != null && hardDeleteCount > 0) { +Bits liveDocs = wrappedLiveDocs == null ? hardLiveDocs : new Bits() { + @Override + public boolean get(int index) { +return hardLiveDocs.get(index) && wrappedLiveDocs.get(index); } -} + + @Override + public int length() { +return hardLiveDocs.length(); + } +}; +wrappedReader = FilterCodecReader.wrapLiveDocs(wrappedReader, liveDocs, wrappedReader.numDocs() - hardDeleteCount); } -} -// Wrap the wrapped reader again if we have excluded some hard-deleted docs -if (hardLiveDocs != null && hardDeleteCount > 0) { - Bits liveDocs = wrappedLiveDocs == null ? hardLiveDocs : new Bits() { -@Override -public boolean get(int index) { - return hardLiveDocs.get(index) && wrappedLiveDocs.get(index); -} -@Override -public int length() { - return hardLiveDocs.length(); -} - }; - wrappedReader = FilterCodecReader.wrapLiveDocs(wrappedReader, liveDocs, wrappedReader.numDocs() - hardDeleteCount); +} else { + int carryOverSoftDeletes = reader.getSegmentInfo().getSoftDelCount() - wrappedReader.numDeletedDocs(); + if (carryOverSoftDeletes < 0) { +throw new IllegalStateException("carry-over soft-deletes must be positive"); --- End diff -- `AssertionError` instead? This means we have a bug right? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #502: LUCENE-8569: Never count soft-deletes if read...
Github user dweiss commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/502#discussion_r235063127 --- Diff: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java --- @@ -4398,42 +4426,41 @@ private int mergeMiddle(MergePolicy.OneMerge merge, MergePolicy mergePolicy) thr // Let the merge wrap readers List mergeReaders = new ArrayList<>(); - int softDeleteCount = 0; + Counter softDeleteCount = Counter.newCounter(false); for (int r = 0; r < merge.readers.size(); r++) { SegmentReader reader = merge.readers.get(r); CodecReader wrappedReader = merge.wrapForMerge(reader); validateMergeReader(wrappedReader); if (softDeletesEnabled) { + if (reader != wrappedReader) { // if we don't have a wrapped reader we won't preserve any soft-deletes Bits hardLiveDocs = merge.hardLiveDocs.get(r); -Bits wrappedLiveDocs = wrappedReader.getLiveDocs(); -int hardDeleteCount = 0; -DocIdSetIterator softDeletedDocs = DocValuesFieldExistsQuery.getDocValuesDocIdSetIterator(config.getSoftDeletesField(), wrappedReader); -if (softDeletedDocs != null) { - int docId; - while ((docId = softDeletedDocs.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { -if (wrappedLiveDocs == null || wrappedLiveDocs.get(docId)) { - if (hardLiveDocs == null || hardLiveDocs.get(docId)) { -softDeleteCount++; - } else { -hardDeleteCount++; +if (hardLiveDocs != null) { // we only need to do this accounting if we have mixed deletes + Bits wrappedLiveDocs = wrappedReader.getLiveDocs(); + int hardDeleteCount = countSoftDeletes(wrappedReader, wrappedLiveDocs, hardLiveDocs, softDeleteCount); --- End diff -- hardDeleteCount var, yet countSoftDeletes() method looks odd. Maybe you could just pass two args (two counters) instead. It's not as pretty, but at least more intuitive? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #502: LUCENE-8569: Never count soft-deletes if read...
Github user dweiss commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/502#discussion_r235062399 --- Diff: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java --- @@ -4350,6 +4351,33 @@ private synchronized void closeMergeReaders(MergePolicy.OneMerge merge, boolean } } + private int countSoftDeletes(CodecReader reader, Bits wrappedLiveDocs, Bits hardLiveDocs, Counter softDeleteCounter) throws IOException { +int hardDeleteCount = 0; +int softDeletesCount = 0; +DocIdSetIterator softDeletedDocs = DocValuesFieldExistsQuery.getDocValuesDocIdSetIterator(config.getSoftDeletesField(), reader); +if (softDeletedDocs != null) { + int docId; + while ((docId = softDeletedDocs.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { +if (wrappedLiveDocs == null || wrappedLiveDocs.get(docId)) { + if (hardLiveDocs == null || hardLiveDocs.get(docId)) { +softDeletesCount++; + } else { +hardDeleteCount++; + } +} + } +} +softDeleteCounter.addAndGet(softDeletesCount); +return hardDeleteCount; + } + + private boolean assertSoftDeletesCount(CodecReader reader, int expectedCount) throws IOException { +Counter count = Counter.newCounter(false); +countSoftDeletes(reader, reader.getLiveDocs(), null, count); +assert count.get() == expectedCount : "soft-deletes cound missmatch expected: " + expectedCount + " but actual: " + count.get() ; --- End diff -- missmatch -> mismatch --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #502: LUCENE-8569: Never count soft-deletes if read...
GitHub user s1monw opened a pull request: https://github.com/apache/lucene-solr/pull/502 LUCENE-8569: Never count soft-deletes if reader has no hard-deletes Today we count the actual soft-deletes during a merge which is unnecessary if there are no hard-deletes present. In this case, which is considered to be the common case we can get accurate counts by substracting the number of deleted docs in the wrapped reader from the number of soft-deletes in that reader. You can merge this pull request into a Git repository by running: $ git pull https://github.com/s1monw/lucene-solr only_count_if_hard_deletes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/502.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #502 commit 87f571610c2f6cfdaa226c6121191c2e656bd13e Author: Simon Willnauer Date: 2018-11-20T11:56:20Z LUCENE-8569: Never count soft-deletes if reader has no hard-deletes Today we count the actual soft-deletes during a merge which is unnecessary if there are no hard-deletes present. In this case, which is considered to be the common case we can get accurate counts by substracting the number of deleted docs in the wrapped reader from the number of soft-deletes in that reader. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org