[GitHub] lucene-solr pull request #502: LUCENE-8569: Never count soft-deletes if read...

2018-11-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/lucene-solr/pull/502


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #502: LUCENE-8569: Never count soft-deletes if read...

2018-11-21 Thread jpountz
Github user jpountz commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/502#discussion_r235312499
  
--- Diff: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ---
@@ -4398,42 +4429,41 @@ private int mergeMiddle(MergePolicy.OneMerge merge, 
MergePolicy mergePolicy) thr
 
   // Let the merge wrap readers
   List mergeReaders = new ArrayList<>();
-  int softDeleteCount = 0;
+  Counter softDeleteCount = Counter.newCounter(false);
   for (int r = 0; r < merge.readers.size(); r++) {
 SegmentReader reader = merge.readers.get(r);
 CodecReader wrappedReader = merge.wrapForMerge(reader);
 validateMergeReader(wrappedReader);
 if (softDeletesEnabled) {
+
   if (reader != wrappedReader) { // if we don't have a wrapped 
reader we won't preserve any soft-deletes
 Bits hardLiveDocs = merge.hardLiveDocs.get(r);
-Bits wrappedLiveDocs = wrappedReader.getLiveDocs();
-int hardDeleteCount = 0;
-DocIdSetIterator softDeletedDocs = 
DocValuesFieldExistsQuery.getDocValuesDocIdSetIterator(config.getSoftDeletesField(),
 wrappedReader);
-if (softDeletedDocs != null) {
-  int docId;
-  while ((docId = softDeletedDocs.nextDoc()) != 
DocIdSetIterator.NO_MORE_DOCS) {
-if (wrappedLiveDocs == null || wrappedLiveDocs.get(docId)) 
{
-  if (hardLiveDocs == null || hardLiveDocs.get(docId)) {
-softDeleteCount++;
-  } else {
-hardDeleteCount++;
+if (hardLiveDocs != null) { // we only need to do this 
accounting if we have mixed deletes
+  Bits wrappedLiveDocs = wrappedReader.getLiveDocs();
+  Counter hardDeleteCounter = Counter.newCounter(false);
+  countSoftDeletes(wrappedReader, wrappedLiveDocs, 
hardLiveDocs, softDeleteCount, hardDeleteCounter);
+  int hardDeleteCount = 
Math.toIntExact(hardDeleteCounter.get());
+  // Wrap the wrapped reader again if we have excluded some 
hard-deleted docs
+  if (hardLiveDocs != null && hardDeleteCount > 0) {
--- End diff --

hardLiveDocs is always non-null at this stage?


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #502: LUCENE-8569: Never count soft-deletes if read...

2018-11-20 Thread s1monw
Github user s1monw commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/502#discussion_r235273221
  
--- Diff: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ---
@@ -4398,42 +4426,41 @@ private int mergeMiddle(MergePolicy.OneMerge merge, 
MergePolicy mergePolicy) thr
 
   // Let the merge wrap readers
   List mergeReaders = new ArrayList<>();
-  int softDeleteCount = 0;
+  Counter softDeleteCount = Counter.newCounter(false);
   for (int r = 0; r < merge.readers.size(); r++) {
 SegmentReader reader = merge.readers.get(r);
 CodecReader wrappedReader = merge.wrapForMerge(reader);
 validateMergeReader(wrappedReader);
 if (softDeletesEnabled) {
+
   if (reader != wrappedReader) { // if we don't have a wrapped 
reader we won't preserve any soft-deletes
 Bits hardLiveDocs = merge.hardLiveDocs.get(r);
-Bits wrappedLiveDocs = wrappedReader.getLiveDocs();
-int hardDeleteCount = 0;
-DocIdSetIterator softDeletedDocs = 
DocValuesFieldExistsQuery.getDocValuesDocIdSetIterator(config.getSoftDeletesField(),
 wrappedReader);
-if (softDeletedDocs != null) {
-  int docId;
-  while ((docId = softDeletedDocs.nextDoc()) != 
DocIdSetIterator.NO_MORE_DOCS) {
-if (wrappedLiveDocs == null || wrappedLiveDocs.get(docId)) 
{
-  if (hardLiveDocs == null || hardLiveDocs.get(docId)) {
-softDeleteCount++;
-  } else {
-hardDeleteCount++;
+if (hardLiveDocs != null) { // we only need to do this 
accounting if we have mixed deletes
+  Bits wrappedLiveDocs = wrappedReader.getLiveDocs();
+  int hardDeleteCount = countSoftDeletes(wrappedReader, 
wrappedLiveDocs, hardLiveDocs, softDeleteCount);
--- End diff --

done


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #502: LUCENE-8569: Never count soft-deletes if read...

2018-11-20 Thread mikemccand
Github user mikemccand commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/502#discussion_r235148814
  
--- Diff: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ---
@@ -4398,42 +4426,41 @@ private int mergeMiddle(MergePolicy.OneMerge merge, 
MergePolicy mergePolicy) thr
 
   // Let the merge wrap readers
   List mergeReaders = new ArrayList<>();
-  int softDeleteCount = 0;
+  Counter softDeleteCount = Counter.newCounter(false);
   for (int r = 0; r < merge.readers.size(); r++) {
 SegmentReader reader = merge.readers.get(r);
 CodecReader wrappedReader = merge.wrapForMerge(reader);
 validateMergeReader(wrappedReader);
 if (softDeletesEnabled) {
+
   if (reader != wrappedReader) { // if we don't have a wrapped 
reader we won't preserve any soft-deletes
 Bits hardLiveDocs = merge.hardLiveDocs.get(r);
-Bits wrappedLiveDocs = wrappedReader.getLiveDocs();
-int hardDeleteCount = 0;
-DocIdSetIterator softDeletedDocs = 
DocValuesFieldExistsQuery.getDocValuesDocIdSetIterator(config.getSoftDeletesField(),
 wrappedReader);
-if (softDeletedDocs != null) {
-  int docId;
-  while ((docId = softDeletedDocs.nextDoc()) != 
DocIdSetIterator.NO_MORE_DOCS) {
-if (wrappedLiveDocs == null || wrappedLiveDocs.get(docId)) 
{
-  if (hardLiveDocs == null || hardLiveDocs.get(docId)) {
-softDeleteCount++;
-  } else {
-hardDeleteCount++;
+if (hardLiveDocs != null) { // we only need to do this 
accounting if we have mixed deletes
+  Bits wrappedLiveDocs = wrappedReader.getLiveDocs();
+  int hardDeleteCount = countSoftDeletes(wrappedReader, 
wrappedLiveDocs, hardLiveDocs, softDeleteCount);
+  // Wrap the wrapped reader again if we have excluded some 
hard-deleted docs
+  if (hardLiveDocs != null && hardDeleteCount > 0) {
+Bits liveDocs = wrappedLiveDocs == null ? hardLiveDocs : 
new Bits() {
+  @Override
+  public boolean get(int index) {
+return hardLiveDocs.get(index) && 
wrappedLiveDocs.get(index);
   }
-}
+
+  @Override
+  public int length() {
+return hardLiveDocs.length();
+  }
+};
+wrappedReader = 
FilterCodecReader.wrapLiveDocs(wrappedReader, liveDocs, wrappedReader.numDocs() 
- hardDeleteCount);
   }
-}
-// Wrap the wrapped reader again if we have excluded some 
hard-deleted docs
-if (hardLiveDocs != null && hardDeleteCount > 0) {
-  Bits liveDocs = wrappedLiveDocs == null ? hardLiveDocs : new 
Bits() {
-@Override
-public boolean get(int index) {
-  return hardLiveDocs.get(index) && 
wrappedLiveDocs.get(index);
-}
-@Override
-public int length() {
-  return hardLiveDocs.length();
-}
-  };
-  wrappedReader = 
FilterCodecReader.wrapLiveDocs(wrappedReader, liveDocs, wrappedReader.numDocs() 
- hardDeleteCount);
+} else {
+  int carryOverSoftDeletes = 
reader.getSegmentInfo().getSoftDelCount() - wrappedReader.numDeletedDocs();
+  if (carryOverSoftDeletes < 0) {
+throw new IllegalStateException("carry-over soft-deletes 
must be positive");
--- End diff --

`AssertionError` instead?  This means we have a bug right?


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #502: LUCENE-8569: Never count soft-deletes if read...

2018-11-20 Thread dweiss
Github user dweiss commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/502#discussion_r235063127
  
--- Diff: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ---
@@ -4398,42 +4426,41 @@ private int mergeMiddle(MergePolicy.OneMerge merge, 
MergePolicy mergePolicy) thr
 
   // Let the merge wrap readers
   List mergeReaders = new ArrayList<>();
-  int softDeleteCount = 0;
+  Counter softDeleteCount = Counter.newCounter(false);
   for (int r = 0; r < merge.readers.size(); r++) {
 SegmentReader reader = merge.readers.get(r);
 CodecReader wrappedReader = merge.wrapForMerge(reader);
 validateMergeReader(wrappedReader);
 if (softDeletesEnabled) {
+
   if (reader != wrappedReader) { // if we don't have a wrapped 
reader we won't preserve any soft-deletes
 Bits hardLiveDocs = merge.hardLiveDocs.get(r);
-Bits wrappedLiveDocs = wrappedReader.getLiveDocs();
-int hardDeleteCount = 0;
-DocIdSetIterator softDeletedDocs = 
DocValuesFieldExistsQuery.getDocValuesDocIdSetIterator(config.getSoftDeletesField(),
 wrappedReader);
-if (softDeletedDocs != null) {
-  int docId;
-  while ((docId = softDeletedDocs.nextDoc()) != 
DocIdSetIterator.NO_MORE_DOCS) {
-if (wrappedLiveDocs == null || wrappedLiveDocs.get(docId)) 
{
-  if (hardLiveDocs == null || hardLiveDocs.get(docId)) {
-softDeleteCount++;
-  } else {
-hardDeleteCount++;
+if (hardLiveDocs != null) { // we only need to do this 
accounting if we have mixed deletes
+  Bits wrappedLiveDocs = wrappedReader.getLiveDocs();
+  int hardDeleteCount = countSoftDeletes(wrappedReader, 
wrappedLiveDocs, hardLiveDocs, softDeleteCount);
--- End diff --

hardDeleteCount var, yet countSoftDeletes() method looks odd. Maybe you 
could just pass two args (two counters) instead. It's not as pretty, but at 
least more intuitive?


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #502: LUCENE-8569: Never count soft-deletes if read...

2018-11-20 Thread dweiss
Github user dweiss commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/502#discussion_r235062399
  
--- Diff: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ---
@@ -4350,6 +4351,33 @@ private synchronized void 
closeMergeReaders(MergePolicy.OneMerge merge, boolean
 }
   }
 
+  private int countSoftDeletes(CodecReader reader, Bits wrappedLiveDocs, 
Bits hardLiveDocs, Counter softDeleteCounter) throws IOException {
+int hardDeleteCount = 0;
+int softDeletesCount = 0;
+DocIdSetIterator softDeletedDocs = 
DocValuesFieldExistsQuery.getDocValuesDocIdSetIterator(config.getSoftDeletesField(),
 reader);
+if (softDeletedDocs != null) {
+  int docId;
+  while ((docId = softDeletedDocs.nextDoc()) != 
DocIdSetIterator.NO_MORE_DOCS) {
+if (wrappedLiveDocs == null || wrappedLiveDocs.get(docId)) {
+  if (hardLiveDocs == null || hardLiveDocs.get(docId)) {
+softDeletesCount++;
+  } else {
+hardDeleteCount++;
+  }
+}
+  }
+}
+softDeleteCounter.addAndGet(softDeletesCount);
+return hardDeleteCount;
+  }
+
+  private boolean assertSoftDeletesCount(CodecReader reader, int 
expectedCount) throws IOException {
+Counter count = Counter.newCounter(false);
+countSoftDeletes(reader, reader.getLiveDocs(), null, count);
+assert count.get() == expectedCount : "soft-deletes cound missmatch 
expected: " + expectedCount  + " but actual: " + count.get() ;
--- End diff --

missmatch -> mismatch


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #502: LUCENE-8569: Never count soft-deletes if read...

2018-11-20 Thread s1monw
GitHub user s1monw opened a pull request:

https://github.com/apache/lucene-solr/pull/502

LUCENE-8569: Never count soft-deletes if reader has no hard-deletes

Today we count the actual soft-deletes during a merge which is
unnecessary if there are no hard-deletes present. In this case, which
is considered to be the common case we can get accurate counts by 
substracting
the number of deleted docs in the wrapped reader from the number of 
soft-deletes
in that reader.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/s1monw/lucene-solr only_count_if_hard_deletes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/502.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #502


commit 87f571610c2f6cfdaa226c6121191c2e656bd13e
Author: Simon Willnauer 
Date:   2018-11-20T11:56:20Z

LUCENE-8569: Never count soft-deletes if reader has no hard-deletes

Today we count the actual soft-deletes during a merge which is
unnecessary if there are no hard-deletes present. In this case, which
is considered to be the common case we can get accurate counts by 
substracting
the number of deleted docs in the wrapped reader from the number of 
soft-deletes
in that reader.




---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org