[GitHub] lucene-solr pull request #527: LUCENE-8609: Allow getting consistent docstat...

2018-12-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/lucene-solr/pull/527


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #527: LUCENE-8609: Allow getting consistent docstat...

2018-12-14 Thread dnhatn
Github user dnhatn commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/527#discussion_r241832076
  
--- Diff: lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java 
---
@@ -3300,7 +3315,7 @@ public int numDeletesToMerge(SegmentCommitInfo info, 
int delCount, IOSupplier

[GitHub] lucene-solr pull request #527: LUCENE-8609: Allow getting consistent docstat...

2018-12-14 Thread dnhatn
Github user dnhatn commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/527#discussion_r241832034
  
--- Diff: lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java 
---
@@ -3147,7 +3162,7 @@ public void testSoftUpdateDocuments() throws 
IOException {
 for (SegmentCommitInfo info : writer.cloneSegmentInfos()) {
  numSoftDeleted += info.getSoftDelCount();
 }
-assertEquals(writer.maxDoc() - writer.numDocs(), numSoftDeleted);
+assertEquals(writer.getDocStats().maxDoc - 
writer.getDocStats().numDocs, numSoftDeleted);
--- End diff --

maybe use a single docStats?


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #527: LUCENE-8609: Allow getting consistent docstat...

2018-12-13 Thread dnhatn
Github user dnhatn commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/527#discussion_r241504268
  
--- Diff: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ---
@@ -5289,4 +5289,48 @@ final synchronized boolean 
segmentCommitInfoExist(SegmentCommitInfo sci) {
   final synchronized SegmentInfos cloneSegmentInfos() {
 return segmentInfos.clone();
   }
+
+  /**
+   * Returns accurate {@link DocStats} form this writer. This is 
equivalent to calling {@link #numDocs()} and {@link #maxDoc()}
+   * but is not subject to race-conditions. The numDoc for instance can 
change after maxDoc is fetched that causes numDocs to be
+   * greater than maxDoc which makes it hard to get accurate document 
stats from IndexWriter.
+   */
+  public synchronized DocStats getDocStats() {
+ensureOpen();
+int numDocs = docWriter.getNumDocs();
+int maxDoc = numDocs;
+for (final SegmentCommitInfo info : segmentInfos) {
+  maxDoc = info.info.maxDoc();
--- End diff --

`=` -> `+=`.


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #527: LUCENE-8609: Allow getting consistent docstat...

2018-12-13 Thread s1monw
GitHub user s1monw opened a pull request:

https://github.com/apache/lucene-solr/pull/527

LUCENE-8609: Allow getting consistent docstats from IndexWriter

Today we have #numDocs() and #maxDoc() on IndexWriter. This is enough
to get all stats for the current index but it's subject to concurrency
and might return numbers that are not consistent ie. some cases can
return maxDoc < numDocs which is undesirable. This change adds a 
getDocStats()
method to index writer to allow fetching consistent numbers for these stats.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/s1monw/lucene-solr docstats

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/527.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #527


commit 6721c40e16c485038fd092dbaa672204e6fdb3c6
Author: Simon Willnauer 
Date:   2018-12-13T15:05:47Z

LUCENE-8609: Allow getting consistent docstats from IndexWriter

Today we have #numDocs() and #maxDoc() on IndexWriter. This is enough
to get all stats for the current index but it's subject to concurrency
and might return numbers that are not consistent ie. some cases can
return maxDoc < numDocs which is undesirable. This change adds a 
getDocStats()
method to index writer to allow fetching consistent numbers for these stats.




---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org