[jira] [Commented] (LUCENE-8043) Attempting to add documents past limit can corrupt index
[ https://issues.apache.org/jira/browse/LUCENE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277490#comment-16277490 ] ASF subversion and git services commented on LUCENE-8043: - Commit 65a716911f35c304ae9da6d4ebb865509787548e in lucene-solr's branch refs/heads/branch_7_1 from [~simonw] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=65a7169 ] LUCENE-8043: Fix document accounting in IndexWriter The IndexWriter check for too many documents does not always work, resulting in going over the limit. Once this happens, Lucene refuses to open the index and throws a CorruptIndexException: Too many documents. This change also fixes document accounting if the index writer hits an aborting exception and/or the writer is rolled back. Pending document counts are now consistent with the latest SegmentInfos once the writer has been rolled back. > Attempting to add documents past limit can corrupt index > > > Key: LUCENE-8043 > URL: https://issues.apache.org/jira/browse/LUCENE-8043 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 4.10, 7.0, master (8.0) >Reporter: Yonik Seeley >Assignee: Simon Willnauer > Fix For: master (8.0), 7.2, 7.1.1 > > Attachments: LUCENE-8043.patch, LUCENE-8043.patch, LUCENE-8043.patch, > LUCENE-8043.patch, LUCENE-8043.patch, YCS_IndexTest7a.java > > > The IndexWriter check for too many documents does not always work, resulting > in going over the limit. Once this happens, Lucene refuses to open the index > and throws a CorruptIndexException: Too many documents. > This appears to affect all versions of Lucene/Solr (the check was first > implemented in LUCENE-5843 in v4.9.1/4.10 and we've seen this manifest in > 4.10) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8043) Attempting to add documents past limit can corrupt index
[ https://issues.apache.org/jira/browse/LUCENE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277489#comment-16277489 ] ASF subversion and git services commented on LUCENE-8043: - Commit 0bc07bc02a2bb5253f85bbca97041c76e4509f5f in lucene-solr's branch refs/heads/branch_7x from [~simonw] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0bc07bc ] LUCENE-8043: Fix document accounting in IndexWriter The IndexWriter check for too many documents does not always work, resulting in going over the limit. Once this happens, Lucene refuses to open the index and throws a CorruptIndexException: Too many documents. This change also fixes document accounting if the index writer hits an aborting exception and/or the writer is rolled back. Pending document counts are now consistent with the latest SegmentInfos once the writer has been rolled back. > Attempting to add documents past limit can corrupt index > > > Key: LUCENE-8043 > URL: https://issues.apache.org/jira/browse/LUCENE-8043 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 4.10, 7.0, master (8.0) >Reporter: Yonik Seeley >Assignee: Simon Willnauer > Fix For: master (8.0), 7.2, 7.1.1 > > Attachments: LUCENE-8043.patch, LUCENE-8043.patch, LUCENE-8043.patch, > LUCENE-8043.patch, LUCENE-8043.patch, YCS_IndexTest7a.java > > > The IndexWriter check for too many documents does not always work, resulting > in going over the limit. Once this happens, Lucene refuses to open the index > and throws a CorruptIndexException: Too many documents. > This appears to affect all versions of Lucene/Solr (the check was first > implemented in LUCENE-5843 in v4.9.1/4.10 and we've seen this manifest in > 4.10) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8043) Attempting to add documents past limit can corrupt index
[ https://issues.apache.org/jira/browse/LUCENE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277472#comment-16277472 ] ASF subversion and git services commented on LUCENE-8043: - Commit b7d8731bbf2a9278c22efa5a7fb43285236c90ba in lucene-solr's branch refs/heads/master from [~simonw] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b7d8731 ] LUCENE-8043: Fix document accounting in IndexWriter The IndexWriter check for too many documents does not always work, resulting in going over the limit. Once this happens, Lucene refuses to open the index and throws a CorruptIndexException: Too many documents. This change also fixes document accounting if the index writer hits an aborting exception and/or the writer is rolled back. Pending document counts are now consistent with the latest SegmentInfos once the writer has been rolled back. > Attempting to add documents past limit can corrupt index > > > Key: LUCENE-8043 > URL: https://issues.apache.org/jira/browse/LUCENE-8043 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 4.10, 7.0, master (8.0) >Reporter: Yonik Seeley >Assignee: Simon Willnauer > Fix For: master (8.0), 7.2, 7.1.1 > > Attachments: LUCENE-8043.patch, LUCENE-8043.patch, LUCENE-8043.patch, > LUCENE-8043.patch, LUCENE-8043.patch, YCS_IndexTest7a.java > > > The IndexWriter check for too many documents does not always work, resulting > in going over the limit. Once this happens, Lucene refuses to open the index > and throws a CorruptIndexException: Too many documents. > This appears to affect all versions of Lucene/Solr (the check was first > implemented in LUCENE-5843 in v4.9.1/4.10 and we've seen this manifest in > 4.10) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8043) Attempting to add documents past limit can corrupt index
[ https://issues.apache.org/jira/browse/LUCENE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277397#comment-16277397 ] Michael McCandless commented on LUCENE-8043: +1 to the patch! Phew that was tricky; thanks @simonw. I beasted all Lucene tests 113X times and only hit 3 failures from LUCENE-8073. +1 to push! > Attempting to add documents past limit can corrupt index > > > Key: LUCENE-8043 > URL: https://issues.apache.org/jira/browse/LUCENE-8043 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 4.10, 7.0, master (8.0) >Reporter: Yonik Seeley >Assignee: Simon Willnauer > Fix For: master (8.0), 7.2, 7.1.1 > > Attachments: LUCENE-8043.patch, LUCENE-8043.patch, LUCENE-8043.patch, > LUCENE-8043.patch, LUCENE-8043.patch, YCS_IndexTest7a.java > > > The IndexWriter check for too many documents does not always work, resulting > in going over the limit. Once this happens, Lucene refuses to open the index > and throws a CorruptIndexException: Too many documents. > This appears to affect all versions of Lucene/Solr (the check was first > implemented in LUCENE-5843 in v4.9.1/4.10 and we've seen this manifest in > 4.10) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8043) Attempting to add documents past limit can corrupt index
[ https://issues.apache.org/jira/browse/LUCENE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274607#comment-16274607 ] Michael McCandless commented on LUCENE-8043: Thanks [~simonw]; I love the new assert, and the patch looks correct to me. I beasted all Lucene tests 33 times and hit this failure, twice: {noformat} ant test -Dtestcase=TestIndexWriter -Dtestmethod=testThreadInterruptDeadlock -Dtests.seed=55197CA38E8C827B java.lang.AssertionError: pendingNumDocs 0 != 11 totalMaxDoc at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1277) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1319) at org.apache.lucene.index.TestIndexWriter$IndexerThreadInterrupt.run(TestIndexWriter.java:902) {noformat} But it does not reproduce for me. I hit two other unrelated failures; look like Similarity issues ... I'll open separate issues for those. > Attempting to add documents past limit can corrupt index > > > Key: LUCENE-8043 > URL: https://issues.apache.org/jira/browse/LUCENE-8043 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 4.10, 7.0, master (8.0) >Reporter: Yonik Seeley >Assignee: Simon Willnauer > Fix For: master (8.0), 7.2, 7.1.1 > > Attachments: LUCENE-8043.patch, LUCENE-8043.patch, LUCENE-8043.patch, > LUCENE-8043.patch, YCS_IndexTest7a.java > > > The IndexWriter check for too many documents does not always work, resulting > in going over the limit. Once this happens, Lucene refuses to open the index > and throws a CorruptIndexException: Too many documents. > This appears to affect all versions of Lucene/Solr (the check was first > implemented in LUCENE-5843 in v4.9.1/4.10 and we've seen this manifest in > 4.10) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8043) Attempting to add documents past limit can corrupt index
[ https://issues.apache.org/jira/browse/LUCENE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274259#comment-16274259 ] Michael McCandless commented on LUCENE-8043: Thanks [~simonw]; I'll look and beast the patch. > Attempting to add documents past limit can corrupt index > > > Key: LUCENE-8043 > URL: https://issues.apache.org/jira/browse/LUCENE-8043 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 4.10, 7.0, master (8.0) >Reporter: Yonik Seeley >Assignee: Simon Willnauer > Fix For: master (8.0), 7.2, 7.1.1 > > Attachments: LUCENE-8043.patch, LUCENE-8043.patch, LUCENE-8043.patch, > YCS_IndexTest7a.java > > > The IndexWriter check for too many documents does not always work, resulting > in going over the limit. Once this happens, Lucene refuses to open the index > and throws a CorruptIndexException: Too many documents. > This appears to affect all versions of Lucene/Solr (the check was first > implemented in LUCENE-5843 in v4.9.1/4.10 and we've seen this manifest in > 4.10) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8043) Attempting to add documents past limit can corrupt index
[ https://issues.apache.org/jira/browse/LUCENE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273711#comment-16273711 ] Michael McCandless commented on LUCENE-8043: Wow, what an evil test :) +1 to the patch; thanks @simonw and [~ysee...@gmail.com]! > Attempting to add documents past limit can corrupt index > > > Key: LUCENE-8043 > URL: https://issues.apache.org/jira/browse/LUCENE-8043 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 4.10, 7.0, master (8.0) >Reporter: Yonik Seeley >Assignee: Simon Willnauer > Fix For: master (8.0), 7.2, 7.1.1 > > Attachments: LUCENE-8043.patch, LUCENE-8043.patch, > YCS_IndexTest7a.java > > > The IndexWriter check for too many documents does not always work, resulting > in going over the limit. Once this happens, Lucene refuses to open the index > and throws a CorruptIndexException: Too many documents. > This appears to affect all versions of Lucene/Solr (the check was first > implemented in LUCENE-5843 in v4.9.1/4.10 and we've seen this manifest in > 4.10) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8043) Attempting to add documents past limit can corrupt index
[ https://issues.apache.org/jira/browse/LUCENE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16272238#comment-16272238 ] Simon Willnauer commented on LUCENE-8043: - {quote}Turns out the test code that failed with a small amount of updates, even after my attempted fix, was for 4.10.3 / 4.10.4 I forward-ported that code to master and things no longer fail... so I think this patch is good for recent Lucene versions. Thanks!{quote} [~yo...@apache.org] do you have a test case I can use to verify the patch going forward? Can you share it? I will also try to turn your reproduction into a testcase but maybe we should push the fix first to not be in the way of a release, WDYT? > Attempting to add documents past limit can corrupt index > > > Key: LUCENE-8043 > URL: https://issues.apache.org/jira/browse/LUCENE-8043 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 4.10, 7.0, master (8.0) >Reporter: Yonik Seeley >Assignee: Simon Willnauer > Attachments: LUCENE-8043.patch > > > The IndexWriter check for too many documents does not always work, resulting > in going over the limit. Once this happens, Lucene refuses to open the index > and throws a CorruptIndexException: Too many documents. > This appears to affect all versions of Lucene/Solr (the check was first > implemented in LUCENE-5843 in v4.9.1/4.10 and we've seen this manifest in > 4.10) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8043) Attempting to add documents past limit can corrupt index
[ https://issues.apache.org/jira/browse/LUCENE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16272010#comment-16272010 ] Yonik Seeley commented on LUCENE-8043: -- Turns out the test code that failed with a small amount of updates, even after my attempted fix, was for 4.10.3 / 4.10.4 I forward-ported that code to master and things no longer fail... so I think this patch is good for recent Lucene versions. Thanks! > Attempting to add documents past limit can corrupt index > > > Key: LUCENE-8043 > URL: https://issues.apache.org/jira/browse/LUCENE-8043 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 4.10, 7.0, master (8.0) >Reporter: Yonik Seeley >Assignee: Simon Willnauer > Attachments: LUCENE-8043.patch > > > The IndexWriter check for too many documents does not always work, resulting > in going over the limit. Once this happens, Lucene refuses to open the index > and throws a CorruptIndexException: Too many documents. > This appears to affect all versions of Lucene/Solr (the check was first > implemented in LUCENE-5843 in v4.9.1/4.10 and we've seen this manifest in > 4.10) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8043) Attempting to add documents past limit can corrupt index
[ https://issues.apache.org/jira/browse/LUCENE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271874#comment-16271874 ] Yonik Seeley commented on LUCENE-8043: -- I had worked on tracking this down for a bit before I got pulled off onto something else... I remember adding the boolean to drop() just as this patch does, but when using that I only put the conditional around the pendingNumDocs decrement (in multiple places). Perhaps that's why it didn't work to fix the issue for me... I also exposed pendingNumDocs for testing reasons and then tested it against expected values, and was able to get tests that reliably failed after a handful of updates. I'll try digging that up and see if it passes with this patch. > Attempting to add documents past limit can corrupt index > > > Key: LUCENE-8043 > URL: https://issues.apache.org/jira/browse/LUCENE-8043 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 4.10, 7.0, master (8.0) >Reporter: Yonik Seeley >Assignee: Simon Willnauer > Attachments: LUCENE-8043.patch > > > The IndexWriter check for too many documents does not always work, resulting > in going over the limit. Once this happens, Lucene refuses to open the index > and throws a CorruptIndexException: Too many documents. > This appears to affect all versions of Lucene/Solr (the check was first > implemented in LUCENE-5843 in v4.9.1/4.10 and we've seen this manifest in > 4.10) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8043) Attempting to add documents past limit can corrupt index
[ https://issues.apache.org/jira/browse/LUCENE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271782#comment-16271782 ] Michael McCandless commented on LUCENE-8043: Wow, nice find [~simonw]! It is normal for drop to be called more than once, I think, so I think your fix is the right approach! Thanks. > Attempting to add documents past limit can corrupt index > > > Key: LUCENE-8043 > URL: https://issues.apache.org/jira/browse/LUCENE-8043 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 4.10, 7.0, master (8.0) >Reporter: Yonik Seeley >Assignee: Simon Willnauer > Attachments: LUCENE-8043.patch > > > The IndexWriter check for too many documents does not always work, resulting > in going over the limit. Once this happens, Lucene refuses to open the index > and throws a CorruptIndexException: Too many documents. > This appears to affect all versions of Lucene/Solr (the check was first > implemented in LUCENE-5843 in v4.9.1/4.10 and we've seen this manifest in > 4.10) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8043) Attempting to add documents past limit can corrupt index
[ https://issues.apache.org/jira/browse/LUCENE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271767#comment-16271767 ] Simon Willnauer commented on LUCENE-8043: - [~jpountz] [~yo...@apache.org] [~mikemccand] I think I found the issue. It seems like we try to drop the same segment reader from the reader pool multiple times during applying deletes which I am not 100% sure is expected or not. Yet, due to that we also reduce the counter for that segment multiple times. With this patch I can run the test 1k times without a failure. I am happy to provide a patch for it but I wonder if this is an expected state? [~mikemccand] can you tell. {code}diff --git a/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java b/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java index 7f47e42d45..586a294915 100644 --- a/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java +++ b/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java @@ -551,13 +551,15 @@ public class IndexWriter implements Closeable, TwoPhaseCommit, Accountable { return true; } -public synchronized void drop(SegmentCommitInfo info) throws IOException { +public synchronized boolean drop(SegmentCommitInfo info) throws IOException { final ReadersAndUpdates rld = readerMap.get(info); if (rld != null) { assert info == rld.info; readerMap.remove(info); rld.dropReaders(); +return true; } + return false; } public synchronized long ramBytesUsed() { @@ -1616,10 +1618,9 @@ public class IndexWriter implements Closeable, TwoPhaseCommit, Accountable { // segment, we leave it in the readerPool; the // merge will skip merging it and will then drop // it once it's done: -if (mergingSegments.contains(info) == false) { +if (mergingSegments.contains(info) == false && readerPool.drop(info)) { segmentInfos.remove(info); pendingNumDocs.addAndGet(-info.info.maxDoc()); - readerPool.drop(info); } }{code} > Attempting to add documents past limit can corrupt index > > > Key: LUCENE-8043 > URL: https://issues.apache.org/jira/browse/LUCENE-8043 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 4.10, 7.0, master (8.0) >Reporter: Yonik Seeley > Attachments: LUCENE-8043.patch > > > The IndexWriter check for too many documents does not always work, resulting > in going over the limit. Once this happens, Lucene refuses to open the index > and throws a CorruptIndexException: Too many documents. > This appears to affect all versions of Lucene/Solr (the check was first > implemented in LUCENE-5843 in v4.9.1/4.10 and we've seen this manifest in > 4.10) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8043) Attempting to add documents past limit can corrupt index
[ https://issues.apache.org/jira/browse/LUCENE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270441#comment-16270441 ] Simon Willnauer commented on LUCENE-8043: - [~jpountz] I can look later at this and try to reproduce it. > Attempting to add documents past limit can corrupt index > > > Key: LUCENE-8043 > URL: https://issues.apache.org/jira/browse/LUCENE-8043 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 4.10, 7.0, master (8.0) >Reporter: Yonik Seeley > Attachments: LUCENE-8043.patch > > > The IndexWriter check for too many documents does not always work, resulting > in going over the limit. Once this happens, Lucene refuses to open the index > and throws a CorruptIndexException: Too many documents. > This appears to affect all versions of Lucene/Solr (the check was first > implemented in LUCENE-5843 in v4.9.1/4.10 and we've seen this manifest in > 4.10) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8043) Attempting to add documents past limit can corrupt index
[ https://issues.apache.org/jira/browse/LUCENE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269209#comment-16269209 ] Adrien Grand commented on LUCENE-8043: -- I can reproduce this but I'm not familiar enough with IndexWriter to understand what causes this. At first I thought thay maybe this was due to the fact the we were giving back documents to early after merges, but actually we do that after updating the list of segment infos, so that looks ok to me. Yet this doesn't prevent the list of segment infos from reaching more that MAX_DOCS documents in {{IndexWriter.publishFlushedSegment}} during the test. [~simonwillnauer] or [~mikemccand] Do you know why this may occur? I wanted to look at the IW info stream to better understand what is happening but unfortunately this probably slows down things enough to prevent the issue from reproducing. It reproduces with assertions enabled ({{-ea}}), but no assertion breaks. > Attempting to add documents past limit can corrupt index > > > Key: LUCENE-8043 > URL: https://issues.apache.org/jira/browse/LUCENE-8043 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 4.10, 7.0, master (8.0) >Reporter: Yonik Seeley > Attachments: LUCENE-8043.patch > > > The IndexWriter check for too many documents does not always work, resulting > in going over the limit. Once this happens, Lucene refuses to open the index > and throws a CorruptIndexException: Too many documents. > This appears to affect all versions of Lucene/Solr (the check was first > implemented in LUCENE-5843 in v4.9.1/4.10 and we've seen this manifest in > 4.10) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8043) Attempting to add documents past limit can corrupt index
[ https://issues.apache.org/jira/browse/LUCENE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242531#comment-16242531 ] Yonik Seeley commented on LUCENE-8043: -- At first I thought it might be more of a transient issue with reopen using the IW and seeing intermediate state that could be over the limit. It was often the case that one could get exceptions about too many docs, but then after merges were finished and the IW was closed, we would be back under the limit. But not always. Sometimes we are still over the limit after all threads have been stopped and we've called commit and close on the IndexWriter. Below is a stack trace of that case: {code} DONE: time in sec:6 Docs indexed:2 ramBytesUsed: sizeInBytes:220160 FAIL: unexpected exception: org.apache.lucene.index.CorruptIndexException: Too many documents: an index cannot exceed 1 but readers have total maxDoc=10010 (resource=BufferedChecksumIndexInput(RAMInputStream(name=segments_4))) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:399) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:59) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:667) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:79) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63) at YCS_IndexTest7.main(YCS_IndexTest7.java:262) {code} > Attempting to add documents past limit can corrupt index > > > Key: LUCENE-8043 > URL: https://issues.apache.org/jira/browse/LUCENE-8043 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 4.10 >Reporter: Yonik Seeley > Attachments: LUCENE-8043.patch > > > The IndexWriter check for too many documents does not always work, resulting > in going over the limit. Once this happens, Lucene refuses to open the index > and throws a CorruptIndexException: Too many documents. > This appears to affect all versions of Lucene/Solr (the check was first > implemented in LUCENE-5843 in v4.9.1/4.10 and we've seen this manifest in > 4.10) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org