[ 
https://issues.apache.org/jira/browse/LUCENE-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-9477.
-------------------------------------
    Fix Version/s: 8.7
                   master (9.0)
       Resolution: Fixed

> IndexWriter might leave broken segments file behind on exception during 
> rollback
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-9477
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9477
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Simon Willnauer
>            Priority: Major
>             Fix For: master (9.0), 8.7
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Mike ran some beasty tests while I was working on LUCENE-8962. This test 
> caused some headaches since it only rarely also fails on master:
> {noformat}
> org.apache.lucene.index.TestIndexWriterOnVMError > testUnknownError FAILED
>     org.apache.lucene.index.CorruptIndexException: Unexpected file read error 
> while reading index. 
> (resource=BufferedChecksumIndexInput(MockIndexInputWrapper((clone of) 
> ByteBuffersIndexInput (file=pending_segments_2, buffers\
> =258 bytes, block size: 1, blocks: 1, position: 0))))
>         at 
> __randomizedtesting.SeedInfo.seed([587A104EFE0C57E1:B32CCFCEFC8BC1D1]:0)
>         at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:300)
>         at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:521)
>         at org.apache.lucene.util.TestUtil.checkIndex(TestUtil.java:301)
>         at 
> org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:836)
>         at 
> org.apache.lucene.index.TestIndexWriterOnVMError.doTest(TestIndexWriterOnVMError.java:89)
>         at 
> org.apache.lucene.index.TestIndexWriterOnVMError.testUnknownError(TestIndexWriterOnVMError.java:251)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>         at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>         at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>         at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
>         at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
>         at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>         at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
>         at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
>         at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
>         at 
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)
>         at java.base/java.lang.Thread.run(Thread.java:834)
>         Caused by:
>         java.io.FileNotFoundException: _0.si in 
> dir=ByteBuffersDirectory@1bae3fe1 
> lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@38275f41
>             at 
> org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:748)
>             at 
> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)
>             at 
> org.apache.lucene.store.MockDirectoryWrapper.openChecksumInput(MockDirectoryWrapper.java:1044)
>             at 
> org.apache.lucene.codecs.lucene86.Lucene86SegmentInfoFormat.read(Lucene86SegmentInfoFormat.java:91)
>             at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:364)
>             at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:298)
>             ... 41 more
>         ....
>   2> NOTE: reproduce with: ant test  -Dtestcase=TestIndexWriterOnVMError 
> -Dtests.method=testUnknownError -Dtests.seed=587A104EFE0C57E1 
> -Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.linedocsfile=/l/sim\
> on/lucene/test-framework/src/resources/org/apache/lucene/util/2000mb.txt.gz 
> -Dtests.locale=zh-CN -Dtests.timezone=SystemV/MST7MDT -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
>   2> NOTE: leaving temporary files on disk at: 
> /l/simon/lucene/core/build/tmp/tests-tmp/lucene.index.TestIndexWriterOnVMError_587A104EFE0C57E1-003
>   2> NOTE: test params are: codec=Asserting(Lucene86): 
> {text_payloads=BlockTreeOrds(blocksize=128), 
> text_vectors=PostingsFormat(name=Asserting), 
> text1=PostingsFormat(name=Asserting), id=BlockTreeOrds(blocksize=128)}, 
> docValu\
> es:{dv3=DocValuesFormat(name=Lucene80), dv2=DocValuesFormat(name=Asserting), 
> dv5=DocValuesFormat(name=Lucene80), dv=DocValuesFormat(name=Asserting), 
> dv4=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=696, maxMBSortInH\
> eap=6.040673619645681, sim=Asserting(RandomSimilarity(queryNorm=false): 
> {text_payloads=IB SPL-DZ(0.3), text_vectors=DFR I(ne)L3(800.0), 
> text1=org.apache.lucene.search.similarities.BooleanSimilarity@6f4329a1}), 
> locale=zh-CN, \
> timezone=SystemV/MST7MDT
>   2> NOTE: Linux 5.5.6-arch1-1 amd64/Oracle Corporation 11.0.6 
> (64-bit)/cpus=128,threads=1,free=241525696,total=268435456
>   2> NOTE: All tests run in this JVM: [TestIndexWriterOnVMError]
> {noformat}
> The test reproduces on master also without the huge line docs file using this:
> {noformat}
> ant test  -Dtestcase=TestIndexWriterOnVMError -Dtests.method=testUnknownError 
> -Dtests.seed=587A104EFE0C57E1 -Dtests.nightly=true -Dtests.slow=true 
> -Dtests.badapples=true -Dtests.locale=zh-CN -Dtests.timezone=SystemV/MST7MDT 
> -Dtests.asserts=true -Dtests.file.encoding=UTF-8
> {noformat}
> the reason is that we fail to delete the already renamed pending segments 
> file when the metadata sync on the directory fails. The subsequent rollback 
> also crashes while it's trying to delete unrefed files and that will cause 
> subsequent CheckIndex calls to fail with FNF exceptions since the commit was 
> written but not fully removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to