[ https://issues.apache.org/jira/browse/LUCENE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-8118: -------------------------------- Attachment: LUCENE-8118_test.patch Here's a really bad test, but it works (takes about 2 minutes). lucene/core$ ant test -Dtestcase=TestIndexWriter -Dtestmethod=testAddDocumentsMassive -Dtests.heapsize=4G {noformat} [junit4] <JUnit4> says مرحبا! Master seed: 1655BF16A8843A6A [junit4] Executing 1 suite with 1 JVM. [junit4] [junit4] Started J0 PID(22813@localhost). [junit4] Suite: org.apache.lucene.index.TestIndexWriter [junit4] HEARTBEAT J0 PID(22813@localhost): 2018-01-05T11:27:48, stalled for 71.2s at: TestIndexWriter.testAddDocumentsMassive [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtests.method=testAddDocumentsMassive -Dtests.seed=1655BF16A8843A6A -Dtests.locale=fr-FR -Dtests.timezone=Asia/Oral -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 121s | TestIndexWriter.testAddDocumentsMassive <<< [junit4] > Throwable #1: java.lang.ArrayIndexOutOfBoundsException: -65536 [junit4] > at __randomizedtesting.SeedInfo.seed([1655BF16A8843A6A:2B0B86082D338FEA]:0) [junit4] > at org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198) [junit4] > at org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224) [junit4] > at org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:80) [junit4] > at org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:171) [junit4] > at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185) [junit4] > at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786) [junit4] > at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430) [junit4] > at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392) [junit4] > at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281) [junit4] > at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:452) [junit4] > at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1530) [junit4] > at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1506) [junit4] > at org.apache.lucene.index.TestIndexWriter.testAddDocumentsMassive(TestIndexWriter.java:2994) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: leaving temporary files on disk at: /home/rmuir/workspace/lucene-solr/lucene/build/core/test/J0/temp/lucene.index.TestIndexWriter_1655BF16A8843A6A-001 [junit4] 2> NOTE: test params are: codec=Asserting(Lucene70), sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@2a213e8b), locale=fr-FR, timezone=Asia/Oral [junit4] 2> NOTE: Linux 4.4.0-104-generic amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=8,threads=1,free=543468648,total=2733637632 [junit4] 2> NOTE: All tests run in this JVM: [TestIndexWriter] [junit4] Completed [1/1 (1!)] in 121.55s, 1 test, 1 error <<< FAILURES! [junit4] [junit4] [junit4] Tests with failures [seed: 1655BF16A8843A6A]: [junit4] - org.apache.lucene.index.TestIndexWriter.testAddDocumentsMassive [junit4] [junit4] [junit4] JVM J0: 0.38 .. 122.56 = 122.18s [junit4] Execution time total: 2 minutes 2 seconds [junit4] Tests summary: 1 suite, 1 test, 1 error BUILD FAILED /home/rmuir/workspace/lucene-solr/lucene/common-build.xml:1512: The following error occurred while executing this line: /home/rmuir/workspace/lucene-solr/lucene/common-build.xml:1038: There were test failures: 1 suite, 1 test, 1 error [seed: 1655BF16A8843A6A] Total time: 2 minutes 5 seconds {noformat} > ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing > ----------------------------------------------------------------------------- > > Key: LUCENE-8118 > URL: https://issues.apache.org/jira/browse/LUCENE-8118 > Project: Lucene - Core > Issue Type: Bug > Components: core/index > Affects Versions: 7.2 > Environment: Debian/Stretch > java version "1.8.0_144" > > Java(TM) SE Runtime > Environment (build 1.8.0_144-b01) > > Java HotSpot(TM) 64-Bit Server VM (build > 25.144-b01, mixed mode) > Reporter: Laura Dietz > Attachments: LUCENE-8118_test.patch > > > Indexing a large collection of about 20 million paragraph-sized documents > results in an ArrayIndexOutOfBoundsException in > org.apache.lucene.index.TermsHashPerField.writeByte (full stack trace > below). > The bug is possibly related to issues described in > [here|http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-65536-td3661945.html] > and [SOLR-10936|https://issues.apache.org/jira/browse/SOLR-10936] -- but I > am not using SOLR, I am directly using Lucene Core. > The issue can be reproduced using code from [GitHub > trec-car-tools-example|https://github.com/TREMA-UNH/trec-car-tools/tree/lucene-bug/trec-car-tools-example] > > - compile with `mvn compile assembly:single` > - run with `java -cp > ./target/treccar-tools-example-0.1-jar-with-dependencies.jar > edu.unh.cs.TrecCarBuildLuceneIndex paragraphs paragraphCorpus.cbor indexDir` > Where paragraphCorpus.cbor is contained in this > [archive|http://trec-car.cs.unh.edu/datareleases/v2.0-snapshot/archive-paragraphCorpus.tar.xz] > Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -65536 > at > org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198) > > at > org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224) > > at > org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:159) > > at > org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185) > > at > org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786) > > at > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430) > > at > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392) > > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281) > > at > org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451) > > at > org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1532) > > at > org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1508) > at > edu.unh.cs.TrecCarBuildLuceneIndex.main(TrecCarBuildLuceneIndex.java:55) -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org