[ 
https://issues.apache.org/jira/browse/LUCENE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-8118:
--------------------------------
    Attachment: LUCENE-8118_test.patch

Here's a really bad test, but it works (takes about 2 minutes). 

lucene/core$ ant test -Dtestcase=TestIndexWriter 
-Dtestmethod=testAddDocumentsMassive -Dtests.heapsize=4G

{noformat}
  [junit4] <JUnit4> says مرحبا! Master seed: 1655BF16A8843A6A
   [junit4] Executing 1 suite with 1 JVM.
   [junit4] 
   [junit4] Started J0 PID(22813@localhost).
   [junit4] Suite: org.apache.lucene.index.TestIndexWriter
   [junit4] HEARTBEAT J0 PID(22813@localhost): 2018-01-05T11:27:48, stalled for 
71.2s at: TestIndexWriter.testAddDocumentsMassive
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestIndexWriter 
-Dtests.method=testAddDocumentsMassive -Dtests.seed=1655BF16A8843A6A 
-Dtests.locale=fr-FR -Dtests.timezone=Asia/Oral -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
   [junit4] ERROR    121s | TestIndexWriter.testAddDocumentsMassive <<<
   [junit4]    > Throwable #1: java.lang.ArrayIndexOutOfBoundsException: -65536
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([1655BF16A8843A6A:2B0B86082D338FEA]:0)
   [junit4]    >        at 
org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198)
   [junit4]    >        at 
org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224)
   [junit4]    >        at 
org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:80)
   [junit4]    >        at 
org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:171)
   [junit4]    >        at 
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185)
   [junit4]    >        at 
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786)
   [junit4]    >        at 
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
   [junit4]    >        at 
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)
   [junit4]    >        at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281)
   [junit4]    >        at 
org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:452)
   [junit4]    >        at 
org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1530)
   [junit4]    >        at 
org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1506)
   [junit4]    >        at 
org.apache.lucene.index.TestIndexWriter.testAddDocumentsMassive(TestIndexWriter.java:2994)
   [junit4]    >        at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: leaving temporary files on disk at: 
/home/rmuir/workspace/lucene-solr/lucene/build/core/test/J0/temp/lucene.index.TestIndexWriter_1655BF16A8843A6A-001
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene70), 
sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@2a213e8b),
 locale=fr-FR, timezone=Asia/Oral
   [junit4]   2> NOTE: Linux 4.4.0-104-generic amd64/Oracle Corporation 
1.8.0_45 (64-bit)/cpus=8,threads=1,free=543468648,total=2733637632
   [junit4]   2> NOTE: All tests run in this JVM: [TestIndexWriter]
   [junit4] Completed [1/1 (1!)] in 121.55s, 1 test, 1 error <<< FAILURES!
   [junit4] 
   [junit4] 
   [junit4] Tests with failures [seed: 1655BF16A8843A6A]:
   [junit4]   - org.apache.lucene.index.TestIndexWriter.testAddDocumentsMassive
   [junit4] 
   [junit4] 
   [junit4] JVM J0:     0.38 ..   122.56 =   122.18s
   [junit4] Execution time total: 2 minutes 2 seconds
   [junit4] Tests summary: 1 suite, 1 test, 1 error

BUILD FAILED
/home/rmuir/workspace/lucene-solr/lucene/common-build.xml:1512: The following 
error occurred while executing this line:
/home/rmuir/workspace/lucene-solr/lucene/common-build.xml:1038: There were test 
failures: 1 suite, 1 test, 1 error [seed: 1655BF16A8843A6A]

Total time: 2 minutes 5 seconds
{noformat}

> ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-8118
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8118
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 7.2
>         Environment: Debian/Stretch
> java version "1.8.0_144"                                                      
>                                                                               
>                                                    Java(TM) SE Runtime 
> Environment (build 1.8.0_144-b01)                                             
>                                                                               
>                                Java HotSpot(TM) 64-Bit Server VM (build 
> 25.144-b01, mixed mode)
>            Reporter: Laura Dietz
>         Attachments: LUCENE-8118_test.patch
>
>
> Indexing a large collection of about 20 million paragraph-sized documents 
> results in an ArrayIndexOutOfBoundsException in 
> org.apache.lucene.index.TermsHashPerField.writeByte  (full stack trace 
> below). 
> The bug is possibly related to issues described in 
> [here|http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-65536-td3661945.html]
>   and [SOLR-10936|https://issues.apache.org/jira/browse/SOLR-10936] -- but I 
> am not using SOLR, I am directly using Lucene Core.
> The issue can be reproduced using code from  [GitHub 
> trec-car-tools-example|https://github.com/TREMA-UNH/trec-car-tools/tree/lucene-bug/trec-car-tools-example]
>  
> - compile with `mvn compile assembly:single`
> - run with `java -cp 
> ./target/treccar-tools-example-0.1-jar-with-dependencies.jar 
> edu.unh.cs.TrecCarBuildLuceneIndex paragraphs paragraphCorpus.cbor indexDir`
> Where paragraphCorpus.cbor is contained in this 
> [archive|http://trec-car.cs.unh.edu/datareleases/v2.0-snapshot/archive-paragraphCorpus.tar.xz]
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -65536   
>                                                                         at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198)
>                                                                               
>                                                at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224)
>                                                                               
>                                                at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:159)
>                                                                               
>                              at 
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185)     
>                                                                               
>                                                 at 
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786)
>                                                                               
>                                    at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
>                                                                               
>                                       at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)
>                                                                               
>                                    at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281)
>                                                                               
>                            at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451)
>                                                                               
>                                              at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1532)    
>                                                                               
>                                                 at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1508)
>         at 
> edu.unh.cs.TrecCarBuildLuceneIndex.main(TrecCarBuildLuceneIndex.java:55)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to