[jira] [Commented] (LUCENE-8118) ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing

2018-01-06 Thread Laura Dietz (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314939#comment-16314939
 ] 

Laura Dietz commented on LUCENE-8118:
-

+1

> ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing
> -
>
> Key: LUCENE-8118
> URL: https://issues.apache.org/jira/browse/LUCENE-8118
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.2
> Environment: Debian/Stretch
> java version "1.8.0_144"  
>   
>Java(TM) SE Runtime 
> Environment (build 1.8.0_144-b01) 
>   
>Java HotSpot(TM) 64-Bit Server VM (build 
> 25.144-b01, mixed mode)
>Reporter: Laura Dietz
> Attachments: LUCENE-8118_test.patch
>
>
> Indexing a large collection of about 20 million paragraph-sized documents 
> results in an ArrayIndexOutOfBoundsException in 
> org.apache.lucene.index.TermsHashPerField.writeByte  (full stack trace 
> below). 
> The bug is possibly related to issues described in 
> [here|http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-65536-td3661945.html]
>   and [SOLR-10936|https://issues.apache.org/jira/browse/SOLR-10936] -- but I 
> am not using SOLR, I am directly using Lucene Core.
> The issue can be reproduced using code from  [GitHub 
> trec-car-tools-example|https://github.com/TREMA-UNH/trec-car-tools/tree/lucene-bug/trec-car-tools-example]
>  
> - compile with `mvn compile assembly:single`
> - run with `java -cp 
> ./target/treccar-tools-example-0.1-jar-with-dependencies.jar 
> edu.unh.cs.TrecCarBuildLuceneIndex paragraphs paragraphCorpus.cbor indexDir`
> Where paragraphCorpus.cbor is contained in this 
> [archive|http://trec-car.cs.unh.edu/datareleases/v2.0-snapshot/archive-paragraphCorpus.tar.xz]
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -65536   
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198)
>   
>at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224)
>   
>at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:159)
>   
>  at 
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185) 
>   
> at 
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786)
>   
>at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
>   
>   at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)
>   
>at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281)
>   
>at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451)
>   
>  at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1532)
>   
> at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1508)
> at 
> edu.unh.cs.TrecCarBuildLuceneIndex.main(TrecCarBuildLuceneIndex.java:55)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: 

[jira] [Comment Edited] (LUCENE-8118) ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing

2018-01-06 Thread Laura Dietz (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314853#comment-16314853
 ] 

Laura Dietz edited comment on LUCENE-8118 at 1/6/18 7:24 PM:
-

Dawid, Michael, my computer has plenty of RAM, which is why I never see an OOM 
exception and always get the AIOOBE. 



was (Author: laura-dietz):
Dawid, my computer has plenty of RAM, which is why I never see an OOM exception 
and always get the AIOOBE. 


> ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing
> -
>
> Key: LUCENE-8118
> URL: https://issues.apache.org/jira/browse/LUCENE-8118
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.2
> Environment: Debian/Stretch
> java version "1.8.0_144"  
>   
>Java(TM) SE Runtime 
> Environment (build 1.8.0_144-b01) 
>   
>Java HotSpot(TM) 64-Bit Server VM (build 
> 25.144-b01, mixed mode)
>Reporter: Laura Dietz
> Attachments: LUCENE-8118_test.patch
>
>
> Indexing a large collection of about 20 million paragraph-sized documents 
> results in an ArrayIndexOutOfBoundsException in 
> org.apache.lucene.index.TermsHashPerField.writeByte  (full stack trace 
> below). 
> The bug is possibly related to issues described in 
> [here|http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-65536-td3661945.html]
>   and [SOLR-10936|https://issues.apache.org/jira/browse/SOLR-10936] -- but I 
> am not using SOLR, I am directly using Lucene Core.
> The issue can be reproduced using code from  [GitHub 
> trec-car-tools-example|https://github.com/TREMA-UNH/trec-car-tools/tree/lucene-bug/trec-car-tools-example]
>  
> - compile with `mvn compile assembly:single`
> - run with `java -cp 
> ./target/treccar-tools-example-0.1-jar-with-dependencies.jar 
> edu.unh.cs.TrecCarBuildLuceneIndex paragraphs paragraphCorpus.cbor indexDir`
> Where paragraphCorpus.cbor is contained in this 
> [archive|http://trec-car.cs.unh.edu/datareleases/v2.0-snapshot/archive-paragraphCorpus.tar.xz]
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -65536   
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198)
>   
>at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224)
>   
>at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:159)
>   
>  at 
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185) 
>   
> at 
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786)
>   
>at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
>   
>   at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)
>   
>at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281)
>   
>at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451)
>   
>  at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1532)
>   
> at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1508)
> at 
> 

[jira] [Commented] (LUCENE-8118) ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing

2018-01-06 Thread Laura Dietz (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314853#comment-16314853
 ] 

Laura Dietz commented on LUCENE-8118:
-

Dawid, my computer has plenty of RAM, which is why I never see an OOM exception 
and always get the AIOOBE. 


> ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing
> -
>
> Key: LUCENE-8118
> URL: https://issues.apache.org/jira/browse/LUCENE-8118
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.2
> Environment: Debian/Stretch
> java version "1.8.0_144"  
>   
>Java(TM) SE Runtime 
> Environment (build 1.8.0_144-b01) 
>   
>Java HotSpot(TM) 64-Bit Server VM (build 
> 25.144-b01, mixed mode)
>Reporter: Laura Dietz
> Attachments: LUCENE-8118_test.patch
>
>
> Indexing a large collection of about 20 million paragraph-sized documents 
> results in an ArrayIndexOutOfBoundsException in 
> org.apache.lucene.index.TermsHashPerField.writeByte  (full stack trace 
> below). 
> The bug is possibly related to issues described in 
> [here|http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-65536-td3661945.html]
>   and [SOLR-10936|https://issues.apache.org/jira/browse/SOLR-10936] -- but I 
> am not using SOLR, I am directly using Lucene Core.
> The issue can be reproduced using code from  [GitHub 
> trec-car-tools-example|https://github.com/TREMA-UNH/trec-car-tools/tree/lucene-bug/trec-car-tools-example]
>  
> - compile with `mvn compile assembly:single`
> - run with `java -cp 
> ./target/treccar-tools-example-0.1-jar-with-dependencies.jar 
> edu.unh.cs.TrecCarBuildLuceneIndex paragraphs paragraphCorpus.cbor indexDir`
> Where paragraphCorpus.cbor is contained in this 
> [archive|http://trec-car.cs.unh.edu/datareleases/v2.0-snapshot/archive-paragraphCorpus.tar.xz]
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -65536   
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198)
>   
>at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224)
>   
>at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:159)
>   
>  at 
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185) 
>   
> at 
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786)
>   
>at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
>   
>   at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)
>   
>at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281)
>   
>at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451)
>   
>  at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1532)
>   
> at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1508)
> at 
> edu.unh.cs.TrecCarBuildLuceneIndex.main(TrecCarBuildLuceneIndex.java:55)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (LUCENE-8118) ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing

2018-01-05 Thread Laura Dietz (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313323#comment-16313323
 ] 

Laura Dietz commented on LUCENE-8118:
-

I think my mistake was to abuse addDocuments(iterator).

I switched to addDocument(doc) with a commit every so often (see master branch)


> ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing
> -
>
> Key: LUCENE-8118
> URL: https://issues.apache.org/jira/browse/LUCENE-8118
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.2
> Environment: Debian/Stretch
> java version "1.8.0_144"  
>   
>Java(TM) SE Runtime 
> Environment (build 1.8.0_144-b01) 
>   
>Java HotSpot(TM) 64-Bit Server VM (build 
> 25.144-b01, mixed mode)
>Reporter: Laura Dietz
>
> Indexing a large collection of about 20 million paragraph-sized documents 
> results in an ArrayIndexOutOfBoundsException in 
> org.apache.lucene.index.TermsHashPerField.writeByte  (full stack trace 
> below). 
> The bug is possibly related to issues described in 
> [here|http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-65536-td3661945.html]
>   and [SOLR-10936|https://issues.apache.org/jira/browse/SOLR-10936] -- but I 
> am not using SOLR, I am directly using Lucene Core.
> The issue can be reproduced using code from  [GitHub 
> trec-car-tools-example|https://github.com/TREMA-UNH/trec-car-tools/tree/lucene-bug/trec-car-tools-example]
>  
> - compile with `mvn compile assembly:single`
> - run with `java -cp 
> ./target/treccar-tools-example-0.1-jar-with-dependencies.jar 
> edu.unh.cs.TrecCarBuildLuceneIndex paragraphs paragraphCorpus.cbor indexDir`
> Where paragraphCorpus.cbor is contained in this 
> [archive|http://trec-car.cs.unh.edu/datareleases/v2.0-snapshot/archive-paragraphCorpus.tar.xz]
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -65536   
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198)
>   
>at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224)
>   
>at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:159)
>   
>  at 
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185) 
>   
> at 
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786)
>   
>at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
>   
>   at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)
>   
>at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281)
>   
>at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451)
>   
>  at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1532)
>   
> at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1508)
> at 
> edu.unh.cs.TrecCarBuildLuceneIndex.main(TrecCarBuildLuceneIndex.java:55)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To 

[jira] [Commented] (LUCENE-8118) ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing

2018-01-05 Thread Laura Dietz (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313292#comment-16313292
 ] 

Laura Dietz commented on LUCENE-8118:
-

Robert, that would be even better!

It is difficult to guess what the right interval of issuing a commits is. I 
understand that some hand tuning might be necessary to get the highest 
performance for given resource constraints. If the issue is a buffer that is 
filling up, it would be helpful to have some form of an emergency auto-commit.

> ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing
> -
>
> Key: LUCENE-8118
> URL: https://issues.apache.org/jira/browse/LUCENE-8118
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.2
> Environment: Debian/Stretch
> java version "1.8.0_144"  
>   
>Java(TM) SE Runtime 
> Environment (build 1.8.0_144-b01) 
>   
>Java HotSpot(TM) 64-Bit Server VM (build 
> 25.144-b01, mixed mode)
>Reporter: Laura Dietz
>
> Indexing a large collection of about 20 million paragraph-sized documents 
> results in an ArrayIndexOutOfBoundsException in 
> org.apache.lucene.index.TermsHashPerField.writeByte  (full stack trace 
> below). 
> The bug is possibly related to issues described in 
> [here|http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-65536-td3661945.html]
>   and [SOLR-10936|https://issues.apache.org/jira/browse/SOLR-10936] -- but I 
> am not using SOLR, I am directly using Lucene Core.
> The issue can be reproduced using code from  [GitHub 
> trec-car-tools-example|https://github.com/TREMA-UNH/trec-car-tools/tree/lucene-bug/trec-car-tools-example]
>  
> - compile with `mvn compile assembly:single`
> - run with `java -cp 
> ./target/treccar-tools-example-0.1-jar-with-dependencies.jar 
> edu.unh.cs.TrecCarBuildLuceneIndex paragraphs paragraphCorpus.cbor indexDir`
> Where paragraphCorpus.cbor is contained in this 
> [archive|http://trec-car.cs.unh.edu/datareleases/v2.0-snapshot/archive-paragraphCorpus.tar.xz]
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -65536   
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198)
>   
>at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224)
>   
>at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:159)
>   
>  at 
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185) 
>   
> at 
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786)
>   
>at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
>   
>   at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)
>   
>at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281)
>   
>at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451)
>   
>  at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1532)
>   
> at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1508)
> at 
> 

[jira] [Commented] (LUCENE-8118) ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing

2018-01-05 Thread Laura Dietz (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313256#comment-16313256
 ] 

Laura Dietz commented on LUCENE-8118:
-

Yes, that works - Thanks, Diego!

I think I could have been helped with an Exception message that indicates 
"Buffer full, call index.commit!"




> ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing
> -
>
> Key: LUCENE-8118
> URL: https://issues.apache.org/jira/browse/LUCENE-8118
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.2
> Environment: Debian/Stretch
> java version "1.8.0_144"  
>   
>Java(TM) SE Runtime 
> Environment (build 1.8.0_144-b01) 
>   
>Java HotSpot(TM) 64-Bit Server VM (build 
> 25.144-b01, mixed mode)
>Reporter: Laura Dietz
>
> Indexing a large collection of about 20 million paragraph-sized documents 
> results in an ArrayIndexOutOfBoundsException in 
> org.apache.lucene.index.TermsHashPerField.writeByte  (full stack trace 
> below). 
> The bug is possibly related to issues described in 
> [here|http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-65536-td3661945.html]
>   and [SOLR-10936|https://issues.apache.org/jira/browse/SOLR-10936] -- but I 
> am not using SOLR, I am directly using Lucene Core.
> The issue can be reproduced using code from  [GitHub 
> trec-car-tools-example|https://github.com/TREMA-UNH/trec-car-tools/tree/lucene-bug/trec-car-tools-example]
>  
> - compile with `mvn compile assembly:single`
> - run with `java -cp 
> ./target/treccar-tools-example-0.1-jar-with-dependencies.jar 
> edu.unh.cs.TrecCarBuildLuceneIndex paragraphs paragraphCorpus.cbor indexDir`
> Where paragraphCorpus.cbor is contained in this 
> [archive|http://trec-car.cs.unh.edu/datareleases/v2.0-snapshot/archive-paragraphCorpus.tar.xz]
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -65536   
> at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198)
>   
>at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224)
>   
>at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:159)
>   
>  at 
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185) 
>   
> at 
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786)
>   
>at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
>   
>   at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)
>   
>at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281)
>   
>at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451)
>   
>  at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1532)
>   
> at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1508)
> at 
> edu.unh.cs.TrecCarBuildLuceneIndex.main(TrecCarBuildLuceneIndex.java:55)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To 

[jira] [Created] (LUCENE-8118) ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing

2018-01-04 Thread Laura Dietz (JIRA)
Laura Dietz created LUCENE-8118:
---

 Summary: ArrayIndexOutOfBoundsException in 
TermsHashPerField.writeByte during indexing
 Key: LUCENE-8118
 URL: https://issues.apache.org/jira/browse/LUCENE-8118
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 7.2
 Environment: Debian/Stretch
java version "1.8.0_144"

   Java(TM) SE Runtime Environment 
(build 1.8.0_144-b01)   

   Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
Reporter: Laura Dietz


Indexing a large collection of about 20 million paragraph-sized documents 
results in an ArrayIndexOutOfBoundsException in 
org.apache.lucene.index.TermsHashPerField.writeByte  (full stack trace below). 


The bug is possibly related to issues described in 
[here|http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-65536-td3661945.html]
  and [SOLR-10936|https://issues.apache.org/jira/browse/SOLR-10936] -- but I am 
not using SOLR, I am directly using Lucene Core.

The issue can be reproduced using code from  [GitHub 
trec-car-tools-example|https://github.com/TREMA-UNH/trec-car-tools/tree/lucene-bug/trec-car-tools-example]
 

- compile with `mvn compile assembly:single`
- run with `java -cp 
./target/treccar-tools-example-0.1-jar-with-dependencies.jar 
edu.unh.cs.TrecCarBuildLuceneIndex paragraphs paragraphCorpus.cbor indexDir`

Where paragraphCorpus.cbor is contained in this 
[archive|http://trec-car.cs.unh.edu/datareleases/v2.0-snapshot/archive-paragraphCorpus.tar.xz]



Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -65536 
  at 
org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198) 

at 
org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224) 

at 
org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:159)

   at 
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185)   

at 
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786)

 at 
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)

at 
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)

 at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281)

 at 
org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451)

   at 
org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1532)  

at 
org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1508)
at 
edu.unh.cs.TrecCarBuildLuceneIndex.main(TrecCarBuildLuceneIndex.java:55)




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org