[jira] [Commented] (LUCENE-8517) TestRandomChains.testRandomChainsWithLargeStrings failure

2018-11-28 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702264#comment-16702264
 ] 

ASF subversion and git services commented on LUCENE-8517:
-

Commit 54907903e8d1a5da0c65328f24a1018c5e393afc in lucene-solr's branch 
refs/heads/jira/http2 from Michael Sokolov
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5490790 ]

LUCENE-8517: do not wrap FixedShingleFilter with conditional in TestRandomChains


> TestRandomChains.testRandomChainsWithLargeStrings failure
> -
>
> Key: LUCENE-8517
> URL: https://issues.apache.org/jira/browse/LUCENE-8517
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Steve Rowe
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> From 
> [https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/2828/consoleText], 
> reproduces for me on Java8:
> {noformat}
> Checking out Revision 216f10026b86627750e133fe24ce6a750c470695 
> (refs/remotes/origin/branch_7x)
> [...]
> [java-info] java version "10.0.1"
> [java-info] OpenJDK Runtime Environment (10.0.1+10, Oracle Corporation)
> [java-info] OpenJDK 64-Bit Server VM (10.0.1+10, Oracle Corporation)
> [java-info] Test args: [-XX:-UseCompressedOops -XX:+UseConcMarkSweepGC]
> [...]
>[junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
>[junit4]   2> Exception from random analyzer: 
>[junit4]   2> charfilters=
>[junit4]   2>   
> org.apache.lucene.analysis.charfilter.MappingCharFilter(org.apache.lucene.analysis.charfilter.NormalizeCharMap@3ef95503,
>  java.io.StringReader@70dde633)
>[junit4]   2>   
> org.apache.lucene.analysis.fa.PersianCharFilter(org.apache.lucene.analysis.charfilter.MappingCharFilter@12423b20)
>[junit4]   2> tokenizer=
>[junit4]   2>   org.apache.lucene.analysis.th.ThaiTokenizer()
>[junit4]   2> filters=
>[junit4]   2>   
> org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter(ValidatingTokenFilter@7914bba7
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  org.apache.lucene.analysis.compound.hyphenation.HyphenationTree@abd7bca)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.MockGraphTokenFilter(java.util.Random@56348091,
>  OneTimeWrapper@aa1c073 
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.shingle.FixedShingleFilter(OneTimeWrapper@4cf58fce
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  4, , )
>[junit4]   2>   
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter(ValidatingTokenFilter@3a915324
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false)
>[junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRandomChains 
> -Dtests.method=testRandomChainsWithLargeStrings -Dtests.seed=92344C536D4E00F4 
> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=en-ZW 
> -Dtests.timezone=Atlantic/Faroe -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>[junit4] ERROR   0.46s J2 | 
> TestRandomChains.testRandomChainsWithLargeStrings <<<
>[junit4]> Throwable #1: java.lang.IllegalStateException: stage 3: 
> inconsistent startOffset at pos=0: 0 vs 5; token=effort
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([92344C536D4E00F4:F86FF34234002007]:0)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:109)
>[junit4]>  at 
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter.incrementToken(PortugueseLightStemFilter.java:48)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:68)
>[junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:441)
>[junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546)
>[junit4]>  at 
> org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:897)
>[junit4]>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>[junit4]>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>[junit4]>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>[junit4]>  at 
> 

[jira] [Commented] (LUCENE-8517) TestRandomChains.testRandomChainsWithLargeStrings failure

2018-11-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700260#comment-16700260
 ] 

ASF subversion and git services commented on LUCENE-8517:
-

Commit e5ab0d41effbad5d65bc3e5e0a5133459317fa14 in lucene-solr's branch 
refs/heads/branch_7x from Michael Sokolov
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e5ab0d4 ]

LUCENE-8517: do not wrap FixedShingleFilter with conditional in TestRandomChains


> TestRandomChains.testRandomChainsWithLargeStrings failure
> -
>
> Key: LUCENE-8517
> URL: https://issues.apache.org/jira/browse/LUCENE-8517
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Steve Rowe
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> From 
> [https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/2828/consoleText], 
> reproduces for me on Java8:
> {noformat}
> Checking out Revision 216f10026b86627750e133fe24ce6a750c470695 
> (refs/remotes/origin/branch_7x)
> [...]
> [java-info] java version "10.0.1"
> [java-info] OpenJDK Runtime Environment (10.0.1+10, Oracle Corporation)
> [java-info] OpenJDK 64-Bit Server VM (10.0.1+10, Oracle Corporation)
> [java-info] Test args: [-XX:-UseCompressedOops -XX:+UseConcMarkSweepGC]
> [...]
>[junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
>[junit4]   2> Exception from random analyzer: 
>[junit4]   2> charfilters=
>[junit4]   2>   
> org.apache.lucene.analysis.charfilter.MappingCharFilter(org.apache.lucene.analysis.charfilter.NormalizeCharMap@3ef95503,
>  java.io.StringReader@70dde633)
>[junit4]   2>   
> org.apache.lucene.analysis.fa.PersianCharFilter(org.apache.lucene.analysis.charfilter.MappingCharFilter@12423b20)
>[junit4]   2> tokenizer=
>[junit4]   2>   org.apache.lucene.analysis.th.ThaiTokenizer()
>[junit4]   2> filters=
>[junit4]   2>   
> org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter(ValidatingTokenFilter@7914bba7
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  org.apache.lucene.analysis.compound.hyphenation.HyphenationTree@abd7bca)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.MockGraphTokenFilter(java.util.Random@56348091,
>  OneTimeWrapper@aa1c073 
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.shingle.FixedShingleFilter(OneTimeWrapper@4cf58fce
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  4, , )
>[junit4]   2>   
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter(ValidatingTokenFilter@3a915324
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false)
>[junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRandomChains 
> -Dtests.method=testRandomChainsWithLargeStrings -Dtests.seed=92344C536D4E00F4 
> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=en-ZW 
> -Dtests.timezone=Atlantic/Faroe -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>[junit4] ERROR   0.46s J2 | 
> TestRandomChains.testRandomChainsWithLargeStrings <<<
>[junit4]> Throwable #1: java.lang.IllegalStateException: stage 3: 
> inconsistent startOffset at pos=0: 0 vs 5; token=effort
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([92344C536D4E00F4:F86FF34234002007]:0)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:109)
>[junit4]>  at 
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter.incrementToken(PortugueseLightStemFilter.java:48)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:68)
>[junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:441)
>[junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546)
>[junit4]>  at 
> org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:897)
>[junit4]>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>[junit4]>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>[junit4]>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>[junit4]>  at 
> 

[jira] [Commented] (LUCENE-8517) TestRandomChains.testRandomChainsWithLargeStrings failure

2018-11-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700231#comment-16700231
 ] 

ASF subversion and git services commented on LUCENE-8517:
-

Commit 54907903e8d1a5da0c65328f24a1018c5e393afc in lucene-solr's branch 
refs/heads/master from Michael Sokolov
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5490790 ]

LUCENE-8517: do not wrap FixedShingleFilter with conditional in TestRandomChains


> TestRandomChains.testRandomChainsWithLargeStrings failure
> -
>
> Key: LUCENE-8517
> URL: https://issues.apache.org/jira/browse/LUCENE-8517
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Steve Rowe
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> From 
> [https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/2828/consoleText], 
> reproduces for me on Java8:
> {noformat}
> Checking out Revision 216f10026b86627750e133fe24ce6a750c470695 
> (refs/remotes/origin/branch_7x)
> [...]
> [java-info] java version "10.0.1"
> [java-info] OpenJDK Runtime Environment (10.0.1+10, Oracle Corporation)
> [java-info] OpenJDK 64-Bit Server VM (10.0.1+10, Oracle Corporation)
> [java-info] Test args: [-XX:-UseCompressedOops -XX:+UseConcMarkSweepGC]
> [...]
>[junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
>[junit4]   2> Exception from random analyzer: 
>[junit4]   2> charfilters=
>[junit4]   2>   
> org.apache.lucene.analysis.charfilter.MappingCharFilter(org.apache.lucene.analysis.charfilter.NormalizeCharMap@3ef95503,
>  java.io.StringReader@70dde633)
>[junit4]   2>   
> org.apache.lucene.analysis.fa.PersianCharFilter(org.apache.lucene.analysis.charfilter.MappingCharFilter@12423b20)
>[junit4]   2> tokenizer=
>[junit4]   2>   org.apache.lucene.analysis.th.ThaiTokenizer()
>[junit4]   2> filters=
>[junit4]   2>   
> org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter(ValidatingTokenFilter@7914bba7
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  org.apache.lucene.analysis.compound.hyphenation.HyphenationTree@abd7bca)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.MockGraphTokenFilter(java.util.Random@56348091,
>  OneTimeWrapper@aa1c073 
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.shingle.FixedShingleFilter(OneTimeWrapper@4cf58fce
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  4, , )
>[junit4]   2>   
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter(ValidatingTokenFilter@3a915324
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false)
>[junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRandomChains 
> -Dtests.method=testRandomChainsWithLargeStrings -Dtests.seed=92344C536D4E00F4 
> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=en-ZW 
> -Dtests.timezone=Atlantic/Faroe -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>[junit4] ERROR   0.46s J2 | 
> TestRandomChains.testRandomChainsWithLargeStrings <<<
>[junit4]> Throwable #1: java.lang.IllegalStateException: stage 3: 
> inconsistent startOffset at pos=0: 0 vs 5; token=effort
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([92344C536D4E00F4:F86FF34234002007]:0)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:109)
>[junit4]>  at 
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter.incrementToken(PortugueseLightStemFilter.java:48)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:68)
>[junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:441)
>[junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546)
>[junit4]>  at 
> org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:897)
>[junit4]>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>[junit4]>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>[junit4]>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>[junit4]>  at 
> 

[jira] [Commented] (LUCENE-8517) TestRandomChains.testRandomChainsWithLargeStrings failure

2018-11-19 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691682#comment-16691682
 ] 

Mike Sokolov commented on LUCENE-8517:
--

Hmm sadly it does still repro for me even with LUCENE-8564 applied. I haven't 
had a chance to look closely this morning. In the meantime I'll post an update 
to the PR addressing your comments, and for now I'll leave in the exclusion for 
{{FIxedShingleFilter}}, although I hope we'll get to the bottom of it.

> TestRandomChains.testRandomChainsWithLargeStrings failure
> -
>
> Key: LUCENE-8517
> URL: https://issues.apache.org/jira/browse/LUCENE-8517
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Steve Rowe
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> From 
> [https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/2828/consoleText], 
> reproduces for me on Java8:
> {noformat}
> Checking out Revision 216f10026b86627750e133fe24ce6a750c470695 
> (refs/remotes/origin/branch_7x)
> [...]
> [java-info] java version "10.0.1"
> [java-info] OpenJDK Runtime Environment (10.0.1+10, Oracle Corporation)
> [java-info] OpenJDK 64-Bit Server VM (10.0.1+10, Oracle Corporation)
> [java-info] Test args: [-XX:-UseCompressedOops -XX:+UseConcMarkSweepGC]
> [...]
>[junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
>[junit4]   2> Exception from random analyzer: 
>[junit4]   2> charfilters=
>[junit4]   2>   
> org.apache.lucene.analysis.charfilter.MappingCharFilter(org.apache.lucene.analysis.charfilter.NormalizeCharMap@3ef95503,
>  java.io.StringReader@70dde633)
>[junit4]   2>   
> org.apache.lucene.analysis.fa.PersianCharFilter(org.apache.lucene.analysis.charfilter.MappingCharFilter@12423b20)
>[junit4]   2> tokenizer=
>[junit4]   2>   org.apache.lucene.analysis.th.ThaiTokenizer()
>[junit4]   2> filters=
>[junit4]   2>   
> org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter(ValidatingTokenFilter@7914bba7
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  org.apache.lucene.analysis.compound.hyphenation.HyphenationTree@abd7bca)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.MockGraphTokenFilter(java.util.Random@56348091,
>  OneTimeWrapper@aa1c073 
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.shingle.FixedShingleFilter(OneTimeWrapper@4cf58fce
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  4, , )
>[junit4]   2>   
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter(ValidatingTokenFilter@3a915324
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false)
>[junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRandomChains 
> -Dtests.method=testRandomChainsWithLargeStrings -Dtests.seed=92344C536D4E00F4 
> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=en-ZW 
> -Dtests.timezone=Atlantic/Faroe -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>[junit4] ERROR   0.46s J2 | 
> TestRandomChains.testRandomChainsWithLargeStrings <<<
>[junit4]> Throwable #1: java.lang.IllegalStateException: stage 3: 
> inconsistent startOffset at pos=0: 0 vs 5; token=effort
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([92344C536D4E00F4:F86FF34234002007]:0)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:109)
>[junit4]>  at 
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter.incrementToken(PortugueseLightStemFilter.java:48)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:68)
>[junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:441)
>[junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546)
>[junit4]>  at 
> org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:897)
>[junit4]>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>[junit4]>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>[junit4]>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>[junit4]>  at 
> 

[jira] [Commented] (LUCENE-8517) TestRandomChains.testRandomChainsWithLargeStrings failure

2018-11-19 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691426#comment-16691426
 ] 

Alan Woodward commented on LUCENE-8517:
---

Thanks for looking into this [~sokolov]!  FixedShingleFilter does indeed not 
deal properly with input graphs - yet.  Can you try applying the patch on 
LUCENE-8564 to see if that fixes this seed?

The extra debugging tools on ValidatingTokenFilter are separately useful in any 
case.  I left a couple of comments on the github PR.

> TestRandomChains.testRandomChainsWithLargeStrings failure
> -
>
> Key: LUCENE-8517
> URL: https://issues.apache.org/jira/browse/LUCENE-8517
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Steve Rowe
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> From 
> [https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/2828/consoleText], 
> reproduces for me on Java8:
> {noformat}
> Checking out Revision 216f10026b86627750e133fe24ce6a750c470695 
> (refs/remotes/origin/branch_7x)
> [...]
> [java-info] java version "10.0.1"
> [java-info] OpenJDK Runtime Environment (10.0.1+10, Oracle Corporation)
> [java-info] OpenJDK 64-Bit Server VM (10.0.1+10, Oracle Corporation)
> [java-info] Test args: [-XX:-UseCompressedOops -XX:+UseConcMarkSweepGC]
> [...]
>[junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
>[junit4]   2> Exception from random analyzer: 
>[junit4]   2> charfilters=
>[junit4]   2>   
> org.apache.lucene.analysis.charfilter.MappingCharFilter(org.apache.lucene.analysis.charfilter.NormalizeCharMap@3ef95503,
>  java.io.StringReader@70dde633)
>[junit4]   2>   
> org.apache.lucene.analysis.fa.PersianCharFilter(org.apache.lucene.analysis.charfilter.MappingCharFilter@12423b20)
>[junit4]   2> tokenizer=
>[junit4]   2>   org.apache.lucene.analysis.th.ThaiTokenizer()
>[junit4]   2> filters=
>[junit4]   2>   
> org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter(ValidatingTokenFilter@7914bba7
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  org.apache.lucene.analysis.compound.hyphenation.HyphenationTree@abd7bca)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.MockGraphTokenFilter(java.util.Random@56348091,
>  OneTimeWrapper@aa1c073 
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.shingle.FixedShingleFilter(OneTimeWrapper@4cf58fce
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  4, , )
>[junit4]   2>   
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter(ValidatingTokenFilter@3a915324
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false)
>[junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRandomChains 
> -Dtests.method=testRandomChainsWithLargeStrings -Dtests.seed=92344C536D4E00F4 
> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=en-ZW 
> -Dtests.timezone=Atlantic/Faroe -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>[junit4] ERROR   0.46s J2 | 
> TestRandomChains.testRandomChainsWithLargeStrings <<<
>[junit4]> Throwable #1: java.lang.IllegalStateException: stage 3: 
> inconsistent startOffset at pos=0: 0 vs 5; token=effort
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([92344C536D4E00F4:F86FF34234002007]:0)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:109)
>[junit4]>  at 
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter.incrementToken(PortugueseLightStemFilter.java:48)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:68)
>[junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:441)
>[junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546)
>[junit4]>  at 
> org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:897)
>[junit4]>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>[junit4]>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>[junit4]>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>[junit4]>  at 
> 

[jira] [Commented] (LUCENE-8517) TestRandomChains.testRandomChainsWithLargeStrings failure

2018-11-17 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690562#comment-16690562
 ] 

Mike Sokolov commented on LUCENE-8517:
--

With a bigger heap, I was able to get that one to run consistently. It looks 
like the same root cause - conditional {{FixedShingleFilter}}, and the token 
dump has a similar flavor too (some tokens removed, causing a token overlap 
with different offsets). I opened a PR 
[https://github.com/apache/lucene-solr/pull/500] that adds 
{{FixedShingleFilter}} to the list of filters not to conditionalize, and stores 
generated tokens in {{ValidatingTokenFilter}}, dumping them when there is a 
validation failure. I don't have all the background on how that was forked from 
{{ShingleFilter}} - is there any reason to think that it handles token-graphs 
in some reasonable way?

 

> TestRandomChains.testRandomChainsWithLargeStrings failure
> -
>
> Key: LUCENE-8517
> URL: https://issues.apache.org/jira/browse/LUCENE-8517
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Steve Rowe
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> From 
> [https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/2828/consoleText], 
> reproduces for me on Java8:
> {noformat}
> Checking out Revision 216f10026b86627750e133fe24ce6a750c470695 
> (refs/remotes/origin/branch_7x)
> [...]
> [java-info] java version "10.0.1"
> [java-info] OpenJDK Runtime Environment (10.0.1+10, Oracle Corporation)
> [java-info] OpenJDK 64-Bit Server VM (10.0.1+10, Oracle Corporation)
> [java-info] Test args: [-XX:-UseCompressedOops -XX:+UseConcMarkSweepGC]
> [...]
>[junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
>[junit4]   2> Exception from random analyzer: 
>[junit4]   2> charfilters=
>[junit4]   2>   
> org.apache.lucene.analysis.charfilter.MappingCharFilter(org.apache.lucene.analysis.charfilter.NormalizeCharMap@3ef95503,
>  java.io.StringReader@70dde633)
>[junit4]   2>   
> org.apache.lucene.analysis.fa.PersianCharFilter(org.apache.lucene.analysis.charfilter.MappingCharFilter@12423b20)
>[junit4]   2> tokenizer=
>[junit4]   2>   org.apache.lucene.analysis.th.ThaiTokenizer()
>[junit4]   2> filters=
>[junit4]   2>   
> org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter(ValidatingTokenFilter@7914bba7
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  org.apache.lucene.analysis.compound.hyphenation.HyphenationTree@abd7bca)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.MockGraphTokenFilter(java.util.Random@56348091,
>  OneTimeWrapper@aa1c073 
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.shingle.FixedShingleFilter(OneTimeWrapper@4cf58fce
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  4, , )
>[junit4]   2>   
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter(ValidatingTokenFilter@3a915324
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false)
>[junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRandomChains 
> -Dtests.method=testRandomChainsWithLargeStrings -Dtests.seed=92344C536D4E00F4 
> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=en-ZW 
> -Dtests.timezone=Atlantic/Faroe -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>[junit4] ERROR   0.46s J2 | 
> TestRandomChains.testRandomChainsWithLargeStrings <<<
>[junit4]> Throwable #1: java.lang.IllegalStateException: stage 3: 
> inconsistent startOffset at pos=0: 0 vs 5; token=effort
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([92344C536D4E00F4:F86FF34234002007]:0)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:109)
>[junit4]>  at 
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter.incrementToken(PortugueseLightStemFilter.java:48)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:68)
>[junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:441)
>[junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546)
>[junit4]>  at 
> org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:897)
>[junit4]>  at 
> 

[jira] [Commented] (LUCENE-8517) TestRandomChains.testRandomChainsWithLargeStrings failure

2018-11-16 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690145#comment-16690145
 ] 

Mike Sokolov commented on LUCENE-8517:
--

{quote}Another reproducing seed, though it only fails for me if I run the whole 
suite, i.e. remove {{-Dtests.method=testRandomChainsWithLargeStrings}} from the 
cmdline
{quote}
failed for me a few times with OutOfMemoryError when I run from the command 
line, including {{-Dtests.method=...}}.  But once I did see it fail with the 
exception above, again running only that single method

> TestRandomChains.testRandomChainsWithLargeStrings failure
> -
>
> Key: LUCENE-8517
> URL: https://issues.apache.org/jira/browse/LUCENE-8517
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Steve Rowe
>Priority: Major
>
> From 
> [https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/2828/consoleText], 
> reproduces for me on Java8:
> {noformat}
> Checking out Revision 216f10026b86627750e133fe24ce6a750c470695 
> (refs/remotes/origin/branch_7x)
> [...]
> [java-info] java version "10.0.1"
> [java-info] OpenJDK Runtime Environment (10.0.1+10, Oracle Corporation)
> [java-info] OpenJDK 64-Bit Server VM (10.0.1+10, Oracle Corporation)
> [java-info] Test args: [-XX:-UseCompressedOops -XX:+UseConcMarkSweepGC]
> [...]
>[junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
>[junit4]   2> Exception from random analyzer: 
>[junit4]   2> charfilters=
>[junit4]   2>   
> org.apache.lucene.analysis.charfilter.MappingCharFilter(org.apache.lucene.analysis.charfilter.NormalizeCharMap@3ef95503,
>  java.io.StringReader@70dde633)
>[junit4]   2>   
> org.apache.lucene.analysis.fa.PersianCharFilter(org.apache.lucene.analysis.charfilter.MappingCharFilter@12423b20)
>[junit4]   2> tokenizer=
>[junit4]   2>   org.apache.lucene.analysis.th.ThaiTokenizer()
>[junit4]   2> filters=
>[junit4]   2>   
> org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter(ValidatingTokenFilter@7914bba7
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  org.apache.lucene.analysis.compound.hyphenation.HyphenationTree@abd7bca)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.MockGraphTokenFilter(java.util.Random@56348091,
>  OneTimeWrapper@aa1c073 
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.shingle.FixedShingleFilter(OneTimeWrapper@4cf58fce
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  4, , )
>[junit4]   2>   
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter(ValidatingTokenFilter@3a915324
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false)
>[junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRandomChains 
> -Dtests.method=testRandomChainsWithLargeStrings -Dtests.seed=92344C536D4E00F4 
> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=en-ZW 
> -Dtests.timezone=Atlantic/Faroe -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>[junit4] ERROR   0.46s J2 | 
> TestRandomChains.testRandomChainsWithLargeStrings <<<
>[junit4]> Throwable #1: java.lang.IllegalStateException: stage 3: 
> inconsistent startOffset at pos=0: 0 vs 5; token=effort
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([92344C536D4E00F4:F86FF34234002007]:0)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:109)
>[junit4]>  at 
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter.incrementToken(PortugueseLightStemFilter.java:48)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:68)
>[junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:441)
>[junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546)
>[junit4]>  at 
> org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:897)
>[junit4]>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>[junit4]>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>[junit4]>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>[junit4]>  at 
> 

[jira] [Commented] (LUCENE-8517) TestRandomChains.testRandomChainsWithLargeStrings failure

2018-11-16 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690125#comment-16690125
 ] 

Mike Sokolov commented on LUCENE-8517:
--

Well I can at least add a bit more color. I added some debugging support to 
ValidatingTokenFilter to capture the tokens seen so they can be dumped out 
after. This is the complete history at the moment the exception was thrown, 
showing for each stage, which tokens it emitted so far. The numbers in square 
brackets are offsets, and the number following "+" is position increment. The 
issue here is in stage 3 (the conditional ShingleFilter), the overlapping 
tokens have different offsets. The ShingleFilter has shingle size=4 in this 
case.

stage 0: best<[0-4] +1> effort<[5-11] +1>
 stage 1: best<[0-4] +1> effort<[5-11] +1> ef<[5-11] +0> effort<[5-11] +0>
 stage 2: best<[0-4] +1> effort<[5-11] +1> ef<[5-11] +0> effort<[5-11] +0>
 stage 3: best<[0-4] +1> effort<[5-11] +0>

Anyway, the problem seems to arise in a conditional FixedShingleFilter that 
follows a conditional MockGraphTokenFilter. I see a comment in TestRandomChains 
to the effect that we don't test conditionals with ShingleFilter due to it not 
handling input graphs properly (LUCENE-4170). I wonder if the same logic ought 
to be applied to FIxedShingleFilter.

Also, I could post a patch with the debug output if it seems useful. It was at 
least somewhat helpful in getting a picture of this for me.

> TestRandomChains.testRandomChainsWithLargeStrings failure
> -
>
> Key: LUCENE-8517
> URL: https://issues.apache.org/jira/browse/LUCENE-8517
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Steve Rowe
>Priority: Major
>
> From 
> [https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/2828/consoleText], 
> reproduces for me on Java8:
> {noformat}
> Checking out Revision 216f10026b86627750e133fe24ce6a750c470695 
> (refs/remotes/origin/branch_7x)
> [...]
> [java-info] java version "10.0.1"
> [java-info] OpenJDK Runtime Environment (10.0.1+10, Oracle Corporation)
> [java-info] OpenJDK 64-Bit Server VM (10.0.1+10, Oracle Corporation)
> [java-info] Test args: [-XX:-UseCompressedOops -XX:+UseConcMarkSweepGC]
> [...]
>[junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
>[junit4]   2> Exception from random analyzer: 
>[junit4]   2> charfilters=
>[junit4]   2>   
> org.apache.lucene.analysis.charfilter.MappingCharFilter(org.apache.lucene.analysis.charfilter.NormalizeCharMap@3ef95503,
>  java.io.StringReader@70dde633)
>[junit4]   2>   
> org.apache.lucene.analysis.fa.PersianCharFilter(org.apache.lucene.analysis.charfilter.MappingCharFilter@12423b20)
>[junit4]   2> tokenizer=
>[junit4]   2>   org.apache.lucene.analysis.th.ThaiTokenizer()
>[junit4]   2> filters=
>[junit4]   2>   
> org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter(ValidatingTokenFilter@7914bba7
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  org.apache.lucene.analysis.compound.hyphenation.HyphenationTree@abd7bca)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.MockGraphTokenFilter(java.util.Random@56348091,
>  OneTimeWrapper@aa1c073 
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.shingle.FixedShingleFilter(OneTimeWrapper@4cf58fce
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  4, , )
>[junit4]   2>   
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter(ValidatingTokenFilter@3a915324
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false)
>[junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRandomChains 
> -Dtests.method=testRandomChainsWithLargeStrings -Dtests.seed=92344C536D4E00F4 
> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=en-ZW 
> -Dtests.timezone=Atlantic/Faroe -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>[junit4] ERROR   0.46s J2 | 
> TestRandomChains.testRandomChainsWithLargeStrings <<<
>[junit4]> Throwable #1: java.lang.IllegalStateException: stage 3: 
> inconsistent startOffset at pos=0: 0 vs 5; token=effort
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([92344C536D4E00F4:F86FF34234002007]:0)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:109)
>[junit4]>  at 
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter.incrementToken(PortugueseLightStemFilter.java:48)
>[junit4]>  at 
> 

[jira] [Commented] (LUCENE-8517) TestRandomChains.testRandomChainsWithLargeStrings failure

2018-11-16 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689623#comment-16689623
 ] 

Steve Rowe commented on LUCENE-8517:


Another reproducing seed, though it only fails for me if I run the whole suite, 
i.e. remove {{-Dtests.method=testRandomChainsWithLargeStrings}} from the 
cmdline - maybe this test method is affected by other methods somehow? From 
[https://builds.apache.org/job/Lucene-Solr-NightlyTests-7.x/377]:

{noformat}
   [junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
   [junit4]   2> Exception from random analyzer: 
   [junit4]   2> charfilters=
   [junit4]   2> tokenizer=
   [junit4]   2>   
org.apache.lucene.analysis.MockTokenizer(org.apache.lucene.util.AttributeFactory$1@9c912349,
 initial state: 0
   [junit4]   2> state 0 [reject]:
   [junit4]   2>  a -> 1
   [junit4]   2>  b -> 2
   [junit4]   2>  f -> 3
   [junit4]   2>  i -> 4
   [junit4]   2>  n -> 5
   [junit4]   2>  o -> 6
   [junit4]   2>  s -> 7
   [junit4]   2>  t -> 8
   [junit4]   2>  w -> 9
   [junit4]   2> state 1 [accept]:
   [junit4]   2>  n -> 10
   [junit4]   2>  r -> 11
   [junit4]   2>  s -> 12
   [junit4]   2>  t -> 13
   [junit4]   2> state 2 [reject]:
   [junit4]   2>  e -> 14
   [junit4]   2>  u -> 15
   [junit4]   2>  y -> 16
   [junit4]   2> state 3 [reject]:
   [junit4]   2>  o -> 17
   [junit4]   2> state 4 [reject]:
   [junit4]   2>  f -> 18
   [junit4]   2>  n -> 19
   [junit4]   2>  s -> 20
   [junit4]   2>  t -> 21
   [junit4]   2> state 5 [reject]:
   [junit4]   2>  o -> 22
   [junit4]   2> state 6 [reject]:
   [junit4]   2>  f -> 23
   [junit4]   2>  n -> 24
   [junit4]   2>  r -> 25
   [junit4]   2> state 7 [reject]:
   [junit4]   2>  u -> 26
   [junit4]   2> state 8 [reject]:
   [junit4]   2>  h -> 27
   [junit4]   2>  o -> 28
   [junit4]   2> state 9 [reject]:
   [junit4]   2>  a -> 29
   [junit4]   2>  i -> 30
   [junit4]   2> state 10 [accept]:
   [junit4]   2>  d -> 31
   [junit4]   2> state 11 [reject]:
   [junit4]   2>  e -> 32
   [junit4]   2> state 12 [accept]:
   [junit4]   2> state 13 [accept]:
   [junit4]   2> state 14 [accept]:
   [junit4]   2> state 15 [reject]:
   [junit4]   2>  t -> 33
   [junit4]   2> state 16 [accept]:
   [junit4]   2> state 17 [reject]:
   [junit4]   2>  r -> 34
   [junit4]   2> state 18 [accept]:
   [junit4]   2> state 19 [accept]:
   [junit4]   2>  t -> 35
   [junit4]   2> state 20 [accept]:
   [junit4]   2> state 21 [accept]:
   [junit4]   2> state 22 [accept]:
   [junit4]   2>  t -> 36
   [junit4]   2> state 23 [accept]:
   [junit4]   2> state 24 [accept]:
   [junit4]   2> state 25 [accept]:
   [junit4]   2> state 26 [reject]:
   [junit4]   2>  c -> 37
   [junit4]   2> state 27 [reject]:
   [junit4]   2>  a -> 38
   [junit4]   2>  e -> 39
   [junit4]   2>  i -> 40
   [junit4]   2> state 28 [accept]:
   [junit4]   2> state 29 [reject]:
   [junit4]   2>  s -> 41
   [junit4]   2> state 30 [reject]:
   [junit4]   2>  l -> 42
   [junit4]   2>  t -> 43
   [junit4]   2> state 31 [accept]:
   [junit4]   2> state 32 [accept]:
   [junit4]   2> state 33 [accept]:
   [junit4]   2> state 34 [accept]:
   [junit4]   2> state 35 [reject]:
   [junit4]   2>  o -> 44
   [junit4]   2> state 36 [accept]:
   [junit4]   2> state 37 [reject]:
   [junit4]   2>  h -> 45
   [junit4]   2> state 38 [reject]:
   [junit4]   2>  t -> 46
   [junit4]   2> state 39 [accept]:
   [junit4]   2>  i -> 47
   [junit4]   2>  n -> 48
   [junit4]   2>  r -> 49
   [junit4]   2>  s -> 50
   [junit4]   2>  y -> 51
   [junit4]   2> state 40 [reject]:
   [junit4]   2>  s -> 52
   [junit4]   2> state 41 [accept]:
   [junit4]   2> state 42 [reject]:
   [junit4]   2>  l -> 53
   [junit4]   2> state 43 [reject]:
   [junit4]   2>  h -> 54
   [junit4]   2> state 44 [accept]:
   [junit4]   2> state 45 [accept]:
   [junit4]   2> state 46 [accept]:
   [junit4]   2> state 47 [reject]:
   [junit4]   2>  r -> 55
   [junit4]   2> state 48 [accept]:
   [junit4]   2> state 49 [reject]:
   [junit4]   2>  e -> 56
   [junit4]   2> state 50 [reject]:
   [junit4]   2>  e -> 57
   [junit4]   2> state 51 [accept]:
   [junit4]   2> state 52 [accept]:
   [junit4]   2> state 53 [accept]:
   [junit4]   2> state 54 [accept]:
   [junit4]   2> state 55 [accept]:
   [junit4]   2> state 56 [accept]:
   [junit4]   2> state 57 [accept]:
   [junit4]   2> , true)
   [junit4]   2> filters=
   [junit4]   2>   
org.apache.lucene.analysis.shingle.ShingleFilter(ValidatingTokenFilter@13de14e 
term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1)
   [junit4]   2>   
Conditional:org.apache.lucene.analysis.shingle.FixedShingleFilter(OneTimeWrapper@2c0047b9
 
term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
 2)
   [junit4]   2>   
org.apache.lucene.analysis.miscellaneous.DateRecognizerFilter(ValidatingTokenFilter@7846ee89
 

[jira] [Commented] (LUCENE-8517) TestRandomChains.testRandomChainsWithLargeStrings failure

2018-11-16 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689602#comment-16689602
 ] 

Steve Rowe commented on LUCENE-8517:


Thanks [~sokolov], have at it!

> TestRandomChains.testRandomChainsWithLargeStrings failure
> -
>
> Key: LUCENE-8517
> URL: https://issues.apache.org/jira/browse/LUCENE-8517
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Steve Rowe
>Priority: Major
>
> From 
> [https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/2828/consoleText], 
> reproduces for me on Java8:
> {noformat}
> Checking out Revision 216f10026b86627750e133fe24ce6a750c470695 
> (refs/remotes/origin/branch_7x)
> [...]
> [java-info] java version "10.0.1"
> [java-info] OpenJDK Runtime Environment (10.0.1+10, Oracle Corporation)
> [java-info] OpenJDK 64-Bit Server VM (10.0.1+10, Oracle Corporation)
> [java-info] Test args: [-XX:-UseCompressedOops -XX:+UseConcMarkSweepGC]
> [...]
>[junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
>[junit4]   2> Exception from random analyzer: 
>[junit4]   2> charfilters=
>[junit4]   2>   
> org.apache.lucene.analysis.charfilter.MappingCharFilter(org.apache.lucene.analysis.charfilter.NormalizeCharMap@3ef95503,
>  java.io.StringReader@70dde633)
>[junit4]   2>   
> org.apache.lucene.analysis.fa.PersianCharFilter(org.apache.lucene.analysis.charfilter.MappingCharFilter@12423b20)
>[junit4]   2> tokenizer=
>[junit4]   2>   org.apache.lucene.analysis.th.ThaiTokenizer()
>[junit4]   2> filters=
>[junit4]   2>   
> org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter(ValidatingTokenFilter@7914bba7
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  org.apache.lucene.analysis.compound.hyphenation.HyphenationTree@abd7bca)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.MockGraphTokenFilter(java.util.Random@56348091,
>  OneTimeWrapper@aa1c073 
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.shingle.FixedShingleFilter(OneTimeWrapper@4cf58fce
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  4, , )
>[junit4]   2>   
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter(ValidatingTokenFilter@3a915324
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false)
>[junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRandomChains 
> -Dtests.method=testRandomChainsWithLargeStrings -Dtests.seed=92344C536D4E00F4 
> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=en-ZW 
> -Dtests.timezone=Atlantic/Faroe -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>[junit4] ERROR   0.46s J2 | 
> TestRandomChains.testRandomChainsWithLargeStrings <<<
>[junit4]> Throwable #1: java.lang.IllegalStateException: stage 3: 
> inconsistent startOffset at pos=0: 0 vs 5; token=effort
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([92344C536D4E00F4:F86FF34234002007]:0)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:109)
>[junit4]>  at 
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter.incrementToken(PortugueseLightStemFilter.java:48)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:68)
>[junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:441)
>[junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546)
>[junit4]>  at 
> org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:897)
>[junit4]>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>[junit4]>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>[junit4]>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>[junit4]>  at 
> java.base/java.lang.reflect.Method.invoke(Method.java:564)
>[junit4]>  at java.base/java.lang.Thread.run(Thread.java:844)
>[junit4]   2> NOTE: test params are: codec=Asserting(Lucene70): 
> {dummy=TestBloomFilteredLucenePostings(BloomFilteringPostingsFormat(Lucene50(blocksize=128)))},
>  docValues:{}, maxPointsInLeafNode=214, 

[jira] [Commented] (LUCENE-8517) TestRandomChains.testRandomChainsWithLargeStrings failure

2018-11-16 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689599#comment-16689599
 ] 

Mike Sokolov commented on LUCENE-8517:
--

[~steve_rowe] I can take a look at this one

> TestRandomChains.testRandomChainsWithLargeStrings failure
> -
>
> Key: LUCENE-8517
> URL: https://issues.apache.org/jira/browse/LUCENE-8517
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Steve Rowe
>Priority: Major
>
> From 
> [https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/2828/consoleText], 
> reproduces for me on Java8:
> {noformat}
> Checking out Revision 216f10026b86627750e133fe24ce6a750c470695 
> (refs/remotes/origin/branch_7x)
> [...]
> [java-info] java version "10.0.1"
> [java-info] OpenJDK Runtime Environment (10.0.1+10, Oracle Corporation)
> [java-info] OpenJDK 64-Bit Server VM (10.0.1+10, Oracle Corporation)
> [java-info] Test args: [-XX:-UseCompressedOops -XX:+UseConcMarkSweepGC]
> [...]
>[junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
>[junit4]   2> Exception from random analyzer: 
>[junit4]   2> charfilters=
>[junit4]   2>   
> org.apache.lucene.analysis.charfilter.MappingCharFilter(org.apache.lucene.analysis.charfilter.NormalizeCharMap@3ef95503,
>  java.io.StringReader@70dde633)
>[junit4]   2>   
> org.apache.lucene.analysis.fa.PersianCharFilter(org.apache.lucene.analysis.charfilter.MappingCharFilter@12423b20)
>[junit4]   2> tokenizer=
>[junit4]   2>   org.apache.lucene.analysis.th.ThaiTokenizer()
>[junit4]   2> filters=
>[junit4]   2>   
> org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter(ValidatingTokenFilter@7914bba7
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  org.apache.lucene.analysis.compound.hyphenation.HyphenationTree@abd7bca)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.MockGraphTokenFilter(java.util.Random@56348091,
>  OneTimeWrapper@aa1c073 
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1)
>[junit4]   2>   
> Conditional:org.apache.lucene.analysis.shingle.FixedShingleFilter(OneTimeWrapper@4cf58fce
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  4, , )
>[junit4]   2>   
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter(ValidatingTokenFilter@3a915324
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false)
>[junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRandomChains 
> -Dtests.method=testRandomChainsWithLargeStrings -Dtests.seed=92344C536D4E00F4 
> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=en-ZW 
> -Dtests.timezone=Atlantic/Faroe -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>[junit4] ERROR   0.46s J2 | 
> TestRandomChains.testRandomChainsWithLargeStrings <<<
>[junit4]> Throwable #1: java.lang.IllegalStateException: stage 3: 
> inconsistent startOffset at pos=0: 0 vs 5; token=effort
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([92344C536D4E00F4:F86FF34234002007]:0)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:109)
>[junit4]>  at 
> org.apache.lucene.analysis.pt.PortugueseLightStemFilter.incrementToken(PortugueseLightStemFilter.java:48)
>[junit4]>  at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:68)
>[junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:441)
>[junit4]>  at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546)
>[junit4]>  at 
> org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:897)
>[junit4]>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>[junit4]>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>[junit4]>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>[junit4]>  at 
> java.base/java.lang.reflect.Method.invoke(Method.java:564)
>[junit4]>  at java.base/java.lang.Thread.run(Thread.java:844)
>[junit4]   2> NOTE: test params are: codec=Asserting(Lucene70): 
> {dummy=TestBloomFilteredLucenePostings(BloomFilteringPostingsFormat(Lucene50(blocksize=128)))},
>  docValues:{}, maxPointsInLeafNode=214,