Jim Ferenczi created LUCENE-8676:
------------------------------------

             Summary: TestKoreanTokenizer#testRandomHugeStrings failure
                 Key: LUCENE-8676
                 URL: https://issues.apache.org/jira/browse/LUCENE-8676
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Jim Ferenczi


KoreanTokenizer#testRandomHugeString failed in CI with the following exception:

{noformat}
  [junit4]    > Throwable #1: java.lang.AssertionError
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([8C5E2BE10F581CB:90E6857D4E833D83]:0)
   [junit4]    >        at 
org.apache.lucene.analysis.ko.KoreanTokenizer.add(KoreanTokenizer.java:334)
   [junit4]    >        at 
org.apache.lucene.analysis.ko.KoreanTokenizer.parse(KoreanTokenizer.java:707)
   [junit4]    >        at 
org.apache.lucene.analysis.ko.KoreanTokenizer.incrementToken(KoreanTokenizer.java:377)
   [junit4]    >        at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:748)
   [junit4]    >        at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:659)
   [junit4]    >        at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:561)
   [junit4]    >        at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:474)
   [junit4]    >        at 
org.apache.lucene.analysis.ko.TestKoreanTokenizer.testRandomHugeStrings(TestKoreanTokenizer.java:313)
   [junit4]    >        at java.lang.Thread.run(Thread.java:748)
   [junit4]   2> NOTE: leaving temporary files
{noformat}

I am able to reproduce locally with:

{noformat}
ant test  -Dtestcase=TestKoreanTokenizer -Dtests.method=testRandomHugeStrings 
-Dtests.seed=8C5E2BE10F581CB -Dtests.multiplier=2 -Dtests.nightly=true 
-Dtests.slow=true 
-Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-7.7/test-data/enwiki.random.lines.txt
 -Dtests.locale=uk-UA -Dtests.timezone=Europe/Istanbul -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1
{noformat}

After some investigation I found out that the position of the buffer is not 
updated when the maximum backtrace size is reached (1024).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to