Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]
singh264 commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1865016211 > If any of you manages to reproduce, I'm interested in the command that you used I was able to reproduce the failure in `TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom` in `branch_9x` on a x86_64 Linux machine: ``` $JAVA_HOME/bin/java -version java version "17.0.9" 2023-10-17 LTS Java(TM) SE Runtime Environment (build 17.0.9+11-LTS-201) Java HotSpot(TM) 64-Bit Server VM (build 17.0.9+11-LTS-201, mixed mode, sharing) $RUNTIME_JAVA_HOME/bin/java -version openjdk version "17.0.9-internal" 2023-10-17 OpenJDK Runtime Environment (build 17.0.9-internal+0-adhoc.root.openj9-openjdk-jdk17) Eclipse OpenJ9 VM (build openj9-0.41.0, JRE 17 Linux amd64-64-Bit Compressed References 20231124_00 (JIT enabled, AOT enabled) OpenJ9 - 461bf3c70 OMR - 5eee6ad9d JCL - 3699725139c based on jdk-17.0.9+9) ``` ``` ./gradlew -p lucene/core -Dtests.seed=F7B4CD7A5624D5EC test --tests TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom -Dtests.jvmargs="-XX:+UseCompressedOops" -Ptests.iters=1000 ... > Task :altJvmWarning NOTE: Alternative java toolchain will be used for compilation and tests: Project will use 17 (Eclipse OpenJ9 JDK 17.0.9-internal+0-adhoc.root.openj9-openjdk-jdk17, home at: /root/openj9_issues_18400/openj9-openjdk-jdk17/build/linux-x86_64-server-release/images/jdk) Gradle runs with 17 (Oracle JDK 17.0.9+11-LTS-201, home at: /root/openj9_issues_18400/jdk-17.0.9) ... > Task :lucene:core:test WARNING: A command line option has enabled the Security Manager WARNING: The Security Manager is deprecated and will be removed in a future release WARNING: A terminally deprecated method in java.lang.System has been called WARNING: System::setSecurityManager has been called by java.lang.System WARNING: Please consider reporting this to the maintainers of java.lang.System WARNING: System::setSecurityManager will be removed in a future release org.apache.lucene.index.TestIndexWriterThreadsToSegments > testSegmentCountOnFlushRandom {seed=[F7B4CD7A5624D5EC:59A28958CC8D8396]} FAILED com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=126, name=Thread-97, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments] Caused by: java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0) at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239) Caused by: java.util.concurrent.BrokenBarrierException at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252) at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364) at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236) com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=127, name=Thread-98, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments] Caused by: java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0) at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239) Caused by: java.util.concurrent.BrokenBarrierException at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252) at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364) at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236) com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=130, name=Thread-101, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments] Caused by: java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0) at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239) Caused by: java.util.concurrent.BrokenBarrierException at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252) at
Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]
uschindler commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1865001473 I made test fail on my AMD Ryzen 3700 (the Policeman Jenkins Sever): ```sh $ ./gradlew -p lucene/core -Dtests.seed=F7B4CD7A5624D5EC beast --tests TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom -Dtests.jvmargs="-XX:+UseCompressedOops" -Ptests.iters=1000 -Ptests.dups=100 ``` It failed on the 3rd beasting: ``` > Task :lucene:core:test_1 WARNING: A command line option has enabled the Security Manager WARNING: The Security Manager is deprecated and will be removed in a future release :lucene:core:test_1 (SUCCESS): 1000 test(s) > Task :lucene:core:test_10 WARNING: A command line option has enabled the Security Manager WARNING: The Security Manager is deprecated and will be removed in a future release :lucene:core:test_10 (SUCCESS): 1000 test(s) > Task :lucene:core:test_100 WARNING: A command line option has enabled the Security Manager WARNING: The Security Manager is deprecated and will be removed in a future release org.apache.lucene.index.TestIndexWriterThreadsToSegments > testSegmentCountOnFlushRandom {seed=[F7B4CD7A5624D5EC:A0DDCC17DE66DB34]} FAILED java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0) at org.junit.Assert.fail(Assert.java:87) at org.junit.Assert.assertTrue(Assert.java:42) at org.junit.Assert.assertTrue(Assert.java:53) at org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150) at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222) at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364) at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236) com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1006, name=Thread-981, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments] Caused by: java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0) at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239) Caused by: java.util.concurrent.BrokenBarrierException at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252) at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364) at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236) com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1004, name=Thread-979, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments] Caused by: java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0) at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239) Caused by: java.util.concurrent.BrokenBarrierException at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252) at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364) at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236) com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1002, name=Thread-977, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments] Caused by: java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0) at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239) Caused by: java.util.concurrent.BrokenBarrierException at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252) at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364) at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236) com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=999, name=Thread-974, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]
uschindler commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1864923856 Thanks. Maybe the OpenJ9 people can help how they reproduced. You can use `gradlew beast` instead of `gradlew test` to run the forked copies inside gradle. More looing into that code; I think the whole code here (and IndexWriter in general) should be freed of ancient "synchronized" blocks and synchronized blocks and should instead use the more modern Java synchronization patterns inclusive volatile / opaque reads and barriers. The problematic concurrency could for sure be solved with some better algorithm that allows to read lockfree and only write with locks (e.g., ReadWriteLock instead of synchronized). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]
jpountz commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1864917501 Thanks. I'm trying to reproduce failures locally with the following command, without luck so far with JDK 17 and JDK 21. I'll dig more tomorrow. If any of you manages to reproduce, I'm interested in the command that you used. ``` $ ./gradlew -p lucene/core -Dtests.seed=F7B4CD7A5624D5EC test --tests TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom -Dtests.jvmargs="-XX:+UseCompressedOops" -Ptests.iters=1000 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]
uschindler commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1864883276 See this comment: https://github.com/eclipse-openj9/openj9/issues/18400#issuecomment-1834577142 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]
jpountz commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1864882629 Thanks I had missed that, I'll look more into it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]
uschindler commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1864881002 They made statistics in the linked issue. Hotspot also fails. So they rejected it as openj9 issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]
uschindler commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1864879720 This is a real bug and not one of openj9. You can reproduce this bug with enough tries on hotspot, too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]
jpountz commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1864843912 Does someone understand if adding synchronization is fixing a real bug of it it just helps hide a J9 bug? This method is subject to contention and #12199 was about avoiding locking on this method, which proved to help significantly when indexing cheap documents on several threads. I'm looking at the test and the code and can't see what sort of race condition may cause this test to fail. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove remaining sources of contention on indexing. [lucene]
jpountz closed pull request #12205: Remove remaining sources of contention on indexing. URL: https://github.com/apache/lucene/pull/12205 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove remaining sources of contention on indexing. [lucene]
jpountz commented on PR #12205: URL: https://github.com/apache/lucene/pull/12205#issuecomment-1864768401 This test failure is sneaky, I extracted some bits from this PR into #12958. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]
jpountz commented on PR #12954: URL: https://github.com/apache/lucene/pull/12954#issuecomment-1864765767 > so maybe we can consider an other approach: try to avoid the for-loop in reset() if the instance can be reused +1 this sounds like a good idea! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]
easyice commented on PR #12954: URL: https://github.com/apache/lucene/pull/12954#issuecomment-1864749851 I took several hours to confirm the results, the benchmark shows it became faster, this exceeded my expectation, we think the speedup is due to remove the loop that initializes the `freqBuffer` to 1 in `reset()` like below: ``` if (indexHasFreq == false || needsFreq == false) { for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { freqBuffer[i] = 1; } } ``` Since if we always allocate the 128-size `freqBuffer` for this PR, the benchmark shows it still has a speedup. therefore, performance improvement has no relevance to reducing memory allocation. so maybe we can consider the other approach: try to avoid the for-loop in `reset()` if the instance can be reused. thanks for the suggestions from @gf2121 when i investigating the cause of the performance speedup. Benchmark output for the PR(using `wikimediumall`): ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value HighSloppyPhrase0.38 (5.3%)0.37 (4.6%) -1.6% ( -10% -8%) 0.323 MedTerm 226.03 (4.8%) 223.08 (5.2%) -1.3% ( -10% -9%) 0.409 HighIntervalsOrdered2.19 (6.4%)2.17 (6.5%) -0.9% ( -12% - 12%) 0.676 MedSloppyPhrase 18.52 (2.7%) 18.39 (2.5%) -0.7% ( -5% -4%) 0.402 Fuzzy2 36.10 (1.8%) 35.86 (1.6%) -0.7% ( -3% -2%) 0.219 Fuzzy1 43.40 (1.6%) 43.16 (1.7%) -0.6% ( -3% -2%) 0.276 Respell 21.69 (1.8%) 21.58 (1.8%) -0.5% ( -4% -3%) 0.375 LowTerm 232.03 (3.0%) 231.08 (2.9%) -0.4% ( -6% -5%) 0.659 LowSloppyPhrase 18.26 (2.0%) 18.20 (2.1%) -0.3% ( -4% -3%) 0.660 HighTerm 267.11 (5.2%) 266.50 (5.6%) -0.2% ( -10% - 11%) 0.893 HighSpanNear1.85 (5.7%)1.84 (6.7%) -0.2% ( -11% - 12%) 0.935 OrHighNotLow 167.52 (5.7%) 167.26 (5.6%) -0.2% ( -10% - 11%) 0.931 HighTermTitleBDVSort1.90 (3.7%)1.90 (4.5%) -0.1% ( -7% -8%) 0.915 MedIntervalsOrdered7.07 (3.4%)7.06 (3.8%) -0.1% ( -7% -7%) 0.910 MedSpanNear 24.97 (2.1%) 24.94 (2.7%) -0.1% ( -4% -4%) 0.874 HighPhrase 10.67 (6.0%) 10.66 (5.7%) -0.1% ( -11% - 12%) 0.950 LowPhrase4.70 (4.0%)4.70 (3.8%) -0.0% ( -7% -8%) 0.979 OrHighNotMed 130.98 (6.2%) 131.01 (6.1%)0.0% ( -11% - 13%) 0.989 OrNotHighHigh 171.61 (5.4%) 171.67 (5.3%)0.0% ( -10% - 11%) 0.984 LowIntervalsOrdered 28.65 (4.3%) 28.68 (4.3%)0.1% ( -8% -9%) 0.947 OrHighHigh 18.94 (2.9%) 19.00 (3.6%)0.3% ( -5% -6%) 0.766 OrHighNotHigh 125.97 (5.5%) 126.41 (6.0%)0.3% ( -10% - 12%) 0.848 OrNotHighMed 181.48 (4.0%) 182.38 (3.6%)0.5% ( -6% -8%) 0.679 LowSpanNear6.89 (2.5%)6.93 (3.2%)0.6% ( -5% -6%) 0.516 MedPhrase 110.79 (2.8%) 111.45 (2.9%)0.6% ( -5% -6%) 0.515 OrHighMed 38.51 (2.4%) 38.79 (2.1%)0.7% ( -3% -5%) 0.311 AndHighMed 40.73 (2.4%) 41.06 (2.5%)0.8% ( -4% -5%) 0.304 TermDTSort 74.72 (4.1%) 75.32 (2.6%)0.8% ( -5% -7%) 0.460 AndHighHigh 10.24 (5.6%) 10.33 (3.9%)0.8% ( -8% - 11%) 0.600 HighTermMonthSort 1071.18 (2.9%) 1079.84 (4.5%)0.8% ( -6% -8%) 0.499 AndHighLow 167.91 (5.2%) 170.10 (5.6%)1.3% ( -9% - 12%) 0.446 IntNRQ 13.84 (4.0%) 14.05 (3.3%)1.5% ( -5% -9%) 0.208 OrNotHighLow 241.06 (4.5%) 244.91 (4.8%)1.6% ( -7% - 11%) 0.276 OrHighLow 175.82
Re: [PR] Make FSTCompiler.compile() to only return the FSTMetadata [lucene]
dungba88 commented on code in PR #12831: URL: https://github.com/apache/lucene/pull/12831#discussion_r1421299753 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/charfilter/NormalizeCharMap.java: ## @@ -111,7 +111,7 @@ public NormalizeCharMap build() { for (Map.Entry ent : pendingPairs.entrySet()) { fstCompiler.add(Util.toUTF16(ent.getKey(), scratch), new CharsRef(ent.getValue())); } -map = fstCompiler.compile(); +map = FST.fromFSTReader(fstCompiler.compile(), fstCompiler.getFSTReader()); Review Comment: This `fromFSTReader` is there to avoid the boilerplate null-check that each consumer must now do. Open for method name suggestion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove remaining sources of contention on indexing. [lucene]
jpountz commented on PR #12205: URL: https://github.com/apache/lucene/pull/12205#issuecomment-1864651149 The above failure is a bit scary, I'll try to split this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Replace usage of deprecated size() with length() in ByteBuffersDataInput [lucene]
dungba88 commented on PR #12948: URL: https://github.com/apache/lucene/pull/12948#issuecomment-1864650439 I think we can remove the `size()` method in 10.0? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Reproducible test failure with Terms#intersect on the default codec [lucene]
jpountz commented on issue #12957: URL: https://github.com/apache/lucene/issues/12957#issuecomment-1864480721 I just pushed the change, thanks @mikemccand for putting me on the right track. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Reproducible test failure with Terms#intersect on the default codec [lucene]
jpountz closed issue #12957: Reproducible test failure with Terms#intersect on the default codec URL: https://github.com/apache/lucene/issues/12957 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Reproducible test failure with Terms#intersect on the default codec [lucene]
mikemccand commented on issue #12957: URL: https://github.com/apache/lucene/issues/12957#issuecomment-1864450920 OK the `DirectPostingsFormat` failure is also happy with this fix. +1 to merge. Thanks @jpountz! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Reproducible test failure with Terms#intersect on the default codec [lucene]
mikemccand commented on issue #12957: URL: https://github.com/apache/lucene/issues/12957#issuecomment-1864448812 > > Terms.intersect(Automaton a, BytesRef startTerm) requires that startTerm is accepted by the incoming automaton, yet the way CheckIndex is calling it can clearly violate that. > > I wondered about that, but the automaton is `Automata.makeAnyBinary()`, shouldn't it accept any term? Oh, you're right! I missed that `Automata.makeAnyBinary()` there! > Oh I see, I created binary automata, but the API implicitly treats automata as UTF32 automata, so you need to tell it explicitly that it's a binary automaton. And something like that should fix the problem? Oh, you are also right! Specifically `CompiledAutomaton` assumes it's UTF32 and needs conversion to UTF8, unless you pass `isBinar=true`. OK I like your fix! I'll confirm it fixes the `DirectPostingsFormat` failure too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Improve Javadoc for DocValuesConsumer [lucene]
jpountz merged PR #12952: URL: https://github.com/apache/lucene/pull/12952 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Reproducible test failure with Terms#intersect on the default codec [lucene]
jpountz commented on issue #12957: URL: https://github.com/apache/lucene/issues/12957#issuecomment-1864407118 Oh I see, I created binary automata, but the API implicitly treats automata as UTF32 automata, so you need to tell it explicitly that it's a binary automaton. And something like that should fix the problem? ```java diff --git a/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java b/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java index a555ce40001..f899b331b92 100644 --- a/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java +++ b/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java @@ -2318,7 +2318,7 @@ public final class CheckIndex implements Closeable { startTerm = new BytesRef(); checkTermsIntersect(terms, automaton, startTerm); -automaton = Automata.makeAnyBinary(); +automaton = Automata.makeNonEmptyBinary(); startTerm = new BytesRef(new byte[] {'l'}); checkTermsIntersect(terms, automaton, startTerm); @@ -2369,8 +2369,8 @@ public final class CheckIndex implements Closeable { throws IOException { TermsEnum allTerms = terms.iterator(); automaton = Operations.determinize(automaton, Operations.DEFAULT_DETERMINIZE_WORK_LIMIT); -CompiledAutomaton compiledAutomaton = new CompiledAutomaton(automaton); -ByteRunAutomaton runAutomaton = new ByteRunAutomaton(automaton); +CompiledAutomaton compiledAutomaton = new CompiledAutomaton(automaton, false, true, true); +ByteRunAutomaton runAutomaton = new ByteRunAutomaton(automaton, true); TermsEnum filteredTerms = terms.intersect(compiledAutomaton, startTerm); BytesRef term; if (startTerm != null) { ``` (I had to change the automaton so that it's still considered of type "normal" and not "all") -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Reproducible test failure with Terms#intersect on the default codec [lucene]
jpountz commented on issue #12957: URL: https://github.com/apache/lucene/issues/12957#issuecomment-1864395031 > Terms.intersect(Automaton a, BytesRef startTerm) requires that startTerm is accepted by the incoming automaton, yet the way CheckIndex is calling it can clearly violate that. I wondered about that, but the automaton is `Automata.makeAnyBinary()`, shouldn't it accept any term? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Reproducible test failure with Terms#intersect on the default codec [lucene]
mikemccand commented on issue #12957: URL: https://github.com/apache/lucene/issues/12957#issuecomment-1864386831 I'll try to fix `CheckIndex` so that it only uses `startTerm` that is accepted by the automaton. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Reproducible test failure with Terms#intersect on the default codec [lucene]
mikemccand commented on issue #12957: URL: https://github.com/apache/lucene/issues/12957#issuecomment-1864385415 OK I think the issue here may be that `Terms.intersect(Automaton a, BytesRef startTerm)` requires that `startTerm` is accepted by the incoming automaton, yet the way `CheckIndex` is calling it can clearly violate that. And the codecs (default and Direct) clearly don't do a good job throwing a clear exception when that is violated :) In addition to the default Codec, `DirectPostingsFormat` is also angry, using this repro: ``` ./gradlew :lucene:core:test --tests "org.apache.lucene.index.TestTerms.testTermMinMaxRandom" -Ptests.jvms=4 -Ptests.jvmargs= -Ptests.seed=C8D1EBB5035DA9F -Ptests.multiplier=2 -Ptests.badapples=false -Ptests.gui=true -Ptests.file.encoding=US-ASCII -Ptests.vectorsize=128 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] An improved check for ignoring the c2-crash test if running on a client compiler. [lucene]
ChrisHegarty commented on PR #12953: URL: https://github.com/apache/lucene/pull/12953#issuecomment-1864374811 I'm way too slow here, sorry. Belated LGTM. And thanks for following up on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] Reproducible test failure with Terms#intersect on the default codec [lucene]
jpountz opened a new issue, #12957: URL: https://github.com/apache/lucene/issues/12957 ### Description The new CheckIndex checks are causing some test failures with the default codec, which are reproducible and look like real bugs? I started looking but I'm not familiar enough with BlockTree to understand what it's doing wrong. https://jenkins.thetaphi.de/job/Lucene-main-Linux/45856/consoleFull ``` org.apache.lucene.index.TestTerms > testTermMinMaxRandom FAILED java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([CBF65306049672F4:8785DC72680AA991]:0) at org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum.getState(IntersectTermsEnum.java:245) at org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum.seekToStartTerm(IntersectTermsEnum.java:288) at org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum.(IntersectTermsEnum.java:126) at org.apache.lucene.codecs.lucene90.blocktree.FieldReader.intersect(FieldReader.java:223) at org.apache.lucene.index.CheckIndex.checkTermsIntersect(CheckIndex.java:2374) at org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:2327) at org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:2529) at org.apache.lucene.index.CheckIndex.testSegment(CheckIndex.java:1067) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:783) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:550) at org.apache.lucene.tests.util.TestUtil.checkIndex(TestUtil.java:340) at org.apache.lucene.tests.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:909) at org.apache.lucene.index.TestTerms.testTermMinMaxRandom(TestTerms.java:85) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:578) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996) at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48) at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902) at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) at