Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-20 Thread via GitHub


singh264 commented on issue #12916:
URL: https://github.com/apache/lucene/issues/12916#issuecomment-1865016211

   >  If any of you manages to reproduce, I'm interested in the command that 
you used
   
   I was able to reproduce the failure in 
`TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom` in `branch_9x` 
on a x86_64 Linux machine:
   
   ```
   $JAVA_HOME/bin/java -version
   java version "17.0.9" 2023-10-17 LTS
   Java(TM) SE Runtime Environment (build 17.0.9+11-LTS-201)
   Java HotSpot(TM) 64-Bit Server VM (build 17.0.9+11-LTS-201, mixed mode, 
sharing)
   
   $RUNTIME_JAVA_HOME/bin/java -version
   openjdk version "17.0.9-internal" 2023-10-17
   OpenJDK Runtime Environment (build 
17.0.9-internal+0-adhoc.root.openj9-openjdk-jdk17)
   Eclipse OpenJ9 VM (build openj9-0.41.0, JRE 17 Linux amd64-64-Bit Compressed 
References 20231124_00 (JIT enabled, AOT enabled)
   OpenJ9   - 461bf3c70
   OMR  - 5eee6ad9d
   JCL  - 3699725139c based on jdk-17.0.9+9)
   ```
   
   ``` 
   ./gradlew -p lucene/core -Dtests.seed=F7B4CD7A5624D5EC test --tests 
TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom 
-Dtests.jvmargs="-XX:+UseCompressedOops" -Ptests.iters=1000
   
   ...
   
   > Task :altJvmWarning
   NOTE: Alternative java toolchain will be used for compilation and tests:
 Project will use 17 (Eclipse OpenJ9 JDK 
17.0.9-internal+0-adhoc.root.openj9-openjdk-jdk17, home at: 
/root/openj9_issues_18400/openj9-openjdk-jdk17/build/linux-x86_64-server-release/images/jdk)
 Gradle runs with 17 (Oracle JDK 17.0.9+11-LTS-201, home at: 
/root/openj9_issues_18400/jdk-17.0.9)
   
   ...
   
   > Task :lucene:core:test
   WARNING: A command line option has enabled the Security Manager
   WARNING: The Security Manager is deprecated and will be removed in a future 
release
   WARNING: A terminally deprecated method in java.lang.System has been called
   WARNING: System::setSecurityManager has been called by java.lang.System
   WARNING: Please consider reporting this to the maintainers of 
java.lang.System
   WARNING: System::setSecurityManager will be removed in a future release
   
   org.apache.lucene.index.TestIndexWriterThreadsToSegments > 
testSegmentCountOnFlushRandom {seed=[F7B4CD7A5624D5EC:59A28958CC8D8396]} FAILED
   com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
uncaught exception in thread: Thread[id=126, name=Thread-97, state=RUNNABLE, 
group=TGRP-TestIndexWriterThreadsToSegments]
   
   Caused by:
   java.lang.RuntimeException: 
java.util.concurrent.BrokenBarrierException
   at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   at 
app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   
   Caused by:
   java.util.concurrent.BrokenBarrierException
   at 
java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   at 
java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   at 
app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   
   com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
uncaught exception in thread: Thread[id=127, name=Thread-98, state=RUNNABLE, 
group=TGRP-TestIndexWriterThreadsToSegments]
   
   Caused by:
   java.lang.RuntimeException: 
java.util.concurrent.BrokenBarrierException
   at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   at 
app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   
   Caused by:
   java.util.concurrent.BrokenBarrierException
   at 
java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   at 
java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   at 
app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   
   com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
uncaught exception in thread: Thread[id=130, name=Thread-101, state=RUNNABLE, 
group=TGRP-TestIndexWriterThreadsToSegments]
   
   Caused by:
   java.lang.RuntimeException: 
java.util.concurrent.BrokenBarrierException
   at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   at 
app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   
   Caused by:
   java.util.concurrent.BrokenBarrierException
   at 
java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   at 

Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-20 Thread via GitHub


uschindler commented on issue #12916:
URL: https://github.com/apache/lucene/issues/12916#issuecomment-1865001473

   I made test fail on my AMD Ryzen 3700 (the Policeman Jenkins Sever):
   
   ```sh
   $ ./gradlew -p lucene/core -Dtests.seed=F7B4CD7A5624D5EC beast --tests 
TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom 
-Dtests.jvmargs="-XX:+UseCompressedOops" -Ptests.iters=1000 -Ptests.dups=100
   ```
   
   It failed on the 3rd beasting:
   
   ```
   > Task :lucene:core:test_1
   WARNING: A command line option has enabled the Security Manager
   WARNING: The Security Manager is deprecated and will be removed in a future 
release
   :lucene:core:test_1 (SUCCESS): 1000 test(s)
   
   > Task :lucene:core:test_10
   WARNING: A command line option has enabled the Security Manager
   WARNING: The Security Manager is deprecated and will be removed in a future 
release
   :lucene:core:test_10 (SUCCESS): 1000 test(s)
   
   > Task :lucene:core:test_100
   WARNING: A command line option has enabled the Security Manager
   WARNING: The Security Manager is deprecated and will be removed in a future 
release
   
   org.apache.lucene.index.TestIndexWriterThreadsToSegments > 
testSegmentCountOnFlushRandom {seed=[F7B4CD7A5624D5EC:A0DDCC17DE66DB34]} FAILED
   java.lang.AssertionError
   at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   at org.junit.Assert.fail(Assert.java:87)
   at org.junit.Assert.assertTrue(Assert.java:42)
   at org.junit.Assert.assertTrue(Assert.java:53)
   at 
org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
   at 
java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
   at 
java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   at 
org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   
   com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
uncaught exception in thread: Thread[id=1006, name=Thread-981, state=RUNNABLE, 
group=TGRP-TestIndexWriterThreadsToSegments]
   
   Caused by:
   java.lang.RuntimeException: 
java.util.concurrent.BrokenBarrierException
   at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   at 
org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   
   Caused by:
   java.util.concurrent.BrokenBarrierException
   at 
java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   at 
java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   at 
org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   
   com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
uncaught exception in thread: Thread[id=1004, name=Thread-979, state=RUNNABLE, 
group=TGRP-TestIndexWriterThreadsToSegments]
   
   Caused by:
   java.lang.RuntimeException: 
java.util.concurrent.BrokenBarrierException
   at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   at 
org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   
   Caused by:
   java.util.concurrent.BrokenBarrierException
   at 
java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   at 
java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   at 
org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   
   com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
uncaught exception in thread: Thread[id=1002, name=Thread-977, state=RUNNABLE, 
group=TGRP-TestIndexWriterThreadsToSegments]
   
   Caused by:
   java.lang.RuntimeException: 
java.util.concurrent.BrokenBarrierException
   at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   at 
org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   
   Caused by:
   java.util.concurrent.BrokenBarrierException
   at 
java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   at 
java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   at 
org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   
   com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
uncaught exception in thread: Thread[id=999, name=Thread-974, state=RUNNABLE, 
group=TGRP-TestIndexWriterThreadsToSegments]
   
  

Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-20 Thread via GitHub


uschindler commented on issue #12916:
URL: https://github.com/apache/lucene/issues/12916#issuecomment-1864923856

   Thanks. Maybe the OpenJ9 people can help how they reproduced. You can use 
`gradlew beast` instead of `gradlew test` to run the forked copies inside 
gradle. 
   
   More looing into that code; I think the whole code here (and IndexWriter in 
general) should be freed of ancient "synchronized" blocks and synchronized 
blocks and should instead use the more modern Java synchronization patterns 
inclusive volatile / opaque reads and barriers. The problematic concurrency 
could for sure be solved with some better algorithm that allows to read 
lockfree and only write with locks (e.g., ReadWriteLock instead of 
synchronized).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-20 Thread via GitHub


jpountz commented on issue #12916:
URL: https://github.com/apache/lucene/issues/12916#issuecomment-1864917501

   Thanks. I'm trying to reproduce failures locally with the following command, 
without luck so far with JDK 17 and JDK 21. I'll dig more tomorrow. If any of 
you manages to reproduce, I'm interested in the command that you used.
   
   ```
   $ ./gradlew -p lucene/core -Dtests.seed=F7B4CD7A5624D5EC test --tests 
TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom 
-Dtests.jvmargs="-XX:+UseCompressedOops" -Ptests.iters=1000
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-20 Thread via GitHub


uschindler commented on issue #12916:
URL: https://github.com/apache/lucene/issues/12916#issuecomment-1864883276

   See this comment: 
https://github.com/eclipse-openj9/openj9/issues/18400#issuecomment-1834577142


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-20 Thread via GitHub


jpountz commented on issue #12916:
URL: https://github.com/apache/lucene/issues/12916#issuecomment-1864882629

   Thanks I had missed that, I'll look more into it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-20 Thread via GitHub


uschindler commented on issue #12916:
URL: https://github.com/apache/lucene/issues/12916#issuecomment-1864881002

   They made statistics in the linked issue. Hotspot also fails. So they 
rejected it as openj9 issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-20 Thread via GitHub


uschindler commented on issue #12916:
URL: https://github.com/apache/lucene/issues/12916#issuecomment-1864879720

   This is a real bug and not one of openj9. You can reproduce this bug with 
enough tries on hotspot, too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-20 Thread via GitHub


jpountz commented on issue #12916:
URL: https://github.com/apache/lucene/issues/12916#issuecomment-1864843912

   Does someone understand if adding synchronization is fixing a real bug of it 
it just helps hide a J9 bug? This method is subject to contention and #12199 
was about avoiding locking on this method, which proved to help significantly 
when indexing cheap documents on several threads. I'm looking at the test and 
the code and can't see what sort of race condition may cause this test to fail.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove remaining sources of contention on indexing. [lucene]

2023-12-20 Thread via GitHub


jpountz closed pull request #12205: Remove remaining sources of contention on 
indexing.
URL: https://github.com/apache/lucene/pull/12205


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove remaining sources of contention on indexing. [lucene]

2023-12-20 Thread via GitHub


jpountz commented on PR #12205:
URL: https://github.com/apache/lucene/pull/12205#issuecomment-1864768401

   This test failure is sneaky, I extracted some bits from this PR into #12958.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]

2023-12-20 Thread via GitHub


jpountz commented on PR #12954:
URL: https://github.com/apache/lucene/pull/12954#issuecomment-1864765767

   > so maybe we can consider an other approach: try to avoid the for-loop in 
reset() if the instance can be reused
   
   +1 this sounds like a good idea!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]

2023-12-20 Thread via GitHub


easyice commented on PR #12954:
URL: https://github.com/apache/lucene/pull/12954#issuecomment-1864749851

   
   I took several hours to confirm the results, the benchmark shows it became 
faster, this exceeded my expectation, we think the speedup is due to remove the 
loop that initializes the `freqBuffer` to 1 in `reset()` like below:
   
   ```
 if (indexHasFreq == false || needsFreq == false) {
   for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) {
 freqBuffer[i] = 1;
   }
 }
   ```
   
   Since if we always allocate the 128-size `freqBuffer` for this PR, the 
benchmark shows it still has a speedup. therefore, performance improvement has 
no relevance to reducing memory allocation. so maybe we can consider the other 
approach: try to avoid the for-loop in `reset()` if the instance can be reused. 
thanks for the suggestions from @gf2121 when i investigating the cause of the 
performance speedup.
   
   
   Benchmark output for the PR(using `wikimediumall`):
   
   ```
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
   HighSloppyPhrase0.38  (5.3%)0.37  
(4.6%)   -1.6% ( -10% -8%) 0.323
MedTerm  226.03  (4.8%)  223.08  
(5.2%)   -1.3% ( -10% -9%) 0.409
   HighIntervalsOrdered2.19  (6.4%)2.17  
(6.5%)   -0.9% ( -12% -   12%) 0.676
MedSloppyPhrase   18.52  (2.7%)   18.39  
(2.5%)   -0.7% (  -5% -4%) 0.402
 Fuzzy2   36.10  (1.8%)   35.86  
(1.6%)   -0.7% (  -3% -2%) 0.219
 Fuzzy1   43.40  (1.6%)   43.16  
(1.7%)   -0.6% (  -3% -2%) 0.276
Respell   21.69  (1.8%)   21.58  
(1.8%)   -0.5% (  -4% -3%) 0.375
LowTerm  232.03  (3.0%)  231.08  
(2.9%)   -0.4% (  -6% -5%) 0.659
LowSloppyPhrase   18.26  (2.0%)   18.20  
(2.1%)   -0.3% (  -4% -3%) 0.660
   HighTerm  267.11  (5.2%)  266.50  
(5.6%)   -0.2% ( -10% -   11%) 0.893
   HighSpanNear1.85  (5.7%)1.84  
(6.7%)   -0.2% ( -11% -   12%) 0.935
   OrHighNotLow  167.52  (5.7%)  167.26  
(5.6%)   -0.2% ( -10% -   11%) 0.931
   HighTermTitleBDVSort1.90  (3.7%)1.90  
(4.5%)   -0.1% (  -7% -8%) 0.915
MedIntervalsOrdered7.07  (3.4%)7.06  
(3.8%)   -0.1% (  -7% -7%) 0.910
MedSpanNear   24.97  (2.1%)   24.94  
(2.7%)   -0.1% (  -4% -4%) 0.874
 HighPhrase   10.67  (6.0%)   10.66  
(5.7%)   -0.1% ( -11% -   12%) 0.950
  LowPhrase4.70  (4.0%)4.70  
(3.8%)   -0.0% (  -7% -8%) 0.979
   OrHighNotMed  130.98  (6.2%)  131.01  
(6.1%)0.0% ( -11% -   13%) 0.989
  OrNotHighHigh  171.61  (5.4%)  171.67  
(5.3%)0.0% ( -10% -   11%) 0.984
LowIntervalsOrdered   28.65  (4.3%)   28.68  
(4.3%)0.1% (  -8% -9%) 0.947
 OrHighHigh   18.94  (2.9%)   19.00  
(3.6%)0.3% (  -5% -6%) 0.766
  OrHighNotHigh  125.97  (5.5%)  126.41  
(6.0%)0.3% ( -10% -   12%) 0.848
   OrNotHighMed  181.48  (4.0%)  182.38  
(3.6%)0.5% (  -6% -8%) 0.679
LowSpanNear6.89  (2.5%)6.93  
(3.2%)0.6% (  -5% -6%) 0.516
  MedPhrase  110.79  (2.8%)  111.45  
(2.9%)0.6% (  -5% -6%) 0.515
  OrHighMed   38.51  (2.4%)   38.79  
(2.1%)0.7% (  -3% -5%) 0.311
 AndHighMed   40.73  (2.4%)   41.06  
(2.5%)0.8% (  -4% -5%) 0.304
 TermDTSort   74.72  (4.1%)   75.32  
(2.6%)0.8% (  -5% -7%) 0.460
AndHighHigh   10.24  (5.6%)   10.33  
(3.9%)0.8% (  -8% -   11%) 0.600
  HighTermMonthSort 1071.18  (2.9%) 1079.84  
(4.5%)0.8% (  -6% -8%) 0.499
 AndHighLow  167.91  (5.2%)  170.10  
(5.6%)1.3% (  -9% -   12%) 0.446
 IntNRQ   13.84  (4.0%)   14.05  
(3.3%)1.5% (  -5% -9%) 0.208
   OrNotHighLow  241.06  (4.5%)  244.91  
(4.8%)1.6% (  -7% -   11%) 0.276
  OrHighLow  175.82  

Re: [PR] Make FSTCompiler.compile() to only return the FSTMetadata [lucene]

2023-12-20 Thread via GitHub


dungba88 commented on code in PR #12831:
URL: https://github.com/apache/lucene/pull/12831#discussion_r1421299753


##
lucene/analysis/common/src/java/org/apache/lucene/analysis/charfilter/NormalizeCharMap.java:
##
@@ -111,7 +111,7 @@ public NormalizeCharMap build() {
 for (Map.Entry ent : pendingPairs.entrySet()) {
   fstCompiler.add(Util.toUTF16(ent.getKey(), scratch), new 
CharsRef(ent.getValue()));
 }
-map = fstCompiler.compile();
+map = FST.fromFSTReader(fstCompiler.compile(), 
fstCompiler.getFSTReader());

Review Comment:
   This `fromFSTReader` is there to avoid the boilerplate null-check that each 
consumer must now do. Open for method name suggestion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove remaining sources of contention on indexing. [lucene]

2023-12-20 Thread via GitHub


jpountz commented on PR #12205:
URL: https://github.com/apache/lucene/pull/12205#issuecomment-1864651149

   The above failure is a bit scary, I'll try to split this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Replace usage of deprecated size() with length() in ByteBuffersDataInput [lucene]

2023-12-20 Thread via GitHub


dungba88 commented on PR #12948:
URL: https://github.com/apache/lucene/pull/12948#issuecomment-1864650439

   I think we can remove the `size()` method in 10.0?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Reproducible test failure with Terms#intersect on the default codec [lucene]

2023-12-20 Thread via GitHub


jpountz commented on issue #12957:
URL: https://github.com/apache/lucene/issues/12957#issuecomment-1864480721

   I just pushed the change, thanks @mikemccand for putting me on the right 
track.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Reproducible test failure with Terms#intersect on the default codec [lucene]

2023-12-20 Thread via GitHub


jpountz closed issue #12957: Reproducible test failure with Terms#intersect on 
the default codec
URL: https://github.com/apache/lucene/issues/12957


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Reproducible test failure with Terms#intersect on the default codec [lucene]

2023-12-20 Thread via GitHub


mikemccand commented on issue #12957:
URL: https://github.com/apache/lucene/issues/12957#issuecomment-1864450920

   OK the `DirectPostingsFormat` failure is also happy with this fix.  +1 to 
merge.  Thanks @jpountz!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Reproducible test failure with Terms#intersect on the default codec [lucene]

2023-12-20 Thread via GitHub


mikemccand commented on issue #12957:
URL: https://github.com/apache/lucene/issues/12957#issuecomment-1864448812

   > > Terms.intersect(Automaton a, BytesRef startTerm) requires that startTerm 
is accepted by the incoming automaton, yet the way CheckIndex is calling it can 
clearly violate that.
   > 
   > I wondered about that, but the automaton is `Automata.makeAnyBinary()`, 
shouldn't it accept any term?
   
   Oh, you're right!  I missed that `Automata.makeAnyBinary()` there!
   
   > Oh I see, I created binary automata, but the API implicitly treats 
automata as UTF32 automata, so you need to tell it explicitly that it's a 
binary automaton. And something like that should fix the problem?
   
   Oh, you are also right!  Specifically `CompiledAutomaton` assumes it's UTF32 
and needs conversion to UTF8, unless you pass `isBinar=true`.  OK I like your 
fix!  I'll confirm it fixes the `DirectPostingsFormat` failure too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Improve Javadoc for DocValuesConsumer [lucene]

2023-12-20 Thread via GitHub


jpountz merged PR #12952:
URL: https://github.com/apache/lucene/pull/12952


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Reproducible test failure with Terms#intersect on the default codec [lucene]

2023-12-20 Thread via GitHub


jpountz commented on issue #12957:
URL: https://github.com/apache/lucene/issues/12957#issuecomment-1864407118

   Oh I see, I created binary automata, but the API implicitly treats automata 
as UTF32 automata, so you need to tell it explicitly that it's a binary 
automaton. And something like that should fix the problem?
   
   ```java
   diff --git a/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java 
b/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
   index a555ce40001..f899b331b92 100644
   --- a/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
   +++ b/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
   @@ -2318,7 +2318,7 @@ public final class CheckIndex implements Closeable {
startTerm = new BytesRef();
checkTermsIntersect(terms, automaton, startTerm);

   -automaton = Automata.makeAnyBinary();
   +automaton = Automata.makeNonEmptyBinary();
startTerm = new BytesRef(new byte[] {'l'});
checkTermsIntersect(terms, automaton, startTerm);

   @@ -2369,8 +2369,8 @@ public final class CheckIndex implements Closeable {
  throws IOException {
TermsEnum allTerms = terms.iterator();
automaton = Operations.determinize(automaton, 
Operations.DEFAULT_DETERMINIZE_WORK_LIMIT);
   -CompiledAutomaton compiledAutomaton = new CompiledAutomaton(automaton);
   -ByteRunAutomaton runAutomaton = new ByteRunAutomaton(automaton);
   +CompiledAutomaton compiledAutomaton = new CompiledAutomaton(automaton, 
false, true, true);
   +ByteRunAutomaton runAutomaton = new ByteRunAutomaton(automaton, true);
TermsEnum filteredTerms = terms.intersect(compiledAutomaton, startTerm);
BytesRef term;
if (startTerm != null) {
   ```
   
   (I had to change the automaton so that it's still considered of type 
"normal" and not "all")


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Reproducible test failure with Terms#intersect on the default codec [lucene]

2023-12-20 Thread via GitHub


jpountz commented on issue #12957:
URL: https://github.com/apache/lucene/issues/12957#issuecomment-1864395031

   > Terms.intersect(Automaton a, BytesRef startTerm) requires that startTerm 
is accepted by the incoming automaton, yet the way CheckIndex is calling it can 
clearly violate that.
   
   I wondered about that, but the automaton is `Automata.makeAnyBinary()`, 
shouldn't it accept any term?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Reproducible test failure with Terms#intersect on the default codec [lucene]

2023-12-20 Thread via GitHub


mikemccand commented on issue #12957:
URL: https://github.com/apache/lucene/issues/12957#issuecomment-1864386831

   I'll try to fix `CheckIndex` so that it only uses `startTerm` that is 
accepted by the automaton.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Reproducible test failure with Terms#intersect on the default codec [lucene]

2023-12-20 Thread via GitHub


mikemccand commented on issue #12957:
URL: https://github.com/apache/lucene/issues/12957#issuecomment-1864385415

   OK I think the issue here may be that `Terms.intersect(Automaton a, BytesRef 
startTerm)` requires that `startTerm` is accepted by the incoming automaton, 
yet the way `CheckIndex` is calling it can clearly violate that.
   
   And the codecs (default and Direct) clearly don't do a good job throwing a 
clear exception when that is violated :)
   
   In addition to the default Codec, `DirectPostingsFormat` is also angry, 
using this repro:
   
   ```
   ./gradlew :lucene:core:test --tests 
"org.apache.lucene.index.TestTerms.testTermMinMaxRandom" -Ptests.jvms=4 
-Ptests.jvmargs= -Ptests.seed=C8D1EBB5035DA9F -Ptests.multiplier=2 
-Ptests.badapples=false -Ptests.gui=true -Ptests.file.encoding=US-ASCII 
-Ptests.vectorsize=128
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] An improved check for ignoring the c2-crash test if running on a client compiler. [lucene]

2023-12-20 Thread via GitHub


ChrisHegarty commented on PR #12953:
URL: https://github.com/apache/lucene/pull/12953#issuecomment-1864374811

   I'm way too slow here, sorry.  Belated LGTM. And thanks for following up on 
this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] Reproducible test failure with Terms#intersect on the default codec [lucene]

2023-12-20 Thread via GitHub


jpountz opened a new issue, #12957:
URL: https://github.com/apache/lucene/issues/12957

   ### Description
   
   The new CheckIndex checks are causing some test failures with the default 
codec, which are reproducible and look like real bugs? I started looking but 
I'm not familiar enough with BlockTree to understand what it's doing wrong.
   
   https://jenkins.thetaphi.de/job/Lucene-main-Linux/45856/consoleFull
   
   ```
   org.apache.lucene.index.TestTerms > testTermMinMaxRandom FAILED
   java.lang.AssertionError
   at 
__randomizedtesting.SeedInfo.seed([CBF65306049672F4:8785DC72680AA991]:0)
   at 
org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum.getState(IntersectTermsEnum.java:245)
   at 
org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum.seekToStartTerm(IntersectTermsEnum.java:288)
   at 
org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum.(IntersectTermsEnum.java:126)
   at 
org.apache.lucene.codecs.lucene90.blocktree.FieldReader.intersect(FieldReader.java:223)
   at 
org.apache.lucene.index.CheckIndex.checkTermsIntersect(CheckIndex.java:2374)
   at 
org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:2327)
   at 
org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:2529)
   at 
org.apache.lucene.index.CheckIndex.testSegment(CheckIndex.java:1067)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:783)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:550)
   at 
org.apache.lucene.tests.util.TestUtil.checkIndex(TestUtil.java:340)
   at 
org.apache.lucene.tests.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:909)
   at 
org.apache.lucene.index.TestTerms.testTermMinMaxRandom(TestTerms.java:85)
   at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
   at java.base/java.lang.reflect.Method.invoke(Method.java:578)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
   at 
org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
   at 
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   at 
org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
   at 
org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   at 
org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
   at 
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
   at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
   at