Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-21 Thread via GitHub
uschindler commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1866406188 OK, it is running now with above command line (beasting) on OpenJDK Temurin 17.0.9. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-12-21 Thread via GitHub
s1monw commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1434180897 ## lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94FieldInfosFormat.java: ## @@ -157,6 +158,8 @@ public FieldInfos read( boolean omitNorms =

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-12-21 Thread via GitHub
s1monw commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1434195239 ## lucene/core/src/java/org/apache/lucene/index/IndexingChain.java: ## @@ -219,15 +224,41 @@ private Sorter.DocMap maybeSortSegment(SegmentWriteState state) throws

Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-21 Thread via GitHub
uschindler commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1866601937 Hi, it still fails - this time in `org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150` It is a

Re: [PR] Make sure `ConcurrentApproximatePriorityQueue#poll` never returns `null` on a non-empty queue. [lucene]

2023-12-21 Thread via GitHub
uschindler commented on PR #12959: URL: https://github.com/apache/lucene/pull/12959#issuecomment-1866729268 It looks like this does not fix the issue in #12916. The bug is there but it does not look like the cause for #12916. -- This is an automated message from the Apache Git Service.

Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]

2023-12-21 Thread via GitHub
easyice commented on PR #12954: URL: https://github.com/apache/lucene/pull/12954#issuecomment-1866446008 Here is the benchmark for new approach (avoid for-loop in `reset()`), the `PKLookup` task still has a speedup, but the speedup for `Wildcard` task is disappeared, i checked the memory

Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-21 Thread via GitHub
uschindler commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1866708686 Nah it is exactly same error. I verified I am on right branch: ``` thetaphi@serv1:~/repro/lucene$ git status On branch

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-12-21 Thread via GitHub
s1monw commented on PR #12829: URL: https://github.com/apache/lucene/pull/12829#issuecomment-1866428159 > If my understanding is correct, we will require a parent field when using blocks as of 10.0. One concern I have about this is that we currently don't require users to know up-front

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-12-21 Thread via GitHub
s1monw commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1434189939 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java: ## @@ -262,6 +294,35 @@ long updateDocuments( } } + private Iterable

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-12-21 Thread via GitHub
s1monw commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1434193511 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -1180,33 +1180,43 @@ public static Status.IndexSortStatus testSort( comparators[i] =

[PR] Fix typo in help/formatting.txt [lucene]

2023-12-21 Thread via GitHub
sabi0 opened a new pull request, #12960: URL: https://github.com/apache/lucene/pull/12960 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Fix typo in help/formatting.txt [lucene]

2023-12-21 Thread via GitHub
dweiss merged PR #12960: URL: https://github.com/apache/lucene/pull/12960 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Fix typo in help/formatting.txt [lucene]

2023-12-21 Thread via GitHub
dweiss commented on PR #12960: URL: https://github.com/apache/lucene/pull/12960#issuecomment-1866794519 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] `gradlew check` fails checksum validation in a fresh clone on Windows [lucene]

2023-12-21 Thread via GitHub
dweiss commented on issue #12961: URL: https://github.com/apache/lucene/issues/12961#issuecomment-1866928747 > What harm do you see in adding those .gitattributes rules? I don't like these conversions, that's about it. I like to get out what I put in. Consider this example if you

Re: [PR] Make sure `ConcurrentApproximatePriorityQueue#poll` never returns `null` on a non-empty queue. [lucene]

2023-12-21 Thread via GitHub
uschindler commented on PR #12959: URL: https://github.com/apache/lucene/pull/12959#issuecomment-1867040618 I get it to fail with both OpenJ9 and Hotspot on this branch, see #12916 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-21 Thread via GitHub
uschindler commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1866743958 Also keep to have the machine busy at same moment (Jenkins jobs are running at same time). This may increase chance to hit this. -- This is an automated message from the

Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-21 Thread via GitHub
uschindler commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1866743326 Did you run my above "beasting" command? It runs everything 100 times with each 1000 iterations. For me it fails in most cases after 3rd or fourth try. -- This is an automated

[PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2023-12-21 Thread via GitHub
mayya-sharipova opened a new pull request, #12962: URL: https://github.com/apache/lucene/pull/12962 A second implementation of #12794 using Queue instead of MaxScoreAccumulator. Speedup concurrent multi-segment HNWS graph search by exchanging the global top scores collected so far

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2023-12-21 Thread via GitHub
mayya-sharipova commented on PR #12962: URL: https://github.com/apache/lucene/pull/12962#issuecomment-1866829988 ### 1M vectors of 100 dims k=10, fanout=90 | |Avg visited nodes | QPS| Recall| | :--- |

Re: [I] `gradlew check` fails checksum validation in a fresh clone on Windows [lucene]

2023-12-21 Thread via GitHub
sabi0 commented on issue #12961: URL: https://github.com/apache/lucene/issues/12961#issuecomment-1866885528 To work with this repository the files _must_ have LF line breaks. Why not make that part of the repository itself? And forget about describing this quirk in the contributing

Re: [I] `gradlew check` fails checksum validation in a fresh clone on Windows [lucene]

2023-12-21 Thread via GitHub
dweiss commented on issue #12961: URL: https://github.com/apache/lucene/issues/12961#issuecomment-1866925375 > Apparently, gradlew check also requires Perl, which contributing guide fails to mention. Perl and python3 (in many cases, not sure whether it's required for tasks check

Re: [PR] Remove unnecessary fields loop from extractWeightedSpanTerms() [lucene]

2023-12-21 Thread via GitHub
sabi0 commented on PR #12965: URL: https://github.com/apache/lucene/pull/12965#issuecomment-1867009056 It seems the method could be skipped completely if the query does not contain `fieldName` or `defaultField`: ``` protected void extractWeightedSpanTerms(Map terms, SpanQuery

Re: [I] `gradlew check` fails checksum validation in a fresh clone on Windows [lucene]

2023-12-21 Thread via GitHub
sabi0 commented on issue #12961: URL: https://github.com/apache/lucene/issues/12961#issuecomment-1866807745 In addition to the checksum problem `gradlew tidy` in a fresh clone on Windows results in 5147 modified files! I guess forcing the LF line breaks is the best fix for both these

Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-21 Thread via GitHub
jpountz commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1866907700 > Did you run my above "beasting" command? It runs everything 100 times with each 1000 iterations. For me it fails in most cases after 3rd or fourth try. I tried it this

Re: [I] `gradlew check` fails checksum validation in a fresh clone on Windows [lucene]

2023-12-21 Thread via GitHub
sabi0 commented on issue #12961: URL: https://github.com/apache/lucene/issues/12961#issuecomment-1866912391 Apparently, `gradlew check` also requires Perl, which contributing guide fails to mention. ``` * What went wrong: Execution failed for task

Re: [I] `gradlew check` fails checksum validation in a fresh clone on Windows [lucene]

2023-12-21 Thread via GitHub
dweiss commented on issue #12961: URL: https://github.com/apache/lucene/issues/12961#issuecomment-1866918536 I won't stand in the way if others want those .gitattributes - I just expressed my opinion. Do yourself a favor and switch it off globally, really. This should be the default on

Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-21 Thread via GitHub
jpountz commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1866740276 Thanks I'll keep digging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] `gradlew check` fails checksum validation in a fresh clone on Windows [lucene]

2023-12-21 Thread via GitHub
sabi0 commented on issue #12961: URL: https://github.com/apache/lucene/issues/12961#issuecomment-1866894383 At the very least the contributing guide needs to mention that one should set the config while cloning the repo: `git clone --config core.autocrlf=false

Re: [I] `gradlew check` fails checksum validation in a fresh clone on Windows [lucene]

2023-12-21 Thread via GitHub
dweiss commented on issue #12961: URL: https://github.com/apache/lucene/issues/12961#issuecomment-1866936687 > I'm with you on not liking this behavior, actually. I think any programmer on Windows has been through this. I don't know why git defaults are such on Windows - maybe git

Re: [I] `gradlew check` fails checksum validation in a fresh clone on Windows [lucene]

2023-12-21 Thread via GitHub
dweiss commented on issue #12961: URL: https://github.com/apache/lucene/issues/12961#issuecomment-1866853682 I'm sorry, but I disagree with this. I'm using Windows myself all the time but the automatic conversion of line ends is the most weird and annoying feature ever conceived. It can be

Re: [I] `gradlew check` fails checksum validation in a fresh clone on Windows [lucene]

2023-12-21 Thread via GitHub
sabi0 commented on issue #12961: URL: https://github.com/apache/lucene/issues/12961#issuecomment-1866890549 With `core.autocrlf` set to `false` line breaks normalization when committing new files is disabled too. Then contributors on Windows will be pushing files with CR LF line breaks

Re: [I] `gradlew check` fails checksum validation in a fresh clone on Windows [lucene]

2023-12-21 Thread via GitHub
sabi0 commented on issue #12961: URL: https://github.com/apache/lucene/issues/12961#issuecomment-1866932938 I'm with you on not liking this behavior, actually. Having a bunch of repositories converted from Subversion with a mix of line breaks from different contributors it is always a

Re: [PR] Make sure `ConcurrentApproximatePriorityQueue#poll` never returns `null` on a non-empty queue. [lucene]

2023-12-21 Thread via GitHub
dweiss commented on PR #12959: URL: https://github.com/apache/lucene/pull/12959#issuecomment-1866971433 For what it's worth, I've tried reproducing this on Adrien's branch with: ``` gradlew -p lucene/core -Dtests.seed=F7B4CD7A5624D5EC beast --tests

[I] `TestStressLockFactories` fails on Windows in a freshly cloned repository [lucene]

2023-12-21 Thread via GitHub
sabi0 opened a new issue, #12964: URL: https://github.com/apache/lucene/issues/12964 ### Description > NOTE: test params are: codec=Asserting(Lucene99): {}, docValues:{}, maxPointsInLeafNode=189, maxMBSortInHeap=5.965018891538125, sim=Asserting(RandomSimilarity(queryNorm=true): {}),

[PR] Remove unnecessary fields loop from extractWeightedSpanTerms() [lucene]

2023-12-21 Thread via GitHub
sabi0 opened a new pull request, #12965: URL: https://github.com/apache/lucene/pull/12965 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-21 Thread via GitHub
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1867190473 Hi @uschindler , can we merge this PR now?(without optimizing ByteBufferIndexInput) or anything else that needs to be changed? -- This is an automated message from the Apache Git

Re: [PR] Optimize FST on-heap BytesReader [lucene]

2023-12-21 Thread via GitHub
mikemccand commented on code in PR #12879: URL: https://github.com/apache/lucene/pull/12879#discussion_r1433990660 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -56,14 +66,59 @@ public long ramBytesUsed() { public void freeze() {

Re: [PR] Optimize FST on-heap BytesReader [lucene]

2023-12-21 Thread via GitHub
mikemccand commented on code in PR #12879: URL: https://github.com/apache/lucene/pull/12879#discussion_r1433995406 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -56,14 +66,59 @@ public long ramBytesUsed() { public void freeze() {

[PR] Make sure `ConcurrentApproximatePriorityQueue#poll` never returns `null` on a non-empty queue. [lucene]

2023-12-21 Thread via GitHub
jpountz opened a new pull request, #12959: URL: https://github.com/apache/lucene/pull/12959 Before this change, `ConcurrentApproximatePriorityQueue#poll` could sometimes return `null` even though the queue was empty at no point in time. The practical implication is that we can end up

Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-21 Thread via GitHub
uschindler commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1866112795 Is there an explanation why it works better when making the method synchronized as suggested by the OpenJ9 people? -- This is an automated message from the Apache Git Service.

Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-21 Thread via GitHub
jpountz commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1866059760 I was not able to reproduce the failure, but could create an isolated test on `ConcurrentApproximatePriorityQueue` that consistently reproduces what I think is the same problem. I

Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-21 Thread via GitHub
uschindler commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1866108892 Ok, maybe the issue happens more often with OpenJ9 or AMD Ryzen CPUs. Should I take your PR and check with beasting on Policeman's Ryzen to see if I can reproduce with the above

Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-21 Thread via GitHub
jpountz commented on issue #12916: URL: https://github.com/apache/lucene/issues/12916#issuecomment-1866215329 > Should I take your PR and check with beasting on Policeman's Ryzen to see if I can reproduce with the above command? That would be great. I have a Ryzen too, though