Re: [PR] Use group-varint encoding for the tail of postings [lucene]

2024-02-22 Thread via GitHub
easyice commented on PR #12782: URL: https://github.com/apache/lucene/pull/12782#issuecomment-1958968291 > @easyice Hi, I have doubt that the encoding data result using group-varint encoding is different from the old way, so is this way compatible with the old index format data?

Re: [PR] LUCENE-8739: Add ZSTD support. [lucene]

2024-02-22 Thread via GitHub
believezzd commented on PR #174: URL: https://github.com/apache/lucene/pull/174#issuecomment-1959010131 @jpountz ## Description * There are two place code that i can't understand in this commit, could you please give me help. ## First one * To

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-02-22 Thread via GitHub
benwtrent commented on PR #13124: URL: https://github.com/apache/lucene/pull/13124#issuecomment-1959856815 > For instance the merge scheduler could return a custom Executor that dynamically decides to run a new task in the current thread or to fork to a separate thread depending on how

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-02-22 Thread via GitHub
jpountz commented on PR #13124: URL: https://github.com/apache/lucene/pull/13124#issuecomment-1959736838 > So, we would have to update the MergeScheduler to have some methods to return the executor for us to use and pass to MergeState (which is only created via the SegmentMerger object).

Re: [PR] FieldInfosFormat translation should be independent of VectorSimilartyFunction enum [lucene]

2024-02-22 Thread via GitHub
ChrisHegarty commented on PR #13119: URL: https://github.com/apache/lucene/pull/13119#issuecomment-1959890082 Thanks for the reviews. All comments have been addressed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] FieldInfosFormat translation should be independent of VectorSimilartyFunction enum [lucene]

2024-02-22 Thread via GitHub
ChrisHegarty merged PR #13119: URL: https://github.com/apache/lucene/pull/13119 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] LUCENE-4056: Japanese Tokenizer (Kuromoji) cannot build UniDic dictionary [lucene]

2024-02-22 Thread via GitHub
azagniotov commented on PR #12517: URL: https://github.com/apache/lucene/pull/12517#issuecomment-1960013536 @uschindler yes: I have rebased from the latest `main` branch and ran the `./gradlew clean regenerate`. The following is the (partial) output: ``` ... ... > Task

Re: [PR] LUCENE-4056: Japanese Tokenizer (Kuromoji) cannot build UniDic dictionary [lucene]

2024-02-22 Thread via GitHub
azagniotov commented on PR #12517: URL: https://github.com/apache/lucene/pull/12517#issuecomment-1960013241 @uschindler yes: I have rebased from the latest `main` branch and ran the `./gradlew clean regenerate`. The following is the (partial) output: ``` ... ... > Task

Re: [PR] Adding benwtrent (me) to committers & PMC [lucene-site]

2024-02-22 Thread via GitHub
benwtrent merged PR #76: URL: https://github.com/apache/lucene-site/pull/76 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-02-22 Thread via GitHub
benwtrent commented on code in PR #13124: URL: https://github.com/apache/lucene/pull/13124#discussion_r1499916055 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java: ## @@ -152,7 +153,25 @@ public Lucene99HnswVectorsFormat() { * @param

Re: [PR] Bump release to Java 21 [lucene]

2024-02-22 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1960300191 @ChrisHegarty @uschindler @mikemccand i think we still need to address these python scripts, sorry I haven't gotten to it! been busy unfortunately: > still need to update

Re: [PR] Bump release to Java 21 [lucene]

2024-02-22 Thread via GitHub
rmuir commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1960301699 I mean, they could be a followup ticket. But it is nice to keep the `main` branch releasable. Until we fix those python scripts, we can't release because they want to use java 17 --

Re: [PR] LUCENE-4056: Japanese Tokenizer (Kuromoji) cannot build UniDic dictionary [lucene]

2024-02-22 Thread via GitHub
mocobeta commented on PR #12517: URL: https://github.com/apache/lucene/pull/12517#issuecomment-1960500994 @azagniotov Sorry, I've not been available for a while. Let me take a look; I will try to find time next week... -- This is an automated message from the Apache Git Service. To

Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]

2024-02-22 Thread via GitHub
github-actions[bot] commented on PR #12980: URL: https://github.com/apache/lucene/pull/12980#issuecomment-1960551634 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your

Re: [PR] Throw CorruptSegmentInfoException on encountering missing segment info (_N.si) file in CheckIndex [lucene]

2024-02-22 Thread via GitHub
github-actions[bot] commented on PR #12872: URL: https://github.com/apache/lucene/pull/12872#issuecomment-1960551768 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your

Re: [PR] Make Lucene90 postings format to write FST off heap [lucene]

2024-02-22 Thread via GitHub
github-actions[bot] commented on PR #12985: URL: https://github.com/apache/lucene/pull/12985#issuecomment-1960551613 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your

Re: [PR] LUCENE-4056: Japanese Tokenizer (Kuromoji) cannot build UniDic dictionary [lucene]

2024-02-22 Thread via GitHub
azagniotov commented on PR #12517: URL: https://github.com/apache/lucene/pull/12517#issuecomment-1960506371 Thank you, @mocobeta  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] Adding benwtrent (me) to committers & PMC [lucene-site]

2024-02-22 Thread via GitHub
benwtrent opened a new pull request, #76: URL: https://github.com/apache/lucene-site/pull/76 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Bump release to Java 21 [lucene]

2024-02-22 Thread via GitHub
ChrisHegarty commented on PR #12753: URL: https://github.com/apache/lucene/pull/12753#issuecomment-1960191939 This is ready to merge, right? @rmuir @uschindler @mikemccand -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Re-use information from graph traversal during exact search [lucene]

2024-02-22 Thread via GitHub
kaivalnp commented on PR #12820: URL: https://github.com/apache/lucene/pull/12820#issuecomment-1959568841 Thanks for checking @benwtrent! We primarily improve cases of using a high topK + a selective filter (good rate of fallback, large number of duplicate computations). I notice

Re: [PR] Use group-varint encoding for the tail of postings [lucene]

2024-02-22 Thread via GitHub
wjp719 commented on PR #12782: URL: https://github.com/apache/lucene/pull/12782#issuecomment-1959222579 > Have you got specific errors? could you give some detailed message? Thanks! I have no errors,I didn't realize the new format was used, Thanks. -- This is an automated

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-02-22 Thread via GitHub
benwtrent commented on code in PR #13124: URL: https://github.com/apache/lucene/pull/13124#discussion_r1499311351 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java: ## @@ -152,7 +153,25 @@ public Lucene99HnswVectorsFormat() { * @param

Re: [PR] FieldInfosFormat translation should be independent of VectorSimilartyFunction enum [lucene]

2024-02-22 Thread via GitHub
benwtrent commented on code in PR #13119: URL: https://github.com/apache/lucene/pull/13119#discussion_r1499170841 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -171,15 +172,25 @@ private void validateFieldEntry(FieldInfo info,

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-02-22 Thread via GitHub
benwtrent commented on PR #13124: URL: https://github.com/apache/lucene/pull/13124#issuecomment-1959551147 > So I think it is still better to have two separate thread pools for inter-segment merge and inner segment merge, but I wonder whether we can have a ThreadPoolManager which