Re: [PR] Add a stored fields test that indexes LineFileDocs. [lucene]

2023-12-19 Thread via GitHub
jpountz merged PR #12927: URL: https://github.com/apache/lucene/pull/12927 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[I] Improve backward-compatibility testing [lucene]

2023-12-19 Thread via GitHub
jpountz opened a new issue, #12956: URL: https://github.com/apache/lucene/issues/12956 ### Description #12895 highlighted that our backward compatibility tests could use some love. I reviewed what we have and tried to collect a list of things we should improve: - [ ] Add

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-12-19 Thread via GitHub
s1monw commented on PR #12829: URL: https://github.com/apache/lucene/pull/12829#issuecomment-1862391122 @mikemccand @jpountz I updated this PR and moved everything to FielInfo / IWC. I think it's ready. the only thing that I'd like to discuss is that we are currently recording the number

Re: [PR] Beef up `Terms#intersect` checks in `CheckIndex`. [lucene]

2023-12-19 Thread via GitHub
jpountz merged PR #12926: URL: https://github.com/apache/lucene/pull/12926 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Modernize LineFileDocs. [lucene]

2023-12-19 Thread via GitHub
jpountz merged PR #12929: URL: https://github.com/apache/lucene/pull/12929 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]

2023-12-19 Thread via GitHub
jpountz commented on PR #12954: URL: https://github.com/apache/lucene/pull/12954#issuecomment-1862482539 I wonder if there is a performance impact, since this is moving a condition from something that runs once per block of 128 docs to something that map run on every doc. -- This is an

Re: [PR] Modernize LineFileDocs. [lucene]

2023-12-19 Thread via GitHub
jpountz commented on PR #12929: URL: https://github.com/apache/lucene/pull/12929#issuecomment-1862488472 From a quick check, the luceneutil version has much more complexity around indexing facets, doc values, etc. It's not entirely obvious to me if sharing the code would help or hurt. --

Re: [PR] Remove patching for doc blocks. [lucene]

2023-12-19 Thread via GitHub
s1monw commented on PR #12741: URL: https://github.com/apache/lucene/pull/12741#issuecomment-1862531327 I wanted to give my $0.02 on this. I am not convinced that a 2% change on a benchmark warrants a 6.2k SLoC addition to such an important codebase. I think the differences in terms of

Re: [PR] Output well-formed UTF-8 bytes in SimpleTextCodec's segmentinfos [lucene]

2023-12-19 Thread via GitHub
jpountz commented on code in PR #12897: URL: https://github.com/apache/lucene/pull/12897#discussion_r1431673997 ## lucene/codecs/src/test/org/apache/lucene/codecs/simpletext/TestSimpleTextSegmentInfoFormat.java: ## @@ -33,4 +43,39 @@ protected Version[] getVersions() {

Re: [PR] Enable CheckIndex to exorcise segments with missing segment infos (.si) (#7820) [lucene]

2023-12-19 Thread via GitHub
gokaai commented on PR #12872: URL: https://github.com/apache/lucene/pull/12872#issuecomment-1863127752 Decided to tackle this in smaller steps - - Made a new (simpler) revision where we throw a more specific error on encountering missing segment info. Created a new exception

Re: [PR] Enable CheckIndex to exorcise segments with missing segment infos (.si) (#7820) [lucene]

2023-12-19 Thread via GitHub
gokaai commented on code in PR #12872: URL: https://github.com/apache/lucene/pull/12872#discussion_r1431602197 ## lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java: ## @@ -389,13 +389,39 @@ private static void parseSegmentInfos( } long totalDocs = 0; +

Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]

2023-12-19 Thread via GitHub
easyice commented on PR #12954: URL: https://github.com/apache/lucene/pull/12954#issuecomment-1862859903 ohhh.. You said makes sense, i will check it. Thank you Adrien! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Add support for index sorting with document blocks [lucene]

2023-12-19 Thread via GitHub
s1monw commented on PR #12829: URL: https://github.com/apache/lucene/pull/12829#issuecomment-1862667309 @mikemccand I don't think we have any impact on performance here at all if this feature is not in use. If you look at DWPT there is a final field that decides if there is further