Re: [PR] Make FSTCompiler.compile() to only return the FSTMetadata [lucene]

2023-12-08 Thread via GitHub
dungba88 commented on code in PR #12831: URL: https://github.com/apache/lucene/pull/12831#discussion_r1420866991 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -503,9 +518,7 @@ public FSTMetadata getMetadata() { } /** - * Save the FST to

Re: [I] Corruption read on term dictionaries in Lucene 9.9 [lucene]

2023-12-08 Thread via GitHub
mikemccand commented on issue #12895: URL: https://github.com/apache/lucene/issues/12895#issuecomment-1848198927 I am travelling this weekend and unlikely to make much progress on this until early next week. Maybe we just revert and release 9.9.1 now? -- This is an automated

Re: [PR] Output well-formed UTF-8 bytes in SimpleTextCodec's segmentinfos [lucene]

2023-12-08 Thread via GitHub
msfroh commented on PR #12897: URL: https://github.com/apache/lucene/pull/12897#issuecomment-1848071583 If needed, I'm happy to add versions of `testFileIsUTF8()` for the other SimpleTextCodec format unit tests. -- This is an automated message from the Apache Git Service. To respond to

[PR] Output well-formed UTF-8 bytes in SimpleTextCodec's segmentinfos [lucene]

2023-12-08 Thread via GitHub
msfroh opened a new pull request, #12897: URL: https://github.com/apache/lucene/pull/12897 ### Description The SimpleTextSegmentInfoFormat was writing the random byte array used as a segment's ID directly -- not converting to a simple text representation of the byte array. As a

Re: [I] Corruption read on term dictionaries in Lucene 9.9 [lucene]

2023-12-08 Thread via GitHub
benwtrent commented on issue #12895: URL: https://github.com/apache/lucene/issues/12895#issuecomment-1848037016 I think if a fix for this isn't found early next week, we should consider reverting it. No user should upgrade to Lucene 9.9.0 with this bug. -- This is an automated

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2023-12-08 Thread via GitHub
kuramitsu commented on code in PR #12885: URL: https://github.com/apache/lucene/pull/12885#discussion_r1421180794 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseReadingFormFilter.java: ## @@ -43,10 +43,30 @@ public

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2023-12-08 Thread via GitHub
kuramitsu commented on code in PR #12885: URL: https://github.com/apache/lucene/pull/12885#discussion_r1421180794 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseReadingFormFilter.java: ## @@ -43,10 +43,30 @@ public

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2023-12-08 Thread via GitHub
kuramitsu commented on code in PR #12885: URL: https://github.com/apache/lucene/pull/12885#discussion_r1421172943 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseReadingFormFilter.java: ## @@ -43,10 +43,30 @@ public

Re: [I] Reproducible TestDrillSideways failure [lucene]

2023-12-08 Thread via GitHub
gsmiller commented on issue #12418: URL: https://github.com/apache/lucene/issues/12418#issuecomment-1847976644 OK merged #12853 which I think fixes the root cause of this randomized test failures. I'm going to resolve out this issue and will keep an eye on nightlies for any new failures.

Re: [I] IntTaxonomyFacets chooses dense values array when FacetsCollector has no MatchingDocs [lucene]

2023-12-08 Thread via GitHub
gsmiller commented on issue #12558: URL: https://github.com/apache/lucene/issues/12558#issuecomment-1847976273 Fixed the root cause of this in #12853 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Ensure #finish is called on all drill-sideways FacetCollectors even when no hits are scored [lucene]

2023-12-08 Thread via GitHub
gsmiller commented on PR #12853: URL: https://github.com/apache/lucene/pull/12853#issuecomment-1847976009 Thanks @gautamworah96 for taking a look! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Ensure #finish is called on all drill-sideways FacetCollectors even when no hits are scored [lucene]

2023-12-08 Thread via GitHub
gsmiller commented on code in PR #12853: URL: https://github.com/apache/lucene/pull/12853#discussion_r1421103992 ## lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java: ## @@ -193,42 +204,29 @@ public BulkScorer bulkScorer(LeafReaderContext context) throws

Re: [PR] Ensure #finish is called on all drill-sideways FacetCollectors even when no hits are scored [lucene]

2023-12-08 Thread via GitHub
gsmiller merged PR #12853: URL: https://github.com/apache/lucene/pull/12853 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Ensure #finish is called on all drill-sideways FacetCollectors even when no hits are scored [lucene]

2023-12-08 Thread via GitHub
gautamworah96 commented on code in PR #12853: URL: https://github.com/apache/lucene/pull/12853#discussion_r1421069686 ## lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java: ## @@ -193,42 +204,29 @@ public BulkScorer bulkScorer(LeafReaderContext context)

Re: [PR] Ensure #finish is called on all drill-sideways FacetCollectors even when no hits are scored [lucene]

2023-12-08 Thread via GitHub
gautamworah96 commented on code in PR #12853: URL: https://github.com/apache/lucene/pull/12853#discussion_r1421077089 ## lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java: ## @@ -193,42 +204,29 @@ public BulkScorer bulkScorer(LeafReaderContext context)

Re: [I] Corruption read on term dictionaries in Lucene 9.9 [lucene]

2023-12-08 Thread via GitHub
benwtrent commented on issue #12895: URL: https://github.com/apache/lucene/issues/12895#issuecomment-1847858897 @mikemccand I have to use at a minimum: `wikibig1m` for it to replicate. Couple of weird things I noticed in that optimization PR: -

Re: [I] Reproducible failure of TestParentBlockJoinByteKnnVectorQuery.testScoringWithMultipleChildren [lucene]

2023-12-08 Thread via GitHub
zhaih commented on issue #12896: URL: https://github.com/apache/lucene/issues/12896#issuecomment-1847830605 Oh probably not, the test is just using the default merge policy (TMP) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Reproducible failure of TestParentBlockJoinByteKnnVectorQuery.testScoringWithMultipleChildren [lucene]

2023-12-08 Thread via GitHub
zhaih commented on issue #12896: URL: https://github.com/apache/lucene/issues/12896#issuecomment-1847828807 I think it might due to the same problem as: https://github.com/apache/lucene/pull/12889 e.g. a doc reorder merge policy reordered the parent child block I haven't check it

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2023-12-08 Thread via GitHub
zhaih commented on code in PR #12885: URL: https://github.com/apache/lucene/pull/12885#discussion_r1421011978 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseReadingFormFilter.java: ## @@ -43,10 +43,30 @@ public JapaneseReadingFormFilter(TokenStream

Re: [I] Corruption read on term dictionaries in Lucene 9.9 [lucene]

2023-12-08 Thread via GitHub
mikemccand commented on issue #12895: URL: https://github.com/apache/lucene/issues/12895#issuecomment-1847786318 It's also curious that it's not happening w/ 9.9 created indices. #12699 is about optimizing how we accumulate the long output while traversing (reading) the FST block tree

Re: [I] Corruption read on term dictionaries in Lucene 9.9 [lucene]

2023-12-08 Thread via GitHub
mikemccand commented on issue #12895: URL: https://github.com/apache/lucene/issues/12895#issuecomment-1847782193 Ugh -- I'll try to look at this later today. Disappointing that our back compat test specifically for reading 9.8 indices failed to catch this. -- This is an automated

[I] Reproducible failure of TestParentBlockJoinByteKnnVectorQuery.testScoringWithMultipleChildren [lucene]

2023-12-08 Thread via GitHub
gsmiller opened a new issue, #12896: URL: https://github.com/apache/lucene/issues/12896 ### Description Saw this test fail a couple times in automated builds (e.g.,

Re: [I] Corruption read on term dictionaries in Lucene 9.9 [lucene]

2023-12-08 Thread via GitHub
jpountz commented on issue #12895: URL: https://github.com/apache/lucene/issues/12895#issuecomment-1847604281 I have a 9.8 index that reproduces the bug and ran a `git bisect` to figure out the first commit that fails, it pointed to #12699. -- This is an automated message from the Apache

Re: [I] Corruption read on term dictionaries in Lucene 9.9 [lucene]

2023-12-08 Thread via GitHub
benwtrent commented on issue #12895: URL: https://github.com/apache/lucene/issues/12895#issuecomment-1847604189 Git bisect has confirmed the read corruption occurs with: https://github.com/apache/lucene/pull/12699 -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Fix NPE on off-heap test and FST is null [lucene]

2023-12-08 Thread via GitHub
gsmiller merged PR #12894: URL: https://github.com/apache/lucene/pull/12894 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Fix NPE on off-heap test and FST is null [lucene]

2023-12-08 Thread via GitHub
gsmiller commented on code in PR #12894: URL: https://github.com/apache/lucene/pull/12894#discussion_r1420835887 ## lucene/test-framework/src/java/org/apache/lucene/tests/util/fst/FSTTester.java: ## @@ -283,14 +283,17 @@ public FST doTest() throws IOException { } }

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
easyice commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420843242 ## lucene/core/src/test/org/apache/lucene/store/TestMMapDirectory.java: ## @@ -114,4 +115,31 @@ public void testNullParamsIndexInput() throws Exception { }

Re: [I] Corruption read on term dictionaries in Lucene 9.9 [lucene]

2023-12-08 Thread via GitHub
benwtrent commented on issue #12895: URL: https://github.com/apache/lucene/issues/12895#issuecomment-1847590622 Possibly related: https://github.com/apache/lucene/pull/12631 NOTE: the read corruption doesn't occur when reading from an index created in 9.9. -- This is an automated

Re: [PR] [Minor] Quick exit for non-zero slice buffers [lucene]

2023-12-08 Thread via GitHub
gsmiller merged PR #12812: URL: https://github.com/apache/lucene/pull/12812 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [Minor] Quick exit for non-zero slice buffers [lucene]

2023-12-08 Thread via GitHub
gsmiller commented on code in PR #12812: URL: https://github.com/apache/lucene/pull/12812#discussion_r1420830252 ## lucene/memory/src/java/org/apache/lucene/index/memory/MemoryIndex.java: ## @@ -179,6 +179,19 @@ static class SlicedIntBlockPool extends IntBlockPool {

Re: [PR] Mark DrillSideways#createDrillDownFacetsCollector as @Deprecated [lucene]

2023-12-08 Thread via GitHub
gsmiller merged PR #12854: URL: https://github.com/apache/lucene/pull/12854 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Remove DrillSideways#createDrillDownFacetsCollector in favor of the manager-based hook [lucene]

2023-12-08 Thread via GitHub
gsmiller merged PR #12855: URL: https://github.com/apache/lucene/pull/12855 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] Corruption read on term dictionaries in Lucene 9.9 [lucene]

2023-12-08 Thread via GitHub
benwtrent commented on issue #12895: URL: https://github.com/apache/lucene/issues/12895#issuecomment-1847568572 Here are some exceptions ran into when trying to do multi-term queries with Lucene 9.9 against an index created in 9.8 or before: ``` Caused by:

Re: [I] Corruption read on term dictionaries in Lucene 9.9 [lucene]

2023-12-08 Thread via GitHub
benwtrent commented on issue #12895: URL: https://github.com/apache/lucene/issues/12895#issuecomment-1847562169 //cc @gf2121 && @mikemccand -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
easyice commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420812950 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -324,24 +324,9 @@ private void readGroupVInt(long[] dst, int offset) throws

[PR] Fix NPE on off-heap test and FST is null [lucene]

2023-12-08 Thread via GitHub
dungba88 opened a new pull request, #12894: URL: https://github.com/apache/lucene/pull/12894 ### Description The test can throw a NPE when it's using off-heap mode and no nodes are accepted -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Add Facets#getBulkSpecificValues method (#12180) [lucene]

2023-12-08 Thread via GitHub
epotyom commented on PR #12862: URL: https://github.com/apache/lucene/pull/12862#issuecomment-1847540870 Thank you fore reviewing @mikemccand ! Resolved your comments in 2nd commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420760926 ## lucene/core/src/test/org/apache/lucene/store/TestMMapDirectory.java: ## @@ -114,4 +115,31 @@ public void testNullParamsIndexInput() throws Exception { }

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420793976 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -324,24 +324,9 @@ private void readGroupVInt(long[] dst, int offset) throws

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420793976 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -324,24 +324,9 @@ private void readGroupVInt(long[] dst, int offset) throws

Re: [PR] Add Facets#getBulkSpecificValues method (#12180) [lucene]

2023-12-08 Thread via GitHub
epotyom commented on code in PR #12862: URL: https://github.com/apache/lucene/pull/12862#discussion_r1420792687 ## lucene/facet/src/java/org/apache/lucene/facet/MultiFacets.java: ## @@ -77,6 +80,39 @@ public Number getSpecificValue(String dim, String... path) throws

Re: [PR] [Minor] Quick exit for non-zero slice buffers [lucene]

2023-12-08 Thread via GitHub
stefanvodita commented on code in PR #12812: URL: https://github.com/apache/lucene/pull/12812#discussion_r1420778257 ## lucene/memory/src/java/org/apache/lucene/index/memory/MemoryIndex.java: ## @@ -179,6 +179,19 @@ static class SlicedIntBlockPool extends IntBlockPool {

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420760926 ## lucene/core/src/test/org/apache/lucene/store/TestMMapDirectory.java: ## @@ -114,4 +115,31 @@ public void testNullParamsIndexInput() throws Exception { }

Re: [PR] Add Facets#getBulkSpecificValues method (#12180) [lucene]

2023-12-08 Thread via GitHub
epotyom commented on code in PR #12862: URL: https://github.com/apache/lucene/pull/12862#discussion_r1420764333 ## lucene/facet/src/java/org/apache/lucene/facet/MultiFacets.java: ## @@ -77,6 +80,39 @@ public Number getSpecificValue(String dim, String... path) throws

Re: [PR] Add Facets#getBulkSpecificValues method (#12180) [lucene]

2023-12-08 Thread via GitHub
epotyom commented on code in PR #12862: URL: https://github.com/apache/lucene/pull/12862#discussion_r1420764008 ## lucene/facet/src/java/org/apache/lucene/facet/MultiFacets.java: ## @@ -77,6 +80,39 @@ public Number getSpecificValue(String dim, String... path) throws

Re: [PR] Add Facets#getBulkSpecificValues method (#12180) [lucene]

2023-12-08 Thread via GitHub
epotyom commented on code in PR #12862: URL: https://github.com/apache/lucene/pull/12862#discussion_r1420763643 ## lucene/facet/src/java/org/apache/lucene/facet/LongValueFacetCounts.java: ## @@ -568,6 +568,12 @@ public Number getSpecificValue(String dim, String... path) {

Re: [PR] Add Facets#getBulkSpecificValues method (#12180) [lucene]

2023-12-08 Thread via GitHub
epotyom commented on code in PR #12862: URL: https://github.com/apache/lucene/pull/12862#discussion_r1420763238 ## lucene/CHANGES.txt: ## @@ -67,6 +67,8 @@ API Changes * GITHUB#11023: Adding -level param to CheckIndex, making the old -fast param the default behaviour.

Re: [PR] [Minor] Quick exit for non-zero slice buffers [lucene]

2023-12-08 Thread via GitHub
gsmiller commented on code in PR #12812: URL: https://github.com/apache/lucene/pull/12812#discussion_r1420739649 ## lucene/memory/src/java/org/apache/lucene/index/memory/MemoryIndex.java: ## @@ -179,6 +179,19 @@ static class SlicedIntBlockPool extends IntBlockPool {

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
easyice commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420565417 ## lucene/test-framework/src/java/org/apache/lucene/tests/store/BaseDirectoryTestCase.java: ## @@ -1438,4 +1440,68 @@ public void testListAllIsSorted() throws

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-12-08 Thread via GitHub
benwtrent commented on code in PR #12699: URL: https://github.com/apache/lucene/pull/12699#discussion_r1420733099 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -1190,4 +1176,63 @@ public void seekExact(long ord) { public

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-12-08 Thread via GitHub
benwtrent commented on code in PR #12699: URL: https://github.com/apache/lucene/pull/12699#discussion_r1420733099 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -1190,4 +1176,63 @@ public void seekExact(long ord) { public

Re: [PR] [WIP] LUCENE-10002: Deprecate FacetsCollector#search helper methods as they internally use IndexSearcher#search(Query, Collector) API [lucene]

2023-12-08 Thread via GitHub
gsmiller commented on PR #12890: URL: https://github.com/apache/lucene/pull/12890#issuecomment-1847461557 IMO we should deprecate these without replacement. I agree that users should be able to implement this logic pretty easily in their application layer, and would probably be better

Re: [PR] [Minor] Quick exit for non-zero slice buffers [lucene]

2023-12-08 Thread via GitHub
stefanvodita commented on code in PR #12812: URL: https://github.com/apache/lucene/pull/12812#discussion_r1420723711 ## lucene/memory/src/java/org/apache/lucene/index/memory/MemoryIndex.java: ## @@ -179,6 +179,19 @@ static class SlicedIntBlockPool extends IntBlockPool {

Re: [PR] [Minor] Quick exit for non-zero slice buffers [lucene]

2023-12-08 Thread via GitHub
gsmiller commented on code in PR #12812: URL: https://github.com/apache/lucene/pull/12812#discussion_r1420707913 ## lucene/memory/src/java/org/apache/lucene/index/memory/MemoryIndex.java: ## @@ -179,6 +179,19 @@ static class SlicedIntBlockPool extends IntBlockPool {

Re: [PR] [Minor] Quick exit for non-zero slice buffers [lucene]

2023-12-08 Thread via GitHub
stefanvodita commented on code in PR #12812: URL: https://github.com/apache/lucene/pull/12812#discussion_r1420699163 ## lucene/memory/src/java/org/apache/lucene/index/memory/MemoryIndex.java: ## @@ -179,6 +179,19 @@ static class SlicedIntBlockPool extends IntBlockPool {

Re: [PR] [Minor] Quick exit for non-zero slice buffers [lucene]

2023-12-08 Thread via GitHub
gsmiller commented on code in PR #12812: URL: https://github.com/apache/lucene/pull/12812#discussion_r1420685472 ## lucene/memory/src/java/org/apache/lucene/index/memory/MemoryIndex.java: ## @@ -179,6 +179,19 @@ static class SlicedIntBlockPool extends IntBlockPool {

Re: [PR] Fix position increment in (Reverse)PathHierarchyTokenizer [lucene]

2023-12-08 Thread via GitHub
lukas-vlcek commented on PR #12875: URL: https://github.com/apache/lucene/pull/12875#issuecomment-1847241112 @mikemccand Do you think you can give me some hint about? > (e.g. `UnifiedHighlighter`, in certain modes) I am looking at `TestUnifiedHighlighter*` tests. Does it mean

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420450756 ## lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420451580 ## lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420447123 ## lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
easyice commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420418454 ## lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420387856 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,34 @@ public byte readByte(long pos) throws IOException { }

[PR] Removing @lucene.experimental tags in testXXX methods in CheckIndex [lucene]

2023-12-08 Thread via GitHub
slow-J opened a new pull request, #12893: URL: https://github.com/apache/lucene/pull/12893 Following up on @mikemccand's comment in previous CheckIndex PR:https://github.com/apache/lucene/pull/12876. > I do think some of these tags in CheckIndex.java could be removed, e.g. on each

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1847102122 It looks good on `byteBuffers` and `MMapDirectory`, the benchmark result is pretty close to previous commit, but a bit slowdon on `NIOFSDirectory`, i will dig it. *

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420382794 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,48 @@ public byte readByte(long pos) throws IOException { }

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420379683 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,34 @@ public byte readByte(long pos) throws IOException { }

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420377148 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -324,24 +324,9 @@ private void readGroupVInt(long[] dst, int offset) throws

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420375351 ## lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java: ## @@ -62,4 +62,42 @@ private static long readLongInGroup(DataInput in, int numBytesMinus1)

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420373977 ## lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java: ## @@ -62,4 +62,42 @@ private static long readLongInGroup(DataInput in, int numBytesMinus1)

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420364348 ## lucene/test-framework/src/java/org/apache/lucene/tests/store/BaseDirectoryTestCase.java: ## @@ -1438,4 +1440,68 @@ public void testListAllIsSorted() throws

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1420364348 ## lucene/test-framework/src/java/org/apache/lucene/tests/store/BaseDirectoryTestCase.java: ## @@ -1438,4 +1440,68 @@ public void testListAllIsSorted() throws

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1847060669 I'm running the performance differences between previous commit, it will take a moment. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Enable CheckIndex to exorcise segments with missing segment infos (.si) (#7820) [lucene]

2023-12-08 Thread via GitHub
gokaai commented on code in PR #12872: URL: https://github.com/apache/lucene/pull/12872#discussion_r1420277534 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -957,6 +974,9 @@ private Status.SegmentInfoStatus testSegment( SegmentReader reader = null;

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1846907147 > Can we do the same for all other inputs? I think so, i will do this if @jpountz doesn't mind. > I will nag Maurizio again about the problem with slice(). Thank you

Re: [PR] Correct last remaining instances of typo e.g. "Levenstein" -> "Levenshtein" [lucene]

2023-12-08 Thread via GitHub
shaikhu commented on PR #12519: URL: https://github.com/apache/lucene/pull/12519#issuecomment-1846884628 Oops I completely forgot about this. Restored forked repo and reopening. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
uschindler commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1846883907 I will nag Maurizio again about the problem with slice(). The reason for this was some strange problem with Hotspot. I thought they fixed it. -- This is an automated message from

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
uschindler commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1846881254 I would still be safe and initialize the IntReader on construction of the IndexInput. It can strongly bind to the current segment. Can we do the same for all other inputs? --

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1846784038 +1 for gc overhead, here is the gc output (`-prof gc` ): ``` Benchmark (size) Mode Cnt Score

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
jpountz commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1846779641 I confirmed there's GC activity happening with the slice approach by using `-prof gc`: ``` Benchmark

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-08 Thread via GitHub
jpountz commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1846774383 I'll check if there is GC activity during the benchmark. In the meantime, I looked into using lambdas instead, and it seems like it would work well:

Re: [PR] [Draft] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2023-12-08 Thread via GitHub
kuramitsu commented on PR #12885: URL: https://github.com/apache/lucene/pull/12885#issuecomment-1846729364 The modification within the getRomanization function has been dropped. Instead, in the incrementToken function, I added a process to treat the hiragana OOV term converted to kataka