Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1574272996 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -642,6 +651,97 @@ public SeekStatus scanToTermLeaf(BytesRef target, boolean exactOnly) throws IOEx return SeekStatus.END; } + // Target's prefix matches this block's prefix; + // And all suffixes have the same length in this block, + // we binary search the entries check if the suffix matches. Review Comment: Done in https://github.com/apache/lucene/pull/13279. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-2033469573 Glad to know that. Thanks @mikemccand . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
mikemccand commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-2032280982 Oooh this change gave a nice pop (~5.4%, ~915 -> 964 K lookups/sec) to the primary key lookup nightly benchy: https://home.apache.org/~mikemccand/lucenebench/PKLookup.html I'll add an annotation, exciting! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1546339137 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -642,6 +651,97 @@ public SeekStatus scanToTermLeaf(BytesRef target, boolean exactOnly) throws IOEx return SeekStatus.END; } + // Target's prefix matches this block's prefix; + // And all suffixes have the same length in this block, + // we binary search the entries check if the suffix matches. Review Comment: Yes, I will do it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
mikemccand merged PR #11888: URL: https://github.com/apache/lucene/pull/11888 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
mikemccand commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-2029720633 Actually I can just re-merge your prior `CHANGES.txt` entry from [here](https://github.com/apache/lucene/pull/11888/commits/a695c07da8ccdb348c87f98e6b4be6d778d919c3), so no need to push another rev here. Thanks @vsop-479 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
mikemccand commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1546290149 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -642,6 +651,97 @@ public SeekStatus scanToTermLeaf(BytesRef target, boolean exactOnly) throws IOEx return SeekStatus.END; } + // Target's prefix matches this block's prefix; + // And all suffixes have the same length in this block, + // we binary search the entries check if the suffix matches. Review Comment: > Should we do these same changes to `scanToTermLeaf` ( maybe in a new PR)? +1, separate PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
mikemccand commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1546289740 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -642,6 +651,97 @@ public SeekStatus scanToTermLeaf(BytesRef target, boolean exactOnly) throws IOEx return SeekStatus.END; } + // Target's prefix matches this block's prefix; + // And all suffixes have the same length in this block, + // we binary search the entries check if the suffix matches. Review Comment: > By the way, should i add a CHANGES entry for this change? Oh yes please! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1543957681 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -642,6 +651,97 @@ public SeekStatus scanToTermLeaf(BytesRef target, boolean exactOnly) throws IOEx return SeekStatus.END; } + // Target's prefix matches this block's prefix; + // And all suffixes have the same length in this block, + // we binary search the entries check if the suffix matches. Review Comment: Done. > we set ste.termExists above so we could just remove this comment and the assert instead? > entries check -> entries to check? Should we do these same changes to `scanToTermLeaf` ( maybe in a new PR)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1543957681 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -642,6 +651,97 @@ public SeekStatus scanToTermLeaf(BytesRef target, boolean exactOnly) throws IOEx return SeekStatus.END; } + // Target's prefix matches this block's prefix; + // And all suffixes have the same length in this block, + // we binary search the entries check if the suffix matches. Review Comment: Done. > we set ste.termExists above so we could just remove this comment and the assert instead? > entries check -> entries to check? Should we do the same change to `scanToTermLeaf` ( maybe in another PR)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1543957681 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -642,6 +651,97 @@ public SeekStatus scanToTermLeaf(BytesRef target, boolean exactOnly) throws IOEx return SeekStatus.END; } + // Target's prefix matches this block's prefix; + // And all suffixes have the same length in this block, + // we binary search the entries check if the suffix matches. Review Comment: Done. > we set ste.termExists above so we could just remove this comment and the assert instead? > entries check -> entries to check? Should we do the same change to `scanToTermLeaf` ( may be in another PR)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1543070445 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -642,6 +651,97 @@ public SeekStatus scanToTermLeaf(BytesRef target, boolean exactOnly) throws IOEx return SeekStatus.END; } + // Target's prefix matches this block's prefix; + // And all suffixes have the same length in this block, + // we binary search the entries check if the suffix matches. Review Comment: Thanks @mikemccand , I will fix this. By the way, should i add a CHANGES entry for this change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
mikemccand commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1542769731 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -642,6 +651,97 @@ public SeekStatus scanToTermLeaf(BytesRef target, boolean exactOnly) throws IOEx return SeekStatus.END; } + // Target's prefix matches this block's prefix; + // And all suffixes have the same length in this block, + // we binary search the entries check if the suffix matches. Review Comment: `entries check` -> `entries to check`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-2024550150 Thanks for your comments @mikemccand . I have fixed them, and removed the stale change entry about this change. Please take a look when you get a chance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1542387588 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -642,6 +651,99 @@ public SeekStatus scanToTermLeaf(BytesRef target, boolean exactOnly) throws IOEx return SeekStatus.END; } + // Target's prefix matches this block's prefix; + // And all suffixes have the same length in this block, + // we binary search the entries check if the suffix matches. + public SeekStatus binarySearchTermLeaf(BytesRef target, boolean exactOnly) throws IOException { +// if (DEBUG) System.out.println("binarySearchTermLeaf: block fp=" + fp + " prefix=" + +// prefix + " +// nextEnt=" + nextEnt + " (of " + entCount + ") target=" + brToString(target) + " term=" + +// brToString(term)); + +assert nextEnt != -1; + +ste.termExists = true; +subCode = 0; + +if (nextEnt == entCount) { + if (exactOnly) { +fillTerm(); + } + return SeekStatus.END; +} + +assert prefixMatches(target); + +suffix = suffixLengthsReader.readVInt(); +// TODO early terminate when target length unequals suffix + prefix. +// But we need to keep the same status with scanToTermLeaf. +int start = nextEnt; +int end = entCount - 1; +// Binary search the entries (terms) in this leaf block: +int cmp = 0; +while (start <= end) { + int mid = (start + end) / 2; + nextEnt = mid + 1; + startBytePos = mid * suffix; + + // Binary search bytes in the suffix, comparing to the target + cmp = + Arrays.compareUnsigned( + suffixBytes, + startBytePos, + startBytePos + suffix, + target.bytes, + target.offset + prefix, + target.offset + target.length); + if (cmp < 0) { +start = mid + 1; + } else if (cmp > 0) { +end = mid - 1; + } else { +// Exact match! +suffixesReader.setPosition(startBytePos + suffix); +// This cannot be a sub-block because we +// would have followed the index to this +// sub-block from the start: +assert ste.termExists; +fillTerm(); +// if (DEBUG) System.out.println("found!"); +return SeekStatus.FOUND; + } +} + +// It is possible (and OK) that terms index pointed us +// at this block, but, we searched the entire block and +// did not find the term to position to. This happens +// when the target is after the last term in the block +// (but, before the next term in the index). EG +// target could be foozzz, and terms index pointed us +// to the foo* block, but the last term in this block +// was fooz (and, eg, first term in the next block will +// bee fop). +// if (DEBUG) System.out.println(" block end"); +SeekStatus seekStatus = end < entCount - 1 ? SeekStatus.NOT_FOUND : SeekStatus.END; +if (seekStatus == SeekStatus.NOT_FOUND) { Review Comment: Thanks @mikemccand . This makes code more clear. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1542363416 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -642,6 +651,99 @@ public SeekStatus scanToTermLeaf(BytesRef target, boolean exactOnly) throws IOEx return SeekStatus.END; } + // Target's prefix matches this block's prefix; + // And all suffixes have the same length in this block, + // we binary search the entries check if the suffix matches. + public SeekStatus binarySearchTermLeaf(BytesRef target, boolean exactOnly) throws IOException { +// if (DEBUG) System.out.println("binarySearchTermLeaf: block fp=" + fp + " prefix=" + +// prefix + " +// nextEnt=" + nextEnt + " (of " + entCount + ") target=" + brToString(target) + " term=" + +// brToString(term)); + +assert nextEnt != -1; + +ste.termExists = true; +subCode = 0; + +if (nextEnt == entCount) { + if (exactOnly) { +fillTerm(); + } + return SeekStatus.END; +} + +assert prefixMatches(target); + +suffix = suffixLengthsReader.readVInt(); +// TODO early terminate when target length unequals suffix + prefix. +// But we need to keep the same status with scanToTermLeaf. +int start = nextEnt; +int end = entCount - 1; +// Binary search the entries (terms) in this leaf block: +int cmp = 0; +while (start <= end) { + int mid = (start + end) / 2; + nextEnt = mid + 1; + startBytePos = mid * suffix; + + // Binary search bytes in the suffix, comparing to the target + cmp = + Arrays.compareUnsigned( + suffixBytes, + startBytePos, + startBytePos + suffix, + target.bytes, + target.offset + prefix, + target.offset + target.length); + if (cmp < 0) { +start = mid + 1; + } else if (cmp > 0) { +end = mid - 1; + } else { +// Exact match! +suffixesReader.setPosition(startBytePos + suffix); +// This cannot be a sub-block because we +// would have followed the index to this +// sub-block from the start: +assert ste.termExists; +fillTerm(); +// if (DEBUG) System.out.println("found!"); +return SeekStatus.FOUND; + } +} + +// It is possible (and OK) that terms index pointed us +// at this block, but, we searched the entire block and +// did not find the term to position to. This happens +// when the target is after the last term in the block +// (but, before the next term in the index). EG +// target could be foozzz, and terms index pointed us +// to the foo* block, but the last term in this block +// was fooz (and, eg, first term in the next block will +// bee fop). +// if (DEBUG) System.out.println(" block end"); +SeekStatus seekStatus = end < entCount - 1 ? SeekStatus.NOT_FOUND : SeekStatus.END; +if (seekStatus == SeekStatus.NOT_FOUND) { Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1542357624 ## lucene/core/src/test/org/apache/lucene/codecs/lucene99/TestLucene99PostingsFormat.java: ## @@ -143,4 +141,13 @@ private void doTestImpactSerialization(List impacts) throws IOException } } } + + @Override + protected void subCheckBinarySearch(TermsEnum termsEnum) throws Exception { +// 10004a matched block's entries: [11, 13, ..., 100049]. +// if target greater than the last entry of the matched block, +// termsEnum.term should be the last entry. +assertFalse(termsEnum.seekExact(new BytesRef("10004a"))); +assertEquals(termsEnum.term(), new BytesRef("100049")); Review Comment: > Is there a seekCeil based test case we can make? Yes, `seekCeil` can also omit an `AssertionError` without the fix of 7084596c1c3a62dec2614aaeb37d0954f5fbd4e2. So i used it to replace `seekExact`. Thanks @mikemccand . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1542233210 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -642,6 +651,99 @@ public SeekStatus scanToTermLeaf(BytesRef target, boolean exactOnly) throws IOEx return SeekStatus.END; } + // Target's prefix matches this block's prefix; + // And all suffixes have the same length in this block, + // we binary search the entries check if the suffix matches. + public SeekStatus binarySearchTermLeaf(BytesRef target, boolean exactOnly) throws IOException { +// if (DEBUG) System.out.println("binarySearchTermLeaf: block fp=" + fp + " prefix=" + +// prefix + " +// nextEnt=" + nextEnt + " (of " + entCount + ") target=" + brToString(target) + " term=" + +// brToString(term)); + +assert nextEnt != -1; + +ste.termExists = true; +subCode = 0; + +if (nextEnt == entCount) { + if (exactOnly) { +fillTerm(); + } + return SeekStatus.END; +} + +assert prefixMatches(target); + +suffix = suffixLengthsReader.readVInt(); +// TODO early terminate when target length unequals suffix + prefix. +// But we need to keep the same status with scanToTermLeaf. +int start = nextEnt; +int end = entCount - 1; +// Binary search the entries (terms) in this leaf block: +int cmp = 0; +while (start <= end) { + int mid = (start + end) / 2; Review Comment: Good catch. I will do it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1542231368 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -642,6 +651,99 @@ public SeekStatus scanToTermLeaf(BytesRef target, boolean exactOnly) throws IOEx return SeekStatus.END; } + // Target's prefix matches this block's prefix; + // And all suffixes have the same length in this block, + // we binary search the entries check if the suffix matches. + public SeekStatus binarySearchTermLeaf(BytesRef target, boolean exactOnly) throws IOException { +// if (DEBUG) System.out.println("binarySearchTermLeaf: block fp=" + fp + " prefix=" + +// prefix + " +// nextEnt=" + nextEnt + " (of " + entCount + ") target=" + brToString(target) + " term=" + +// brToString(term)); + +assert nextEnt != -1; + +ste.termExists = true; +subCode = 0; + +if (nextEnt == entCount) { + if (exactOnly) { +fillTerm(); + } + return SeekStatus.END; +} + +assert prefixMatches(target); + +suffix = suffixLengthsReader.readVInt(); +// TODO early terminate when target length unequals suffix + prefix. +// But we need to keep the same status with scanToTermLeaf. +int start = nextEnt; +int end = entCount - 1; +// Binary search the entries (terms) in this leaf block: +int cmp = 0; +while (start <= end) { + int mid = (start + end) / 2; + nextEnt = mid + 1; + startBytePos = mid * suffix; + + // Binary search bytes in the suffix, comparing to the target + cmp = + Arrays.compareUnsigned( + suffixBytes, + startBytePos, + startBytePos + suffix, + target.bytes, + target.offset + prefix, + target.offset + target.length); + if (cmp < 0) { +start = mid + 1; + } else if (cmp > 0) { +end = mid - 1; + } else { +// Exact match! +suffixesReader.setPosition(startBytePos + suffix); +// This cannot be a sub-block because we +// would have followed the index to this +// sub-block from the start: +assert ste.termExists; Review Comment: I will remove it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
mikemccand commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1541971598 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -642,6 +651,99 @@ public SeekStatus scanToTermLeaf(BytesRef target, boolean exactOnly) throws IOEx return SeekStatus.END; } + // Target's prefix matches this block's prefix; + // And all suffixes have the same length in this block, + // we binary search the entries check if the suffix matches. + public SeekStatus binarySearchTermLeaf(BytesRef target, boolean exactOnly) throws IOException { +// if (DEBUG) System.out.println("binarySearchTermLeaf: block fp=" + fp + " prefix=" + +// prefix + " +// nextEnt=" + nextEnt + " (of " + entCount + ") target=" + brToString(target) + " term=" + +// brToString(term)); + +assert nextEnt != -1; + +ste.termExists = true; +subCode = 0; + +if (nextEnt == entCount) { + if (exactOnly) { +fillTerm(); + } + return SeekStatus.END; +} + +assert prefixMatches(target); + +suffix = suffixLengthsReader.readVInt(); +// TODO early terminate when target length unequals suffix + prefix. +// But we need to keep the same status with scanToTermLeaf. +int start = nextEnt; +int end = entCount - 1; +// Binary search the entries (terms) in this leaf block: +int cmp = 0; +while (start <= end) { + int mid = (start + end) / 2; Review Comment: It surely won't matter for this particular binary search but can we replace the division by 2 with logical right shift `>>> 1` instead, to avoid even the appearance of the [classic binary search overflow bug](https://thebittheories.com/the-curious-case-of-binary-search-the-famous-bug-that-remained-undetected-for-20-years-973e89fc212)? ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -642,6 +651,99 @@ public SeekStatus scanToTermLeaf(BytesRef target, boolean exactOnly) throws IOEx return SeekStatus.END; } + // Target's prefix matches this block's prefix; + // And all suffixes have the same length in this block, + // we binary search the entries check if the suffix matches. + public SeekStatus binarySearchTermLeaf(BytesRef target, boolean exactOnly) throws IOException { +// if (DEBUG) System.out.println("binarySearchTermLeaf: block fp=" + fp + " prefix=" + +// prefix + " +// nextEnt=" + nextEnt + " (of " + entCount + ") target=" + brToString(target) + " term=" + +// brToString(term)); + +assert nextEnt != -1; + +ste.termExists = true; +subCode = 0; + +if (nextEnt == entCount) { + if (exactOnly) { +fillTerm(); + } + return SeekStatus.END; +} + +assert prefixMatches(target); + +suffix = suffixLengthsReader.readVInt(); +// TODO early terminate when target length unequals suffix + prefix. +// But we need to keep the same status with scanToTermLeaf. +int start = nextEnt; +int end = entCount - 1; +// Binary search the entries (terms) in this leaf block: +int cmp = 0; +while (start <= end) { + int mid = (start + end) / 2; + nextEnt = mid + 1; + startBytePos = mid * suffix; + + // Binary search bytes in the suffix, comparing to the target + cmp = + Arrays.compareUnsigned( + suffixBytes, + startBytePos, + startBytePos + suffix, + target.bytes, + target.offset + prefix, + target.offset + target.length); + if (cmp < 0) { +start = mid + 1; + } else if (cmp > 0) { +end = mid - 1; + } else { +// Exact match! +suffixesReader.setPosition(startBytePos + suffix); +// This cannot be a sub-block because we +// would have followed the index to this +// sub-block from the start: +assert ste.termExists; +fillTerm(); +// if (DEBUG) System.out.println("found!"); +return SeekStatus.FOUND; + } +} + +// It is possible (and OK) that terms index pointed us +// at this block, but, we searched the entire block and +// did not find the term to position to. This happens +// when the target is after the last term in the block +// (but, before the next term in the index). EG +// target could be foozzz, and terms index pointed us +// to the foo* block, but the last term in this block +// was fooz (and, eg, first term in the next block will +// bee fop). +// if (DEBUG) System.out.println(" block end"); +SeekStatus seekStatus = end < entCount - 1 ? SeekStatus.NOT_FOUND : SeekStatus.END; +if (seekStatus == SeekStatus.NOT_FOUND) { + // If binary search ended at the less term, and greater term exists. + // We need to advance to the greater term. + if (cmp < 0)
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
mikemccand commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1541892891 ## lucene/core/src/test/org/apache/lucene/codecs/lucene99/TestLucene99PostingsFormat.java: ## @@ -143,4 +141,13 @@ private void doTestImpactSerialization(List impacts) throws IOException } } } + + @Override + protected void subCheckBinarySearch(TermsEnum termsEnum) throws Exception { +// 10004a matched block's entries: [11, 13, ..., 100049]. +// if target greater than the last entry of the matched block, +// termsEnum.term should be the last entry. +assertFalse(termsEnum.seekExact(new BytesRef("10004a"))); +assertEquals(termsEnum.term(), new BytesRef("100049")); Review Comment: Well, I think we need to find a way to test this bug w/o abusing the API. Our tests should not violate our APIs ... Is there a `seekCeil` based test case we can make? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-2022219035 @mikemccand Thanks for your review. I measured performance on `wikimediumall`: # iter1 TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value BrowseMonthTaxoFacets 11.02 (26.9%) 10.62 (28.4%) -3.7% ( -46% - 70%) 0.727 BrowseRandomLabelSSDVFacets6.22 (9.1%)6.04 (6.5%) -2.9% ( -16% - 13%) 0.335 HighTermTitleSort 184.11 (4.0%) 180.43 (4.0%) -2.0% ( -9% -6%) 0.187 TermDTSort 214.19 (4.6%) 210.41 (4.2%) -1.8% ( -10% -7%) 0.290 HighTermMonthSort 4049.75 (3.8%) 4009.22 (6.2%) -1.0% ( -10% -9%) 0.606 OrHighMedDayTaxoFacets6.41 (6.7%)6.35 (7.5%) -0.9% ( -14% - 14%) 0.737 Prefix3 562.57 (1.3%) 558.72 (1.7%) -0.7% ( -3% -2%) 0.228 AndHighHighDayTaxoFacets 23.16 (1.5%) 23.02 (1.6%) -0.6% ( -3% -2%) 0.288 HighPhrase 49.14 (3.9%) 48.88 (3.1%) -0.5% ( -7% -6%) 0.698 MedTermDayTaxoFacets 17.63 (3.6%) 17.55 (2.9%) -0.4% ( -6% -6%) 0.718 HighSpanNear 15.30 (1.8%) 15.24 (1.5%) -0.4% ( -3% -3%) 0.544 HighTermTitleBDVSort 10.23 (2.4%) 10.19 (3.0%) -0.4% ( -5% -5%) 0.717 BrowseDayOfYearSSDVFacets6.97 (6.3%)6.95 (6.0%) -0.3% ( -11% - 12%) 0.890 Respell 73.65 (1.5%) 73.44 (2.4%) -0.3% ( -4% -3%) 0.717 HighTermDayOfYearSort 525.93 (2.5%) 524.54 (3.0%) -0.3% ( -5% -5%) 0.799 MedSpanNear 75.25 (2.6%) 75.16 (1.5%) -0.1% ( -4% -3%) 0.882 MedPhrase 62.74 (4.7%) 62.73 (2.6%) -0.0% ( -6% -7%) 0.987 LowSpanNear 10.39 (2.1%) 10.39 (1.4%)0.0% ( -3% -3%) 0.998 Fuzzy1 99.01 (1.7%) 99.07 (1.6%)0.1% ( -3% -3%) 0.923 OrNotHighHigh 576.76 (4.0%) 578.77 (4.5%)0.3% ( -7% -9%) 0.827 AndHighMedDayTaxoFacets 80.00 (1.3%) 80.29 (1.6%)0.4% ( -2% -3%) 0.511 LowPhrase 149.97 (2.9%) 150.57 (2.1%)0.4% ( -4% -5%) 0.675 OrHighLow 675.00 (2.6%) 678.06 (3.1%)0.5% ( -5% -6%) 0.674 HighIntervalsOrdered2.81 (14.0%)2.83 (11.6%)0.5% ( -22% - 30%) 0.921 AndHighLow 1027.38 (4.2%) 1032.64 (3.9%)0.5% ( -7% -8%) 0.738 Wildcard 100.84 (2.4%) 101.44 (2.8%)0.6% ( -4% -5%) 0.547 Fuzzy2 92.33 (1.5%) 92.98 (1.4%)0.7% ( -2% -3%) 0.206 MedIntervalsOrdered 13.07 (10.5%) 13.18 (9.4%)0.8% ( -17% - 23%) 0.824 BrowseMonthSSDVFacets6.91 (8.0%)6.97 (7.1%)0.9% ( -13% - 17%) 0.748 OrNotHighMed 341.39 (3.4%) 345.26 (3.2%)1.1% ( -5% -7%) 0.362 AndHighMed 155.80 (3.0%) 157.63 (3.2%)1.2% ( -4% -7%) 0.316 BrowseDayOfYearTaxoFacets7.70 (4.0%)7.79 (4.4%)1.2% ( -6% -9%) 0.450 OrHighHigh 42.26 (3.9%) 42.77 (3.6%)1.2% ( -6% -9%) 0.396 BrowseRandomLabelTaxoFacets7.17 (5.0%)7.26 (4.5%)1.2% ( -7% - 11%) 0.486 OrHighNotHigh 464.54 (4.5%) 470.46 (5.2%)1.3% ( -8% - 11%) 0.490 LowTerm 669.77 (3.4%) 678.39 (4.3%)1.3% ( -6% -9%) 0.383 OrHighMed 118.91 (3.2%) 120.47 (3.1%)1.3% ( -4% -7%) 0.270 LowIntervalsOrdered 63.73 (8.3%) 64.58 (7.7%)1.3% ( -13% - 18%) 0.657 BrowseDateTaxoFacets7.63 (3.8%)7.74 (3.8%)1.4% ( -5% -9%) 0.324 AndHighHigh 30.16 (5.2%) 30.61 (2.2%)1.5% ( -5% -9%) 0.323 OrNotHighLow 1186.21 (4.7%) 1203.99
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1540528182 ## lucene/core/src/test/org/apache/lucene/codecs/lucene99/TestLucene99PostingsFormat.java: ## @@ -143,4 +141,13 @@ private void doTestImpactSerialization(List impacts) throws IOException } } } + + @Override + protected void subCheckBinarySearch(TermsEnum termsEnum) throws Exception { +// 10004a matched block's entries: [11, 13, ..., 100049]. +// if target greater than the last entry of the matched block, +// termsEnum.term should be the last entry. +assertFalse(termsEnum.seekExact(new BytesRef("10004a"))); +assertEquals(termsEnum.term(), new BytesRef("100049")); Review Comment: > why are we testing that here :) Since there was a bug(fixed by 7084596c1c3a62dec2614aaeb37d0954f5fbd4e2) in previous implementation. So i added this test to watch it. Should i remove it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-2021791458 > Was this on wikimediumall? No, this was on `wikimedium10k`. I will measure the performance again on `wikimediumall`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
mikemccand commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1539192140 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -523,7 +526,9 @@ public void scanToSubBlock(long subFP) { // NOTE: sets startBytePos/suffix as a side effect public SeekStatus scanToTerm(BytesRef target, boolean exactOnly) throws IOException { -return isLeafBlock ? scanToTermLeaf(target, exactOnly) : scanToTermNonLeaf(target, exactOnly); +return isLeafBlock Review Comment: I know this was a pre-existing ternary :) But now we are embedding another confusing ternary inside the first one -- could we instead spell all of this out as verbose `if`? ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -568,8 +573,6 @@ public SeekStatus scanToTermLeaf(BytesRef target, boolean exactOnly) throws IOEx assert prefixMatches(target); -// TODO: binary search when all terms have the same length, which is common for ID fields, Review Comment: Aha! Another `TODO` gone, thank you @vsop-479! ## lucene/core/src/test/org/apache/lucene/codecs/lucene99/TestLucene99PostingsFormat.java: ## @@ -143,4 +141,13 @@ private void doTestImpactSerialization(List impacts) throws IOException } } } + + @Override + protected void subCheckBinarySearch(TermsEnum termsEnum) throws Exception { +// 10004a matched block's entries: [11, 13, ..., 100049]. +// if target greater than the last entry of the matched block, +// termsEnum.term should be the last entry. +assertFalse(termsEnum.seekExact(new BytesRef("10004a"))); +assertEquals(termsEnum.term(), new BytesRef("100049")); Review Comment: Hmm, when `seekExact` returns `false`, the `TermsEnum` is unpositioned and calling `.term()` (and other methods e.g. `.postings()`) is not allowed (the behavior is undefined -- it could throw an exception or corrupt its internal state or so) ... why are we testing that here :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
mikemccand commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-2020383140 I like this idea! It seems like it'd especially help primary key lookup against fixed length IDs like UUID? Hmm, the QPS in the `luceneutil` runs are way too high (1000s of QPS) to be trustworthy? Was this on `wikimediumall`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-1999256752 @jpountz I want to move `subCheckBinarySearch` to `BasePostingsFormatTestCase` to make this change forward compatible, by judging whether `IndexWriterConfig` is set DefaultPostingsFormat like this: if (TestUtil.getDefaultPostingsFormat() .getName() .equals(TestUtil.getPostingsFormat(iwc.getCodec(), "id"))) { // test target greater than the last entry of matched block, } But it won't pass if this DefaultPostingsFormat do not use `DEFAULT_MIN_BLOCK_SIZE` and `DEFAULT_MAX_BLOCK_SIZE`, such as `TestPerFieldPostingsFormat`. I also tried to set DefaultCodec to test target greater than the last entry of matched block case, like this: iwc.setCodec(TestUtil.getDefaultCodec()); `TestSTUniformSplitPostingFormat.checkEncoding` won't pass this, because it must use its own `FieldsConsumer` to set states like `blocksEncoded`. Do you have any idea about this? Can we expose `minTermBlockSize` and `maxTermBlockSize` in `LuceneXXPostingsFormat` to a `DefaultPostingsFormat`, to let user use them? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
github-actions[bot] commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-1953304146 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-1926589435 @jpountz Can we push on this change by checking whether our test case has covered all the status, that `TermsEnum.seekExact` or `TermsEnum.seekCeil` may emit? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-1914149013 @jpountz @mikemccand I resolved the conflicts, and moved the test case for target greater than the last entry of matched block from `TestLucene90PostingsFormat` to `TestLucene99PostingsFormat`. Please take a look when you get a chance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
github-actions[bot] commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-1907130580 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
jpountz commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-1881059520 @mikemccand I could use your help to review this change, it's quite deep in the guts of block tree. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
github-actions[bot] commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-1880904269 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]
vsop-479 commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-1759050886 Append some performance data. Note that the results have quite diversity from different rounds. # round1 Task QPS baseline StdDev QPS bsearch StdDev Pct diffp-value BrowseRandomLabelTaxoFacets 2601.88 (5.0%) 2437.93 (11.1%) -6.3% ( -21% - 10%) 0.021 BrowseRandomLabelSSDVFacets 1796.93 (7.1%) 1696.76 (8.0%) -5.6% ( -19% - 10%) 0.019 MedPhrase 4021.51 (6.7%) 3813.81 (7.6%) -5.2% ( -18% -9%) 0.022 HighPhrase 821.97 (7.3%) 788.46 (6.3%) -4.1% ( -16% - 10%) 0.059 HighTerm 6221.04 (7.3%) 6032.11 (5.6%) -3.0% ( -14% - 10%) 0.138 Fuzzy2 276.46 (6.0%) 268.79 (4.8%) -2.8% ( -12% -8%) 0.105 Respell 707.62 (5.9%) 692.48 (4.9%) -2.1% ( -12% -9%) 0.211 BrowseDayOfYearSSDVFacets 6517.91 (5.4%) 6392.55 (6.0%) -1.9% ( -12% - 10%) 0.287 BrowseDateSSDVFacets 2195.69 (19.2%) 2155.70 (9.7%) -1.8% ( -25% - 33%) 0.705 BrowseMonthSSDVFacets 6724.35 (6.7%) 6606.24 (6.1%) -1.8% ( -13% - 11%) 0.386 Prefix3 3023.99 (4.5%) 2974.60 (5.3%) -1.6% ( -10% -8%) 0.293 Fuzzy1 886.09 (5.1%) 877.56 (6.0%) -1.0% ( -11% - 10%) 0.587 PKLookup 396.87 (15.8%) 393.83 (19.2%) -0.8% ( -30% - 40%) 0.890 OrHighLow 3242.80 (6.0%) 3223.53 (7.2%) -0.6% ( -13% - 13%) 0.778 BrowseMonthTaxoFacets 5874.67 (6.1%) 5856.79 (5.1%) -0.3% ( -10% - 11%) 0.865 IntNRQ 2799.54 (4.7%) 2792.69 (5.4%) -0.2% ( -9% - 10%) 0.878 LowIntervalsOrdered 1336.37 (8.1%) 1333.20 (5.6%) -0.2% ( -12% - 14%) 0.914 HighSpanNear 2660.49 (6.7%) 2654.45 (6.1%) -0.2% ( -12% - 13%) 0.911 LowTerm 9965.56 (8.1%) 9961.77 (10.6%) -0.0% ( -17% - 20%) 0.990 AndHighHigh 3384.43 (7.0%) 3388.41 (11.3%)0.1% ( -16% - 19%) 0.968 HighSloppyPhrase 1984.76 (5.8%) 1988.83 (5.3%)0.2% ( -10% - 12%) 0.908 MedIntervalsOrdered 7914.54 (8.2%) 7944.02 (10.2%)0.4% ( -16% - 20%) 0.899 AndHighMed 4097.29 (7.7%) 4121.75 (8.0%)0.6% ( -14% - 17%) 0.811 LowSpanNear 5107.67 (9.3%) 5145.19 (7.6%)0.7% ( -14% - 19%) 0.785 HighTermMonthSort 3221.73 (5.2%) 3245.54 (8.5%)0.7% ( -12% - 15%) 0.739 HighIntervalsOrdered 1333.81 (7.6%) 1349.72 (5.3%)1.2% ( -10% - 15%) 0.564 LowPhrase 5029.07 (8.3%) 5091.95 (11.1%)1.3% ( -16% - 22%) 0.687 Wildcard 1327.36 (3.8%) 1346.91 (3.4%)1.5% ( -5% -9%) 0.197 AndHighLow 4382.38 (7.8%) 4447.59 (6.9%)1.5% ( -12% - 17%) 0.524 OrHighMed 3121.72 (7.2%) 3169.60 (6.4%)1.5% ( -11% - 16%) 0.478 HighTermDayOfYearSort 3766.72 (6.2%) 3825.04 (7.2%)1.5% ( -11% - 15%) 0.467 MedTerm 8666.16 (7.3%) 8841.37 (8.1%)2.0% ( -12% - 18%) 0.406 LowSloppyPhrase 3303.11 (6.5%) 3374.89 (7.8%)2.2% ( -11% - 17%) 0.341 OrHighHigh 2458.90 (7.7%) 2512.82 (5.2%)2.2% ( -9% - 16%) 0.289 BrowseDateTaxoFacets 6229.11 (5.5%) 6366.43 (5.7%)2.2% ( -8% - 14%) 0.211 BrowseDayOfYearTaxoFacets 5695.81 (6.8%) 5830.89 (6.6%)2.4% ( -10% - 16%) 0.265 MedSpanNear 3161.49 (6.8%) 3242.45 (5.4%)2.6% ( -8% - 15%) 0.186 MedSloppyPhrase 3363.11 (7.0%) 3456.66 (7.6%)2.8% ( -11% - 18%) 0.230 # round2 Task QPS baseline StdDev QPS bsearch StdDev Pct diff