[jira] [Commented] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523700#comment-17523700 ] Feng Guo commented on LUCENE-10315: --- Thanks [~ivera]! +1 to remove the int24 forutil implementation. I have updated the branch: https://github.com/apache/lucene/pull/797 > Speed up BKD leaf block ids codec by a 512 ints ForUtil > --- > > Key: LUCENE-10315 > URL: https://issues.apache.org/jira/browse/LUCENE-10315 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Feng Guo >Assignee: Feng Guo >Priority: Major > Attachments: addall.svg, cpu_profile_baseline.html, > cpu_profile_path.html > > Time Spent: 6.5h > Remaining Estimate: 0h > > Elasticsearch (which based on lucene) can automatically infers types for > users with its dynamic mapping feature. When users index some low cardinality > fields, such as gender / age / status... they often use some numbers to > represent the values, while ES will infer these fields as {{{}long{}}}, and > ES uses BKD as the index of {{long}} fields. When the data volume grows, > building the result set of low-cardinality fields will make the CPU usage and > load very high. > This is a flame graph we obtained from the production environment: > [^addall.svg] > It can be seen that almost all CPU is used in addAll. When we reindex > {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly > reduced ( We spent weeks of time to reindex all indices... ). I know that ES > recommended to use {{keyword}} for term/terms query and {{long}} for range > query in the document, but there are always some users who didn't realize > this and keep their habit of using sql database, or dynamic mapping > automatically selects the type for them. All in all, users won't realize that > there would be such a big difference in performance between {{long}} and > {{keyword}} fields in low cardinality fields. So from my point of view it > will make sense if we can make BKD works better for the low/medium > cardinality fields. > As far as i can see, for low cardinality fields, there are two advantages of > {{keyword}} over {{{}long{}}}: > 1. {{ForUtil}} used in {{keyword}} postings is much more efficient than BKD's > delta VInt, because its batch reading (readLongs) and SIMD decode. > 2. When the query term count is less than 16, {{TermsInSetQuery}} can lazily > materialize of its result set, and when another small result clause > intersects with this low cardinality condition, the low cardinality field can > avoid reading all docIds into memory. > This ISSUE is targeting to solve the first point. The basic idea is trying to > use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked this optimization > by mocking some random {{LongPoint}} and querying them with > {{PointInSetQuery}}. > *Benchmark Result* > |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff > percentage| > |1|32|1|51.44|148.26|188.22%| > |1|32|2|26.8|101.88|280.15%| > |1|32|4|14.04|53.52|281.20%| > |1|32|8|7.04|28.54|305.40%| > |1|32|16|3.54|14.61|312.71%| > |1|128|1|110.56|350.26|216.81%| > |1|128|8|16.6|89.81|441.02%| > |1|128|16|8.45|48.07|468.88%| > |1|128|32|4.2|25.35|503.57%| > |1|128|64|2.13|13.02|511.27%| > |1|1024|1|536.19|843.88|57.38%| > |1|1024|8|109.71|251.89|129.60%| > |1|1024|32|33.24|104.11|213.21%| > |1|1024|128|8.87|30.47|243.52%| > |1|1024|512|2.24|8.3|270.54%| > |1|8192|1|.33|5000|50.00%| > |1|8192|32|139.47|214.59|53.86%| > |1|8192|128|54.59|109.23|100.09%| > |1|8192|512|15.61|36.15|131.58%| > |1|8192|2048|4.11|11.14|171.05%| > |1|1048576|1|2597.4|3030.3|16.67%| > |1|1048576|32|314.96|371.75|18.03%| > |1|1048576|128|99.7|116.28|16.63%| > |1|1048576|512|30.5|37.15|21.80%| > |1|1048576|2048|10.38|12.3|18.50%| > |1|8388608|1|2564.1|3174.6|23.81%| > |1|8388608|32|196.27|238.95|21.75%| > |1|8388608|128|55.36|68.03|22.89%| > |1|8388608|512|15.58|19.24|23.49%| > |1|8388608|2048|4.56|5.71|25.22%| > The indices size is reduced for low cardinality fields and flat for high > cardinality fields. > {code:java} > 113Mindex_1_doc_32_cardinality_baseline > 114Mindex_1_doc_32_cardinality_candidate > 140Mindex_1_doc_128_cardinality_baseline > 133Mindex_1_doc_128_cardinality_candidate > 193Mindex_1_doc_1024_cardinality_baseline > 174Mindex_1_doc_1024_cardinality_candidate > 241Mindex_1_doc_8192_cardinality_baseline > 233Mindex_1_doc_8192_cardinality_candidate > 314M
[jira] [Commented] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518473#comment-17518473 ] Feng Guo commented on LUCENE-10315: --- Here is the benchmark result I got on my machine by [https://github.com/iverase/benchmark_forutil]. {code:java} Benchmark Mode Cnt Score Error Units ReadInts24Benchmark.readInts24ForUtil thrpt 25 9.086 ± 0.089 ops/us ReadInts24Benchmark.readInts24ForUtilVisitor thrpt 25 0.764 ± 0.005 ops/us ReadInts24Benchmark.readInts24Legacy thrpt 25 2.877 ± 0.013 ops/us ReadInts24Benchmark.readInts24Visitor thrpt 25 0.778 ± 0.006 ops/us ReadIntsAsLongBenchmark.readIntsLegacyLong1 thrpt 25 3.329 ± 0.023 ops/us ReadIntsAsLongBenchmark.readIntsLegacyLong2 thrpt 25 3.218 ± 0.037 ops/us ReadIntsAsLongBenchmark.readIntsLegacyLong3 thrpt 25 3.755 ± 0.017 ops/us ReadIntsAsLongBenchmark.readIntsLegacyLong4 thrpt 25 3.862 ± 0.025 ops/us ReadIntsAsLongBenchmark.readIntsLegacyLongVisitor1 thrpt 25 0.710 ± 0.008 ops/us ReadIntsAsLongBenchmark.readIntsLegacyLongVisitor2 thrpt 25 0.849 ± 0.013 ops/us ReadIntsAsLongBenchmark.readIntsLegacyLongVisitor3 thrpt 25 0.804 ± 0.006 ops/us ReadIntsAsLongBenchmark.readIntsLegacyLongVisitor4 thrpt 25 0.768 ± 0.007 ops/us ReadIntsBenchmark.readIntsForUtil thrpt 25 18.957 ± 0.194 ops/us ReadIntsBenchmark.readIntsForUtilVisitor thrpt 25 0.817 ± 0.004 ops/us ReadIntsBenchmark.readIntsLegacy thrpt 25 2.456 ± 0.016 ops/us ReadIntsBenchmark.readIntsLegacyVisitor thrpt 25 0.608 ± 0.007 ops/us {code} In this result, I'm seeing {{readInts24ForUtil}} runs 3 times faster than {{{}readInts24Legacy{}}}. This speed is attractive to me. So i'm trying to find some ways to solve the regression when calling visitor. A way i'm thinking about is to introduce {{visit(int[] docs, int count)}} for {{IntersectVisitor.}} The benefit of this method: 1. This method can help reduce the number of virtual function call. 2. {{BufferAdder}} can directly use {{System#arraycopy}} to append doc ids. 3. {{InverseIntersectVisitor}} can count cost faster. Based on luceneutil, I reproduced the regression successfully on my local machine by nightly benchmark tasks and random seed = 10: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value IntNRQ 27.43 (1.8%) 24.12 (1.1%) -12.1% ( -14% - -9%) 0.000 {code} After the optimization, I can see the speed up with the same seed: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value IntNRQ 27.68 (1.7%) 31.89 (2.0%) 15.2% ( 11% - 19%) 0.000 {code} I post the draft code here: [https://github.com/apache/lucene/pull/797]. This commit [https://github.com/apache/lucene/pull/797/commits/7fb6ac3f5901a29d87e9fa427ba429d1e1749b14] shows what was changed. > Speed up BKD leaf block ids codec by a 512 ints ForUtil > --- > > Key: LUCENE-10315 > URL: https://issues.apache.org/jira/browse/LUCENE-10315 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Feng Guo >Assignee: Feng Guo >Priority: Major > Attachments: addall.svg, cpu_profile_baseline.html, > cpu_profile_path.html > > Time Spent: 6.5h > Remaining Estimate: 0h > > Elasticsearch (which based on lucene) can automatically infers types for > users with its dynamic mapping feature. When users index some low cardinality > fields, such as gender / age / status... they often use some numbers to > represent the values, while ES will infer these fields as {{{}long{}}}, and > ES uses BKD as the index of {{long}} fields. When the data volume grows, > building the result set of low-cardinality fields will make the CPU usage and > load very high. > This is a flame graph we obtained from the production environment: > [^addall.svg] > It can be seen that almost all CPU is used in addAll. When we reindex > {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly > reduced ( We spent weeks of time to reindex all indices... ). I know that ES > recommended to use {{keyword}} for term/terms query and {{long}} for range > query in the document, but there are always some users who didn't realize > this and keep their habit of using sql database, or dynamic mapping > automatically selects the type for them. All in all, users won't realize that > there would be such a big
[jira] [Commented] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518272#comment-17518272 ] Feng Guo commented on LUCENE-10315: --- Thanks [~ivera], [~jpountz] for all effort and suggestions here! FYI, here is something interesting: I tried to change {code:java} @Benchmark public void readInts24ForUtilVisitor(IntDecodeState state, Blackhole bh) { decode24(state); for (int i = 0; i < state.count; i++) { bh.consume(state.outputInts[i]); } } {code} To {code:java} @Benchmark public void readInts24ForUtilVisitorImproved(IntDecodeState state, Blackhole bh) { decode24(state); int[] ints = state.outputInts; for (int i = 0; i < state.count; i++) { bh.consume(ints[i]); } } {code} And here is the result: {code:java} Benchmark Mode Cnt Score Error Units ReadInts24Benchmark.readInts24ForUtilVisitor thrpt 10 0.776 ± 0.012 ops/us ReadInts24Benchmark.readInts24ForUtilVisitorImproved thrpt 10 0.848 ± 0.012 ops/us ReadInts24Benchmark.readInts24Visitor thrpt 10 0.786 ± 0.006 ops/us $ java -version openjdk version "17.0.2" 2022-01-18 OpenJDK Runtime Environment (build 17.0.2+8-86) OpenJDK 64-Bit Server VM (build 17.0.2+8-86, mixed mode, sharing) {code} > Speed up BKD leaf block ids codec by a 512 ints ForUtil > --- > > Key: LUCENE-10315 > URL: https://issues.apache.org/jira/browse/LUCENE-10315 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Feng Guo >Assignee: Feng Guo >Priority: Major > Attachments: addall.svg, cpu_profile_baseline.html, > cpu_profile_path.html > > Time Spent: 6h 20m > Remaining Estimate: 0h > > Elasticsearch (which based on lucene) can automatically infers types for > users with its dynamic mapping feature. When users index some low cardinality > fields, such as gender / age / status... they often use some numbers to > represent the values, while ES will infer these fields as {{{}long{}}}, and > ES uses BKD as the index of {{long}} fields. When the data volume grows, > building the result set of low-cardinality fields will make the CPU usage and > load very high. > This is a flame graph we obtained from the production environment: > [^addall.svg] > It can be seen that almost all CPU is used in addAll. When we reindex > {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly > reduced ( We spent weeks of time to reindex all indices... ). I know that ES > recommended to use {{keyword}} for term/terms query and {{long}} for range > query in the document, but there are always some users who didn't realize > this and keep their habit of using sql database, or dynamic mapping > automatically selects the type for them. All in all, users won't realize that > there would be such a big difference in performance between {{long}} and > {{keyword}} fields in low cardinality fields. So from my point of view it > will make sense if we can make BKD works better for the low/medium > cardinality fields. > As far as i can see, for low cardinality fields, there are two advantages of > {{keyword}} over {{{}long{}}}: > 1. {{ForUtil}} used in {{keyword}} postings is much more efficient than BKD's > delta VInt, because its batch reading (readLongs) and SIMD decode. > 2. When the query term count is less than 16, {{TermsInSetQuery}} can lazily > materialize of its result set, and when another small result clause > intersects with this low cardinality condition, the low cardinality field can > avoid reading all docIds into memory. > This ISSUE is targeting to solve the first point. The basic idea is trying to > use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked this optimization > by mocking some random {{LongPoint}} and querying them with > {{PointInSetQuery}}. > *Benchmark Result* > |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff > percentage| > |1|32|1|51.44|148.26|188.22%| > |1|32|2|26.8|101.88|280.15%| > |1|32|4|14.04|53.52|281.20%| > |1|32|8|7.04|28.54|305.40%| > |1|32|16|3.54|14.61|312.71%| > |1|128|1|110.56|350.26|216.81%| > |1|128|8|16.6|89.81|441.02%| > |1|128|16|8.45|48.07|468.88%| > |1|128|32|4.2|25.35|503.57%| > |1|128|64|2.13|13.02|511.27%| > |1|1024|1|536.19|843.88|57.38%| > |1|1024|8|109.71|251.89|129.60%| > |1|1024|32|33.24|104.11|213.21%| > |1|1024|128|8.87|30.47|243.52%| > |1|1024|512|2.24|8.3|270.54%| > |1|8192|1|.33|5000|50.00%| > |1|8192|32|139.47|214.59|53.86%| > |1|8192|128|54.59|109.23|100.09%| >
[jira] [Assigned] (LUCENE-10417) IntNRQ task performance decreased in nightly benchmark
[ https://issues.apache.org/jira/browse/LUCENE-10417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo reassigned LUCENE-10417: - Assignee: Feng Guo > IntNRQ task performance decreased in nightly benchmark > -- > > Key: LUCENE-10417 > URL: https://issues.apache.org/jira/browse/LUCENE-10417 > Project: Lucene - Core > Issue Type: Bug > Components: core/codecs >Reporter: Feng Guo >Assignee: Feng Guo >Priority: Major > > Link: https://home.apache.org/~mikemccand/lucenebench/2022.02.07.18.02.48.html > Probably related to LUCENE-10315, I'll dig. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10417) IntNRQ task performance decreased in nightly benchmark
[ https://issues.apache.org/jira/browse/LUCENE-10417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10417: -- Description: Link: https://home.apache.org/~mikemccand/lucenebench/2022.02.07.18.02.48.html Probably related to LUCENE-10315, I'll dig. was: Link: https://home.apache.org/~mikemccand/lucenebench/2022.02.07.18.02.48.html Probably related to LUCENE-LUCENE-10315, I'll dig. > IntNRQ task performance decreased in nightly benchmark > -- > > Key: LUCENE-10417 > URL: https://issues.apache.org/jira/browse/LUCENE-10417 > Project: Lucene - Core > Issue Type: Bug > Components: core/codecs >Reporter: Feng Guo >Priority: Major > > Link: https://home.apache.org/~mikemccand/lucenebench/2022.02.07.18.02.48.html > Probably related to LUCENE-10315, I'll dig. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10417) IntNRQ task performance decreased in nightly benchmark
[ https://issues.apache.org/jira/browse/LUCENE-10417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10417: -- Description: Link: https://home.apache.org/~mikemccand/lucenebench/2022.02.07.18.02.48.html Probably related to LUCENE-LUCENE-10315, I'll dig. was:Probably related to LUCENE-LUCENE-10315, I'll dig. > IntNRQ task performance decreased in nightly benchmark > -- > > Key: LUCENE-10417 > URL: https://issues.apache.org/jira/browse/LUCENE-10417 > Project: Lucene - Core > Issue Type: Bug > Components: core/codecs >Reporter: Feng Guo >Priority: Major > > Link: https://home.apache.org/~mikemccand/lucenebench/2022.02.07.18.02.48.html > Probably related to LUCENE-LUCENE-10315, I'll dig. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10417) IntNRQ task performance decreased in nightly benchmark
Feng Guo created LUCENE-10417: - Summary: IntNRQ task performance decreased in nightly benchmark Key: LUCENE-10417 URL: https://issues.apache.org/jira/browse/LUCENE-10417 Project: Lucene - Core Issue Type: Bug Components: core/codecs Reporter: Feng Guo Probably related to LUCENE-LUCENE-10315, I'll dig. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10409) Improve BKDWriter's DocIdsWriter to better encode decreasing sequences of doc IDs
[ https://issues.apache.org/jira/browse/LUCENE-10409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489439#comment-17489439 ] Feng Guo commented on LUCENE-10409: --- Hi [~jpountz] ! I'm seeing this issue is marked as a {{{}Task{}}}. I'm not exactly sure what does this mean. Does it mean that someone can work on it if he is interested in this issue? Feel free to ignore me if you already plan to work on this. I just want to say that i'd like to take on this if you don't have the time :) > Improve BKDWriter's DocIdsWriter to better encode decreasing sequences of doc > IDs > - > > Key: LUCENE-10409 > URL: https://issues.apache.org/jira/browse/LUCENE-10409 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > > [~gf2121] recently improved DocIdsWriter for the case when doc IDs are dense > and come in the same order as values via the CONTINUOUS_IDS and BITSET_IDS > encodings. > We could do the same for the case when doc IDs come in the opposite order to > values. This would be used whenever searching on a field that is used for > index sorting in the descending order. This would be a frequent case for > Elasticsearch users as we're planning on using index sorting more and more on > time-based data with a descending sort on the timestamp as the last sort > field. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10410) Add some more tests for legacy encoding logic in DocIdsWriter
[ https://issues.apache.org/jira/browse/LUCENE-10410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo resolved LUCENE-10410. --- Fix Version/s: 9.1 Resolution: Fixed > Add some more tests for legacy encoding logic in DocIdsWriter > - > > Key: LUCENE-10410 > URL: https://issues.apache.org/jira/browse/LUCENE-10410 > Project: Lucene - Core > Issue Type: Test > Components: core/codecs >Reporter: Feng Guo >Assignee: Feng Guo >Priority: Trivial > Fix For: 9.1 > > Time Spent: 50m > Remaining Estimate: 0h > > This is a follow-up 0f LUCENE-10315, add some more tests for legacy encoding > logic in DocIdsWriter. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10409) Improve BKDWriter's DocIdsWriter to better encode decreasing sequences of doc IDs
[ https://issues.apache.org/jira/browse/LUCENE-10409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488607#comment-17488607 ] Feng Guo edited comment on LUCENE-10409 at 2/8/22, 8:15 AM: +1, Great idea! was (Author: gf2121): +1, Great idea! I'd like to take on this if you agree. > Improve BKDWriter's DocIdsWriter to better encode decreasing sequences of doc > IDs > - > > Key: LUCENE-10409 > URL: https://issues.apache.org/jira/browse/LUCENE-10409 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > > [~gf2121] recently improved DocIdsWriter for the case when doc IDs are dense > and come in the same order as values via the CONTINUOUS_IDS and BITSET_IDS > encodings. > We could do the same for the case when doc IDs come in the opposite order to > values. This would be used whenever searching on a field that is used for > index sorting in the descending order. This would be a frequent case for > Elasticsearch users as we're planning on using index sorting more and more on > time-based data with a descending sort on the timestamp as the last sort > field. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-10410) Add some more tests for legacy encoding logic in DocIdsWriter
[ https://issues.apache.org/jira/browse/LUCENE-10410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo reassigned LUCENE-10410: - Assignee: Feng Guo > Add some more tests for legacy encoding logic in DocIdsWriter > - > > Key: LUCENE-10410 > URL: https://issues.apache.org/jira/browse/LUCENE-10410 > Project: Lucene - Core > Issue Type: Test > Components: core/codecs >Reporter: Feng Guo >Assignee: Feng Guo >Priority: Trivial > Time Spent: 10m > Remaining Estimate: 0h > > This is a follow-up 0f LUCENE-10315, add some more tests for legacy encoding > logic in DocIdsWriter. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10409) Improve BKDWriter's DocIdsWriter to better encode decreasing sequences of doc IDs
[ https://issues.apache.org/jira/browse/LUCENE-10409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488607#comment-17488607 ] Feng Guo commented on LUCENE-10409: --- +1, Great idea! I'd like to take on this if you agree. > Improve BKDWriter's DocIdsWriter to better encode decreasing sequences of doc > IDs > - > > Key: LUCENE-10409 > URL: https://issues.apache.org/jira/browse/LUCENE-10409 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > > [~gf2121] recently improved DocIdsWriter for the case when doc IDs are dense > and come in the same order as values via the CONTINUOUS_IDS and BITSET_IDS > encodings. > We could do the same for the case when doc IDs come in the opposite order to > values. This would be used whenever searching on a field that is used for > index sorting in the descending order. This would be a frequent case for > Elasticsearch users as we're planning on using index sorting more and more on > time-based data with a descending sort on the timestamp as the last sort > field. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10410) Add some more tests for legacy encoding logic in DocIdsWriter
Feng Guo created LUCENE-10410: - Summary: Add some more tests for legacy encoding logic in DocIdsWriter Key: LUCENE-10410 URL: https://issues.apache.org/jira/browse/LUCENE-10410 Project: Lucene - Core Issue Type: Test Components: core/codecs Reporter: Feng Guo This is a follow-up 0f LUCENE-10315, add some more tests for legacy encoding logic in DocIdsWriter. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo resolved LUCENE-10315. --- Fix Version/s: 9.1 Resolution: Fixed > Speed up BKD leaf block ids codec by a 512 ints ForUtil > --- > > Key: LUCENE-10315 > URL: https://issues.apache.org/jira/browse/LUCENE-10315 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Feng Guo >Assignee: Feng Guo >Priority: Major > Fix For: 9.1 > > Attachments: addall.svg > > Time Spent: 6h 20m > Remaining Estimate: 0h > > Elasticsearch (which based on lucene) can automatically infers types for > users with its dynamic mapping feature. When users index some low cardinality > fields, such as gender / age / status... they often use some numbers to > represent the values, while ES will infer these fields as {{{}long{}}}, and > ES uses BKD as the index of {{long}} fields. When the data volume grows, > building the result set of low-cardinality fields will make the CPU usage and > load very high. > This is a flame graph we obtained from the production environment: > [^addall.svg] > It can be seen that almost all CPU is used in addAll. When we reindex > {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly > reduced ( We spent weeks of time to reindex all indices... ). I know that ES > recommended to use {{keyword}} for term/terms query and {{long}} for range > query in the document, but there are always some users who didn't realize > this and keep their habit of using sql database, or dynamic mapping > automatically selects the type for them. All in all, users won't realize that > there would be such a big difference in performance between {{long}} and > {{keyword}} fields in low cardinality fields. So from my point of view it > will make sense if we can make BKD works better for the low/medium > cardinality fields. > As far as i can see, for low cardinality fields, there are two advantages of > {{keyword}} over {{{}long{}}}: > 1. {{ForUtil}} used in {{keyword}} postings is much more efficient than BKD's > delta VInt, because its batch reading (readLongs) and SIMD decode. > 2. When the query term count is less than 16, {{TermsInSetQuery}} can lazily > materialize of its result set, and when another small result clause > intersects with this low cardinality condition, the low cardinality field can > avoid reading all docIds into memory. > This ISSUE is targeting to solve the first point. The basic idea is trying to > use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked this optimization > by mocking some random {{LongPoint}} and querying them with > {{PointInSetQuery}}. > *Benchmark Result* > |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff > percentage| > |1|32|1|51.44|148.26|188.22%| > |1|32|2|26.8|101.88|280.15%| > |1|32|4|14.04|53.52|281.20%| > |1|32|8|7.04|28.54|305.40%| > |1|32|16|3.54|14.61|312.71%| > |1|128|1|110.56|350.26|216.81%| > |1|128|8|16.6|89.81|441.02%| > |1|128|16|8.45|48.07|468.88%| > |1|128|32|4.2|25.35|503.57%| > |1|128|64|2.13|13.02|511.27%| > |1|1024|1|536.19|843.88|57.38%| > |1|1024|8|109.71|251.89|129.60%| > |1|1024|32|33.24|104.11|213.21%| > |1|1024|128|8.87|30.47|243.52%| > |1|1024|512|2.24|8.3|270.54%| > |1|8192|1|.33|5000|50.00%| > |1|8192|32|139.47|214.59|53.86%| > |1|8192|128|54.59|109.23|100.09%| > |1|8192|512|15.61|36.15|131.58%| > |1|8192|2048|4.11|11.14|171.05%| > |1|1048576|1|2597.4|3030.3|16.67%| > |1|1048576|32|314.96|371.75|18.03%| > |1|1048576|128|99.7|116.28|16.63%| > |1|1048576|512|30.5|37.15|21.80%| > |1|1048576|2048|10.38|12.3|18.50%| > |1|8388608|1|2564.1|3174.6|23.81%| > |1|8388608|32|196.27|238.95|21.75%| > |1|8388608|128|55.36|68.03|22.89%| > |1|8388608|512|15.58|19.24|23.49%| > |1|8388608|2048|4.56|5.71|25.22%| > The indices size is reduced for low cardinality fields and flat for high > cardinality fields. > {code:java} > 113Mindex_1_doc_32_cardinality_baseline > 114Mindex_1_doc_32_cardinality_candidate > 140Mindex_1_doc_128_cardinality_baseline > 133Mindex_1_doc_128_cardinality_candidate > 193Mindex_1_doc_1024_cardinality_baseline > 174Mindex_1_doc_1024_cardinality_candidate > 241Mindex_1_doc_8192_cardinality_baseline > 233Mindex_1_doc_8192_cardinality_candidate > 314Mindex_1_doc_1048576_cardinality_baseline > 315Mindex_1_doc_1048576_cardinality_candidate > 392M
[jira] [Assigned] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo reassigned LUCENE-10315: - Assignee: Feng Guo > Speed up BKD leaf block ids codec by a 512 ints ForUtil > --- > > Key: LUCENE-10315 > URL: https://issues.apache.org/jira/browse/LUCENE-10315 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Feng Guo >Assignee: Feng Guo >Priority: Major > Attachments: addall.svg > > Time Spent: 6h 10m > Remaining Estimate: 0h > > Elasticsearch (which based on lucene) can automatically infers types for > users with its dynamic mapping feature. When users index some low cardinality > fields, such as gender / age / status... they often use some numbers to > represent the values, while ES will infer these fields as {{{}long{}}}, and > ES uses BKD as the index of {{long}} fields. When the data volume grows, > building the result set of low-cardinality fields will make the CPU usage and > load very high. > This is a flame graph we obtained from the production environment: > [^addall.svg] > It can be seen that almost all CPU is used in addAll. When we reindex > {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly > reduced ( We spent weeks of time to reindex all indices... ). I know that ES > recommended to use {{keyword}} for term/terms query and {{long}} for range > query in the document, but there are always some users who didn't realize > this and keep their habit of using sql database, or dynamic mapping > automatically selects the type for them. All in all, users won't realize that > there would be such a big difference in performance between {{long}} and > {{keyword}} fields in low cardinality fields. So from my point of view it > will make sense if we can make BKD works better for the low/medium > cardinality fields. > As far as i can see, for low cardinality fields, there are two advantages of > {{keyword}} over {{{}long{}}}: > 1. {{ForUtil}} used in {{keyword}} postings is much more efficient than BKD's > delta VInt, because its batch reading (readLongs) and SIMD decode. > 2. When the query term count is less than 16, {{TermsInSetQuery}} can lazily > materialize of its result set, and when another small result clause > intersects with this low cardinality condition, the low cardinality field can > avoid reading all docIds into memory. > This ISSUE is targeting to solve the first point. The basic idea is trying to > use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked this optimization > by mocking some random {{LongPoint}} and querying them with > {{PointInSetQuery}}. > *Benchmark Result* > |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff > percentage| > |1|32|1|51.44|148.26|188.22%| > |1|32|2|26.8|101.88|280.15%| > |1|32|4|14.04|53.52|281.20%| > |1|32|8|7.04|28.54|305.40%| > |1|32|16|3.54|14.61|312.71%| > |1|128|1|110.56|350.26|216.81%| > |1|128|8|16.6|89.81|441.02%| > |1|128|16|8.45|48.07|468.88%| > |1|128|32|4.2|25.35|503.57%| > |1|128|64|2.13|13.02|511.27%| > |1|1024|1|536.19|843.88|57.38%| > |1|1024|8|109.71|251.89|129.60%| > |1|1024|32|33.24|104.11|213.21%| > |1|1024|128|8.87|30.47|243.52%| > |1|1024|512|2.24|8.3|270.54%| > |1|8192|1|.33|5000|50.00%| > |1|8192|32|139.47|214.59|53.86%| > |1|8192|128|54.59|109.23|100.09%| > |1|8192|512|15.61|36.15|131.58%| > |1|8192|2048|4.11|11.14|171.05%| > |1|1048576|1|2597.4|3030.3|16.67%| > |1|1048576|32|314.96|371.75|18.03%| > |1|1048576|128|99.7|116.28|16.63%| > |1|1048576|512|30.5|37.15|21.80%| > |1|1048576|2048|10.38|12.3|18.50%| > |1|8388608|1|2564.1|3174.6|23.81%| > |1|8388608|32|196.27|238.95|21.75%| > |1|8388608|128|55.36|68.03|22.89%| > |1|8388608|512|15.58|19.24|23.49%| > |1|8388608|2048|4.56|5.71|25.22%| > The indices size is reduced for low cardinality fields and flat for high > cardinality fields. > {code:java} > 113Mindex_1_doc_32_cardinality_baseline > 114Mindex_1_doc_32_cardinality_candidate > 140Mindex_1_doc_128_cardinality_baseline > 133Mindex_1_doc_128_cardinality_candidate > 193Mindex_1_doc_1024_cardinality_baseline > 174Mindex_1_doc_1024_cardinality_candidate > 241Mindex_1_doc_8192_cardinality_baseline > 233Mindex_1_doc_8192_cardinality_candidate > 314Mindex_1_doc_1048576_cardinality_baseline > 315Mindex_1_doc_1048576_cardinality_candidate > 392Mindex_1_doc_8388608_cardinality_baseline > 391M
[jira] [Updated] (LUCENE-10388) Remove MultiLevelSkipListReader#SkipBuffer
[ https://issues.apache.org/jira/browse/LUCENE-10388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10388: -- Fix Version/s: 9.1 Affects Version/s: 9.1 > Remove MultiLevelSkipListReader#SkipBuffer > -- > > Key: LUCENE-10388 > URL: https://issues.apache.org/jira/browse/LUCENE-10388 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: 9.1 >Reporter: Feng Guo >Assignee: Feng Guo >Priority: Minor > Fix For: 9.1 > > Time Spent: 40m > Remaining Estimate: 0h > > Previous talk can be found in [https://github.com/apache/lucene/pull/592] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10388) Remove MultiLevelSkipListReader#SkipBuffer
[ https://issues.apache.org/jira/browse/LUCENE-10388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo resolved LUCENE-10388. --- Resolution: Fixed > Remove MultiLevelSkipListReader#SkipBuffer > -- > > Key: LUCENE-10388 > URL: https://issues.apache.org/jira/browse/LUCENE-10388 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: 9.1 >Reporter: Feng Guo >Assignee: Feng Guo >Priority: Minor > Fix For: 9.1 > > Time Spent: 40m > Remaining Estimate: 0h > > Previous talk can be found in [https://github.com/apache/lucene/pull/592] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-10388) Remove MultiLevelSkipListReader#SkipBuffer
[ https://issues.apache.org/jira/browse/LUCENE-10388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo reassigned LUCENE-10388: - Assignee: Feng Guo > Remove MultiLevelSkipListReader#SkipBuffer > -- > > Key: LUCENE-10388 > URL: https://issues.apache.org/jira/browse/LUCENE-10388 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Assignee: Feng Guo >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Previous talk can be found in [https://github.com/apache/lucene/pull/592] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-10387) Clean unused lastPayloadByteUpto in Lucene90SkipWriter
[ https://issues.apache.org/jira/browse/LUCENE-10387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo reassigned LUCENE-10387: - Fix Version/s: 10.0 (main) Affects Version/s: 10.0 (main) Assignee: Feng Guo > Clean unused lastPayloadByteUpto in Lucene90SkipWriter > -- > > Key: LUCENE-10387 > URL: https://issues.apache.org/jira/browse/LUCENE-10387 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: 10.0 (main) >Reporter: Feng Guo >Assignee: Feng Guo >Priority: Trivial > Fix For: 10.0 (main) > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10387) Clean unused lastPayloadByteUpto in Lucene90SkipWriter
[ https://issues.apache.org/jira/browse/LUCENE-10387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo resolved LUCENE-10387. --- Resolution: Fixed > Clean unused lastPayloadByteUpto in Lucene90SkipWriter > -- > > Key: LUCENE-10387 > URL: https://issues.apache.org/jira/browse/LUCENE-10387 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: 10.0 (main) >Reporter: Feng Guo >Assignee: Feng Guo >Priority: Trivial > Fix For: 10.0 (main) > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10388) Remove MultiLevelSkipListReader#SkipBuffer
Feng Guo created LUCENE-10388: - Summary: Remove MultiLevelSkipListReader#SkipBuffer Key: LUCENE-10388 URL: https://issues.apache.org/jira/browse/LUCENE-10388 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Feng Guo Previous talk can be found in [https://github.com/apache/lucene/pull/592] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10387) Clean unused lastPayloadByteUpto in Lucene90SkipWriter
Feng Guo created LUCENE-10387: - Summary: Clean unused lastPayloadByteUpto in Lucene90SkipWriter Key: LUCENE-10387 URL: https://issues.apache.org/jira/browse/LUCENE-10387 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Feng Guo -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10376) Roll up the loop in vint/vlong in DataInput
Feng Guo created LUCENE-10376: - Summary: Roll up the loop in vint/vlong in DataInput Key: LUCENE-10376 URL: https://issues.apache.org/jira/browse/LUCENE-10376 Project: Lucene - Core Issue Type: Improvement Components: core/store Reporter: Feng Guo This issue proposes to roll up the loop in {{{}DataInput#readVInt and {{DataInput#readVLong{}}}{}}}. Previous talk can be found here: [https://github.com/apache/lucene/pull/592.] Benchmark: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseMonthTaxoFacets5.17 (15.9%)5.00 (12.1%) -3.4% ( -27% - 29%) 0.446 OrNotHighLow 1010.74 (4.0%) 978.71 (4.6%) -3.2% ( -11% -5%) 0.021 HighPhrase 171.95 (3.6%) 166.92 (4.6%) -2.9% ( -10% -5%) 0.025 AndHighLow 594.12 (4.2%) 577.24 (5.4%) -2.8% ( -11% -7%) 0.064 OrHighLow 540.46 (4.1%) 526.17 (5.4%) -2.6% ( -11% -7%) 0.083 OrHighMedDayTaxoFacets6.01 (5.3%)5.88 (3.9%) -2.2% ( -10% -7%) 0.136 AndHighMedDayTaxoFacets 14.78 (2.6%) 14.51 (2.1%) -1.8% ( -6% -2%) 0.013 MedPhrase 142.26 (2.9%) 139.67 (3.1%) -1.8% ( -7% -4%) 0.058 LowPhrase 21.22 (2.8%) 20.85 (3.1%) -1.8% ( -7% -4%) 0.061 AndHighHighDayTaxoFacets4.31 (4.5%)4.24 (3.2%) -1.7% ( -8% -6%) 0.158 BrowseDayOfYearTaxoFacets4.70 (17.3%)4.63 (12.9%) -1.3% ( -26% - 34%) 0.787 BrowseDateTaxoFacets4.65 (16.9%)4.59 (12.9%) -1.2% ( -26% - 34%) 0.803 MedSloppyPhrase 34.40 (2.9%) 34.02 (4.0%) -1.1% ( -7% -5%) 0.318 MedTermDayTaxoFacets 13.85 (6.7%) 13.70 (4.5%) -1.0% ( -11% - 10%) 0.563 BrowseRandomLabelTaxoFacets4.16 (12.7%)4.11 (9.7%) -1.0% ( -20% - 24%) 0.772 LowSloppyPhrase5.77 (2.2%)5.72 (3.3%) -0.9% ( -6% -4%) 0.307 LowSpanNear 53.67 (3.6%) 53.22 (3.9%) -0.8% ( -8% -6%) 0.481 HighSpanNear2.66 (4.8%)2.63 (5.4%) -0.8% ( -10% -9%) 0.616 MedIntervalsOrdered 25.88 (9.4%) 25.68 (9.5%) -0.8% ( -17% - 20%) 0.797 OrHighNotHigh 1043.34 (3.7%) 1037.43 (4.4%) -0.6% ( -8% -7%) 0.658 HighSloppyPhrase1.47 (3.4%)1.46 (4.2%) -0.6% ( -7% -7%) 0.645 MedSpanNear 11.52 (3.5%) 11.46 (4.3%) -0.5% ( -7% -7%) 0.685 OrNotHighHigh 1615.92 (3.4%) 1608.09 (3.6%) -0.5% ( -7% -6%) 0.663 BrowseRandomLabelSSDVFacets3.11 (6.0%)3.10 (4.4%) -0.2% ( -10% - 10%) 0.881 LowIntervalsOrdered4.06 (8.9%)4.06 (8.9%) -0.2% ( -16% - 19%) 0.957 OrHighNotMed 1188.76 (3.8%) 1187.46 (4.4%) -0.1% ( -7% -8%) 0.933 OrNotHighMed 1220.26 (3.1%) 1219.23 (3.7%) -0.1% ( -6% -6%) 0.938 AndHighMed 115.92 (3.6%) 116.03 (3.3%)0.1% ( -6% -7%) 0.928 Fuzzy1 111.98 (3.2%) 112.15 (3.5%)0.1% ( -6% -7%) 0.889 HighIntervalsOrdered5.14 (7.5%)5.15 (7.3%)0.2% ( -13% - 16%) 0.937 OrHighNotLow 1222.80 (4.1%) 1226.76 (4.7%)0.3% ( -8% -9%) 0.817 TermDTSort 51.02 (14.1%) 51.21 (18.9%)0.4% ( -28% - 38%) 0.944 HighTerm 1570.53 (3.7%) 1578.45 (4.4%)0.5% ( -7% -8%) 0.693 BrowseDayOfYearSSDVFacets4.26 (3.9%)4.28 (9.1%)0.5% ( -12% - 14%) 0.811 AndHighHigh 40.61 (4.1%) 40.83 (4.1%)0.5% ( -7% -9%) 0.681 MedTerm 2002.17 (3.6%) 2013.12 (4.3%)0.5% ( -7% -8%) 0.659 Respell 67.74 (3.8%) 68.14 (3.3%)0.6% ( -6% -8%) 0.594 LowTerm 1633.26 (2.8%) 1643.86 (2.6%)0.6% ( -4% -6%) 0.444 OrHighMed
[jira] [Updated] (LUCENE-10372) Performance of TaxoFacets in Nightly benchmark decreased
[ https://issues.apache.org/jira/browse/LUCENE-10372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10372: -- Description: link: https://home.apache.org/~mikemccand/lucenebench/2022.01.10.18.03.12.html {code:java} BrowseDayOfYearTaxoFacets 7.6 (12.1%) 6.3 (26.7%) 0.8 X 0.010 BrowseDateTaxoFacets7.6 (11.9%) 6.3 (26.4%) 0.8 X 0.010 BrowseRandomLabelTaxoFacets 6.4 (7.5%) 5.7 (16.2%) 0.9 X 0.004 BrowseMonthTaxoFacets 6.6 (2.6%) 8.2 (87.0%) 1.2 X 0.218 {code} I'm not sure why but it should be related to https://issues.apache.org/jira/browse/LUCENE-10350, I'll raise a PR revert it. was: link: https://home.apache.org/~mikemccand/lucenebench/2022.01.10.18.03.12.html {code:java} BrowseDayOfYearTaxoFacets 7.6 (12.1%) 6.3 (26.7%) 0.8 X 0.010 BrowseDateTaxoFacets7.6 (11.9%) 6.3 (26.4%) 0.8 X 0.010 BrowseRandomLabelTaxoFacets 6.4 (7.5%) 5.7 (16.2%) 0.9 X 0.004 BrowseMonthTaxoFacets 6.6 (2.6%) 8.2 (87.0%) 1.2 X 0.218 {code} I'm not sure why but it should be related to https://issues.apache.org/jira/browse/LUCENE-10350, I'll raise a PR revert it. > Performance of TaxoFacets in Nightly benchmark decreased > > > Key: LUCENE-10372 > URL: https://issues.apache.org/jira/browse/LUCENE-10372 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Feng Guo >Priority: Major > > link: https://home.apache.org/~mikemccand/lucenebench/2022.01.10.18.03.12.html > {code:java} > BrowseDayOfYearTaxoFacets 7.6 (12.1%) 6.3 (26.7%) 0.8 X 0.010 > BrowseDateTaxoFacets 7.6 (11.9%) 6.3 (26.4%) 0.8 X 0.010 > BrowseRandomLabelTaxoFacets 6.4 (7.5%) 5.7 (16.2%) 0.9 X 0.004 > BrowseMonthTaxoFacets 6.6 (2.6%) 8.2 (87.0%) 1.2 X 0.218 > {code} > I'm not sure why but it should be related to > https://issues.apache.org/jira/browse/LUCENE-10350, I'll raise a PR revert it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10372) Performance of TaxoFacets in Nightly benchmark decreased
Feng Guo created LUCENE-10372: - Summary: Performance of TaxoFacets in Nightly benchmark decreased Key: LUCENE-10372 URL: https://issues.apache.org/jira/browse/LUCENE-10372 Project: Lucene - Core Issue Type: Improvement Reporter: Feng Guo link: https://home.apache.org/~mikemccand/lucenebench/2022.01.10.18.03.12.html {code:java} BrowseDayOfYearTaxoFacets 7.6 (12.1%) 6.3 (26.7%) 0.8 X 0.010 BrowseDateTaxoFacets7.6 (11.9%) 6.3 (26.4%) 0.8 X 0.010 BrowseRandomLabelTaxoFacets 6.4 (7.5%) 5.7 (16.2%) 0.9 X 0.004 BrowseMonthTaxoFacets 6.6 (2.6%) 8.2 (87.0%) 1.2 X 0.218 {code} I'm not sure why but it should be related to https://issues.apache.org/jira/browse/LUCENE-10350, I'll raise a PR revert it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10366) Reduce the number of valid checks for ByteBufferIndexInput#readVInt
Feng Guo created LUCENE-10366: - Summary: Reduce the number of valid checks for ByteBufferIndexInput#readVInt Key: LUCENE-10366 URL: https://issues.apache.org/jira/browse/LUCENE-10366 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Feng Guo Today, we do not rewrite {{#readVInt}} and {{#readVLong}} for {{ByteBufferIndexInput}}. By default, the logic will call {{#readByte}} several times, and we need to check whether ByteBuffer is valid every time. This may not be necessary as we just need a final check. {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseDayOfYearSSDVFacets 16.74 (17.3%) 15.91 (12.3%) -5.0% ( -29% - 29%) 0.295 MedTermDayTaxoFacets 27.01 (6.9%) 26.56 (5.9%) -1.7% ( -13% - 11%) 0.402 Wildcard 111.55 (8.1%) 109.67 (7.6%) -1.7% ( -16% - 15%) 0.499 Respell 58.06 (2.6%) 57.20 (2.6%) -1.5% ( -6% -3%) 0.074 OrHighMedDayTaxoFacets8.91 (4.7%)8.81 (7.2%) -1.1% ( -12% - 11%) 0.557 Fuzzy1 117.17 (3.8%) 116.14 (3.3%) -0.9% ( -7% -6%) 0.437 Fuzzy2 103.70 (3.2%) 102.82 (4.3%) -0.9% ( -8% -6%) 0.472 HighIntervalsOrdered 10.11 (7.9%) 10.05 (7.4%) -0.6% ( -14% - 15%) 0.797 HighTermDayOfYearSort 183.18 (8.8%) 182.92 (10.8%) -0.1% ( -18% - 21%) 0.964 AndHighHighDayTaxoFacets 11.44 (3.8%) 11.43 (3.1%) -0.1% ( -6% -7%) 0.936 Prefix3 161.90 (13.5%) 161.80 (13.3%) -0.1% ( -23% - 30%) 0.989 HighSpanNear 11.43 (4.8%) 11.45 (4.2%)0.1% ( -8% -9%) 0.928 PKLookup 220.15 (3.3%) 220.69 (6.2%)0.2% ( -8% - 10%) 0.874 MedSpanNear 92.60 (4.0%) 93.11 (3.7%)0.5% ( -6% -8%) 0.656 TermDTSort 143.26 (9.0%) 144.14 (10.9%)0.6% ( -17% - 22%) 0.847 MedIntervalsOrdered 63.74 (6.6%) 64.21 (6.1%)0.8% ( -11% - 14%) 0.707 HighTermTitleBDVSort 99.61 (9.1%) 100.49 (12.4%)0.9% ( -18% - 24%) 0.796 LowSpanNear 126.43 (3.6%) 127.61 (3.2%)0.9% ( -5% -8%) 0.383 LowIntervalsOrdered 12.45 (5.4%) 12.58 (5.2%)1.0% ( -9% - 12%) 0.535 LowTerm 1767.08 (3.7%) 1788.83 (3.1%)1.2% ( -5% -8%) 0.257 HighSloppyPhrase 11.45 (7.0%) 11.61 (7.1%)1.5% ( -11% - 16%) 0.515 AndHighMedDayTaxoFacets 69.41 (3.7%) 70.46 (2.8%)1.5% ( -4% -8%) 0.147 BrowseRandomLabelSSDVFacets 10.85 (6.1%) 11.04 (5.1%)1.7% ( -9% - 13%) 0.342 MedTerm 2083.04 (5.3%) 2119.48 (5.7%)1.7% ( -8% - 13%) 0.316 LowSloppyPhrase 148.79 (3.6%) 151.76 (3.2%)2.0% ( -4% -9%) 0.062 HighPhrase 98.67 (3.4%) 100.80 (3.5%)2.2% ( -4% -9%) 0.048 OrHighNotLow 1371.31 (7.1%) 1400.91 (7.9%)2.2% ( -12% - 18%) 0.365 BrowseMonthTaxoFacets 16.65 (11.6%) 17.03 (13.1%)2.2% ( -20% - 30%) 0.565 OrHighNotHigh 1267.37 (6.8%) 1297.42 (8.9%)2.4% ( -12% - 19%) 0.344 MedSloppyPhrase 39.35 (3.6%) 40.42 (4.2%)2.7% ( -4% - 10%) 0.028 OrNotHighHigh 1190.01 (6.6%) 1224.72 (7.6%)2.9% ( -10% - 18%) 0.194 OrHighHigh 37.72 (4.3%) 39.00 (3.4%)3.4% ( -4% - 11%) 0.005 AndHighHigh 92.46 (4.5%) 95.76 (4.9%)3.6% ( -5% - 13%) 0.017 OrHighNotMed 1231.31 (6.3%) 1275.65 (7.9%)3.6% ( -9% - 18%) 0.109 OrHighMed 174.32 (3.8%) 181.43 (2.9%)4.1% ( -2% - 11%) 0.000 AndHighLow 2761.91 (10.7%) 2885.28 (10.1%)4.5% ( -14% - 28%) 0.175 MedPhrase 214.87 (4.9%) 224.55 (4.8%)4.5% ( -4% - 14%) 0.003 LowPhrase 333.03
[jira] [Updated] (LUCENE-10355) Remove EMPTY LongValues in favor of LongValues#ZERO
[ https://issues.apache.org/jira/browse/LUCENE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10355: -- Description: Remove EMPTY LongValues in favor of LongValues#ZEROS > Remove EMPTY LongValues in favor of LongValues#ZERO > --- > > Key: LUCENE-10355 > URL: https://issues.apache.org/jira/browse/LUCENE-10355 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Trivial > > Remove EMPTY LongValues in favor of LongValues#ZEROS -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10355) Remove EMPTY LongValues in favor of LongValues#ZERO
[ https://issues.apache.org/jira/browse/LUCENE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10355: -- Component/s: core/codecs > Remove EMPTY LongValues in favor of LongValues#ZERO > --- > > Key: LUCENE-10355 > URL: https://issues.apache.org/jira/browse/LUCENE-10355 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10355) Remove EMPTY LongValues in favor of LongValues#ZERO
Feng Guo created LUCENE-10355: - Summary: Remove EMPTY LongValues in favor of LongValues#ZERO Key: LUCENE-10355 URL: https://issues.apache.org/jira/browse/LUCENE-10355 Project: Lucene - Core Issue Type: Improvement Reporter: Feng Guo -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10350) Avoid some null checking for FastTaxonomyFacetCounts#countAll()
Feng Guo created LUCENE-10350: - Summary: Avoid some null checking for FastTaxonomyFacetCounts#countAll() Key: LUCENE-10350 URL: https://issues.apache.org/jira/browse/LUCENE-10350 Project: Lucene - Core Issue Type: Improvement Reporter: Feng Guo I find that {{org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()}} is using about 2% cpu of luceneutil, this could probably be replaced with {{values[doc]++}} since {{#countAll}} will never use hashTable. Two changes: # No need to check liveDocs null again and again. # Call {{values[doc]++}} instead of {{#increment}} since {{#countAll}} will never use hashTable. *Benchmark* (baseline is the newest main, including LUCENE-10346) {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value IntNRQ 128.51 (27.8%) 120.13 (27.4%) -6.5% ( -48% - 67%) 0.455 PKLookup 232.55 (5.0%) 226.26 (4.2%) -2.7% ( -11% -6%) 0.065 Wildcard 178.54 (5.5%) 175.13 (5.7%) -1.9% ( -12% -9%) 0.283 BrowseMonthSSDVFacets 16.37 (6.9%) 16.13 (4.6%) -1.5% ( -12% - 10%) 0.422 HighPhrase 211.52 (3.7%) 209.59 (3.3%) -0.9% ( -7% -6%) 0.414 MedPhrase 239.31 (3.2%) 237.14 (2.5%) -0.9% ( -6% -4%) 0.311 HighSloppyPhrase 33.08 (3.3%) 32.79 (3.5%) -0.9% ( -7% -6%) 0.407 Prefix3 171.63 (7.5%) 170.33 (8.3%) -0.8% ( -15% - 16%) 0.762 Respell 80.21 (3.3%) 79.74 (2.7%) -0.6% ( -6% -5%) 0.530 LowPhrase 26.21 (3.6%) 26.05 (2.5%) -0.6% ( -6% -5%) 0.549 LowSloppyPhrase 165.34 (2.4%) 164.47 (2.7%) -0.5% ( -5% -4%) 0.516 OrHighNotLow 1984.04 (3.9%) 1974.07 (5.2%) -0.5% ( -9% -8%) 0.730 OrHighMed 93.69 (4.2%) 93.23 (4.1%) -0.5% ( -8% -8%) 0.711 MedSpanNear 12.19 (3.6%) 12.14 (4.0%) -0.3% ( -7% -7%) 0.777 Fuzzy2 98.86 (3.0%) 98.56 (2.6%) -0.3% ( -5% -5%) 0.735 HighTerm 2284.28 (4.3%) 2277.92 (3.4%) -0.3% ( -7% -7%) 0.819 BrowseDayOfYearSSDVFacets 14.65 (4.8%) 14.61 (4.0%) -0.3% ( -8% -8%) 0.844 LowSpanNear 101.85 (1.7%) 101.58 (2.0%) -0.3% ( -3% -3%) 0.662 BrowseRandomLabelSSDVFacets 11.04 (5.4%) 11.02 (7.2%) -0.2% ( -12% - 13%) 0.902 OrHighHigh 39.59 (4.2%) 39.49 (4.1%) -0.2% ( -8% -8%) 0.859 Fuzzy1 84.27 (3.1%) 84.11 (2.3%) -0.2% ( -5% -5%) 0.826 AndHighMed 94.85 (5.1%) 94.77 (6.9%) -0.1% ( -11% - 12%) 0.969 HighTermDayOfYearSort 179.66 (17.0%) 179.56 (12.8%) -0.1% ( -25% - 35%) 0.991 LowTerm 2016.63 (3.5%) 2015.71 (3.9%) -0.0% ( -7% -7%) 0.969 AndHighLow 1011.34 (4.1%) 1011.05 (5.3%) -0.0% ( -9% -9%) 0.985 HighTermTitleBDVSort 121.48 (14.4%) 121.49 (15.9%)0.0% ( -26% - 35%) 0.998 MedTerm 2239.73 (4.6%) 2245.65 (3.1%)0.3% ( -7% -8%) 0.830 AndHighHigh 102.09 (3.1%) 102.48 (5.3%)0.4% ( -7% -9%) 0.778 OrNotHighLow 1113.23 (2.3%) 1117.98 (2.4%)0.4% ( -4% -5%) 0.568 HighSpanNear1.92 (4.7%)1.93 (5.4%)0.5% ( -9% - 11%) 0.738 OrHighNotMed 1322.20 (4.3%) 1330.58 (3.1%)0.6% ( -6% -8%) 0.592 AndHighMedDayTaxoFacets 65.82 (1.8%) 66.30 (2.5%)0.7% ( -3% -5%) 0.295 OrNotHighMed 1262.49 (3.0%) 1272.12 (3.8%)0.8% ( -5% -7%) 0.480 MedTermDayTaxoFacets 52.07 (4.7%) 52.54 (6.9%)0.9% ( -10% - 13%) 0.628 OrNotHighHigh 944.56 (3.7%) 953.87 (3.0%)1.0% ( -5% -7%) 0.352 MedSloppyPhrase 64.28 (5.4%) 64.92 (4.7%)1.0% ( -8% - 11%) 0.531
[jira] [Updated] (LUCENE-10346) Specially treat SingletonSortedNumericDocValues in FastTaxonomyFacetCounts#countAll()
[ https://issues.apache.org/jira/browse/LUCENE-10346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10346: -- Description: CPU profile often tells {{SingletonSortedNumericDocValues#nextDoc()}} is using a high percentage of CPU when running luceneutil, but the {{nextDoc()}} of dense cases should be rather simple. So I suspect that it is too many layers of abstraction (and wrap) that cause the stress of JVM. Unwraping it to {{NumericDocvalues}} shows around 30% speed up. {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value HighTermTitleBDVSort 132.24 (20.6%) 125.67 (9.9%) -5.0% ( -29% - 32%) 0.330 LowTerm 1424.13 (3.2%) 1381.34 (4.4%) -3.0% ( -10% -4%) 0.014 OrHighNotHigh 707.82 (3.3%) 687.49 (6.0%) -2.9% ( -11% -6%) 0.062 TermDTSort 155.32 (10.9%) 151.02 (10.2%) -2.8% ( -21% - 20%) 0.406 OrNotHighMed 618.46 (3.7%) 602.65 (4.4%) -2.6% ( -10% -5%) 0.047 Fuzzy1 76.22 (5.3%) 74.71 (6.6%) -2.0% ( -13% - 10%) 0.293 HighTermMonthSort 174.89 (10.4%) 171.45 (10.6%) -2.0% ( -20% - 21%) 0.554 OrHighNotMed 776.08 (4.9%) 761.70 (7.8%) -1.9% ( -13% - 11%) 0.367 HighTermDayOfYearSort 56.23 (10.7%) 55.26 (10.9%) -1.7% ( -21% - 22%) 0.615 MedTerm 1449.48 (3.7%) 1425.87 (5.1%) -1.6% ( -10% -7%) 0.250 OrNotHighHigh 687.92 (4.9%) 677.06 (5.5%) -1.6% ( -11% -9%) 0.339 OrHighNotLow 742.99 (4.7%) 732.23 (5.9%) -1.4% ( -11% -9%) 0.390 OrNotHighLow 789.37 (2.7%) 778.80 (4.7%) -1.3% ( -8% -6%) 0.270 HighPhrase 75.84 (2.2%) 75.14 (3.0%) -0.9% ( -6% -4%) 0.269 HighSloppyPhrase 20.71 (5.9%) 20.56 (5.2%) -0.7% ( -11% - 11%) 0.678 IntNRQ 106.38 (18.4%) 105.67 (18.2%) -0.7% ( -31% - 44%) 0.908 OrHighMed 45.10 (1.5%) 44.83 (1.8%) -0.6% ( -3% -2%) 0.261 MedSpanNear 192.49 (2.5%) 191.51 (3.5%) -0.5% ( -6% -5%) 0.593 OrHighLow 489.82 (5.5%) 487.79 (5.7%) -0.4% ( -11% - 11%) 0.815 MedSloppyPhrase 27.33 (2.9%) 27.22 (2.3%) -0.4% ( -5% -5%) 0.623 MedPhrase 208.94 (2.9%) 208.09 (3.7%) -0.4% ( -6% -6%) 0.696 Respell 71.84 (2.4%) 71.55 (2.4%) -0.4% ( -5% -4%) 0.600 OrHighHigh 36.26 (1.3%) 36.13 (1.1%) -0.4% ( -2% -2%) 0.344 BrowseMonthSSDVFacets 15.95 (2.7%) 15.90 (2.5%) -0.4% ( -5% -5%) 0.672 AndHighMed 85.83 (2.2%) 85.53 (2.7%) -0.3% ( -5% -4%) 0.658 Prefix3 123.15 (2.6%) 122.74 (2.5%) -0.3% ( -5% -4%) 0.678 Fuzzy2 76.41 (4.7%) 76.23 (4.2%) -0.2% ( -8% -9%) 0.867 BrowseDayOfYearSSDVFacets 14.52 (2.4%) 14.49 (2.2%) -0.2% ( -4% -4%) 0.747 MedIntervalsOrdered 56.39 (4.2%) 56.27 (4.1%) -0.2% ( -8% -8%) 0.871 HighIntervalsOrdered9.29 (4.7%)9.27 (4.4%) -0.2% ( -8% -9%) 0.896 AndHighMedDayTaxoFacets 119.76 (2.5%) 119.53 (2.9%) -0.2% ( -5% -5%) 0.831 HighSpanNear 20.89 (2.0%) 20.85 (2.3%) -0.2% ( -4% -4%) 0.803 LowIntervalsOrdered 45.51 (4.9%) 45.47 (4.8%) -0.1% ( -9% - 10%) 0.952 LowPhrase 64.17 (2.6%) 64.14 (2.6%) -0.1% ( -5% -5%) 0.951 LowSpanNear 104.45 (2.2%) 104.41 (1.9%) -0.0% ( -4% -4%) 0.959 Wildcard 103.83 (2.8%) 103.80 (2.8%) -0.0% ( -5% -5%) 0.970 AndHighHigh 42.33 (2.6%) 42.33 (2.4%) -0.0% ( -4% -5%) 0.991 BrowseRandomLabelSSDVFacets 10.62 (2.5%) 10.62 (1.8%)0.0% ( -4% -4%) 0.981 AndHighHighDayTaxoFacets
[jira] [Created] (LUCENE-10346) Specially treat SingletonSortedNumericDocValues in FastTaxonomyFacetCounts#countAll()
Feng Guo created LUCENE-10346: - Summary: Specially treat SingletonSortedNumericDocValues in FastTaxonomyFacetCounts#countAll() Key: LUCENE-10346 URL: https://issues.apache.org/jira/browse/LUCENE-10346 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Feng Guo CPU profile often tells {{SingletonSortedNumericDocValues#nextDoc()}} is using a high percentage of CPU when running luceneutil, but the {{nextDoc()}} of dense cases should be rather simple. So I suspect that it is too many layers of abstraction that cause the stress of JVM. Unwraping it to {{NumericDocvalues}} shows around 30% speed up. {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value HighTermTitleBDVSort 132.24 (20.6%) 125.67 (9.9%) -5.0% ( -29% - 32%) 0.330 LowTerm 1424.13 (3.2%) 1381.34 (4.4%) -3.0% ( -10% -4%) 0.014 OrHighNotHigh 707.82 (3.3%) 687.49 (6.0%) -2.9% ( -11% -6%) 0.062 TermDTSort 155.32 (10.9%) 151.02 (10.2%) -2.8% ( -21% - 20%) 0.406 OrNotHighMed 618.46 (3.7%) 602.65 (4.4%) -2.6% ( -10% -5%) 0.047 Fuzzy1 76.22 (5.3%) 74.71 (6.6%) -2.0% ( -13% - 10%) 0.293 HighTermMonthSort 174.89 (10.4%) 171.45 (10.6%) -2.0% ( -20% - 21%) 0.554 OrHighNotMed 776.08 (4.9%) 761.70 (7.8%) -1.9% ( -13% - 11%) 0.367 HighTermDayOfYearSort 56.23 (10.7%) 55.26 (10.9%) -1.7% ( -21% - 22%) 0.615 MedTerm 1449.48 (3.7%) 1425.87 (5.1%) -1.6% ( -10% -7%) 0.250 OrNotHighHigh 687.92 (4.9%) 677.06 (5.5%) -1.6% ( -11% -9%) 0.339 OrHighNotLow 742.99 (4.7%) 732.23 (5.9%) -1.4% ( -11% -9%) 0.390 OrNotHighLow 789.37 (2.7%) 778.80 (4.7%) -1.3% ( -8% -6%) 0.270 HighPhrase 75.84 (2.2%) 75.14 (3.0%) -0.9% ( -6% -4%) 0.269 HighSloppyPhrase 20.71 (5.9%) 20.56 (5.2%) -0.7% ( -11% - 11%) 0.678 IntNRQ 106.38 (18.4%) 105.67 (18.2%) -0.7% ( -31% - 44%) 0.908 OrHighMed 45.10 (1.5%) 44.83 (1.8%) -0.6% ( -3% -2%) 0.261 MedSpanNear 192.49 (2.5%) 191.51 (3.5%) -0.5% ( -6% -5%) 0.593 OrHighLow 489.82 (5.5%) 487.79 (5.7%) -0.4% ( -11% - 11%) 0.815 MedSloppyPhrase 27.33 (2.9%) 27.22 (2.3%) -0.4% ( -5% -5%) 0.623 MedPhrase 208.94 (2.9%) 208.09 (3.7%) -0.4% ( -6% -6%) 0.696 Respell 71.84 (2.4%) 71.55 (2.4%) -0.4% ( -5% -4%) 0.600 OrHighHigh 36.26 (1.3%) 36.13 (1.1%) -0.4% ( -2% -2%) 0.344 BrowseMonthSSDVFacets 15.95 (2.7%) 15.90 (2.5%) -0.4% ( -5% -5%) 0.672 AndHighMed 85.83 (2.2%) 85.53 (2.7%) -0.3% ( -5% -4%) 0.658 Prefix3 123.15 (2.6%) 122.74 (2.5%) -0.3% ( -5% -4%) 0.678 Fuzzy2 76.41 (4.7%) 76.23 (4.2%) -0.2% ( -8% -9%) 0.867 BrowseDayOfYearSSDVFacets 14.52 (2.4%) 14.49 (2.2%) -0.2% ( -4% -4%) 0.747 MedIntervalsOrdered 56.39 (4.2%) 56.27 (4.1%) -0.2% ( -8% -8%) 0.871 HighIntervalsOrdered9.29 (4.7%)9.27 (4.4%) -0.2% ( -8% -9%) 0.896 AndHighMedDayTaxoFacets 119.76 (2.5%) 119.53 (2.9%) -0.2% ( -5% -5%) 0.831 HighSpanNear 20.89 (2.0%) 20.85 (2.3%) -0.2% ( -4% -4%) 0.803 LowIntervalsOrdered 45.51 (4.9%) 45.47 (4.8%) -0.1% ( -9% - 10%) 0.952 LowPhrase 64.17 (2.6%) 64.14 (2.6%) -0.1% ( -5% -5%) 0.951 LowSpanNear 104.45 (2.2%) 104.41 (1.9%) -0.0% ( -4% -4%) 0.959 Wildcard 103.83 (2.8%) 103.80 (2.8%) -0.0% ( -5% -5%) 0.970 AndHighHigh 42.33 (2.6%)
[jira] [Commented] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466863#comment-17466863 ] Feng Guo commented on LUCENE-10334: --- OK! I've prepared the [PR for first patch|https://github.com/apache/lucene/pull/562/files] ready for a review now, please help take a look when you have free time, Thanks [~rcmuir]! > Introduce a BlockReader based on ForUtil and use it for NumericDocValues > > > Key: LUCENE-10334 > URL: https://issues.apache.org/jira/browse/LUCENE-10334 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Previous talk is here: [https://github.com/apache/lucene/pull/557] > This is trying to add a new BlockReader based on ForUtil to replace the > DirectReader we are using for NumericDocvalues > -*Benchmark based on wiki10m*- (Previous benchmark results are wrong so i > deleted it to avoid misleading, let's see the benchmark in comments.) -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466709#comment-17466709 ] Feng Guo edited comment on LUCENE-10334 at 12/30/21, 10:06 AM: --- In order to save reading time, I deleted some previous progress comments and try to make a final summary here. ??one idea is we could try using the new block compression just for ordinals as a start?? Thanks [~rcmuir] for the suggestion! I made some optimizations in this approach and browse taxo tasks (Browse*TaxoFacets) are getting speed up too. So the benchmark is telling "dense faster sparse slower" instead of "SSDV faster Taxos slower" now. I suspect we probably did not see a SSDV regression just because we have not added reading sparse SSDV values tasks, e.g. a {{{}MedTermDaySSDVFacets{}}}. I've got two schemes in mind so far: *ForUtil Approach* This approach tends to make file format friendly to block decoding and decode block based on the efficient ForUtil (with SIMD opt) for each get. As a result we can get a rather delicious (130%) speed up in {{Browse*}} tasks. But we also get a slight (10%) regression in tasks that reading facets with a query (like MedTermDayTaxoFacets) since we are reading sparse values there and we need to decompress the whole 128 values block even we only need one value in that block. Here is the [code|https://github.com/apache/lucene/pull/562] and luceneutil benchmark: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value AndHighMedDayTaxoFacets 71.49 (2.1%) 64.72 (2.0%) -9.5% ( -13% - -5%) 0.000 MedTermDayTaxoFacets 25.79 (2.6%) 24.00 (1.8%) -6.9% ( -11% - -2%) 0.000 AndHighHighDayTaxoFacets 13.13 (3.4%) 12.63 (3.1%) -3.9% ( -10% -2%) 0.000 OrHighMedDayTaxoFacets 13.71 (4.1%) 13.41 (4.7%) -2.2% ( -10% -6%) 0.118 PKLookup 204.87 (3.9%) 203.03 (3.6%) -0.9% ( -8% -6%) 0.450 Prefix3 113.85 (3.6%) 113.32 (4.6%) -0.5% ( -8% -8%) 0.724 HighSpanNear 25.34 (2.5%) 25.26 (3.1%) -0.3% ( -5% -5%) 0.714 LowSpanNear 55.96 (2.0%) 55.80 (2.1%) -0.3% ( -4% -3%) 0.658 MedSpanNear 56.84 (2.4%) 56.90 (2.2%)0.1% ( -4% -4%) 0.895 MedSloppyPhrase 26.57 (1.8%) 26.60 (1.9%)0.1% ( -3% -3%) 0.831 HighSloppyPhrase 30.20 (3.7%) 30.24 (3.6%)0.2% ( -6% -7%) 0.890 OrHighMed 49.96 (2.1%) 50.06 (1.7%)0.2% ( -3% -4%) 0.742 AndHighMed 96.70 (2.9%) 96.95 (2.6%)0.3% ( -5% -5%) 0.772 LowIntervalsOrdered 23.32 (4.6%) 23.38 (4.5%)0.3% ( -8% -9%) 0.856 OrHighHigh 38.09 (1.9%) 38.20 (1.8%)0.3% ( -3% -4%) 0.643 TermDTSort 128.55 (14.7%) 128.94 (11.6%)0.3% ( -22% - 31%) 0.942 Fuzzy1 99.54 (7.1%) 99.86 (8.0%)0.3% ( -13% - 16%) 0.893 HighIntervalsOrdered 15.58 (2.6%) 15.65 (2.6%)0.4% ( -4% -5%) 0.636 Respell 63.96 (1.9%) 64.22 (2.3%)0.4% ( -3% -4%) 0.542 OrHighNotHigh 611.12 (5.8%) 613.85 (6.2%)0.4% ( -10% - 13%) 0.814 MedIntervalsOrdered 59.48 (5.2%) 59.75 (5.1%)0.5% ( -9% - 11%) 0.780 AndHighHigh 58.76 (3.0%) 59.16 (3.0%)0.7% ( -5% -6%) 0.478 OrNotHighHigh 619.53 (6.0%) 623.79 (7.1%)0.7% ( -11% - 14%) 0.740 HighPhrase 31.00 (2.5%) 31.26 (2.7%)0.8% ( -4% -6%) 0.307 AndHighLow 828.41 (5.9%) 835.65 (7.1%)0.9% ( -11% - 14%) 0.672 OrNotHighLow 986.46 (6.8%) 995.13 (10.5%)0.9% ( -15% - 19%) 0.752 HighTermTitleBDVSort 110.39 (12.3%) 111.38 (11.1%)0.9% ( -20% - 27%) 0.807 IntNRQ 151.29 (2.6%) 152.96 (3.5%)1.1% ( -4% -7%) 0.262 LowTerm 1876.18 (7.8%) 1897.19 (8.3%)1.1% ( -13% - 18%) 0.660 HighTermDayOfYearSort 108.34 (18.9%) 109.87 (17.4%)
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466709#comment-17466709 ] Feng Guo edited comment on LUCENE-10334 at 12/30/21, 9:18 AM: -- In order to save reading time, I deleted some previous progress comments and try to make a final summary here. ??one idea is we could try using the new block compression just for ordinals as a start?? Thanks [~rcmuir] for the suggestion! I made some optimizations in this approach and browse taxo tasks (Browse*TaxoFacets) are getting speed up too. So the benchmark is telling "dense faster sparse slower" instead of "SSDV faster Taxos slower" now. I suspect we probably did not see a SSDV regression just because we have not added reading sparse SSDV values tasks, e.g. a {{{}MedTermDaySSDVFacets{}}}. I've got two schemes in mind so far: *ForUtil Approach* This approach tends to make file format friendly to block decoding and decode block based on the efficient ForUtil (with SIMD opt) for each get. As a result we can get a rather delicious (130%) speed up in {{Browse*}} tasks. But we also get a slight (10%) regression in tasks that reading facets with a query (like MedTermDayTaxoFacets) since we are reading sparse values there and we need to decompress the whole 128 values block even we only need one value in that block. Here is the [code|https://github.com/apache/lucene/pull/562] and luceneutil benchmark: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value AndHighMedDayTaxoFacets 71.49 (2.1%) 64.72 (2.0%) -9.5% ( -13% - -5%) 0.000 MedTermDayTaxoFacets 25.79 (2.6%) 24.00 (1.8%) -6.9% ( -11% - -2%) 0.000 AndHighHighDayTaxoFacets 13.13 (3.4%) 12.63 (3.1%) -3.9% ( -10% -2%) 0.000 OrHighMedDayTaxoFacets 13.71 (4.1%) 13.41 (4.7%) -2.2% ( -10% -6%) 0.118 PKLookup 204.87 (3.9%) 203.03 (3.6%) -0.9% ( -8% -6%) 0.450 Prefix3 113.85 (3.6%) 113.32 (4.6%) -0.5% ( -8% -8%) 0.724 HighSpanNear 25.34 (2.5%) 25.26 (3.1%) -0.3% ( -5% -5%) 0.714 LowSpanNear 55.96 (2.0%) 55.80 (2.1%) -0.3% ( -4% -3%) 0.658 MedSpanNear 56.84 (2.4%) 56.90 (2.2%)0.1% ( -4% -4%) 0.895 MedSloppyPhrase 26.57 (1.8%) 26.60 (1.9%)0.1% ( -3% -3%) 0.831 HighSloppyPhrase 30.20 (3.7%) 30.24 (3.6%)0.2% ( -6% -7%) 0.890 OrHighMed 49.96 (2.1%) 50.06 (1.7%)0.2% ( -3% -4%) 0.742 AndHighMed 96.70 (2.9%) 96.95 (2.6%)0.3% ( -5% -5%) 0.772 LowIntervalsOrdered 23.32 (4.6%) 23.38 (4.5%)0.3% ( -8% -9%) 0.856 OrHighHigh 38.09 (1.9%) 38.20 (1.8%)0.3% ( -3% -4%) 0.643 TermDTSort 128.55 (14.7%) 128.94 (11.6%)0.3% ( -22% - 31%) 0.942 Fuzzy1 99.54 (7.1%) 99.86 (8.0%)0.3% ( -13% - 16%) 0.893 HighIntervalsOrdered 15.58 (2.6%) 15.65 (2.6%)0.4% ( -4% -5%) 0.636 Respell 63.96 (1.9%) 64.22 (2.3%)0.4% ( -3% -4%) 0.542 OrHighNotHigh 611.12 (5.8%) 613.85 (6.2%)0.4% ( -10% - 13%) 0.814 MedIntervalsOrdered 59.48 (5.2%) 59.75 (5.1%)0.5% ( -9% - 11%) 0.780 AndHighHigh 58.76 (3.0%) 59.16 (3.0%)0.7% ( -5% -6%) 0.478 OrNotHighHigh 619.53 (6.0%) 623.79 (7.1%)0.7% ( -11% - 14%) 0.740 HighPhrase 31.00 (2.5%) 31.26 (2.7%)0.8% ( -4% -6%) 0.307 AndHighLow 828.41 (5.9%) 835.65 (7.1%)0.9% ( -11% - 14%) 0.672 OrNotHighLow 986.46 (6.8%) 995.13 (10.5%)0.9% ( -15% - 19%) 0.752 HighTermTitleBDVSort 110.39 (12.3%) 111.38 (11.1%)0.9% ( -20% - 27%) 0.807 IntNRQ 151.29 (2.6%) 152.96 (3.5%)1.1% ( -4% -7%) 0.262 LowTerm 1876.18 (7.8%) 1897.19 (8.3%)1.1% ( -13% - 18%) 0.660 HighTermDayOfYearSort 108.34 (18.9%) 109.87 (17.4%)
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466709#comment-17466709 ] Feng Guo edited comment on LUCENE-10334 at 12/30/21, 9:16 AM: -- In order to save reading time, I deleted some previous progress comments and try to make a final summary here. ??one idea is we could try using the new block compression just for ordinals as a start?? Thanks [~rcmuir] for the suggestion! I made some optimizations in this approach and browse taxo tasks (Browse*TaxoFacets) are getting speed up too. So the benchmark is telling "dense faster sparse slower" instead of "SSDV faster Taxos slower" now. I suspect we probably did not see a SSDV regression just because we have not added reading sparse SSDV values tasks, e.g. a {{{}MedTermDaySSDVFacets{}}}. I've got two schemes in mind so far: *ForUtil Approach* This approach tends to make file format friendly to block decoding and decode block based on the efficient ForUtil (with SIMD opt) for each get. As a result we can get a rather delicious (130%) speed up in {{Browse*}} tasks. But we also get a slight (10%) regression in tasks that reading facets with a query (like MedTermDayTaxoFacets) since we are reading sparse values there and we need to decompress the whole 128 values block even we only need one value in that block. Here is the [code|https://github.com/apache/lucene/pull/562] and luceneutil benchmark: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value AndHighMedDayTaxoFacets 71.49 (2.1%) 64.72 (2.0%) -9.5% ( -13% - -5%) 0.000 MedTermDayTaxoFacets 25.79 (2.6%) 24.00 (1.8%) -6.9% ( -11% - -2%) 0.000 AndHighHighDayTaxoFacets 13.13 (3.4%) 12.63 (3.1%) -3.9% ( -10% -2%) 0.000 OrHighMedDayTaxoFacets 13.71 (4.1%) 13.41 (4.7%) -2.2% ( -10% -6%) 0.118 PKLookup 204.87 (3.9%) 203.03 (3.6%) -0.9% ( -8% -6%) 0.450 Prefix3 113.85 (3.6%) 113.32 (4.6%) -0.5% ( -8% -8%) 0.724 HighSpanNear 25.34 (2.5%) 25.26 (3.1%) -0.3% ( -5% -5%) 0.714 LowSpanNear 55.96 (2.0%) 55.80 (2.1%) -0.3% ( -4% -3%) 0.658 MedSpanNear 56.84 (2.4%) 56.90 (2.2%)0.1% ( -4% -4%) 0.895 MedSloppyPhrase 26.57 (1.8%) 26.60 (1.9%)0.1% ( -3% -3%) 0.831 HighSloppyPhrase 30.20 (3.7%) 30.24 (3.6%)0.2% ( -6% -7%) 0.890 OrHighMed 49.96 (2.1%) 50.06 (1.7%)0.2% ( -3% -4%) 0.742 AndHighMed 96.70 (2.9%) 96.95 (2.6%)0.3% ( -5% -5%) 0.772 LowIntervalsOrdered 23.32 (4.6%) 23.38 (4.5%)0.3% ( -8% -9%) 0.856 OrHighHigh 38.09 (1.9%) 38.20 (1.8%)0.3% ( -3% -4%) 0.643 TermDTSort 128.55 (14.7%) 128.94 (11.6%)0.3% ( -22% - 31%) 0.942 Fuzzy1 99.54 (7.1%) 99.86 (8.0%)0.3% ( -13% - 16%) 0.893 HighIntervalsOrdered 15.58 (2.6%) 15.65 (2.6%)0.4% ( -4% -5%) 0.636 Respell 63.96 (1.9%) 64.22 (2.3%)0.4% ( -3% -4%) 0.542 OrHighNotHigh 611.12 (5.8%) 613.85 (6.2%)0.4% ( -10% - 13%) 0.814 MedIntervalsOrdered 59.48 (5.2%) 59.75 (5.1%)0.5% ( -9% - 11%) 0.780 AndHighHigh 58.76 (3.0%) 59.16 (3.0%)0.7% ( -5% -6%) 0.478 OrNotHighHigh 619.53 (6.0%) 623.79 (7.1%)0.7% ( -11% - 14%) 0.740 HighPhrase 31.00 (2.5%) 31.26 (2.7%)0.8% ( -4% -6%) 0.307 AndHighLow 828.41 (5.9%) 835.65 (7.1%)0.9% ( -11% - 14%) 0.672 OrNotHighLow 986.46 (6.8%) 995.13 (10.5%)0.9% ( -15% - 19%) 0.752 HighTermTitleBDVSort 110.39 (12.3%) 111.38 (11.1%)0.9% ( -20% - 27%) 0.807 IntNRQ 151.29 (2.6%) 152.96 (3.5%)1.1% ( -4% -7%) 0.262 LowTerm 1876.18 (7.8%) 1897.19 (8.3%)1.1% ( -13% - 18%) 0.660 HighTermDayOfYearSort 108.34 (18.9%) 109.87 (17.4%)
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466709#comment-17466709 ] Feng Guo edited comment on LUCENE-10334 at 12/30/21, 8:41 AM: -- In order to save reading time, I deleted some previous progress comments and try to make a final summary here. ??one idea is we could try using the new block compression just for ordinals as a start?? Thanks [~rcmuir] for the suggestion! I made some optimizations in this approach and browse taxo tasks (Browse*TaxoFacets) are getting speed up too. So the benchmark is telling "dense faster sparse slower" instead of "SSDV faster Taxos slower" now. I suspect we probably did not see a SSDV regression just because we have not added reading sparse SSDV values tasks, e.g. a {{{}MedTermDaySSDVFacets{}}}. I've got two schemes in mind so far: *ForUtil Approach* This approach tends to make file format friendly to block decoding and do block decoding for each get. As a result we can get a rather delicious (130%) speed up in {{Browse*}} tasks. But we also get a slight (10%) regression in tasks that reading facets with a query (like MedTermDayTaxoFacets) since we are reading sparse values there and we need to decompress the whole 128 values block even we only need one value in that block. Here is the [code|https://github.com/apache/lucene/pull/562] and luceneutil benchmark: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value AndHighMedDayTaxoFacets 71.49 (2.1%) 64.72 (2.0%) -9.5% ( -13% - -5%) 0.000 MedTermDayTaxoFacets 25.79 (2.6%) 24.00 (1.8%) -6.9% ( -11% - -2%) 0.000 AndHighHighDayTaxoFacets 13.13 (3.4%) 12.63 (3.1%) -3.9% ( -10% -2%) 0.000 OrHighMedDayTaxoFacets 13.71 (4.1%) 13.41 (4.7%) -2.2% ( -10% -6%) 0.118 PKLookup 204.87 (3.9%) 203.03 (3.6%) -0.9% ( -8% -6%) 0.450 Prefix3 113.85 (3.6%) 113.32 (4.6%) -0.5% ( -8% -8%) 0.724 HighSpanNear 25.34 (2.5%) 25.26 (3.1%) -0.3% ( -5% -5%) 0.714 LowSpanNear 55.96 (2.0%) 55.80 (2.1%) -0.3% ( -4% -3%) 0.658 MedSpanNear 56.84 (2.4%) 56.90 (2.2%)0.1% ( -4% -4%) 0.895 MedSloppyPhrase 26.57 (1.8%) 26.60 (1.9%)0.1% ( -3% -3%) 0.831 HighSloppyPhrase 30.20 (3.7%) 30.24 (3.6%)0.2% ( -6% -7%) 0.890 OrHighMed 49.96 (2.1%) 50.06 (1.7%)0.2% ( -3% -4%) 0.742 AndHighMed 96.70 (2.9%) 96.95 (2.6%)0.3% ( -5% -5%) 0.772 LowIntervalsOrdered 23.32 (4.6%) 23.38 (4.5%)0.3% ( -8% -9%) 0.856 OrHighHigh 38.09 (1.9%) 38.20 (1.8%)0.3% ( -3% -4%) 0.643 TermDTSort 128.55 (14.7%) 128.94 (11.6%)0.3% ( -22% - 31%) 0.942 Fuzzy1 99.54 (7.1%) 99.86 (8.0%)0.3% ( -13% - 16%) 0.893 HighIntervalsOrdered 15.58 (2.6%) 15.65 (2.6%)0.4% ( -4% -5%) 0.636 Respell 63.96 (1.9%) 64.22 (2.3%)0.4% ( -3% -4%) 0.542 OrHighNotHigh 611.12 (5.8%) 613.85 (6.2%)0.4% ( -10% - 13%) 0.814 MedIntervalsOrdered 59.48 (5.2%) 59.75 (5.1%)0.5% ( -9% - 11%) 0.780 AndHighHigh 58.76 (3.0%) 59.16 (3.0%)0.7% ( -5% -6%) 0.478 OrNotHighHigh 619.53 (6.0%) 623.79 (7.1%)0.7% ( -11% - 14%) 0.740 HighPhrase 31.00 (2.5%) 31.26 (2.7%)0.8% ( -4% -6%) 0.307 AndHighLow 828.41 (5.9%) 835.65 (7.1%)0.9% ( -11% - 14%) 0.672 OrNotHighLow 986.46 (6.8%) 995.13 (10.5%)0.9% ( -15% - 19%) 0.752 HighTermTitleBDVSort 110.39 (12.3%) 111.38 (11.1%)0.9% ( -20% - 27%) 0.807 IntNRQ 151.29 (2.6%) 152.96 (3.5%)1.1% ( -4% -7%) 0.262 LowTerm 1876.18 (7.8%) 1897.19 (8.3%)1.1% ( -13% - 18%) 0.660 HighTermDayOfYearSort 108.34 (18.9%) 109.87 (17.4%)1.4% ( -29% - 46%) 0.805
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466709#comment-17466709 ] Feng Guo edited comment on LUCENE-10334 at 12/30/21, 7:53 AM: -- In order to save reading time, I deleted some previous progress comments and try to make a final summary here. ??one idea is we could try using the new block compression just for ordinals as a start?? Thanks [~rcmuir] for the suggestion! I made some optimizations in this approach and browse taxo tasks (Browse*TaxoFacets) are getting speed up too. So the benchmark is telling "dense faster sparse slower" instead of "SSDV faster Taxos slower" now. I suspect we probably did not see a SSDV regression just because we have not added reading sparse SSDV values tasks, e.g. a {{{}MedTermDaySSDVFacets{}}}. I've got two schemes in mind so far: *ForUtil Approach* This approach tends to make file format friendly to block decoding and do block decoding for each get. As a result we can get a rather delicious (130%) speed up in {{Browse*}} tasks. But we also get a slight (10%) regression in tasks that reading facets with a query (like MedTermDayTaxoFacets) since we are reading sparse values there and we need to decompress the whole 128 values block even we only need one value in that block. Here is the [code|https://github.com/apache/lucene/pull/562] and luceneutil benchmark: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value AndHighMedDayTaxoFacets 71.49 (2.1%) 64.72 (2.0%) -9.5% ( -13% - -5%) 0.000 MedTermDayTaxoFacets 25.79 (2.6%) 24.00 (1.8%) -6.9% ( -11% - -2%) 0.000 AndHighHighDayTaxoFacets 13.13 (3.4%) 12.63 (3.1%) -3.9% ( -10% -2%) 0.000 OrHighMedDayTaxoFacets 13.71 (4.1%) 13.41 (4.7%) -2.2% ( -10% -6%) 0.118 PKLookup 204.87 (3.9%) 203.03 (3.6%) -0.9% ( -8% -6%) 0.450 Prefix3 113.85 (3.6%) 113.32 (4.6%) -0.5% ( -8% -8%) 0.724 HighSpanNear 25.34 (2.5%) 25.26 (3.1%) -0.3% ( -5% -5%) 0.714 LowSpanNear 55.96 (2.0%) 55.80 (2.1%) -0.3% ( -4% -3%) 0.658 MedSpanNear 56.84 (2.4%) 56.90 (2.2%)0.1% ( -4% -4%) 0.895 MedSloppyPhrase 26.57 (1.8%) 26.60 (1.9%)0.1% ( -3% -3%) 0.831 HighSloppyPhrase 30.20 (3.7%) 30.24 (3.6%)0.2% ( -6% -7%) 0.890 OrHighMed 49.96 (2.1%) 50.06 (1.7%)0.2% ( -3% -4%) 0.742 AndHighMed 96.70 (2.9%) 96.95 (2.6%)0.3% ( -5% -5%) 0.772 LowIntervalsOrdered 23.32 (4.6%) 23.38 (4.5%)0.3% ( -8% -9%) 0.856 OrHighHigh 38.09 (1.9%) 38.20 (1.8%)0.3% ( -3% -4%) 0.643 TermDTSort 128.55 (14.7%) 128.94 (11.6%)0.3% ( -22% - 31%) 0.942 Fuzzy1 99.54 (7.1%) 99.86 (8.0%)0.3% ( -13% - 16%) 0.893 HighIntervalsOrdered 15.58 (2.6%) 15.65 (2.6%)0.4% ( -4% -5%) 0.636 Respell 63.96 (1.9%) 64.22 (2.3%)0.4% ( -3% -4%) 0.542 OrHighNotHigh 611.12 (5.8%) 613.85 (6.2%)0.4% ( -10% - 13%) 0.814 MedIntervalsOrdered 59.48 (5.2%) 59.75 (5.1%)0.5% ( -9% - 11%) 0.780 AndHighHigh 58.76 (3.0%) 59.16 (3.0%)0.7% ( -5% -6%) 0.478 OrNotHighHigh 619.53 (6.0%) 623.79 (7.1%)0.7% ( -11% - 14%) 0.740 HighPhrase 31.00 (2.5%) 31.26 (2.7%)0.8% ( -4% -6%) 0.307 AndHighLow 828.41 (5.9%) 835.65 (7.1%)0.9% ( -11% - 14%) 0.672 OrNotHighLow 986.46 (6.8%) 995.13 (10.5%)0.9% ( -15% - 19%) 0.752 HighTermTitleBDVSort 110.39 (12.3%) 111.38 (11.1%)0.9% ( -20% - 27%) 0.807 IntNRQ 151.29 (2.6%) 152.96 (3.5%)1.1% ( -4% -7%) 0.262 LowTerm 1876.18 (7.8%) 1897.19 (8.3%)1.1% ( -13% - 18%) 0.660 HighTermDayOfYearSort 108.34 (18.9%) 109.87 (17.4%)1.4% ( -29% - 46%) 0.805
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466709#comment-17466709 ] Feng Guo edited comment on LUCENE-10334 at 12/30/21, 7:39 AM: -- In order to save reading time, I deleted some previous progress comments and try to make a final summary here. ??one idea is we could try using the new block compression just for ordinals as a start?? Thanks [~rcmuir] for the suggestion! I made some optimizations in this approach and browse taxo tasks (Browse*TaxoFacets) are getting speed up too. So the benchmark is telling "dense faster sparse slower" instead of "SSDV faster Taxos slower" now. I suspect we probably did not see a SSDV regression just because we have not added reading sparse SSDV values tasks, e.g. a {{MedTermDaySSDVFacets}}. I've got two schemes in mind so far: *ForUtil Approach* This approach tends to make file format friendly to block decoding and do block decoding for each get. As a result we can get a rather delicious (130%) speed up in {{Browse*}} tasks. But we also get a slight (10%) regression in tasks that reading facets with a query (like MedTermDayTaxoFacets) since we are reading sparse values there and we need to decompress the whole 128 values block even we only need one value in that block. Here is the [code|https://github.com/apache/lucene/pull/562] luceneutil benchmark: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value AndHighMedDayTaxoFacets 71.49 (2.1%) 64.72 (2.0%) -9.5% ( -13% - -5%) 0.000 MedTermDayTaxoFacets 25.79 (2.6%) 24.00 (1.8%) -6.9% ( -11% - -2%) 0.000 AndHighHighDayTaxoFacets 13.13 (3.4%) 12.63 (3.1%) -3.9% ( -10% -2%) 0.000 OrHighMedDayTaxoFacets 13.71 (4.1%) 13.41 (4.7%) -2.2% ( -10% -6%) 0.118 PKLookup 204.87 (3.9%) 203.03 (3.6%) -0.9% ( -8% -6%) 0.450 Prefix3 113.85 (3.6%) 113.32 (4.6%) -0.5% ( -8% -8%) 0.724 HighSpanNear 25.34 (2.5%) 25.26 (3.1%) -0.3% ( -5% -5%) 0.714 LowSpanNear 55.96 (2.0%) 55.80 (2.1%) -0.3% ( -4% -3%) 0.658 MedSpanNear 56.84 (2.4%) 56.90 (2.2%)0.1% ( -4% -4%) 0.895 MedSloppyPhrase 26.57 (1.8%) 26.60 (1.9%)0.1% ( -3% -3%) 0.831 HighSloppyPhrase 30.20 (3.7%) 30.24 (3.6%)0.2% ( -6% -7%) 0.890 OrHighMed 49.96 (2.1%) 50.06 (1.7%)0.2% ( -3% -4%) 0.742 AndHighMed 96.70 (2.9%) 96.95 (2.6%)0.3% ( -5% -5%) 0.772 LowIntervalsOrdered 23.32 (4.6%) 23.38 (4.5%)0.3% ( -8% -9%) 0.856 OrHighHigh 38.09 (1.9%) 38.20 (1.8%)0.3% ( -3% -4%) 0.643 TermDTSort 128.55 (14.7%) 128.94 (11.6%)0.3% ( -22% - 31%) 0.942 Fuzzy1 99.54 (7.1%) 99.86 (8.0%)0.3% ( -13% - 16%) 0.893 HighIntervalsOrdered 15.58 (2.6%) 15.65 (2.6%)0.4% ( -4% -5%) 0.636 Respell 63.96 (1.9%) 64.22 (2.3%)0.4% ( -3% -4%) 0.542 OrHighNotHigh 611.12 (5.8%) 613.85 (6.2%)0.4% ( -10% - 13%) 0.814 MedIntervalsOrdered 59.48 (5.2%) 59.75 (5.1%)0.5% ( -9% - 11%) 0.780 AndHighHigh 58.76 (3.0%) 59.16 (3.0%)0.7% ( -5% -6%) 0.478 OrNotHighHigh 619.53 (6.0%) 623.79 (7.1%)0.7% ( -11% - 14%) 0.740 HighPhrase 31.00 (2.5%) 31.26 (2.7%)0.8% ( -4% -6%) 0.307 AndHighLow 828.41 (5.9%) 835.65 (7.1%)0.9% ( -11% - 14%) 0.672 OrNotHighLow 986.46 (6.8%) 995.13 (10.5%)0.9% ( -15% - 19%) 0.752 HighTermTitleBDVSort 110.39 (12.3%) 111.38 (11.1%)0.9% ( -20% - 27%) 0.807 IntNRQ 151.29 (2.6%) 152.96 (3.5%)1.1% ( -4% -7%) 0.262 LowTerm 1876.18 (7.8%) 1897.19 (8.3%)1.1% ( -13% - 18%) 0.660 HighTermDayOfYearSort 108.34 (18.9%) 109.87 (17.4%)1.4% ( -29% - 46%) 0.805
[jira] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334 ] Feng Guo deleted comment on LUCENE-10334: --- was (Author: gf2121): Thanks [~rcmuir] for suggestion! I tried some optimizations on this patch: 1. I replaced {{DirectWriter#unsignedBitsRequired}} with {{PackedInts#unsignedBitsRequired}} at first since ForUtil can support all bpv, this change can reduce some index size. But now i rollbacked this change since the decode of 1,2,4,8,12,16... could also be a bit faster in ForUtil. 2. {{ForUtil#decode}} will do a {{switch}} for each call, this can be avoided by the way like what we do in {{{}DirectReader{}}}, choose a implementation of an interface at the beginning. I applied this change in ForUtil. I'm not sure which is the major optimization but the report seems better now: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value AndHighMedDayTaxoFacets 71.49 (2.1%) 64.72 (2.0%) -9.5% ( -13% - -5%) 0.000 MedTermDayTaxoFacets 25.79 (2.6%) 24.00 (1.8%) -6.9% ( -11% - -2%) 0.000 AndHighHighDayTaxoFacets 13.13 (3.4%) 12.63 (3.1%) -3.9% ( -10% -2%) 0.000 OrHighMedDayTaxoFacets 13.71 (4.1%) 13.41 (4.7%) -2.2% ( -10% -6%) 0.118 PKLookup 204.87 (3.9%) 203.03 (3.6%) -0.9% ( -8% -6%) 0.450 Prefix3 113.85 (3.6%) 113.32 (4.6%) -0.5% ( -8% -8%) 0.724 HighSpanNear 25.34 (2.5%) 25.26 (3.1%) -0.3% ( -5% -5%) 0.714 LowSpanNear 55.96 (2.0%) 55.80 (2.1%) -0.3% ( -4% -3%) 0.658 MedSpanNear 56.84 (2.4%) 56.90 (2.2%)0.1% ( -4% -4%) 0.895 MedSloppyPhrase 26.57 (1.8%) 26.60 (1.9%)0.1% ( -3% -3%) 0.831 HighSloppyPhrase 30.20 (3.7%) 30.24 (3.6%)0.2% ( -6% -7%) 0.890 OrHighMed 49.96 (2.1%) 50.06 (1.7%)0.2% ( -3% -4%) 0.742 AndHighMed 96.70 (2.9%) 96.95 (2.6%)0.3% ( -5% -5%) 0.772 LowIntervalsOrdered 23.32 (4.6%) 23.38 (4.5%)0.3% ( -8% -9%) 0.856 OrHighHigh 38.09 (1.9%) 38.20 (1.8%)0.3% ( -3% -4%) 0.643 TermDTSort 128.55 (14.7%) 128.94 (11.6%)0.3% ( -22% - 31%) 0.942 Fuzzy1 99.54 (7.1%) 99.86 (8.0%)0.3% ( -13% - 16%) 0.893 HighIntervalsOrdered 15.58 (2.6%) 15.65 (2.6%)0.4% ( -4% -5%) 0.636 Respell 63.96 (1.9%) 64.22 (2.3%)0.4% ( -3% -4%) 0.542 OrHighNotHigh 611.12 (5.8%) 613.85 (6.2%)0.4% ( -10% - 13%) 0.814 MedIntervalsOrdered 59.48 (5.2%) 59.75 (5.1%)0.5% ( -9% - 11%) 0.780 AndHighHigh 58.76 (3.0%) 59.16 (3.0%)0.7% ( -5% -6%) 0.478 OrNotHighHigh 619.53 (6.0%) 623.79 (7.1%)0.7% ( -11% - 14%) 0.740 HighPhrase 31.00 (2.5%) 31.26 (2.7%)0.8% ( -4% -6%) 0.307 AndHighLow 828.41 (5.9%) 835.65 (7.1%)0.9% ( -11% - 14%) 0.672 OrNotHighLow 986.46 (6.8%) 995.13 (10.5%)0.9% ( -15% - 19%) 0.752 HighTermTitleBDVSort 110.39 (12.3%) 111.38 (11.1%)0.9% ( -20% - 27%) 0.807 IntNRQ 151.29 (2.6%) 152.96 (3.5%)1.1% ( -4% -7%) 0.262 LowTerm 1876.18 (7.8%) 1897.19 (8.3%)1.1% ( -13% - 18%) 0.660 HighTermDayOfYearSort 108.34 (18.9%) 109.87 (17.4%)1.4% ( -29% - 46%) 0.805 HighTermMonthSort 65.84 (11.0%) 66.78 (11.7%)1.4% ( -19% - 27%) 0.689 OrHighNotMed 770.05 (5.3%) 782.54 (8.8%)1.6% ( -11% - 16%) 0.480 Wildcard 182.10 (5.5%) 185.24 (7.2%)1.7% ( -10% - 15%) 0.394 LowSloppyPhrase 33.75 (6.6%) 34.35 (8.8%)1.8% ( -12% - 18%) 0.478 MedPhrase 161.57 (3.8%) 164.62 (6.1%)1.9% ( -7% - 12%) 0.242 OrHighNotLow 679.46 (7.2%) 693.59 (7.6%)2.1% ( -11% -
[jira] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334 ] Feng Guo deleted comment on LUCENE-10334: --- was (Author: gf2121): In the 'detect warm up' approach, I unrolled the block decode codes, speed it up a bit: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value MedTermDayTaxoFacets 55.17 (6.5%) 53.38 (7.1%) -3.2% ( -15% - 11%) 0.129 Wildcard 309.31 (12.9%) 299.80 (12.6%) -3.1% ( -25% - 25%) 0.446 OrNotHighLow 696.18 (8.7%) 677.46 (10.1%) -2.7% ( -19% - 17%) 0.367 HighTerm 1183.05 (9.5%) 1151.67 (9.2%) -2.7% ( -19% - 17%) 0.368 OrHighMed 120.60 (7.0%) 117.55 (7.8%) -2.5% ( -16% - 13%) 0.279 OrHighMedDayTaxoFacets9.46 (7.5%)9.25 (6.7%) -2.2% ( -15% - 12%) 0.320 Prefix3 177.41 (7.3%) 173.96 (10.3%) -1.9% ( -18% - 16%) 0.489 AndHighHighDayTaxoFacets 28.81 (6.7%) 28.35 (6.3%) -1.6% ( -13% - 12%) 0.433 BrowseMonthTaxoFacets 13.50 (13.4%) 13.30 (6.6%) -1.5% ( -18% - 21%) 0.658 AndHighMedDayTaxoFacets 46.43 (7.8%) 45.75 (7.3%) -1.5% ( -15% - 14%) 0.540 MedPhrase 360.70 (8.5%) 355.47 (8.3%) -1.4% ( -16% - 16%) 0.587 AndHighMed 233.52 (6.7%) 230.19 (7.0%) -1.4% ( -14% - 13%) 0.510 HighTermTitleBDVSort 72.17 (16.9%) 71.14 (15.4%) -1.4% ( -28% - 37%) 0.780 OrHighNotMed 659.68 (9.7%) 650.38 (12.6%) -1.4% ( -21% - 23%) 0.691 HighPhrase 73.05 (7.6%) 72.25 (9.3%) -1.1% ( -16% - 17%) 0.685 TermDTSort 123.29 (15.5%) 122.10 (13.8%) -1.0% ( -26% - 33%) 0.835 IntNRQ 167.75 (7.4%) 166.17 (8.6%) -0.9% ( -15% - 16%) 0.710 OrHighNotHigh 890.84 (13.2%) 883.31 (11.5%) -0.8% ( -22% - 27%) 0.828 OrHighLow 279.24 (7.5%) 276.97 (6.6%) -0.8% ( -13% - 14%) 0.718 PKLookup 198.13 (6.6%) 196.54 (6.9%) -0.8% ( -13% - 13%) 0.707 MedSloppyPhrase 94.28 (8.0%) 93.55 (6.5%) -0.8% ( -14% - 14%) 0.737 AndHighLow 574.70 (8.2%) 570.50 (9.6%) -0.7% ( -17% - 18%) 0.795 OrNotHighMed 717.34 (11.5%) 712.33 (12.4%) -0.7% ( -22% - 26%) 0.853 AndHighHigh 61.26 (7.2%) 60.84 (6.3%) -0.7% ( -13% - 13%) 0.753 HighSloppyPhrase6.56 (6.5%)6.52 (5.3%) -0.7% ( -11% - 11%) 0.729 LowSloppyPhrase 159.12 (6.9%) 158.23 (6.5%) -0.6% ( -13% - 13%) 0.794 LowPhrase 88.55 (8.6%) 88.07 (8.6%) -0.5% ( -16% - 18%) 0.844 MedSpanNear 14.63 (6.1%) 14.55 (5.3%) -0.5% ( -11% - 11%) 0.786 Fuzzy2 24.31 (9.5%) 24.19 (7.7%) -0.5% ( -16% - 18%) 0.858 MedTerm 1440.59 (9.8%) 1433.53 (11.5%) -0.5% ( -19% - 23%) 0.885 HighSpanNear 23.52 (6.1%) 23.40 (5.9%) -0.5% ( -11% - 12%) 0.797 OrHighHigh 32.01 (8.4%) 31.96 (5.7%) -0.1% ( -13% - 15%) 0.948 Fuzzy1 82.31 (11.7%) 82.30 (13.5%) -0.0% ( -22% - 28%) 0.998 OrHighNotLow 724.27 (9.5%) 724.70 (10.1%)0.1% ( -17% - 21%) 0.985 HighTermDayOfYearSort 156.00 (14.4%) 156.22 (14.6%)0.1% ( -25% - 33%) 0.975 Respell 68.40 (8.2%) 68.49 (7.6%)0.1% ( -14% - 17%) 0.955 MedIntervalsOrdered9.22 (7.4%)9.23 (6.9%)0.2% ( -13% - 15%) 0.936 OrNotHighHigh 571.66 (8.1%) 572.72 (10.9%)0.2% ( -17% - 20%) 0.951 LowSpanNear 82.39 (7.1%) 82.57 (4.4%)0.2% ( -10% - 12%) 0.907 LowTerm 1355.15 (10.2%) 1358.89 (10.0%)0.3% ( -18% - 22%) 0.931 HighTermMonthSort 58.72 (20.8%) 58.95 (20.1%)0.4% ( -33% - 52%) 0.950
[jira] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334 ] Feng Guo deleted comment on LUCENE-10334: --- was (Author: gf2121): If we can not tolerate the regression, another idea coming to my mind to solve the regression is introducing a 'detect warm up' phase for {{{}DirectReader{}}}. As most of the usage of DirectReader in DocvaluesProducer is a forward reading, we can probably judge hits is dense/sparse by first 128 #get, e.g. we can assume the reading is dense if we get more than 80% times in the first block, and choose block decoding for following gets if dense. Here is the POC code: [https://github.com/apache/lucene/pull/570] and benchmark result: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value OrHighMedDayTaxoFacets 12.08 (5.6%) 11.85 (4.4%) -1.9% ( -11% -8%) 0.228 MedTermDayTaxoFacets 35.50 (2.9%) 35.09 (2.1%) -1.2% ( -5% -3%) 0.148 AndHighHighDayTaxoFacets 20.35 (2.5%) 20.18 (2.2%) -0.8% ( -5% -4%) 0.275 BrowseMonthTaxoFacets 14.09 (12.4%) 13.99 (7.2%) -0.7% ( -18% - 21%) 0.817 AndHighMedDayTaxoFacets 100.43 (2.2%) 99.96 (2.2%) -0.5% ( -4% -3%) 0.501 LowIntervalsOrdered 31.96 (3.6%) 31.90 (2.7%) -0.2% ( -6% -6%) 0.853 HighIntervalsOrdered9.82 (4.8%)9.81 (3.8%) -0.1% ( -8% -8%) 0.925 HighTermDayOfYearSort 58.36 (8.2%) 58.29 (7.2%) -0.1% ( -14% - 16%) 0.962 MedIntervalsOrdered 16.33 (3.3%) 16.33 (2.5%) -0.0% ( -5% -6%) 0.967 HighTermTitleBDVSort 82.38 (11.9%) 82.52 (13.2%)0.2% ( -22% - 28%) 0.966 HighSpanNear 38.08 (1.9%) 38.17 (1.5%)0.2% ( -3% -3%) 0.687 AndHighHigh 73.02 (4.1%) 73.20 (4.4%)0.2% ( -7% -9%) 0.854 OrHighHigh 38.67 (2.1%) 38.77 (1.9%)0.3% ( -3% -4%) 0.669 LowSloppyPhrase 48.05 (5.4%) 48.20 (5.5%)0.3% ( -10% - 11%) 0.856 MedSloppyPhrase 34.55 (2.7%) 34.66 (2.6%)0.3% ( -4% -5%) 0.696 TermDTSort 200.08 (11.2%) 200.74 (11.3%)0.3% ( -19% - 25%) 0.926 HighTermMonthSort 126.69 (11.4%) 127.18 (11.7%)0.4% ( -20% - 26%) 0.917 HighSloppyPhrase 14.03 (3.5%) 14.09 (3.7%)0.4% ( -6% -7%) 0.703 MedSpanNear 103.61 (2.1%) 104.14 (1.2%)0.5% ( -2% -3%) 0.332 IntNRQ 126.16 (2.3%) 126.81 (2.7%)0.5% ( -4% -5%) 0.508 AndHighMed 164.27 (4.2%) 165.20 (4.4%)0.6% ( -7% -9%) 0.676 LowSpanNear 167.58 (2.7%) 168.63 (2.6%)0.6% ( -4% -6%) 0.460 PKLookup 201.62 (3.8%) 203.05 (4.7%)0.7% ( -7% -9%) 0.599 Respell 73.56 (2.1%) 74.43 (2.7%)1.2% ( -3% -6%) 0.121 MedPhrase 266.51 (5.2%) 270.42 (5.9%)1.5% ( -9% - 13%) 0.405 OrHighMed 116.57 (4.0%) 118.30 (3.3%)1.5% ( -5% -9%) 0.202 Prefix3 136.44 (3.9%) 138.51 (3.6%)1.5% ( -5% -9%) 0.204 OrNotHighMed 669.05 (5.3%) 679.79 (7.7%)1.6% ( -10% - 15%) 0.443 OrNotHighLow 907.93 (5.8%) 922.66 (10.1%)1.6% ( -13% - 18%) 0.533 Wildcard 146.59 (3.2%) 149.19 (4.9%)1.8% ( -6% - 10%) 0.172 OrHighLow 383.74 (8.5%) 390.67 (8.0%)1.8% ( -13% - 20%) 0.489 HighPhrase 96.06 (4.4%) 97.81 (6.8%)1.8% ( -8% - 13%) 0.316 Fuzzy2 65.58 (12.9%) 66.81 (11.3%)1.9% ( -19% - 29%) 0.624 LowPhrase 145.74 (4.0%) 148.50 (5.1%)1.9% ( -6% - 11%) 0.192 MedTerm 1470.64 (7.1%) 1498.96 (9.5%)1.9% ( -13% - 19%) 0.468 OrHighNotHigh 562.56 (5.7%) 573.78 (7.3%)2.0% ( -10% - 15%) 0.336 Fuzzy1 95.47 (5.7%) 97.51 (7.3%)2.1% ( -10% - 16%) 0.303
[jira] [Commented] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466709#comment-17466709 ] Feng Guo commented on LUCENE-10334: --- In order to save reading time, I deleted some previous progress comments and try to make a final summary here. ??one idea is we could try using the new block compression just for ordinals as a start?? Thanks [~rcmuir] for the suggestion! I made some optimizations in this approach and browse taxo tasks (Browse*TaxoFacets) are getting spped up too. So the benchmark is telling "dense faster sparse slower" instead of "SSDV faster Taxos slower" now. I suspect we probably did not see a SSDV regression just because we have not added reading sparse SSDV values tasks, e.g. a {{MedTermDaySSDVFacets}}. I've got two schemes in mind so far: *ForUtil Approach* This approach tends to make file format friendly to block decoding and do block decoding for each get. As a result we can get a rather delicious (130%) speed up in {{Browse*}} tasks. But we also get a slight (10%) regression in tasks that reading facets with a query (like MedTermDayTaxoFacets) since we are reading sparse values there and we need to decompress the whole 128 values block even we only need one value in that block. Here is the [code|https://github.com/apache/lucene/pull/562] luceneutil benchmark: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value AndHighMedDayTaxoFacets 71.49 (2.1%) 64.72 (2.0%) -9.5% ( -13% - -5%) 0.000 MedTermDayTaxoFacets 25.79 (2.6%) 24.00 (1.8%) -6.9% ( -11% - -2%) 0.000 AndHighHighDayTaxoFacets 13.13 (3.4%) 12.63 (3.1%) -3.9% ( -10% -2%) 0.000 OrHighMedDayTaxoFacets 13.71 (4.1%) 13.41 (4.7%) -2.2% ( -10% -6%) 0.118 PKLookup 204.87 (3.9%) 203.03 (3.6%) -0.9% ( -8% -6%) 0.450 Prefix3 113.85 (3.6%) 113.32 (4.6%) -0.5% ( -8% -8%) 0.724 HighSpanNear 25.34 (2.5%) 25.26 (3.1%) -0.3% ( -5% -5%) 0.714 LowSpanNear 55.96 (2.0%) 55.80 (2.1%) -0.3% ( -4% -3%) 0.658 MedSpanNear 56.84 (2.4%) 56.90 (2.2%)0.1% ( -4% -4%) 0.895 MedSloppyPhrase 26.57 (1.8%) 26.60 (1.9%)0.1% ( -3% -3%) 0.831 HighSloppyPhrase 30.20 (3.7%) 30.24 (3.6%)0.2% ( -6% -7%) 0.890 OrHighMed 49.96 (2.1%) 50.06 (1.7%)0.2% ( -3% -4%) 0.742 AndHighMed 96.70 (2.9%) 96.95 (2.6%)0.3% ( -5% -5%) 0.772 LowIntervalsOrdered 23.32 (4.6%) 23.38 (4.5%)0.3% ( -8% -9%) 0.856 OrHighHigh 38.09 (1.9%) 38.20 (1.8%)0.3% ( -3% -4%) 0.643 TermDTSort 128.55 (14.7%) 128.94 (11.6%)0.3% ( -22% - 31%) 0.942 Fuzzy1 99.54 (7.1%) 99.86 (8.0%)0.3% ( -13% - 16%) 0.893 HighIntervalsOrdered 15.58 (2.6%) 15.65 (2.6%)0.4% ( -4% -5%) 0.636 Respell 63.96 (1.9%) 64.22 (2.3%)0.4% ( -3% -4%) 0.542 OrHighNotHigh 611.12 (5.8%) 613.85 (6.2%)0.4% ( -10% - 13%) 0.814 MedIntervalsOrdered 59.48 (5.2%) 59.75 (5.1%)0.5% ( -9% - 11%) 0.780 AndHighHigh 58.76 (3.0%) 59.16 (3.0%)0.7% ( -5% -6%) 0.478 OrNotHighHigh 619.53 (6.0%) 623.79 (7.1%)0.7% ( -11% - 14%) 0.740 HighPhrase 31.00 (2.5%) 31.26 (2.7%)0.8% ( -4% -6%) 0.307 AndHighLow 828.41 (5.9%) 835.65 (7.1%)0.9% ( -11% - 14%) 0.672 OrNotHighLow 986.46 (6.8%) 995.13 (10.5%)0.9% ( -15% - 19%) 0.752 HighTermTitleBDVSort 110.39 (12.3%) 111.38 (11.1%)0.9% ( -20% - 27%) 0.807 IntNRQ 151.29 (2.6%) 152.96 (3.5%)1.1% ( -4% -7%) 0.262 LowTerm 1876.18 (7.8%) 1897.19 (8.3%)1.1% ( -13% - 18%) 0.660 HighTermDayOfYearSort 108.34 (18.9%) 109.87 (17.4%)1.4% ( -29% - 46%) 0.805 HighTermMonthSort 65.84 (11.0%) 66.78
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465416#comment-17465416 ] Feng Guo edited comment on LUCENE-10334 at 12/29/21, 4:41 AM: -- If we can not tolerate the regression, another idea coming to my mind to solve the regression is introducing a 'detect warm up' phase for {{{}DirectReader{}}}. As most of the usage of DirectReader in DocvaluesProducer is a forward reading, we can probably judge hits is dense/sparse by first 128 #get, e.g. we can assume the reading is dense if we get more than 80% times in the first block, and choose block decoding for following gets if dense. Here is the POC code: [https://github.com/apache/lucene/pull/570] and benchmark result: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value OrHighMedDayTaxoFacets 12.08 (5.6%) 11.85 (4.4%) -1.9% ( -11% -8%) 0.228 MedTermDayTaxoFacets 35.50 (2.9%) 35.09 (2.1%) -1.2% ( -5% -3%) 0.148 AndHighHighDayTaxoFacets 20.35 (2.5%) 20.18 (2.2%) -0.8% ( -5% -4%) 0.275 BrowseMonthTaxoFacets 14.09 (12.4%) 13.99 (7.2%) -0.7% ( -18% - 21%) 0.817 AndHighMedDayTaxoFacets 100.43 (2.2%) 99.96 (2.2%) -0.5% ( -4% -3%) 0.501 LowIntervalsOrdered 31.96 (3.6%) 31.90 (2.7%) -0.2% ( -6% -6%) 0.853 HighIntervalsOrdered9.82 (4.8%)9.81 (3.8%) -0.1% ( -8% -8%) 0.925 HighTermDayOfYearSort 58.36 (8.2%) 58.29 (7.2%) -0.1% ( -14% - 16%) 0.962 MedIntervalsOrdered 16.33 (3.3%) 16.33 (2.5%) -0.0% ( -5% -6%) 0.967 HighTermTitleBDVSort 82.38 (11.9%) 82.52 (13.2%)0.2% ( -22% - 28%) 0.966 HighSpanNear 38.08 (1.9%) 38.17 (1.5%)0.2% ( -3% -3%) 0.687 AndHighHigh 73.02 (4.1%) 73.20 (4.4%)0.2% ( -7% -9%) 0.854 OrHighHigh 38.67 (2.1%) 38.77 (1.9%)0.3% ( -3% -4%) 0.669 LowSloppyPhrase 48.05 (5.4%) 48.20 (5.5%)0.3% ( -10% - 11%) 0.856 MedSloppyPhrase 34.55 (2.7%) 34.66 (2.6%)0.3% ( -4% -5%) 0.696 TermDTSort 200.08 (11.2%) 200.74 (11.3%)0.3% ( -19% - 25%) 0.926 HighTermMonthSort 126.69 (11.4%) 127.18 (11.7%)0.4% ( -20% - 26%) 0.917 HighSloppyPhrase 14.03 (3.5%) 14.09 (3.7%)0.4% ( -6% -7%) 0.703 MedSpanNear 103.61 (2.1%) 104.14 (1.2%)0.5% ( -2% -3%) 0.332 IntNRQ 126.16 (2.3%) 126.81 (2.7%)0.5% ( -4% -5%) 0.508 AndHighMed 164.27 (4.2%) 165.20 (4.4%)0.6% ( -7% -9%) 0.676 LowSpanNear 167.58 (2.7%) 168.63 (2.6%)0.6% ( -4% -6%) 0.460 PKLookup 201.62 (3.8%) 203.05 (4.7%)0.7% ( -7% -9%) 0.599 Respell 73.56 (2.1%) 74.43 (2.7%)1.2% ( -3% -6%) 0.121 MedPhrase 266.51 (5.2%) 270.42 (5.9%)1.5% ( -9% - 13%) 0.405 OrHighMed 116.57 (4.0%) 118.30 (3.3%)1.5% ( -5% -9%) 0.202 Prefix3 136.44 (3.9%) 138.51 (3.6%)1.5% ( -5% -9%) 0.204 OrNotHighMed 669.05 (5.3%) 679.79 (7.7%)1.6% ( -10% - 15%) 0.443 OrNotHighLow 907.93 (5.8%) 922.66 (10.1%)1.6% ( -13% - 18%) 0.533 Wildcard 146.59 (3.2%) 149.19 (4.9%)1.8% ( -6% - 10%) 0.172 OrHighLow 383.74 (8.5%) 390.67 (8.0%)1.8% ( -13% - 20%) 0.489 HighPhrase 96.06 (4.4%) 97.81 (6.8%)1.8% ( -8% - 13%) 0.316 Fuzzy2 65.58 (12.9%) 66.81 (11.3%)1.9% ( -19% - 29%) 0.624 LowPhrase 145.74 (4.0%) 148.50 (5.1%)1.9% ( -6% - 11%) 0.192 MedTerm 1470.64 (7.1%) 1498.96 (9.5%)1.9% ( -13% - 19%) 0.468 OrHighNotHigh 562.56 (5.7%) 573.78 (7.3%)2.0% ( -10% - 15%) 0.336
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465416#comment-17465416 ] Feng Guo edited comment on LUCENE-10334 at 12/29/21, 4:40 AM: -- If we can not tolerate the regression, another idea coming to my mind to solve the regression is introducing a 'detect warm up' phase for {{{}DirectReader{}}}. As most of the usage of DirectReader in DocvaluesProducer is a forward reading, we can probably judge hits is dense/sparse by first 128th #get, e.g. we can assume the reading is dense if we get more than 80% times in the first block, and choose block decoding for following gets if dense. Here is the POC code: [https://github.com/apache/lucene/pull/570] and benchmark result: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value OrHighMedDayTaxoFacets 12.08 (5.6%) 11.85 (4.4%) -1.9% ( -11% -8%) 0.228 MedTermDayTaxoFacets 35.50 (2.9%) 35.09 (2.1%) -1.2% ( -5% -3%) 0.148 AndHighHighDayTaxoFacets 20.35 (2.5%) 20.18 (2.2%) -0.8% ( -5% -4%) 0.275 BrowseMonthTaxoFacets 14.09 (12.4%) 13.99 (7.2%) -0.7% ( -18% - 21%) 0.817 AndHighMedDayTaxoFacets 100.43 (2.2%) 99.96 (2.2%) -0.5% ( -4% -3%) 0.501 LowIntervalsOrdered 31.96 (3.6%) 31.90 (2.7%) -0.2% ( -6% -6%) 0.853 HighIntervalsOrdered9.82 (4.8%)9.81 (3.8%) -0.1% ( -8% -8%) 0.925 HighTermDayOfYearSort 58.36 (8.2%) 58.29 (7.2%) -0.1% ( -14% - 16%) 0.962 MedIntervalsOrdered 16.33 (3.3%) 16.33 (2.5%) -0.0% ( -5% -6%) 0.967 HighTermTitleBDVSort 82.38 (11.9%) 82.52 (13.2%)0.2% ( -22% - 28%) 0.966 HighSpanNear 38.08 (1.9%) 38.17 (1.5%)0.2% ( -3% -3%) 0.687 AndHighHigh 73.02 (4.1%) 73.20 (4.4%)0.2% ( -7% -9%) 0.854 OrHighHigh 38.67 (2.1%) 38.77 (1.9%)0.3% ( -3% -4%) 0.669 LowSloppyPhrase 48.05 (5.4%) 48.20 (5.5%)0.3% ( -10% - 11%) 0.856 MedSloppyPhrase 34.55 (2.7%) 34.66 (2.6%)0.3% ( -4% -5%) 0.696 TermDTSort 200.08 (11.2%) 200.74 (11.3%)0.3% ( -19% - 25%) 0.926 HighTermMonthSort 126.69 (11.4%) 127.18 (11.7%)0.4% ( -20% - 26%) 0.917 HighSloppyPhrase 14.03 (3.5%) 14.09 (3.7%)0.4% ( -6% -7%) 0.703 MedSpanNear 103.61 (2.1%) 104.14 (1.2%)0.5% ( -2% -3%) 0.332 IntNRQ 126.16 (2.3%) 126.81 (2.7%)0.5% ( -4% -5%) 0.508 AndHighMed 164.27 (4.2%) 165.20 (4.4%)0.6% ( -7% -9%) 0.676 LowSpanNear 167.58 (2.7%) 168.63 (2.6%)0.6% ( -4% -6%) 0.460 PKLookup 201.62 (3.8%) 203.05 (4.7%)0.7% ( -7% -9%) 0.599 Respell 73.56 (2.1%) 74.43 (2.7%)1.2% ( -3% -6%) 0.121 MedPhrase 266.51 (5.2%) 270.42 (5.9%)1.5% ( -9% - 13%) 0.405 OrHighMed 116.57 (4.0%) 118.30 (3.3%)1.5% ( -5% -9%) 0.202 Prefix3 136.44 (3.9%) 138.51 (3.6%)1.5% ( -5% -9%) 0.204 OrNotHighMed 669.05 (5.3%) 679.79 (7.7%)1.6% ( -10% - 15%) 0.443 OrNotHighLow 907.93 (5.8%) 922.66 (10.1%)1.6% ( -13% - 18%) 0.533 Wildcard 146.59 (3.2%) 149.19 (4.9%)1.8% ( -6% - 10%) 0.172 OrHighLow 383.74 (8.5%) 390.67 (8.0%)1.8% ( -13% - 20%) 0.489 HighPhrase 96.06 (4.4%) 97.81 (6.8%)1.8% ( -8% - 13%) 0.316 Fuzzy2 65.58 (12.9%) 66.81 (11.3%)1.9% ( -19% - 29%) 0.624 LowPhrase 145.74 (4.0%) 148.50 (5.1%)1.9% ( -6% - 11%) 0.192 MedTerm 1470.64 (7.1%) 1498.96 (9.5%)1.9% ( -13% - 19%) 0.468 OrHighNotHigh 562.56 (5.7%) 573.78 (7.3%)2.0% ( -10% - 15%)
[jira] [Commented] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466112#comment-17466112 ] Feng Guo commented on LUCENE-10334: --- In the 'detect warm up' approach, I unrolled the block decode codes, speed it up a bit: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value MedTermDayTaxoFacets 55.17 (6.5%) 53.38 (7.1%) -3.2% ( -15% - 11%) 0.129 Wildcard 309.31 (12.9%) 299.80 (12.6%) -3.1% ( -25% - 25%) 0.446 OrNotHighLow 696.18 (8.7%) 677.46 (10.1%) -2.7% ( -19% - 17%) 0.367 HighTerm 1183.05 (9.5%) 1151.67 (9.2%) -2.7% ( -19% - 17%) 0.368 OrHighMed 120.60 (7.0%) 117.55 (7.8%) -2.5% ( -16% - 13%) 0.279 OrHighMedDayTaxoFacets9.46 (7.5%)9.25 (6.7%) -2.2% ( -15% - 12%) 0.320 Prefix3 177.41 (7.3%) 173.96 (10.3%) -1.9% ( -18% - 16%) 0.489 AndHighHighDayTaxoFacets 28.81 (6.7%) 28.35 (6.3%) -1.6% ( -13% - 12%) 0.433 BrowseMonthTaxoFacets 13.50 (13.4%) 13.30 (6.6%) -1.5% ( -18% - 21%) 0.658 AndHighMedDayTaxoFacets 46.43 (7.8%) 45.75 (7.3%) -1.5% ( -15% - 14%) 0.540 MedPhrase 360.70 (8.5%) 355.47 (8.3%) -1.4% ( -16% - 16%) 0.587 AndHighMed 233.52 (6.7%) 230.19 (7.0%) -1.4% ( -14% - 13%) 0.510 HighTermTitleBDVSort 72.17 (16.9%) 71.14 (15.4%) -1.4% ( -28% - 37%) 0.780 OrHighNotMed 659.68 (9.7%) 650.38 (12.6%) -1.4% ( -21% - 23%) 0.691 HighPhrase 73.05 (7.6%) 72.25 (9.3%) -1.1% ( -16% - 17%) 0.685 TermDTSort 123.29 (15.5%) 122.10 (13.8%) -1.0% ( -26% - 33%) 0.835 IntNRQ 167.75 (7.4%) 166.17 (8.6%) -0.9% ( -15% - 16%) 0.710 OrHighNotHigh 890.84 (13.2%) 883.31 (11.5%) -0.8% ( -22% - 27%) 0.828 OrHighLow 279.24 (7.5%) 276.97 (6.6%) -0.8% ( -13% - 14%) 0.718 PKLookup 198.13 (6.6%) 196.54 (6.9%) -0.8% ( -13% - 13%) 0.707 MedSloppyPhrase 94.28 (8.0%) 93.55 (6.5%) -0.8% ( -14% - 14%) 0.737 AndHighLow 574.70 (8.2%) 570.50 (9.6%) -0.7% ( -17% - 18%) 0.795 OrNotHighMed 717.34 (11.5%) 712.33 (12.4%) -0.7% ( -22% - 26%) 0.853 AndHighHigh 61.26 (7.2%) 60.84 (6.3%) -0.7% ( -13% - 13%) 0.753 HighSloppyPhrase6.56 (6.5%)6.52 (5.3%) -0.7% ( -11% - 11%) 0.729 LowSloppyPhrase 159.12 (6.9%) 158.23 (6.5%) -0.6% ( -13% - 13%) 0.794 LowPhrase 88.55 (8.6%) 88.07 (8.6%) -0.5% ( -16% - 18%) 0.844 MedSpanNear 14.63 (6.1%) 14.55 (5.3%) -0.5% ( -11% - 11%) 0.786 Fuzzy2 24.31 (9.5%) 24.19 (7.7%) -0.5% ( -16% - 18%) 0.858 MedTerm 1440.59 (9.8%) 1433.53 (11.5%) -0.5% ( -19% - 23%) 0.885 HighSpanNear 23.52 (6.1%) 23.40 (5.9%) -0.5% ( -11% - 12%) 0.797 OrHighHigh 32.01 (8.4%) 31.96 (5.7%) -0.1% ( -13% - 15%) 0.948 Fuzzy1 82.31 (11.7%) 82.30 (13.5%) -0.0% ( -22% - 28%) 0.998 OrHighNotLow 724.27 (9.5%) 724.70 (10.1%)0.1% ( -17% - 21%) 0.985 HighTermDayOfYearSort 156.00 (14.4%) 156.22 (14.6%)0.1% ( -25% - 33%) 0.975 Respell 68.40 (8.2%) 68.49 (7.6%)0.1% ( -14% - 17%) 0.955 MedIntervalsOrdered9.22 (7.4%)9.23 (6.9%)0.2% ( -13% - 15%) 0.936 OrNotHighHigh 571.66 (8.1%) 572.72 (10.9%)0.2% ( -17% - 20%) 0.951 LowSpanNear 82.39 (7.1%) 82.57 (4.4%)0.2% ( -10% - 12%) 0.907 LowTerm 1355.15 (10.2%) 1358.89 (10.0%)0.3% ( -18% - 22%) 0.931 HighTermMonthSort 58.72 (20.8%)
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465310#comment-17465310 ] Feng Guo edited comment on LUCENE-10334 at 12/28/21, 4:22 AM: -- Thanks [~rcmuir] for suggestion! I tried some optimizations on this patch: 1. I replaced {{DirectWriter#unsignedBitsRequired}} with {{PackedInts#unsignedBitsRequired}} at first since ForUtil can support all bpv, this change can reduce some index size. But now i rollbacked this change since the decode of 1,2,4,8,12,16... could also be a bit faster in ForUtil. 2. {{ForUtil#decode}} will do a {{switch}} for each call, this can be avoided by the way like what we do in {{{}DirectReader{}}}, choose a implementation of an interface at the beginning. I applied this change in ForUtil. I'm not sure which is the major optimization but the report seems better now: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value AndHighMedDayTaxoFacets 71.49 (2.1%) 64.72 (2.0%) -9.5% ( -13% - -5%) 0.000 MedTermDayTaxoFacets 25.79 (2.6%) 24.00 (1.8%) -6.9% ( -11% - -2%) 0.000 AndHighHighDayTaxoFacets 13.13 (3.4%) 12.63 (3.1%) -3.9% ( -10% -2%) 0.000 OrHighMedDayTaxoFacets 13.71 (4.1%) 13.41 (4.7%) -2.2% ( -10% -6%) 0.118 PKLookup 204.87 (3.9%) 203.03 (3.6%) -0.9% ( -8% -6%) 0.450 Prefix3 113.85 (3.6%) 113.32 (4.6%) -0.5% ( -8% -8%) 0.724 HighSpanNear 25.34 (2.5%) 25.26 (3.1%) -0.3% ( -5% -5%) 0.714 LowSpanNear 55.96 (2.0%) 55.80 (2.1%) -0.3% ( -4% -3%) 0.658 MedSpanNear 56.84 (2.4%) 56.90 (2.2%)0.1% ( -4% -4%) 0.895 MedSloppyPhrase 26.57 (1.8%) 26.60 (1.9%)0.1% ( -3% -3%) 0.831 HighSloppyPhrase 30.20 (3.7%) 30.24 (3.6%)0.2% ( -6% -7%) 0.890 OrHighMed 49.96 (2.1%) 50.06 (1.7%)0.2% ( -3% -4%) 0.742 AndHighMed 96.70 (2.9%) 96.95 (2.6%)0.3% ( -5% -5%) 0.772 LowIntervalsOrdered 23.32 (4.6%) 23.38 (4.5%)0.3% ( -8% -9%) 0.856 OrHighHigh 38.09 (1.9%) 38.20 (1.8%)0.3% ( -3% -4%) 0.643 TermDTSort 128.55 (14.7%) 128.94 (11.6%)0.3% ( -22% - 31%) 0.942 Fuzzy1 99.54 (7.1%) 99.86 (8.0%)0.3% ( -13% - 16%) 0.893 HighIntervalsOrdered 15.58 (2.6%) 15.65 (2.6%)0.4% ( -4% -5%) 0.636 Respell 63.96 (1.9%) 64.22 (2.3%)0.4% ( -3% -4%) 0.542 OrHighNotHigh 611.12 (5.8%) 613.85 (6.2%)0.4% ( -10% - 13%) 0.814 MedIntervalsOrdered 59.48 (5.2%) 59.75 (5.1%)0.5% ( -9% - 11%) 0.780 AndHighHigh 58.76 (3.0%) 59.16 (3.0%)0.7% ( -5% -6%) 0.478 OrNotHighHigh 619.53 (6.0%) 623.79 (7.1%)0.7% ( -11% - 14%) 0.740 HighPhrase 31.00 (2.5%) 31.26 (2.7%)0.8% ( -4% -6%) 0.307 AndHighLow 828.41 (5.9%) 835.65 (7.1%)0.9% ( -11% - 14%) 0.672 OrNotHighLow 986.46 (6.8%) 995.13 (10.5%)0.9% ( -15% - 19%) 0.752 HighTermTitleBDVSort 110.39 (12.3%) 111.38 (11.1%)0.9% ( -20% - 27%) 0.807 IntNRQ 151.29 (2.6%) 152.96 (3.5%)1.1% ( -4% -7%) 0.262 LowTerm 1876.18 (7.8%) 1897.19 (8.3%)1.1% ( -13% - 18%) 0.660 HighTermDayOfYearSort 108.34 (18.9%) 109.87 (17.4%)1.4% ( -29% - 46%) 0.805 HighTermMonthSort 65.84 (11.0%) 66.78 (11.7%)1.4% ( -19% - 27%) 0.689 OrHighNotMed 770.05 (5.3%) 782.54 (8.8%)1.6% ( -11% - 16%) 0.480 Wildcard 182.10 (5.5%) 185.24 (7.2%)1.7% ( -10% - 15%) 0.394 LowSloppyPhrase 33.75 (6.6%) 34.35 (8.8%)1.8% ( -12% - 18%) 0.478 MedPhrase 161.57 (3.8%) 164.62 (6.1%)1.9% ( -7%
[jira] [Updated] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10334: -- Description: Previous talk is here: [https://github.com/apache/lucene/pull/557] This is trying to add a new BlockReader based on ForUtil to replace the DirectReader we are using for NumericDocvalues -*Benchmark based on wiki10m*- (Previous benchmark results are wrong so i deleted it to avoid misleading, let's see the benchmark in comments.) was: Previous talk is here: https://github.com/apache/lucene/pull/557 This is trying to add a new BlockReader based on ForUtil to replace the DirectReader we are using for NumericDocvalues *Benchmark based on wiki10m* {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value OrNotHighHigh 694.17 (8.2%) 685.83 (7.0%) -1.2% ( -15% - 15%) 0.618 Respell 75.15 (2.7%) 74.32 (2.0%) -1.1% ( -5% -3%) 0.146 Prefix3 220.11 (5.1%) 217.78 (5.8%) -1.1% ( -11% - 10%) 0.541 Wildcard 129.75 (3.7%) 128.63 (2.5%) -0.9% ( -6% -5%) 0.383 LowSpanNear 68.54 (2.1%) 68.00 (2.4%) -0.8% ( -5% -3%) 0.269 OrNotHighMed 732.90 (6.8%) 727.49 (5.3%) -0.7% ( -12% - 12%) 0.703 BrowseRandomLabelTaxoFacets11879.03 (8.6%)11799.33 (5.5%) -0.7% ( -13% - 14%) 0.769 HighSloppyPhrase6.87 (2.9%)6.83 (2.3%) -0.6% ( -5% -4%) 0.496 OrHighNotMed 827.54 (9.2%) 822.94 (8.0%) -0.6% ( -16% - 18%) 0.838 MedSpanNear 18.92 (5.7%) 18.82 (5.6%) -0.5% ( -11% - 11%) 0.759 OrHighMedDayTaxoFacets 10.27 (4.0%) 10.21 (4.3%) -0.5% ( -8% -8%) 0.676 PKLookup 207.98 (4.0%) 206.85 (2.7%) -0.5% ( -7% -6%) 0.621 LowIntervalsOrdered 159.17 (2.3%) 158.32 (2.2%) -0.5% ( -4% -3%) 0.445 HighSpanNear6.32 (4.2%)6.28 (4.1%) -0.5% ( -8% -8%) 0.691 MedIntervalsOrdered 85.31 (3.2%) 84.88 (2.9%) -0.5% ( -6% -5%) 0.607 HighTerm 1170.55 (5.8%) 1164.79 (3.9%) -0.5% ( -9% -9%) 0.753 LowSloppyPhrase 14.54 (3.1%) 14.48 (2.9%) -0.4% ( -6% -5%) 0.651 HighPhrase 112.81 (4.4%) 112.39 (4.1%) -0.4% ( -8% -8%) 0.781 OrNotHighLow 858.02 (5.9%) 854.99 (4.8%) -0.4% ( -10% - 10%) 0.835 HighIntervalsOrdered 25.08 (2.8%) 25.00 (2.6%) -0.3% ( -5% -5%) 0.701 MedPhrase 27.20 (2.1%) 27.11 (2.9%) -0.3% ( -5% -4%) 0.689 MedTermDayTaxoFacets 81.55 (2.3%) 81.35 (2.9%) -0.3% ( -5% -5%) 0.762 IntNRQ 63.36 (2.0%) 63.21 (2.5%) -0.2% ( -4% -4%) 0.740 Fuzzy2 73.24 (5.5%) 73.10 (6.2%) -0.2% ( -11% - 12%) 0.916 AndHighMedDayTaxoFacets 76.08 (3.5%) 75.98 (3.4%) -0.1% ( -6% -7%) 0.905 AndHighHigh 62.20 (2.0%) 62.18 (2.4%) -0.0% ( -4% -4%) 0.954 BrowseMonthTaxoFacets11993.48 (6.7%)11989.53 (4.8%) -0.0% ( -10% - 12%) 0.986 OrHighNotLow 732.82 (7.2%) 732.80 (6.2%) -0.0% ( -12% - 14%) 0.999 Fuzzy1 46.43 (5.3%) 46.45 (6.0%)0.0% ( -10% - 11%) 0.989 LowTerm 1608.25 (6.0%) 1608.84 (4.9%)0.0% ( -10% - 11%) 0.983 OrHighMed 75.90 (2.3%) 75.93 (1.8%)0.0% ( -3% -4%) 0.939 LowPhrase 273.81 (2.9%) 274.04 (3.3%)0.1% ( -5% -6%) 0.932 AndHighLow 717.24 (6.1%) 718.17 (3.3%)0.1% ( -8% - 10%) 0.933 AndHighHighDayTaxoFacets 39.63 (2.5%) 39.69 (2.6%)0.1% ( -4% -5%) 0.862 OrHighHigh 34.63 (1.8%) 34.68 (2.0%)0.1% ( -3% -4%) 0.821 MedSloppyPhrase 158.80 (2.8%) 159.09 (2.6%)0.2% ( -5% -5%) 0.832 OrHighLow 257.77 (2.9%)
[jira] [Created] (LUCENE-10343) Remove MyRandom in favor of test framework random
Feng Guo created LUCENE-10343: - Summary: Remove MyRandom in favor of test framework random Key: LUCENE-10343 URL: https://issues.apache.org/jira/browse/LUCENE-10343 Project: Lucene - Core Issue Type: Test Reporter: Feng Guo -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465416#comment-17465416 ] Feng Guo edited comment on LUCENE-10334 at 12/27/21, 2:30 PM: -- If we can not tolerate the regression, another idea coming to my mind to solve the regression is introducing a 'detect warm up' phase for {{{}DirectReader{}}}. As most of the usage of DirectReader in DocvaluesProducer is a forward reading, we can probably judge hits is dense/sparse by first 128th #get, e.g. we can say the reading is dense if {code:java} 128th reading index - 1st reading index <= 128 * 1.5 {code} And choose block decoding for following gets if dense. Here is the POC code: [https://github.com/apache/lucene/pull/570] and benchmark result: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value OrHighMedDayTaxoFacets 12.08 (5.6%) 11.85 (4.4%) -1.9% ( -11% -8%) 0.228 MedTermDayTaxoFacets 35.50 (2.9%) 35.09 (2.1%) -1.2% ( -5% -3%) 0.148 AndHighHighDayTaxoFacets 20.35 (2.5%) 20.18 (2.2%) -0.8% ( -5% -4%) 0.275 BrowseMonthTaxoFacets 14.09 (12.4%) 13.99 (7.2%) -0.7% ( -18% - 21%) 0.817 AndHighMedDayTaxoFacets 100.43 (2.2%) 99.96 (2.2%) -0.5% ( -4% -3%) 0.501 LowIntervalsOrdered 31.96 (3.6%) 31.90 (2.7%) -0.2% ( -6% -6%) 0.853 HighIntervalsOrdered9.82 (4.8%)9.81 (3.8%) -0.1% ( -8% -8%) 0.925 HighTermDayOfYearSort 58.36 (8.2%) 58.29 (7.2%) -0.1% ( -14% - 16%) 0.962 MedIntervalsOrdered 16.33 (3.3%) 16.33 (2.5%) -0.0% ( -5% -6%) 0.967 HighTermTitleBDVSort 82.38 (11.9%) 82.52 (13.2%)0.2% ( -22% - 28%) 0.966 HighSpanNear 38.08 (1.9%) 38.17 (1.5%)0.2% ( -3% -3%) 0.687 AndHighHigh 73.02 (4.1%) 73.20 (4.4%)0.2% ( -7% -9%) 0.854 OrHighHigh 38.67 (2.1%) 38.77 (1.9%)0.3% ( -3% -4%) 0.669 LowSloppyPhrase 48.05 (5.4%) 48.20 (5.5%)0.3% ( -10% - 11%) 0.856 MedSloppyPhrase 34.55 (2.7%) 34.66 (2.6%)0.3% ( -4% -5%) 0.696 TermDTSort 200.08 (11.2%) 200.74 (11.3%)0.3% ( -19% - 25%) 0.926 HighTermMonthSort 126.69 (11.4%) 127.18 (11.7%)0.4% ( -20% - 26%) 0.917 HighSloppyPhrase 14.03 (3.5%) 14.09 (3.7%)0.4% ( -6% -7%) 0.703 MedSpanNear 103.61 (2.1%) 104.14 (1.2%)0.5% ( -2% -3%) 0.332 IntNRQ 126.16 (2.3%) 126.81 (2.7%)0.5% ( -4% -5%) 0.508 AndHighMed 164.27 (4.2%) 165.20 (4.4%)0.6% ( -7% -9%) 0.676 LowSpanNear 167.58 (2.7%) 168.63 (2.6%)0.6% ( -4% -6%) 0.460 PKLookup 201.62 (3.8%) 203.05 (4.7%)0.7% ( -7% -9%) 0.599 Respell 73.56 (2.1%) 74.43 (2.7%)1.2% ( -3% -6%) 0.121 MedPhrase 266.51 (5.2%) 270.42 (5.9%)1.5% ( -9% - 13%) 0.405 OrHighMed 116.57 (4.0%) 118.30 (3.3%)1.5% ( -5% -9%) 0.202 Prefix3 136.44 (3.9%) 138.51 (3.6%)1.5% ( -5% -9%) 0.204 OrNotHighMed 669.05 (5.3%) 679.79 (7.7%)1.6% ( -10% - 15%) 0.443 OrNotHighLow 907.93 (5.8%) 922.66 (10.1%)1.6% ( -13% - 18%) 0.533 Wildcard 146.59 (3.2%) 149.19 (4.9%)1.8% ( -6% - 10%) 0.172 OrHighLow 383.74 (8.5%) 390.67 (8.0%)1.8% ( -13% - 20%) 0.489 HighPhrase 96.06 (4.4%) 97.81 (6.8%)1.8% ( -8% - 13%) 0.316 Fuzzy2 65.58 (12.9%) 66.81 (11.3%)1.9% ( -19% - 29%) 0.624 LowPhrase 145.74 (4.0%) 148.50 (5.1%)1.9% ( -6% - 11%) 0.192 MedTerm 1470.64 (7.1%) 1498.96 (9.5%)1.9% ( -13% - 19%) 0.468 OrHighNotHigh 562.56 (5.7%) 573.78 (7.3%)2.0%
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465416#comment-17465416 ] Feng Guo edited comment on LUCENE-10334 at 12/27/21, 11:42 AM: --- If we can not tolerate the regression, another idea coming to my mind to solve the regression is introducing a 'detect warm up' phase for {{{}DirectReader{}}}. As most of the usage of DirectReader in DocvaluesProducer is a forward reading, we can probably judge hits is dense/sparse by first 128th #get, e.g. we can say the reading is dense if {code:java} 128th reading index - 1st reading index <= 128 * 1.5 {code} And we can choose block decoding if dense. Here is the POC code: [https://github.com/apache/lucene/pull/570] and benchmark result: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value OrHighMedDayTaxoFacets 12.08 (5.6%) 11.85 (4.4%) -1.9% ( -11% -8%) 0.228 MedTermDayTaxoFacets 35.50 (2.9%) 35.09 (2.1%) -1.2% ( -5% -3%) 0.148 AndHighHighDayTaxoFacets 20.35 (2.5%) 20.18 (2.2%) -0.8% ( -5% -4%) 0.275 BrowseMonthTaxoFacets 14.09 (12.4%) 13.99 (7.2%) -0.7% ( -18% - 21%) 0.817 AndHighMedDayTaxoFacets 100.43 (2.2%) 99.96 (2.2%) -0.5% ( -4% -3%) 0.501 LowIntervalsOrdered 31.96 (3.6%) 31.90 (2.7%) -0.2% ( -6% -6%) 0.853 HighIntervalsOrdered9.82 (4.8%)9.81 (3.8%) -0.1% ( -8% -8%) 0.925 HighTermDayOfYearSort 58.36 (8.2%) 58.29 (7.2%) -0.1% ( -14% - 16%) 0.962 MedIntervalsOrdered 16.33 (3.3%) 16.33 (2.5%) -0.0% ( -5% -6%) 0.967 HighTermTitleBDVSort 82.38 (11.9%) 82.52 (13.2%)0.2% ( -22% - 28%) 0.966 HighSpanNear 38.08 (1.9%) 38.17 (1.5%)0.2% ( -3% -3%) 0.687 AndHighHigh 73.02 (4.1%) 73.20 (4.4%)0.2% ( -7% -9%) 0.854 OrHighHigh 38.67 (2.1%) 38.77 (1.9%)0.3% ( -3% -4%) 0.669 LowSloppyPhrase 48.05 (5.4%) 48.20 (5.5%)0.3% ( -10% - 11%) 0.856 MedSloppyPhrase 34.55 (2.7%) 34.66 (2.6%)0.3% ( -4% -5%) 0.696 TermDTSort 200.08 (11.2%) 200.74 (11.3%)0.3% ( -19% - 25%) 0.926 HighTermMonthSort 126.69 (11.4%) 127.18 (11.7%)0.4% ( -20% - 26%) 0.917 HighSloppyPhrase 14.03 (3.5%) 14.09 (3.7%)0.4% ( -6% -7%) 0.703 MedSpanNear 103.61 (2.1%) 104.14 (1.2%)0.5% ( -2% -3%) 0.332 IntNRQ 126.16 (2.3%) 126.81 (2.7%)0.5% ( -4% -5%) 0.508 AndHighMed 164.27 (4.2%) 165.20 (4.4%)0.6% ( -7% -9%) 0.676 LowSpanNear 167.58 (2.7%) 168.63 (2.6%)0.6% ( -4% -6%) 0.460 PKLookup 201.62 (3.8%) 203.05 (4.7%)0.7% ( -7% -9%) 0.599 Respell 73.56 (2.1%) 74.43 (2.7%)1.2% ( -3% -6%) 0.121 MedPhrase 266.51 (5.2%) 270.42 (5.9%)1.5% ( -9% - 13%) 0.405 OrHighMed 116.57 (4.0%) 118.30 (3.3%)1.5% ( -5% -9%) 0.202 Prefix3 136.44 (3.9%) 138.51 (3.6%)1.5% ( -5% -9%) 0.204 OrNotHighMed 669.05 (5.3%) 679.79 (7.7%)1.6% ( -10% - 15%) 0.443 OrNotHighLow 907.93 (5.8%) 922.66 (10.1%)1.6% ( -13% - 18%) 0.533 Wildcard 146.59 (3.2%) 149.19 (4.9%)1.8% ( -6% - 10%) 0.172 OrHighLow 383.74 (8.5%) 390.67 (8.0%)1.8% ( -13% - 20%) 0.489 HighPhrase 96.06 (4.4%) 97.81 (6.8%)1.8% ( -8% - 13%) 0.316 Fuzzy2 65.58 (12.9%) 66.81 (11.3%)1.9% ( -19% - 29%) 0.624 LowPhrase 145.74 (4.0%) 148.50 (5.1%)1.9% ( -6% - 11%) 0.192 MedTerm 1470.64 (7.1%) 1498.96 (9.5%)1.9% ( -13% - 19%) 0.468 OrHighNotHigh 562.56 (5.7%) 573.78 (7.3%)2.0% ( -10% -
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465416#comment-17465416 ] Feng Guo edited comment on LUCENE-10334 at 12/27/21, 9:35 AM: -- Another idea coming to my mind to solve the regression is introducing a 'detect warm up' phase for {{{}DirectReader{}}}. As most of the usage of DirectReader in DocvaluesProducer is a forward reading, we can probably judge hits is dense/sparse by first 128th #get, e.g. we can say the reading is dense if {code:java} 128th reading index - 1st reading index <= 128 * 1.5 {code} And we can choose block decoding if dense. This way could be an alternative if we can not accept the regression of the ForUtil patch, here is the POC code: [https://github.com/apache/lucene/pull/570] and benchmark result: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value OrHighMedDayTaxoFacets 12.08 (5.6%) 11.85 (4.4%) -1.9% ( -11% -8%) 0.228 MedTermDayTaxoFacets 35.50 (2.9%) 35.09 (2.1%) -1.2% ( -5% -3%) 0.148 AndHighHighDayTaxoFacets 20.35 (2.5%) 20.18 (2.2%) -0.8% ( -5% -4%) 0.275 BrowseMonthTaxoFacets 14.09 (12.4%) 13.99 (7.2%) -0.7% ( -18% - 21%) 0.817 AndHighMedDayTaxoFacets 100.43 (2.2%) 99.96 (2.2%) -0.5% ( -4% -3%) 0.501 LowIntervalsOrdered 31.96 (3.6%) 31.90 (2.7%) -0.2% ( -6% -6%) 0.853 HighIntervalsOrdered9.82 (4.8%)9.81 (3.8%) -0.1% ( -8% -8%) 0.925 HighTermDayOfYearSort 58.36 (8.2%) 58.29 (7.2%) -0.1% ( -14% - 16%) 0.962 MedIntervalsOrdered 16.33 (3.3%) 16.33 (2.5%) -0.0% ( -5% -6%) 0.967 HighTermTitleBDVSort 82.38 (11.9%) 82.52 (13.2%)0.2% ( -22% - 28%) 0.966 HighSpanNear 38.08 (1.9%) 38.17 (1.5%)0.2% ( -3% -3%) 0.687 AndHighHigh 73.02 (4.1%) 73.20 (4.4%)0.2% ( -7% -9%) 0.854 OrHighHigh 38.67 (2.1%) 38.77 (1.9%)0.3% ( -3% -4%) 0.669 LowSloppyPhrase 48.05 (5.4%) 48.20 (5.5%)0.3% ( -10% - 11%) 0.856 MedSloppyPhrase 34.55 (2.7%) 34.66 (2.6%)0.3% ( -4% -5%) 0.696 TermDTSort 200.08 (11.2%) 200.74 (11.3%)0.3% ( -19% - 25%) 0.926 HighTermMonthSort 126.69 (11.4%) 127.18 (11.7%)0.4% ( -20% - 26%) 0.917 HighSloppyPhrase 14.03 (3.5%) 14.09 (3.7%)0.4% ( -6% -7%) 0.703 MedSpanNear 103.61 (2.1%) 104.14 (1.2%)0.5% ( -2% -3%) 0.332 IntNRQ 126.16 (2.3%) 126.81 (2.7%)0.5% ( -4% -5%) 0.508 AndHighMed 164.27 (4.2%) 165.20 (4.4%)0.6% ( -7% -9%) 0.676 LowSpanNear 167.58 (2.7%) 168.63 (2.6%)0.6% ( -4% -6%) 0.460 PKLookup 201.62 (3.8%) 203.05 (4.7%)0.7% ( -7% -9%) 0.599 Respell 73.56 (2.1%) 74.43 (2.7%)1.2% ( -3% -6%) 0.121 MedPhrase 266.51 (5.2%) 270.42 (5.9%)1.5% ( -9% - 13%) 0.405 OrHighMed 116.57 (4.0%) 118.30 (3.3%)1.5% ( -5% -9%) 0.202 Prefix3 136.44 (3.9%) 138.51 (3.6%)1.5% ( -5% -9%) 0.204 OrNotHighMed 669.05 (5.3%) 679.79 (7.7%)1.6% ( -10% - 15%) 0.443 OrNotHighLow 907.93 (5.8%) 922.66 (10.1%)1.6% ( -13% - 18%) 0.533 Wildcard 146.59 (3.2%) 149.19 (4.9%)1.8% ( -6% - 10%) 0.172 OrHighLow 383.74 (8.5%) 390.67 (8.0%)1.8% ( -13% - 20%) 0.489 HighPhrase 96.06 (4.4%) 97.81 (6.8%)1.8% ( -8% - 13%) 0.316 Fuzzy2 65.58 (12.9%) 66.81 (11.3%)1.9% ( -19% - 29%) 0.624 LowPhrase 145.74 (4.0%) 148.50 (5.1%)1.9% ( -6% - 11%) 0.192 MedTerm 1470.64 (7.1%) 1498.96 (9.5%)1.9% ( -13% - 19%) 0.468 OrHighNotHigh 562.56
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465416#comment-17465416 ] Feng Guo edited comment on LUCENE-10334 at 12/26/21, 6:09 PM: -- Another idea coming to my mind to solve the regression is introducing a 'detect warm up' phase for {{{}DirectReader{}}}. As most of the usage of DirectReader in DocvaluesProducer is a forward reading, we can probably judge hits is dense/sparse by first 128th #get, e.g. we can say the reading is dense if {{{}128th index - 1st index <= 128 * 1.5{}}}. And if dense, we can do some block decoding there. This way could be an alternative if we can not accept the regression of the ForUtil patch, here is the POC code: [https://github.com/apache/lucene/pull/570] and benchmark result: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value OrHighMedDayTaxoFacets 12.08 (5.6%) 11.85 (4.4%) -1.9% ( -11% -8%) 0.228 MedTermDayTaxoFacets 35.50 (2.9%) 35.09 (2.1%) -1.2% ( -5% -3%) 0.148 AndHighHighDayTaxoFacets 20.35 (2.5%) 20.18 (2.2%) -0.8% ( -5% -4%) 0.275 BrowseMonthTaxoFacets 14.09 (12.4%) 13.99 (7.2%) -0.7% ( -18% - 21%) 0.817 AndHighMedDayTaxoFacets 100.43 (2.2%) 99.96 (2.2%) -0.5% ( -4% -3%) 0.501 LowIntervalsOrdered 31.96 (3.6%) 31.90 (2.7%) -0.2% ( -6% -6%) 0.853 HighIntervalsOrdered9.82 (4.8%)9.81 (3.8%) -0.1% ( -8% -8%) 0.925 HighTermDayOfYearSort 58.36 (8.2%) 58.29 (7.2%) -0.1% ( -14% - 16%) 0.962 MedIntervalsOrdered 16.33 (3.3%) 16.33 (2.5%) -0.0% ( -5% -6%) 0.967 HighTermTitleBDVSort 82.38 (11.9%) 82.52 (13.2%)0.2% ( -22% - 28%) 0.966 HighSpanNear 38.08 (1.9%) 38.17 (1.5%)0.2% ( -3% -3%) 0.687 AndHighHigh 73.02 (4.1%) 73.20 (4.4%)0.2% ( -7% -9%) 0.854 OrHighHigh 38.67 (2.1%) 38.77 (1.9%)0.3% ( -3% -4%) 0.669 LowSloppyPhrase 48.05 (5.4%) 48.20 (5.5%)0.3% ( -10% - 11%) 0.856 MedSloppyPhrase 34.55 (2.7%) 34.66 (2.6%)0.3% ( -4% -5%) 0.696 TermDTSort 200.08 (11.2%) 200.74 (11.3%)0.3% ( -19% - 25%) 0.926 HighTermMonthSort 126.69 (11.4%) 127.18 (11.7%)0.4% ( -20% - 26%) 0.917 HighSloppyPhrase 14.03 (3.5%) 14.09 (3.7%)0.4% ( -6% -7%) 0.703 MedSpanNear 103.61 (2.1%) 104.14 (1.2%)0.5% ( -2% -3%) 0.332 IntNRQ 126.16 (2.3%) 126.81 (2.7%)0.5% ( -4% -5%) 0.508 AndHighMed 164.27 (4.2%) 165.20 (4.4%)0.6% ( -7% -9%) 0.676 LowSpanNear 167.58 (2.7%) 168.63 (2.6%)0.6% ( -4% -6%) 0.460 PKLookup 201.62 (3.8%) 203.05 (4.7%)0.7% ( -7% -9%) 0.599 Respell 73.56 (2.1%) 74.43 (2.7%)1.2% ( -3% -6%) 0.121 MedPhrase 266.51 (5.2%) 270.42 (5.9%)1.5% ( -9% - 13%) 0.405 OrHighMed 116.57 (4.0%) 118.30 (3.3%)1.5% ( -5% -9%) 0.202 Prefix3 136.44 (3.9%) 138.51 (3.6%)1.5% ( -5% -9%) 0.204 OrNotHighMed 669.05 (5.3%) 679.79 (7.7%)1.6% ( -10% - 15%) 0.443 OrNotHighLow 907.93 (5.8%) 922.66 (10.1%)1.6% ( -13% - 18%) 0.533 Wildcard 146.59 (3.2%) 149.19 (4.9%)1.8% ( -6% - 10%) 0.172 OrHighLow 383.74 (8.5%) 390.67 (8.0%)1.8% ( -13% - 20%) 0.489 HighPhrase 96.06 (4.4%) 97.81 (6.8%)1.8% ( -8% - 13%) 0.316 Fuzzy2 65.58 (12.9%) 66.81 (11.3%)1.9% ( -19% - 29%) 0.624 LowPhrase 145.74 (4.0%) 148.50 (5.1%)1.9% ( -6% - 11%) 0.192 MedTerm 1470.64 (7.1%) 1498.96 (9.5%)1.9% ( -13% - 19%) 0.468 OrHighNotHigh 562.56 (5.7%)
[jira] [Commented] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465416#comment-17465416 ] Feng Guo commented on LUCENE-10334: --- Another idea coming to my mind to solve the regression is introducing a 'detect warm up' phase for {{DirectReader}}. As most of the usage of DirectReader in DocvaluesProducer is a forward reading, we can probably judge hits is dense/sparse by first 128th #get, e.g. we can say the reading is dense if {{128th index - 1st index <= 128 * 1.5}}. And if dense, we can do some block decoding there. This way could be an alternative if we can not accept the regression of the ForUtil patch, here is the POC code: https://github.com/apache/lucene/pull/570 and benchmark result: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value OrHighMedDayTaxoFacets 12.08 (5.6%) 11.85 (4.4%) -1.9% ( -11% -8%) 0.228 MedTermDayTaxoFacets 35.50 (2.9%) 35.09 (2.1%) -1.2% ( -5% -3%) 0.148 AndHighHighDayTaxoFacets 20.35 (2.5%) 20.18 (2.2%) -0.8% ( -5% -4%) 0.275 BrowseMonthTaxoFacets 14.09 (12.4%) 13.99 (7.2%) -0.7% ( -18% - 21%) 0.817 AndHighMedDayTaxoFacets 100.43 (2.2%) 99.96 (2.2%) -0.5% ( -4% -3%) 0.501 LowIntervalsOrdered 31.96 (3.6%) 31.90 (2.7%) -0.2% ( -6% -6%) 0.853 HighIntervalsOrdered9.82 (4.8%)9.81 (3.8%) -0.1% ( -8% -8%) 0.925 HighTermDayOfYearSort 58.36 (8.2%) 58.29 (7.2%) -0.1% ( -14% - 16%) 0.962 MedIntervalsOrdered 16.33 (3.3%) 16.33 (2.5%) -0.0% ( -5% -6%) 0.967 HighTermTitleBDVSort 82.38 (11.9%) 82.52 (13.2%)0.2% ( -22% - 28%) 0.966 HighSpanNear 38.08 (1.9%) 38.17 (1.5%)0.2% ( -3% -3%) 0.687 AndHighHigh 73.02 (4.1%) 73.20 (4.4%)0.2% ( -7% -9%) 0.854 OrHighHigh 38.67 (2.1%) 38.77 (1.9%)0.3% ( -3% -4%) 0.669 LowSloppyPhrase 48.05 (5.4%) 48.20 (5.5%)0.3% ( -10% - 11%) 0.856 MedSloppyPhrase 34.55 (2.7%) 34.66 (2.6%)0.3% ( -4% -5%) 0.696 TermDTSort 200.08 (11.2%) 200.74 (11.3%)0.3% ( -19% - 25%) 0.926 HighTermMonthSort 126.69 (11.4%) 127.18 (11.7%)0.4% ( -20% - 26%) 0.917 HighSloppyPhrase 14.03 (3.5%) 14.09 (3.7%)0.4% ( -6% -7%) 0.703 MedSpanNear 103.61 (2.1%) 104.14 (1.2%)0.5% ( -2% -3%) 0.332 IntNRQ 126.16 (2.3%) 126.81 (2.7%)0.5% ( -4% -5%) 0.508 AndHighMed 164.27 (4.2%) 165.20 (4.4%)0.6% ( -7% -9%) 0.676 LowSpanNear 167.58 (2.7%) 168.63 (2.6%)0.6% ( -4% -6%) 0.460 PKLookup 201.62 (3.8%) 203.05 (4.7%)0.7% ( -7% -9%) 0.599 Respell 73.56 (2.1%) 74.43 (2.7%)1.2% ( -3% -6%) 0.121 MedPhrase 266.51 (5.2%) 270.42 (5.9%)1.5% ( -9% - 13%) 0.405 OrHighMed 116.57 (4.0%) 118.30 (3.3%)1.5% ( -5% -9%) 0.202 Prefix3 136.44 (3.9%) 138.51 (3.6%)1.5% ( -5% -9%) 0.204 OrNotHighMed 669.05 (5.3%) 679.79 (7.7%)1.6% ( -10% - 15%) 0.443 OrNotHighLow 907.93 (5.8%) 922.66 (10.1%)1.6% ( -13% - 18%) 0.533 Wildcard 146.59 (3.2%) 149.19 (4.9%)1.8% ( -6% - 10%) 0.172 OrHighLow 383.74 (8.5%) 390.67 (8.0%)1.8% ( -13% - 20%) 0.489 HighPhrase 96.06 (4.4%) 97.81 (6.8%)1.8% ( -8% - 13%) 0.316 Fuzzy2 65.58 (12.9%) 66.81 (11.3%)1.9% ( -19% - 29%) 0.624 LowPhrase 145.74 (4.0%) 148.50 (5.1%)1.9% ( -6% - 11%) 0.192 MedTerm 1470.64 (7.1%) 1498.96 (9.5%)1.9% ( -13% - 19%) 0.468 OrHighNotHigh 562.56 (5.7%) 573.78 (7.3%)2.0% ( -10% - 15%) 0.336
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465310#comment-17465310 ] Feng Guo edited comment on LUCENE-10334 at 12/26/21, 8:52 AM: -- Thanks [~rcmuir] for suggestion! I tried some optimizations on this patch: 1. I replaced {{DirectWriter#unsignedBitsRequired}} with {{PackedInts#unsignedBitsRequired}} at first since ForUtil can support all bpv, this change can reduce some index size. But now i rollbacked this change since the decode of 1,2,4,8,12,16... could also be a bit faster in ForUtil. 2. {{ForUtil#decode}} will do a {{switch}} for each call, this can be avoided by the way like what we do in {{{}DirectReader{}}}, choose a implementation of an interface at the beginning. I applied this change in ForUtil. I'm not sure which is the major optimization but the report seems better now: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value AndHighMedDayTaxoFacets 71.49 (2.1%) 64.72 (2.0%) -9.5% ( -13% - -5%) 0.000 MedTermDayTaxoFacets 25.79 (2.6%) 24.00 (1.8%) -6.9% ( -11% - -2%) 0.000 AndHighHighDayTaxoFacets 13.13 (3.4%) 12.63 (3.1%) -3.9% ( -10% -2%) 0.000 OrHighMedDayTaxoFacets 13.71 (4.1%) 13.41 (4.7%) -2.2% ( -10% -6%) 0.118 PKLookup 204.87 (3.9%) 203.03 (3.6%) -0.9% ( -8% -6%) 0.450 Prefix3 113.85 (3.6%) 113.32 (4.6%) -0.5% ( -8% -8%) 0.724 HighSpanNear 25.34 (2.5%) 25.26 (3.1%) -0.3% ( -5% -5%) 0.714 LowSpanNear 55.96 (2.0%) 55.80 (2.1%) -0.3% ( -4% -3%) 0.658 MedSpanNear 56.84 (2.4%) 56.90 (2.2%)0.1% ( -4% -4%) 0.895 MedSloppyPhrase 26.57 (1.8%) 26.60 (1.9%)0.1% ( -3% -3%) 0.831 HighSloppyPhrase 30.20 (3.7%) 30.24 (3.6%)0.2% ( -6% -7%) 0.890 OrHighMed 49.96 (2.1%) 50.06 (1.7%)0.2% ( -3% -4%) 0.742 AndHighMed 96.70 (2.9%) 96.95 (2.6%)0.3% ( -5% -5%) 0.772 LowIntervalsOrdered 23.32 (4.6%) 23.38 (4.5%)0.3% ( -8% -9%) 0.856 OrHighHigh 38.09 (1.9%) 38.20 (1.8%)0.3% ( -3% -4%) 0.643 TermDTSort 128.55 (14.7%) 128.94 (11.6%)0.3% ( -22% - 31%) 0.942 Fuzzy1 99.54 (7.1%) 99.86 (8.0%)0.3% ( -13% - 16%) 0.893 HighIntervalsOrdered 15.58 (2.6%) 15.65 (2.6%)0.4% ( -4% -5%) 0.636 Respell 63.96 (1.9%) 64.22 (2.3%)0.4% ( -3% -4%) 0.542 OrHighNotHigh 611.12 (5.8%) 613.85 (6.2%)0.4% ( -10% - 13%) 0.814 MedIntervalsOrdered 59.48 (5.2%) 59.75 (5.1%)0.5% ( -9% - 11%) 0.780 AndHighHigh 58.76 (3.0%) 59.16 (3.0%)0.7% ( -5% -6%) 0.478 OrNotHighHigh 619.53 (6.0%) 623.79 (7.1%)0.7% ( -11% - 14%) 0.740 HighPhrase 31.00 (2.5%) 31.26 (2.7%)0.8% ( -4% -6%) 0.307 AndHighLow 828.41 (5.9%) 835.65 (7.1%)0.9% ( -11% - 14%) 0.672 OrNotHighLow 986.46 (6.8%) 995.13 (10.5%)0.9% ( -15% - 19%) 0.752 HighTermTitleBDVSort 110.39 (12.3%) 111.38 (11.1%)0.9% ( -20% - 27%) 0.807 IntNRQ 151.29 (2.6%) 152.96 (3.5%)1.1% ( -4% -7%) 0.262 LowTerm 1876.18 (7.8%) 1897.19 (8.3%)1.1% ( -13% - 18%) 0.660 HighTermDayOfYearSort 108.34 (18.9%) 109.87 (17.4%)1.4% ( -29% - 46%) 0.805 HighTermMonthSort 65.84 (11.0%) 66.78 (11.7%)1.4% ( -19% - 27%) 0.689 OrHighNotMed 770.05 (5.3%) 782.54 (8.8%)1.6% ( -11% - 16%) 0.480 Wildcard 182.10 (5.5%) 185.24 (7.2%)1.7% ( -10% - 15%) 0.394 LowSloppyPhrase 33.75 (6.6%) 34.35 (8.8%)1.8% ( -12% - 18%) 0.478 MedPhrase 161.57 (3.8%) 164.62 (6.1%)1.9% ( -7%
[jira] [Commented] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465310#comment-17465310 ] Feng Guo commented on LUCENE-10334: --- Thanks [~rcmuir] for suggestion! I tried some optimizations on this patch: 1. I replaced {{DirectWriter#unsignedBitsRequired}} with {{PackedInts#unsignedBitsRequired}} at first since ForUtil can support all bpv, this change can reduce some index size. But now i rollbacked this change since the decode of 1,2,4,8,12,16... could also be a bit faster in ForUtil. 2. {{ForUtil#decode}} will do a {{switch}} for each call, this can be avoided by the way like what we do in {{{}DirectReader{}}}, choose a implementation of an interface at the beginning. I applied this change in ForUtil. I'm not sure which is the major optimization but the report seems better now: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value AndHighMedDayTaxoFacets 71.49 (2.1%) 64.72 (2.0%) -9.5% ( -13% - -5%) 0.000 MedTermDayTaxoFacets 25.79 (2.6%) 24.00 (1.8%) -6.9% ( -11% - -2%) 0.000 AndHighHighDayTaxoFacets 13.13 (3.4%) 12.63 (3.1%) -3.9% ( -10% -2%) 0.000 OrHighMedDayTaxoFacets 13.71 (4.1%) 13.41 (4.7%) -2.2% ( -10% -6%) 0.118 PKLookup 204.87 (3.9%) 203.03 (3.6%) -0.9% ( -8% -6%) 0.450 Prefix3 113.85 (3.6%) 113.32 (4.6%) -0.5% ( -8% -8%) 0.724 HighSpanNear 25.34 (2.5%) 25.26 (3.1%) -0.3% ( -5% -5%) 0.714 LowSpanNear 55.96 (2.0%) 55.80 (2.1%) -0.3% ( -4% -3%) 0.658 MedSpanNear 56.84 (2.4%) 56.90 (2.2%)0.1% ( -4% -4%) 0.895 MedSloppyPhrase 26.57 (1.8%) 26.60 (1.9%)0.1% ( -3% -3%) 0.831 HighSloppyPhrase 30.20 (3.7%) 30.24 (3.6%)0.2% ( -6% -7%) 0.890 OrHighMed 49.96 (2.1%) 50.06 (1.7%)0.2% ( -3% -4%) 0.742 AndHighMed 96.70 (2.9%) 96.95 (2.6%)0.3% ( -5% -5%) 0.772 LowIntervalsOrdered 23.32 (4.6%) 23.38 (4.5%)0.3% ( -8% -9%) 0.856 OrHighHigh 38.09 (1.9%) 38.20 (1.8%)0.3% ( -3% -4%) 0.643 TermDTSort 128.55 (14.7%) 128.94 (11.6%)0.3% ( -22% - 31%) 0.942 Fuzzy1 99.54 (7.1%) 99.86 (8.0%)0.3% ( -13% - 16%) 0.893 HighIntervalsOrdered 15.58 (2.6%) 15.65 (2.6%)0.4% ( -4% -5%) 0.636 Respell 63.96 (1.9%) 64.22 (2.3%)0.4% ( -3% -4%) 0.542 OrHighNotHigh 611.12 (5.8%) 613.85 (6.2%)0.4% ( -10% - 13%) 0.814 MedIntervalsOrdered 59.48 (5.2%) 59.75 (5.1%)0.5% ( -9% - 11%) 0.780 AndHighHigh 58.76 (3.0%) 59.16 (3.0%)0.7% ( -5% -6%) 0.478 OrNotHighHigh 619.53 (6.0%) 623.79 (7.1%)0.7% ( -11% - 14%) 0.740 HighPhrase 31.00 (2.5%) 31.26 (2.7%)0.8% ( -4% -6%) 0.307 AndHighLow 828.41 (5.9%) 835.65 (7.1%)0.9% ( -11% - 14%) 0.672 OrNotHighLow 986.46 (6.8%) 995.13 (10.5%)0.9% ( -15% - 19%) 0.752 HighTermTitleBDVSort 110.39 (12.3%) 111.38 (11.1%)0.9% ( -20% - 27%) 0.807 IntNRQ 151.29 (2.6%) 152.96 (3.5%)1.1% ( -4% -7%) 0.262 LowTerm 1876.18 (7.8%) 1897.19 (8.3%)1.1% ( -13% - 18%) 0.660 HighTermDayOfYearSort 108.34 (18.9%) 109.87 (17.4%)1.4% ( -29% - 46%) 0.805 HighTermMonthSort 65.84 (11.0%) 66.78 (11.7%)1.4% ( -19% - 27%) 0.689 OrHighNotMed 770.05 (5.3%) 782.54 (8.8%)1.6% ( -11% - 16%) 0.480 Wildcard 182.10 (5.5%) 185.24 (7.2%)1.7% ( -10% - 15%) 0.394 LowSloppyPhrase 33.75 (6.6%) 34.35 (8.8%)1.8% ( -12% - 18%) 0.478 MedPhrase 161.57 (3.8%) 164.62 (6.1%)1.9% ( -7% - 12%) 0.242 OrHighNotLow
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464619#comment-17464619 ] Feng Guo edited comment on LUCENE-10334 at 12/23/21, 4:11 PM: -- I'm so sorry to tell that there is something wrong with my benchmark: The localrun script was still using the facets described in luceneutil [readme|https://github.com/mikemccand/luceneutil/blob/master/README.md], like this: {code:python} facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear')) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') {code} And i got the result mentioned above with this facets. But when i'm cloning a new luceneutil and rerun the setup.py, it becomes: {code:python} index = comp.newIndex('lucene_baseline', sourceData, facets = (('taxonomy:Date', 'Date'), ('taxonomy:Month', 'Month'), ('taxonomy:DayOfYear', 'DayOfYear'), ('sortedset:Month', 'Month'), ('sortedset:DayOfYear', 'DayOfYear'), ('taxonomy:RandomLabel', 'RandomLabel'), ('sortedset:RandomLabel', 'RandomLabel'))) {code} And the result is totally different with this: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseDayOfYearTaxoFacets 13.65 (8.9%) 10.49 (2.6%) -23.2% ( -31% - -12%) 0.000 BrowseMonthTaxoFacets 13.54 (14.6%) 10.89 (2.9%) -19.6% ( -32% - -2%) 0.000 BrowseDateTaxoFacets 13.50 (8.8%) 11.11 (3.7%) -17.7% ( -27% - -5%) 0.000 BrowseRandomLabelTaxoFacets 11.78 (7.0%)9.94 (5.1%) -15.6% ( -25% - -3%) 0.000 MedTermDayTaxoFacets 47.49 (2.4%) 41.45 (3.4%) -12.7% ( -18% - -7%) 0.000 AndHighMedDayTaxoFacets 130.24 (2.7%) 119.48 (3.9%) -8.3% ( -14% - -1%) 0.000 AndHighHighDayTaxoFacets 28.80 (2.8%) 27.09 (3.1%) -5.9% ( -11% -0%) 0.000 OrHighMedDayTaxoFacets9.68 (2.7%)9.35 (2.8%) -3.4% ( -8% -2%) 0.000 HighTermDayOfYearSort 139.73 (9.6%) 135.74 (10.2%) -2.9% ( -20% - 18%) 0.361 TermDTSort 151.46 (9.0%) 147.40 (7.7%) -2.7% ( -17% - 15%) 0.311 Fuzzy2 35.22 (6.3%) 34.38 (5.9%) -2.4% ( -13% - 10%) 0.213 MedSloppyPhrase 78.99 (6.7%) 77.21 (7.1%) -2.3% ( -15% - 12%) 0.300 LowTerm 1636.38 (6.4%) 1600.26 (9.6%) -2.2% ( -17% - 14%) 0.392 LowPhrase 252.68 (3.8%) 247.11 (6.5%) -2.2% ( -12% -8%) 0.189 Respell 61.23 (2.3%) 59.89 (5.0%) -2.2% ( -9% -5%) 0.078 AndHighHigh 56.54 (2.6%) 55.43 (4.3%) -2.0% ( -8% -5%) 0.084 MedSpanNear 99.37 (2.4%) 97.44 (5.2%) -1.9% ( -9% -5%) 0.128 HighSloppyPhrase 28.58 (5.4%) 28.05 (5.4%) -1.8% ( -11% -9%) 0.280 PKLookup 198.95 (3.0%) 195.34 (4.8%) -1.8% ( -9% -6%) 0.148 AndHighMed 116.50 (3.3%) 114.65 (4.5%) -1.6% ( -9% -6%) 0.204 Fuzzy1 75.07 (6.4%) 73.99 (8.1%) -1.4% ( -14% - 13%) 0.532 HighSpanNear 10.73 (2.8%) 10.58 (3.9%) -1.4% ( -7% -5%) 0.180 LowSpanNear 43.92 (2.4%) 43.30 (3.4%) -1.4% ( -6% -4%) 0.128 LowSloppyPhrase 14.70 (4.4%) 14.50 (4.2%) -1.3% ( -9% -7%) 0.329 HighTermMonthSort 148.80 (8.3%) 146.84 (8.1%) -1.3% ( -16% - 16%) 0.612 OrHighMed 103.00 (3.2%) 101.67 (5.1%) -1.3% ( -9% -7%) 0.341 MedIntervalsOrdered5.44 (2.5%)5.37 (2.2%) -1.3% ( -5% -3%) 0.092 OrHighNotHigh 648.74 (6.7%) 640.81 (8.8%) -1.2% ( -15% - 15%) 0.621 MedPhrase 80.35 (2.7%) 79.38 (4.8%) -1.2% ( -8% -6%) 0.327 HighTerm 1384.91 (6.8%) 1369.27 (8.8%) -1.1% ( -15% - 15%) 0.650
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464619#comment-17464619 ] Feng Guo edited comment on LUCENE-10334 at 12/23/21, 2:37 PM: -- I'm so sorry to tell that there is something wrong with my benchmark: The localrun script was still using the facets described in luceneutil [readme|https://github.com/mikemccand/luceneutil/blob/master/README.md], like this: {code:python} facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear')) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') {code} And i got the result mentioned above with this facets. But when i'm cloning a new luceneutil and rerun the setup.py, it becomes: {code:python} index = comp.newIndex('lucene_baseline', sourceData, facets = (('taxonomy:Date', 'Date'), ('taxonomy:Month', 'Month'), ('taxonomy:DayOfYear', 'DayOfYear'), ('sortedset:Month', 'Month'), ('sortedset:DayOfYear', 'DayOfYear'), ('taxonomy:RandomLabel', 'RandomLabel'), ('sortedset:RandomLabel', 'RandomLabel'))) {code} And the result is totally different with this: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseDayOfYearTaxoFacets 13.65 (8.9%) 10.49 (2.6%) -23.2% ( -31% - -12%) 0.000 BrowseMonthTaxoFacets 13.54 (14.6%) 10.89 (2.9%) -19.6% ( -32% - -2%) 0.000 BrowseDateTaxoFacets 13.50 (8.8%) 11.11 (3.7%) -17.7% ( -27% - -5%) 0.000 BrowseRandomLabelTaxoFacets 11.78 (7.0%)9.94 (5.1%) -15.6% ( -25% - -3%) 0.000 MedTermDayTaxoFacets 47.49 (2.4%) 41.45 (3.4%) -12.7% ( -18% - -7%) 0.000 AndHighMedDayTaxoFacets 130.24 (2.7%) 119.48 (3.9%) -8.3% ( -14% - -1%) 0.000 AndHighHighDayTaxoFacets 28.80 (2.8%) 27.09 (3.1%) -5.9% ( -11% -0%) 0.000 OrHighMedDayTaxoFacets9.68 (2.7%)9.35 (2.8%) -3.4% ( -8% -2%) 0.000 HighTermDayOfYearSort 139.73 (9.6%) 135.74 (10.2%) -2.9% ( -20% - 18%) 0.361 TermDTSort 151.46 (9.0%) 147.40 (7.7%) -2.7% ( -17% - 15%) 0.311 Fuzzy2 35.22 (6.3%) 34.38 (5.9%) -2.4% ( -13% - 10%) 0.213 MedSloppyPhrase 78.99 (6.7%) 77.21 (7.1%) -2.3% ( -15% - 12%) 0.300 LowTerm 1636.38 (6.4%) 1600.26 (9.6%) -2.2% ( -17% - 14%) 0.392 LowPhrase 252.68 (3.8%) 247.11 (6.5%) -2.2% ( -12% -8%) 0.189 Respell 61.23 (2.3%) 59.89 (5.0%) -2.2% ( -9% -5%) 0.078 AndHighHigh 56.54 (2.6%) 55.43 (4.3%) -2.0% ( -8% -5%) 0.084 MedSpanNear 99.37 (2.4%) 97.44 (5.2%) -1.9% ( -9% -5%) 0.128 HighSloppyPhrase 28.58 (5.4%) 28.05 (5.4%) -1.8% ( -11% -9%) 0.280 PKLookup 198.95 (3.0%) 195.34 (4.8%) -1.8% ( -9% -6%) 0.148 AndHighMed 116.50 (3.3%) 114.65 (4.5%) -1.6% ( -9% -6%) 0.204 Fuzzy1 75.07 (6.4%) 73.99 (8.1%) -1.4% ( -14% - 13%) 0.532 HighSpanNear 10.73 (2.8%) 10.58 (3.9%) -1.4% ( -7% -5%) 0.180 LowSpanNear 43.92 (2.4%) 43.30 (3.4%) -1.4% ( -6% -4%) 0.128 LowSloppyPhrase 14.70 (4.4%) 14.50 (4.2%) -1.3% ( -9% -7%) 0.329 HighTermMonthSort 148.80 (8.3%) 146.84 (8.1%) -1.3% ( -16% - 16%) 0.612 OrHighMed 103.00 (3.2%) 101.67 (5.1%) -1.3% ( -9% -7%) 0.341 MedIntervalsOrdered5.44 (2.5%)5.37 (2.2%) -1.3% ( -5% -3%) 0.092 OrHighNotHigh 648.74 (6.7%) 640.81 (8.8%) -1.2% ( -15% - 15%) 0.621 MedPhrase 80.35 (2.7%) 79.38 (4.8%) -1.2% ( -8% -6%) 0.327 HighTerm 1384.91 (6.8%) 1369.27 (8.8%) -1.1% ( -15% - 15%) 0.650
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464619#comment-17464619 ] Feng Guo edited comment on LUCENE-10334 at 12/23/21, 2:37 PM: -- I'm so sorry to tell that there is something wrong with my benchmark: The localrun script is still using the facets described in luceneutil [readme|https://github.com/mikemccand/luceneutil/blob/master/README.md], like this: {code:python} facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear')) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') {code} And i got the result mentioned above with this facets. But when i'm cloning a new luceneutil and rerun the setup.py, it becomes: {code:python} index = comp.newIndex('lucene_baseline', sourceData, facets = (('taxonomy:Date', 'Date'), ('taxonomy:Month', 'Month'), ('taxonomy:DayOfYear', 'DayOfYear'), ('sortedset:Month', 'Month'), ('sortedset:DayOfYear', 'DayOfYear'), ('taxonomy:RandomLabel', 'RandomLabel'), ('sortedset:RandomLabel', 'RandomLabel'))) {code} And the result is totally different with this: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseDayOfYearTaxoFacets 13.65 (8.9%) 10.49 (2.6%) -23.2% ( -31% - -12%) 0.000 BrowseMonthTaxoFacets 13.54 (14.6%) 10.89 (2.9%) -19.6% ( -32% - -2%) 0.000 BrowseDateTaxoFacets 13.50 (8.8%) 11.11 (3.7%) -17.7% ( -27% - -5%) 0.000 BrowseRandomLabelTaxoFacets 11.78 (7.0%)9.94 (5.1%) -15.6% ( -25% - -3%) 0.000 MedTermDayTaxoFacets 47.49 (2.4%) 41.45 (3.4%) -12.7% ( -18% - -7%) 0.000 AndHighMedDayTaxoFacets 130.24 (2.7%) 119.48 (3.9%) -8.3% ( -14% - -1%) 0.000 AndHighHighDayTaxoFacets 28.80 (2.8%) 27.09 (3.1%) -5.9% ( -11% -0%) 0.000 OrHighMedDayTaxoFacets9.68 (2.7%)9.35 (2.8%) -3.4% ( -8% -2%) 0.000 HighTermDayOfYearSort 139.73 (9.6%) 135.74 (10.2%) -2.9% ( -20% - 18%) 0.361 TermDTSort 151.46 (9.0%) 147.40 (7.7%) -2.7% ( -17% - 15%) 0.311 Fuzzy2 35.22 (6.3%) 34.38 (5.9%) -2.4% ( -13% - 10%) 0.213 MedSloppyPhrase 78.99 (6.7%) 77.21 (7.1%) -2.3% ( -15% - 12%) 0.300 LowTerm 1636.38 (6.4%) 1600.26 (9.6%) -2.2% ( -17% - 14%) 0.392 LowPhrase 252.68 (3.8%) 247.11 (6.5%) -2.2% ( -12% -8%) 0.189 Respell 61.23 (2.3%) 59.89 (5.0%) -2.2% ( -9% -5%) 0.078 AndHighHigh 56.54 (2.6%) 55.43 (4.3%) -2.0% ( -8% -5%) 0.084 MedSpanNear 99.37 (2.4%) 97.44 (5.2%) -1.9% ( -9% -5%) 0.128 HighSloppyPhrase 28.58 (5.4%) 28.05 (5.4%) -1.8% ( -11% -9%) 0.280 PKLookup 198.95 (3.0%) 195.34 (4.8%) -1.8% ( -9% -6%) 0.148 AndHighMed 116.50 (3.3%) 114.65 (4.5%) -1.6% ( -9% -6%) 0.204 Fuzzy1 75.07 (6.4%) 73.99 (8.1%) -1.4% ( -14% - 13%) 0.532 HighSpanNear 10.73 (2.8%) 10.58 (3.9%) -1.4% ( -7% -5%) 0.180 LowSpanNear 43.92 (2.4%) 43.30 (3.4%) -1.4% ( -6% -4%) 0.128 LowSloppyPhrase 14.70 (4.4%) 14.50 (4.2%) -1.3% ( -9% -7%) 0.329 HighTermMonthSort 148.80 (8.3%) 146.84 (8.1%) -1.3% ( -16% - 16%) 0.612 OrHighMed 103.00 (3.2%) 101.67 (5.1%) -1.3% ( -9% -7%) 0.341 MedIntervalsOrdered5.44 (2.5%)5.37 (2.2%) -1.3% ( -5% -3%) 0.092 OrHighNotHigh 648.74 (6.7%) 640.81 (8.8%) -1.2% ( -15% - 15%) 0.621 MedPhrase 80.35 (2.7%) 79.38 (4.8%) -1.2% ( -8% -6%) 0.327 HighTerm 1384.91 (6.8%) 1369.27 (8.8%) -1.1% ( -15% - 15%) 0.650
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464619#comment-17464619 ] Feng Guo edited comment on LUCENE-10334 at 12/23/21, 2:34 PM: -- I'm so sorry to say that there is something wrong with my benchmark: The localrun script is still using the facets described in luceneutil [readme|https://github.com/mikemccand/luceneutil/blob/master/README.md], like this: {code:python} facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear')) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') {code} And i got the result mentioned above with this facets. But when i'm cloning a new luceneutil and rerun the setup.py, it becomes: {code:python} index = comp.newIndex('lucene_baseline', sourceData, facets = (('taxonomy:Date', 'Date'), ('taxonomy:Month', 'Month'), ('taxonomy:DayOfYear', 'DayOfYear'), ('sortedset:Month', 'Month'), ('sortedset:DayOfYear', 'DayOfYear'), ('taxonomy:RandomLabel', 'RandomLabel'), ('sortedset:RandomLabel', 'RandomLabel'))) {code} And the result is totally different with this: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseDayOfYearTaxoFacets 13.65 (8.9%) 10.49 (2.6%) -23.2% ( -31% - -12%) 0.000 BrowseMonthTaxoFacets 13.54 (14.6%) 10.89 (2.9%) -19.6% ( -32% - -2%) 0.000 BrowseDateTaxoFacets 13.50 (8.8%) 11.11 (3.7%) -17.7% ( -27% - -5%) 0.000 BrowseRandomLabelTaxoFacets 11.78 (7.0%)9.94 (5.1%) -15.6% ( -25% - -3%) 0.000 MedTermDayTaxoFacets 47.49 (2.4%) 41.45 (3.4%) -12.7% ( -18% - -7%) 0.000 AndHighMedDayTaxoFacets 130.24 (2.7%) 119.48 (3.9%) -8.3% ( -14% - -1%) 0.000 AndHighHighDayTaxoFacets 28.80 (2.8%) 27.09 (3.1%) -5.9% ( -11% -0%) 0.000 OrHighMedDayTaxoFacets9.68 (2.7%)9.35 (2.8%) -3.4% ( -8% -2%) 0.000 HighTermDayOfYearSort 139.73 (9.6%) 135.74 (10.2%) -2.9% ( -20% - 18%) 0.361 TermDTSort 151.46 (9.0%) 147.40 (7.7%) -2.7% ( -17% - 15%) 0.311 Fuzzy2 35.22 (6.3%) 34.38 (5.9%) -2.4% ( -13% - 10%) 0.213 MedSloppyPhrase 78.99 (6.7%) 77.21 (7.1%) -2.3% ( -15% - 12%) 0.300 LowTerm 1636.38 (6.4%) 1600.26 (9.6%) -2.2% ( -17% - 14%) 0.392 LowPhrase 252.68 (3.8%) 247.11 (6.5%) -2.2% ( -12% -8%) 0.189 Respell 61.23 (2.3%) 59.89 (5.0%) -2.2% ( -9% -5%) 0.078 AndHighHigh 56.54 (2.6%) 55.43 (4.3%) -2.0% ( -8% -5%) 0.084 MedSpanNear 99.37 (2.4%) 97.44 (5.2%) -1.9% ( -9% -5%) 0.128 HighSloppyPhrase 28.58 (5.4%) 28.05 (5.4%) -1.8% ( -11% -9%) 0.280 PKLookup 198.95 (3.0%) 195.34 (4.8%) -1.8% ( -9% -6%) 0.148 AndHighMed 116.50 (3.3%) 114.65 (4.5%) -1.6% ( -9% -6%) 0.204 Fuzzy1 75.07 (6.4%) 73.99 (8.1%) -1.4% ( -14% - 13%) 0.532 HighSpanNear 10.73 (2.8%) 10.58 (3.9%) -1.4% ( -7% -5%) 0.180 LowSpanNear 43.92 (2.4%) 43.30 (3.4%) -1.4% ( -6% -4%) 0.128 LowSloppyPhrase 14.70 (4.4%) 14.50 (4.2%) -1.3% ( -9% -7%) 0.329 HighTermMonthSort 148.80 (8.3%) 146.84 (8.1%) -1.3% ( -16% - 16%) 0.612 OrHighMed 103.00 (3.2%) 101.67 (5.1%) -1.3% ( -9% -7%) 0.341 MedIntervalsOrdered5.44 (2.5%)5.37 (2.2%) -1.3% ( -5% -3%) 0.092 OrHighNotHigh 648.74 (6.7%) 640.81 (8.8%) -1.2% ( -15% - 15%) 0.621 MedPhrase 80.35 (2.7%) 79.38 (4.8%) -1.2% ( -8% -6%) 0.327 HighTerm 1384.91 (6.8%) 1369.27 (8.8%) -1.1% ( -15% - 15%) 0.650
[jira] [Commented] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464619#comment-17464619 ] Feng Guo commented on LUCENE-10334: --- I'm so sorry to say that there is something wrong with my benchmark: The localrun script is still using the facets described in luceneutil [readme|https://github.com/mikemccand/luceneutil/blob/master/README.md], like this: {code:python} facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear')) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') {code} And i got the result mentioned above with this facets. But when i'm cloning a new luceneutil and rerun the setup.py, it becomes: {code:python} index = comp.newIndex('lucene_baseline', sourceData, facets = (('taxonomy:Date', 'Date'), ('taxonomy:Month', 'Month'), ('taxonomy:DayOfYear', 'DayOfYear'), ('sortedset:Month', 'Month'), ('sortedset:DayOfYear', 'DayOfYear'), ('taxonomy:RandomLabel', 'RandomLabel'), ('sortedset:RandomLabel', 'RandomLabel'))) {code} And the result is totally different with this: {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseDayOfYearTaxoFacets 13.65 (8.9%) 10.49 (2.6%) -23.2% ( -31% - -12%) 0.000 BrowseMonthTaxoFacets 13.54 (14.6%) 10.89 (2.9%) -19.6% ( -32% - -2%) 0.000 BrowseDateTaxoFacets 13.50 (8.8%) 11.11 (3.7%) -17.7% ( -27% - -5%) 0.000 BrowseRandomLabelTaxoFacets 11.78 (7.0%)9.94 (5.1%) -15.6% ( -25% - -3%) 0.000 MedTermDayTaxoFacets 47.49 (2.4%) 41.45 (3.4%) -12.7% ( -18% - -7%) 0.000 AndHighMedDayTaxoFacets 130.24 (2.7%) 119.48 (3.9%) -8.3% ( -14% - -1%) 0.000 AndHighHighDayTaxoFacets 28.80 (2.8%) 27.09 (3.1%) -5.9% ( -11% -0%) 0.000 OrHighMedDayTaxoFacets9.68 (2.7%)9.35 (2.8%) -3.4% ( -8% -2%) 0.000 HighTermDayOfYearSort 139.73 (9.6%) 135.74 (10.2%) -2.9% ( -20% - 18%) 0.361 TermDTSort 151.46 (9.0%) 147.40 (7.7%) -2.7% ( -17% - 15%) 0.311 Fuzzy2 35.22 (6.3%) 34.38 (5.9%) -2.4% ( -13% - 10%) 0.213 MedSloppyPhrase 78.99 (6.7%) 77.21 (7.1%) -2.3% ( -15% - 12%) 0.300 LowTerm 1636.38 (6.4%) 1600.26 (9.6%) -2.2% ( -17% - 14%) 0.392 LowPhrase 252.68 (3.8%) 247.11 (6.5%) -2.2% ( -12% -8%) 0.189 Respell 61.23 (2.3%) 59.89 (5.0%) -2.2% ( -9% -5%) 0.078 AndHighHigh 56.54 (2.6%) 55.43 (4.3%) -2.0% ( -8% -5%) 0.084 MedSpanNear 99.37 (2.4%) 97.44 (5.2%) -1.9% ( -9% -5%) 0.128 HighSloppyPhrase 28.58 (5.4%) 28.05 (5.4%) -1.8% ( -11% -9%) 0.280 PKLookup 198.95 (3.0%) 195.34 (4.8%) -1.8% ( -9% -6%) 0.148 AndHighMed 116.50 (3.3%) 114.65 (4.5%) -1.6% ( -9% -6%) 0.204 Fuzzy1 75.07 (6.4%) 73.99 (8.1%) -1.4% ( -14% - 13%) 0.532 HighSpanNear 10.73 (2.8%) 10.58 (3.9%) -1.4% ( -7% -5%) 0.180 LowSpanNear 43.92 (2.4%) 43.30 (3.4%) -1.4% ( -6% -4%) 0.128 LowSloppyPhrase 14.70 (4.4%) 14.50 (4.2%) -1.3% ( -9% -7%) 0.329 HighTermMonthSort 148.80 (8.3%) 146.84 (8.1%) -1.3% ( -16% - 16%) 0.612 OrHighMed 103.00 (3.2%) 101.67 (5.1%) -1.3% ( -9% -7%) 0.341 MedIntervalsOrdered5.44 (2.5%)5.37 (2.2%) -1.3% ( -5% -3%) 0.092 OrHighNotHigh 648.74 (6.7%) 640.81 (8.8%) -1.2% ( -15% - 15%) 0.621 MedPhrase 80.35 (2.7%) 79.38 (4.8%) -1.2% ( -8% -6%) 0.327 HighTerm 1384.91 (6.8%) 1369.27 (8.8%) -1.1% ( -15% - 15%) 0.650 IntNRQ 127.36 (2.8%) 125.95
[jira] [Commented] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464539#comment-17464539 ] Feng Guo commented on LUCENE-10334: --- Thanks [~rcmuir] for reply! No hurry here, feel free to ignore this and have a nice holiday :) > Introduce a BlockReader based on ForUtil and use it for NumericDocValues > > > Key: LUCENE-10334 > URL: https://issues.apache.org/jira/browse/LUCENE-10334 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Previous talk is here: https://github.com/apache/lucene/pull/557 > This is trying to add a new BlockReader based on ForUtil to replace the > DirectReader we are using for NumericDocvalues > *Benchmark based on wiki10m* > {code:java} > TaskQPS baseline StdDevQPS > my_modified_version StdDevPct diff p-value >OrNotHighHigh 694.17 (8.2%) 685.83 > (7.0%) -1.2% ( -15% - 15%) 0.618 > Respell 75.15 (2.7%) 74.32 > (2.0%) -1.1% ( -5% -3%) 0.146 > Prefix3 220.11 (5.1%) 217.78 > (5.8%) -1.1% ( -11% - 10%) 0.541 > Wildcard 129.75 (3.7%) 128.63 > (2.5%) -0.9% ( -6% -5%) 0.383 > LowSpanNear 68.54 (2.1%) 68.00 > (2.4%) -0.8% ( -5% -3%) 0.269 > OrNotHighMed 732.90 (6.8%) 727.49 > (5.3%) -0.7% ( -12% - 12%) 0.703 > BrowseRandomLabelTaxoFacets11879.03 (8.6%)11799.33 > (5.5%) -0.7% ( -13% - 14%) 0.769 > HighSloppyPhrase6.87 (2.9%)6.83 > (2.3%) -0.6% ( -5% -4%) 0.496 > OrHighNotMed 827.54 (9.2%) 822.94 > (8.0%) -0.6% ( -16% - 18%) 0.838 > MedSpanNear 18.92 (5.7%) 18.82 > (5.6%) -0.5% ( -11% - 11%) 0.759 > OrHighMedDayTaxoFacets 10.27 (4.0%) 10.21 > (4.3%) -0.5% ( -8% -8%) 0.676 > PKLookup 207.98 (4.0%) 206.85 > (2.7%) -0.5% ( -7% -6%) 0.621 > LowIntervalsOrdered 159.17 (2.3%) 158.32 > (2.2%) -0.5% ( -4% -3%) 0.445 > HighSpanNear6.32 (4.2%)6.28 > (4.1%) -0.5% ( -8% -8%) 0.691 > MedIntervalsOrdered 85.31 (3.2%) 84.88 > (2.9%) -0.5% ( -6% -5%) 0.607 > HighTerm 1170.55 (5.8%) 1164.79 > (3.9%) -0.5% ( -9% -9%) 0.753 > LowSloppyPhrase 14.54 (3.1%) 14.48 > (2.9%) -0.4% ( -6% -5%) 0.651 > HighPhrase 112.81 (4.4%) 112.39 > (4.1%) -0.4% ( -8% -8%) 0.781 > OrNotHighLow 858.02 (5.9%) 854.99 > (4.8%) -0.4% ( -10% - 10%) 0.835 > HighIntervalsOrdered 25.08 (2.8%) 25.00 > (2.6%) -0.3% ( -5% -5%) 0.701 >MedPhrase 27.20 (2.1%) 27.11 > (2.9%) -0.3% ( -5% -4%) 0.689 > MedTermDayTaxoFacets 81.55 (2.3%) 81.35 > (2.9%) -0.3% ( -5% -5%) 0.762 > IntNRQ 63.36 (2.0%) 63.21 > (2.5%) -0.2% ( -4% -4%) 0.740 > Fuzzy2 73.24 (5.5%) 73.10 > (6.2%) -0.2% ( -11% - 12%) 0.916 > AndHighMedDayTaxoFacets 76.08 (3.5%) 75.98 > (3.4%) -0.1% ( -6% -7%) 0.905 > AndHighHigh 62.20 (2.0%) 62.18 > (2.4%) -0.0% ( -4% -4%) 0.954 >BrowseMonthTaxoFacets11993.48 (6.7%)11989.53 > (4.8%) -0.0% ( -10% - 12%) 0.986 > OrHighNotLow 732.82 (7.2%) 732.80 > (6.2%) -0.0% ( -12% - 14%) 0.999 > Fuzzy1 46.43 (5.3%) 46.45 > (6.0%)0.0% ( -10% - 11%) 0.989 > LowTerm 1608.25 (6.0%) 1608.84 > (4.9%)0.0% ( -10% - 11%) 0.983 >OrHighMed 75.90 (2.3%) 75.93 > (1.8%)0.0% ( -3% -4%) 0.939 >LowPhrase 273.81 (2.9%) 274.04 > (3.3%)0.1% ( -5% -6%) 0.932 > AndHighLow 717.24 (6.1%) 718.17 > (3.3%)0.1% ( -8% -
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463911#comment-17463911 ] Feng Guo edited comment on LUCENE-10334 at 12/23/21, 12:22 PM: --- Thanks [~gsmiller] ! Yes I do thought i would get some regression in sparse-hits tasks and the result suprised me too. Maybe we should thank to the powerful {{{}ForUtil{}}}? :) By reading codes in LUCENE-10033, i suspect there are two reasons that could probably lead to a more obvious regression than this approach: 1. 10033 approach computes bpv for each small block and need to read the pointer from a {{DirectMonotonicReader}} before seeking. While this approach is using a global bpv and pointers can be computed by {{{}offset + blockBytes * block{}}}. This could be faster. A global bpv can lead larger index size but i think it acceptable since it's what we used to do. 2. 10033 approach decode offset/gcd/delta for each block, some of them could be auto-vectorized but still a bit heavier. This approach is trying to make the decoding of blocks as simple as possible and jobs like gcd decoding is only done for hit docs. I'm not really sure these are major reasons but should make the benchmark result a bit more explainable. was (Author: gf2121): Thanks [~gsmiller] ! Yes I do thought i would get some regression in sparse-hits tasks and the result suprised me too. Maybe we should thank to the powerful {{{}ForUtil{}}}? :) By reading codes in LUCENE-10033, i suspect there are two reasons that could probably lead to a more obvious regression than this approach: 1. 10033 approach computes bpv for each small block and need to read the pointer from a {{DirectMonotonicReader}} before seeking. While this approach is using a global bpv and pointers can be computed by {{{}offset + blockBytes * block{}}}. This could be faster. A global bpv can lead larger index size but i think it acceptable since it's what we used to do. 2. 10033 approach decode offset/gcd/delta for each block, some of them could be auto-vectorized but still a bit heavier. This approach is trying to make the decoding of blocks as simple as possible and jobs like gcd decoding is only done for hit docs. I'm not really sure these are major reasons but should make the benchmark result a bit more explainable. By the way, here is my localrun script. I post it here in case there is something wrong with it. (I personally added ('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without this, but i'm not sure this is correct since the readme did not mention it) {code:python} if __name__ == '__main__': sourceData = competition.sourceData() comp = competition.Competition() facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', "RandomLabel")) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') candidate_index = comp.newIndex('lucene_candidate', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') #Warning -- Do not break the order of arguments #TODO -- Fix the following by using argparser if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches': concurrentSearches = True else: concurrentSearches = False # create a competitor named baseline with sources in the ../trunk folder comp.competitor('baseline', 'lucene_baseline', index = index, concurrentSearches = concurrentSearches) comp.competitor('my_modified_version', 'lucene_candidate', index = candidate_index, concurrentSearches = concurrentSearches) # start the benchmark - this can take long depending on your index and machines comp.benchmark("baseline_vs_patch") {code} > Introduce a BlockReader based on ForUtil and use it for NumericDocValues > > > Key: LUCENE-10334 > URL: https://issues.apache.org/jira/browse/LUCENE-10334 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Previous talk is here: https://github.com/apache/lucene/pull/557 > This is trying to add a new BlockReader based on ForUtil to replace the > DirectReader we are using for NumericDocvalues > *Benchmark based on wiki10m* > {code:java} > TaskQPS baseline StdDevQPS > my_modified_version StdDevPct diff p-value >OrNotHighHigh 694.17 (8.2%) 685.83 > (7.0%) -1.2% ( -15% - 15%) 0.618 > Respell 75.15 (2.7%) 74.32 > (2.0%) -1.1% ( -5% -3%) 0.146 >
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464380#comment-17464380 ] Feng Guo edited comment on LUCENE-10334 at 12/23/21, 9:15 AM: -- Hi all! Since all existing luceneutil tasks look good, I wonder if we need to add some more tasks or try this approach in Amazon's product search engine benchmark (like what we did in https://issues.apache.org/jira/browse/LUCENE-10033) to justify this change? I'm willing to do any work to futher test this if any. Or if we think existing luceneUtil tasks are enough to justify this, I've fixed CI issues and the PR is probably ready for a reivew now :) In this PR, I only replaced the {{DirectReader}} used in {{NumericDocValues#longValue}} with {{BlockReader}} but i suspect this could be used in some other places (e.g. {{DirectMonotonicReader}}, stored fields, even in BKD https://issues.apache.org/jira/browse/LUCENE-10315). I'll justify those changes in follow ups. was (Author: gf2121): Hi all! Since all existing luceneutil tasks look good, I wonder if we need to add some more tasks or try this approach in Amazon's product search engine benchmark (like what we did in https://issues.apache.org/jira/browse/LUCENE-10033) to justify this change? I'm willing to do any work to futher test this if any. Or if we think existing luceneUtil tasks are enough to justify this, I've fixed CI issues and the PR is probably ready for a reivew now :) in this PR, I only replaced the {{DirectReader}} used in {{NumericDocValues#longValue}} with {{BlockReader}} but i suspect this could probably be used in some other places (e.g. {{DirectMonotonicReader}}, stored fields, even in BKD https://issues.apache.org/jira/browse/LUCENE-10315). I'll justify those changes in follow ups. > Introduce a BlockReader based on ForUtil and use it for NumericDocValues > > > Key: LUCENE-10334 > URL: https://issues.apache.org/jira/browse/LUCENE-10334 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Previous talk is here: https://github.com/apache/lucene/pull/557 > This is trying to add a new BlockReader based on ForUtil to replace the > DirectReader we are using for NumericDocvalues > *Benchmark based on wiki10m* > {code:java} > TaskQPS baseline StdDevQPS > my_modified_version StdDevPct diff p-value >OrNotHighHigh 694.17 (8.2%) 685.83 > (7.0%) -1.2% ( -15% - 15%) 0.618 > Respell 75.15 (2.7%) 74.32 > (2.0%) -1.1% ( -5% -3%) 0.146 > Prefix3 220.11 (5.1%) 217.78 > (5.8%) -1.1% ( -11% - 10%) 0.541 > Wildcard 129.75 (3.7%) 128.63 > (2.5%) -0.9% ( -6% -5%) 0.383 > LowSpanNear 68.54 (2.1%) 68.00 > (2.4%) -0.8% ( -5% -3%) 0.269 > OrNotHighMed 732.90 (6.8%) 727.49 > (5.3%) -0.7% ( -12% - 12%) 0.703 > BrowseRandomLabelTaxoFacets11879.03 (8.6%)11799.33 > (5.5%) -0.7% ( -13% - 14%) 0.769 > HighSloppyPhrase6.87 (2.9%)6.83 > (2.3%) -0.6% ( -5% -4%) 0.496 > OrHighNotMed 827.54 (9.2%) 822.94 > (8.0%) -0.6% ( -16% - 18%) 0.838 > MedSpanNear 18.92 (5.7%) 18.82 > (5.6%) -0.5% ( -11% - 11%) 0.759 > OrHighMedDayTaxoFacets 10.27 (4.0%) 10.21 > (4.3%) -0.5% ( -8% -8%) 0.676 > PKLookup 207.98 (4.0%) 206.85 > (2.7%) -0.5% ( -7% -6%) 0.621 > LowIntervalsOrdered 159.17 (2.3%) 158.32 > (2.2%) -0.5% ( -4% -3%) 0.445 > HighSpanNear6.32 (4.2%)6.28 > (4.1%) -0.5% ( -8% -8%) 0.691 > MedIntervalsOrdered 85.31 (3.2%) 84.88 > (2.9%) -0.5% ( -6% -5%) 0.607 > HighTerm 1170.55 (5.8%) 1164.79 > (3.9%) -0.5% ( -9% -9%) 0.753 > LowSloppyPhrase 14.54 (3.1%) 14.48 > (2.9%) -0.4% ( -6% -5%) 0.651 > HighPhrase 112.81 (4.4%) 112.39 > (4.1%) -0.4% ( -8% -8%) 0.781 > OrNotHighLow 858.02 (5.9%) 854.99 > (4.8%) -0.4% ( -10% - 10%) 0.835 > HighIntervalsOrdered 25.08
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464380#comment-17464380 ] Feng Guo edited comment on LUCENE-10334 at 12/23/21, 9:14 AM: -- Hi all! Since all existing luceneutil tasks look good, I wonder if we need to add some more tasks or try this approach in Amazon's product search engine benchmark (like what we did in https://issues.apache.org/jira/browse/LUCENE-10033) to justify this change? I'm willing to do any work to futher test this if any. Or if we think existing luceneUtil tasks are enough to justify this, I've fixed CI issues and the PR is probably ready for a reivew now :) in this PR, I only replaced the {{DirectReader}} used in {{NumericDocValues#longValue}} with {{BlockReader}} but i suspect this could probably be used in some other places (e.g. {{DirectMonotonicReader}}, stored fields, even in BKD https://issues.apache.org/jira/browse/LUCENE-10315). I'll justify those changes in follow ups. was (Author: gf2121): Hi all! Since all existing luceneutil tasks look good, I wonder if we need to add some more tasks or try this approach in Amazon's product search engine benchmark (like what we did in https://issues.apache.org/jira/browse/LUCENE-10033) to justify this change? I'm willing to do any work to futher test this if any. Or if we think existing luceneUtil tasks are enough to justify this, I've fixed CI issues and the PR is probably ready for a reivew now :) n this PR, I only replaced the {{DirectReader}} used in {{NumericDocValues#longValue}} with {{BlockReader}} but i suspect this could probably be used in some other places (e.g. {{DirectMonotonicReader}}, stored fields, even in BKD https://issues.apache.org/jira/browse/LUCENE-10315). I'll justify those changes in follow ups. > Introduce a BlockReader based on ForUtil and use it for NumericDocValues > > > Key: LUCENE-10334 > URL: https://issues.apache.org/jira/browse/LUCENE-10334 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Previous talk is here: https://github.com/apache/lucene/pull/557 > This is trying to add a new BlockReader based on ForUtil to replace the > DirectReader we are using for NumericDocvalues > *Benchmark based on wiki10m* > {code:java} > TaskQPS baseline StdDevQPS > my_modified_version StdDevPct diff p-value >OrNotHighHigh 694.17 (8.2%) 685.83 > (7.0%) -1.2% ( -15% - 15%) 0.618 > Respell 75.15 (2.7%) 74.32 > (2.0%) -1.1% ( -5% -3%) 0.146 > Prefix3 220.11 (5.1%) 217.78 > (5.8%) -1.1% ( -11% - 10%) 0.541 > Wildcard 129.75 (3.7%) 128.63 > (2.5%) -0.9% ( -6% -5%) 0.383 > LowSpanNear 68.54 (2.1%) 68.00 > (2.4%) -0.8% ( -5% -3%) 0.269 > OrNotHighMed 732.90 (6.8%) 727.49 > (5.3%) -0.7% ( -12% - 12%) 0.703 > BrowseRandomLabelTaxoFacets11879.03 (8.6%)11799.33 > (5.5%) -0.7% ( -13% - 14%) 0.769 > HighSloppyPhrase6.87 (2.9%)6.83 > (2.3%) -0.6% ( -5% -4%) 0.496 > OrHighNotMed 827.54 (9.2%) 822.94 > (8.0%) -0.6% ( -16% - 18%) 0.838 > MedSpanNear 18.92 (5.7%) 18.82 > (5.6%) -0.5% ( -11% - 11%) 0.759 > OrHighMedDayTaxoFacets 10.27 (4.0%) 10.21 > (4.3%) -0.5% ( -8% -8%) 0.676 > PKLookup 207.98 (4.0%) 206.85 > (2.7%) -0.5% ( -7% -6%) 0.621 > LowIntervalsOrdered 159.17 (2.3%) 158.32 > (2.2%) -0.5% ( -4% -3%) 0.445 > HighSpanNear6.32 (4.2%)6.28 > (4.1%) -0.5% ( -8% -8%) 0.691 > MedIntervalsOrdered 85.31 (3.2%) 84.88 > (2.9%) -0.5% ( -6% -5%) 0.607 > HighTerm 1170.55 (5.8%) 1164.79 > (3.9%) -0.5% ( -9% -9%) 0.753 > LowSloppyPhrase 14.54 (3.1%) 14.48 > (2.9%) -0.4% ( -6% -5%) 0.651 > HighPhrase 112.81 (4.4%) 112.39 > (4.1%) -0.4% ( -8% -8%) 0.781 > OrNotHighLow 858.02 (5.9%) 854.99 > (4.8%) -0.4% ( -10% - 10%) 0.835 > HighIntervalsOrdered
[jira] [Commented] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464380#comment-17464380 ] Feng Guo commented on LUCENE-10334: --- Hi all! Since all existing luceneutil tasks look good, I wonder if we need to add some more tasks or try this approach in Amazon's product search engine benchmark (like what we did in https://issues.apache.org/jira/browse/LUCENE-10033) to justify this change? I'm willing to do any work to futher test this if any. Or if we think existing luceneUtil tasks are enough to justify this, I've fixed CI issues and the PR is probably ready for a reivew now :) n this PR, I only replaced the {{DirectReader}} used in {{NumericDocValues#longValue}} with {{BlockReader}} but i suspect this could probably be used in some other places (e.g. {{DirectMonotonicReader}}, stored fields, even in BKD https://issues.apache.org/jira/browse/LUCENE-10315). I'll justify those changes in follow ups. > Introduce a BlockReader based on ForUtil and use it for NumericDocValues > > > Key: LUCENE-10334 > URL: https://issues.apache.org/jira/browse/LUCENE-10334 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Previous talk is here: https://github.com/apache/lucene/pull/557 > This is trying to add a new BlockReader based on ForUtil to replace the > DirectReader we are using for NumericDocvalues > *Benchmark based on wiki10m* > {code:java} > TaskQPS baseline StdDevQPS > my_modified_version StdDevPct diff p-value >OrNotHighHigh 694.17 (8.2%) 685.83 > (7.0%) -1.2% ( -15% - 15%) 0.618 > Respell 75.15 (2.7%) 74.32 > (2.0%) -1.1% ( -5% -3%) 0.146 > Prefix3 220.11 (5.1%) 217.78 > (5.8%) -1.1% ( -11% - 10%) 0.541 > Wildcard 129.75 (3.7%) 128.63 > (2.5%) -0.9% ( -6% -5%) 0.383 > LowSpanNear 68.54 (2.1%) 68.00 > (2.4%) -0.8% ( -5% -3%) 0.269 > OrNotHighMed 732.90 (6.8%) 727.49 > (5.3%) -0.7% ( -12% - 12%) 0.703 > BrowseRandomLabelTaxoFacets11879.03 (8.6%)11799.33 > (5.5%) -0.7% ( -13% - 14%) 0.769 > HighSloppyPhrase6.87 (2.9%)6.83 > (2.3%) -0.6% ( -5% -4%) 0.496 > OrHighNotMed 827.54 (9.2%) 822.94 > (8.0%) -0.6% ( -16% - 18%) 0.838 > MedSpanNear 18.92 (5.7%) 18.82 > (5.6%) -0.5% ( -11% - 11%) 0.759 > OrHighMedDayTaxoFacets 10.27 (4.0%) 10.21 > (4.3%) -0.5% ( -8% -8%) 0.676 > PKLookup 207.98 (4.0%) 206.85 > (2.7%) -0.5% ( -7% -6%) 0.621 > LowIntervalsOrdered 159.17 (2.3%) 158.32 > (2.2%) -0.5% ( -4% -3%) 0.445 > HighSpanNear6.32 (4.2%)6.28 > (4.1%) -0.5% ( -8% -8%) 0.691 > MedIntervalsOrdered 85.31 (3.2%) 84.88 > (2.9%) -0.5% ( -6% -5%) 0.607 > HighTerm 1170.55 (5.8%) 1164.79 > (3.9%) -0.5% ( -9% -9%) 0.753 > LowSloppyPhrase 14.54 (3.1%) 14.48 > (2.9%) -0.4% ( -6% -5%) 0.651 > HighPhrase 112.81 (4.4%) 112.39 > (4.1%) -0.4% ( -8% -8%) 0.781 > OrNotHighLow 858.02 (5.9%) 854.99 > (4.8%) -0.4% ( -10% - 10%) 0.835 > HighIntervalsOrdered 25.08 (2.8%) 25.00 > (2.6%) -0.3% ( -5% -5%) 0.701 >MedPhrase 27.20 (2.1%) 27.11 > (2.9%) -0.3% ( -5% -4%) 0.689 > MedTermDayTaxoFacets 81.55 (2.3%) 81.35 > (2.9%) -0.3% ( -5% -5%) 0.762 > IntNRQ 63.36 (2.0%) 63.21 > (2.5%) -0.2% ( -4% -4%) 0.740 > Fuzzy2 73.24 (5.5%) 73.10 > (6.2%) -0.2% ( -11% - 12%) 0.916 > AndHighMedDayTaxoFacets 76.08 (3.5%) 75.98 > (3.4%) -0.1% ( -6% -7%) 0.905 > AndHighHigh 62.20 (2.0%) 62.18 > (2.4%) -0.0% ( -4% -4%) 0.954 >BrowseMonthTaxoFacets11993.48 (6.7%)11989.53 > (4.8%) -0.0% ( -10% -
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463911#comment-17463911 ] Feng Guo edited comment on LUCENE-10334 at 12/23/21, 4:29 AM: -- Thanks [~gsmiller] ! Yes I do thought i would get some regression in sparse-hits tasks and the result suprised me too. Maybe we should thank to the powerful {{{}ForUtil{}}}? :) By reading codes in LUCENE-10033, i suspect there are two reasons that could probably lead to a more obvious regression than this approach: 1. 10033 approach computes bpv for each small block and need to read the pointer from a {{DirectMonotonicReader}} before seeking. While this approach is using a global bpv and pointers can be computed by {{{}offset + blockBytes * block{}}}. This could be faster. A global bpv can lead larger index size but i think it acceptable since it's what we used to do. 2. 10033 approach decode offset/gcd/delta for each block, some of them could be auto-vectorized but still a bit heavier. This approach is trying to make the decoding of blocks as simple as possible and jobs like gcd decoding is only done for hit docs. I'm not really sure these are major reasons but should make the benchmark result a bit more explainable. By the way, here is my localrun script. I post it here in case there is something wrong with it. (I personally added ('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without this, but i'm not sure this is correct since the readme did not mention it) {code:python} if __name__ == '__main__': sourceData = competition.sourceData() comp = competition.Competition() facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', "RandomLabel")) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') candidate_index = comp.newIndex('lucene_candidate', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') #Warning -- Do not break the order of arguments #TODO -- Fix the following by using argparser if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches': concurrentSearches = True else: concurrentSearches = False # create a competitor named baseline with sources in the ../trunk folder comp.competitor('baseline', 'lucene_baseline', index = index, concurrentSearches = concurrentSearches) comp.competitor('my_modified_version', 'lucene_candidate', index = candidate_index, concurrentSearches = concurrentSearches) # start the benchmark - this can take long depending on your index and machines comp.benchmark("baseline_vs_patch") {code} was (Author: gf2121): Thanks [~gsmiller] ! Yes I do thought i would get some regression in sparse-hits tasks and the result suprised me too. Maybe we should thank to the powerful {{{}ForUtil{}}}? :) By reading codes in LUCENE-10033, i suspect there are two reasons that could probably lead to a more obvious regression than this approach: 1. 10033 approach computes bpv for each small block and need to read the pointer from a {{DirectMonotonicReader}} before seeking. While this approach is using a global bpv and pointers can be computed by {{{}offset + blockBytes * block{}}}. This could be faster. A global bpv can lead larger index size but i think it acceptable since it's what we used to do. 2. 10033 approach decode offset/gcd/delta for each block, some of them could be auto-vectorized but still a bit heavier. This approach is trying to make the decoding of blocks as simple as possible and jobs like gcd decoding is only done for hit docs. I'm not really sure these are major reasons but these reasons should make the benchmark result a bit more explainable. By the way, here is my localrun script. I post it here in case there is something wrong with it. (I personally added ('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without this, but i'm not sure this is correct since the readme did not mention it) {code:python} if __name__ == '__main__': sourceData = competition.sourceData() comp = competition.Competition() facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', "RandomLabel")) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') candidate_index = comp.newIndex('lucene_candidate', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') #Warning -- Do not break the order of arguments #TODO -- Fix the following by using argparser if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches': concurrentSearches = True else: concurrentSearches = False # create a competitor named baseline with sources in the ../trunk folder
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463911#comment-17463911 ] Feng Guo edited comment on LUCENE-10334 at 12/23/21, 3:08 AM: -- Thanks [~gsmiller] ! Yes I do thought i would get some regression in sparse-hits tasks and the result suprised me too. Maybe we should thank to the powerful {{{}ForUtil{}}}? :) By reading codes in LUCENE-10033, i suspect there are two reasons that could probably lead to a more obvious regression than this approach: 1. 10033 approach computes bpv for each small block and need to read the pointer from a {{DirectMonotonicReader}} before seeking. While this approach is using a global bpv and pointers can be computed by {{{}offset + blockBytes * block{}}}. This could be faster. A global bpv can lead larger index size but i think it acceptable since it's what we used to do. 2. 10033 approach decode offset/gcd/delta for each block, some of them could be auto-vectorized but still a bit heavier. This approach is trying to make the decoding of blocks as simple as possible and jobs like gcd decoding is only done for hit docs. I'm not really sure these are major reasons but these reasons should make the benchmark result a bit more explainable. By the way, here is my localrun script. I post it here in case there is something wrong with it. (I personally added ('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without this, but i'm not sure this is correct since the readme did not mention it) {code:python} if __name__ == '__main__': sourceData = competition.sourceData() comp = competition.Competition() facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', "RandomLabel")) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') candidate_index = comp.newIndex('lucene_candidate', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') #Warning -- Do not break the order of arguments #TODO -- Fix the following by using argparser if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches': concurrentSearches = True else: concurrentSearches = False # create a competitor named baseline with sources in the ../trunk folder comp.competitor('baseline', 'lucene_baseline', index = index, concurrentSearches = concurrentSearches) comp.competitor('my_modified_version', 'lucene_candidate', index = candidate_index, concurrentSearches = concurrentSearches) # start the benchmark - this can take long depending on your index and machines comp.benchmark("baseline_vs_patch") {code} was (Author: gf2121): Thanks [~gsmiller] ! Yes I do thought i would get some regression in sparse-hits tasks and the result suprised me too. Maybe we should thank to the powerful {{{}ForUtil{}}}? :) By reading codes in LUCENE-10033, i suspect there are two reasons that could probably lead to a more obvious regression than this approach: 1. 10033 approach computes bpv for each small block and need to read the pointer from a {{DirectMonotonicReader}} before seeking. While this approach is using a global bpv and pointers can be computed by {{{}offset + blockBytes * block{}}}. This could be faster. A global bpv can lead larger index size but i think it acceptable since it's what we used to do. 2. 10033 approach decode offset/gcd/delta for each block, some of them could be auto-vectorized but still a bit heavier. This approach is trying to make the decoding of blocks as simple as possible and jobs like gcd decoding is only done for hit docs. I'm not really sure these are major reasons but just trying to explain the benchmark result here. By the way, here is my localrun script. I post it here in case there is something wrong with it. (I personally added ('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without this, but i'm not sure this is correct since the readme did not mention it) {code:python} if __name__ == '__main__': sourceData = competition.sourceData() comp = competition.Competition() facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', "RandomLabel")) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') candidate_index = comp.newIndex('lucene_candidate', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') #Warning -- Do not break the order of arguments #TODO -- Fix the following by using argparser if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches': concurrentSearches = True else: concurrentSearches = False # create a competitor named baseline with sources in the ../trunk folder
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463911#comment-17463911 ] Feng Guo edited comment on LUCENE-10334 at 12/22/21, 6:25 PM: -- Thanks [~gsmiller] ! Yes I do thought i would get some regression in sparse-hits tasks and the result suprised me too. Maybe we should thank to the powerful {{{}ForUtil{}}}? :) By reading codes in LUCENE-10033, i suspect there are two reasons that could probably lead to a more obvious regression than this approach: 1. 10033 approach computes bpv for each small block and need to read the pointer from a {{DirectMonotonicReader}} before seeking. While this approach is using a global bpv and pointers can be computed by {{{}offset + blockBytes * block{}}}. This could be faster. A global bpv can lead larger index size but i think it acceptable since it's what we used to do. 2. 10033 approach decode offset/gcd/delta for each block, some of them could be auto-vectorized but still a bit heavier. This approach is trying to make the decoding of blocks as simple as possible and jobs like gcd decoding is only done for hit docs. I'm not really sure these are major reasons but just trying to explain the benchmark result here. By the way, here is my localrun script. I post it here in case there is something wrong with it. (I personally added ('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without this, but i'm not sure this is correct since the readme did not mention it) {code:python} if __name__ == '__main__': sourceData = competition.sourceData() comp = competition.Competition() facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', "RandomLabel")) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') candidate_index = comp.newIndex('lucene_candidate', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') #Warning -- Do not break the order of arguments #TODO -- Fix the following by using argparser if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches': concurrentSearches = True else: concurrentSearches = False # create a competitor named baseline with sources in the ../trunk folder comp.competitor('baseline', 'lucene_baseline', index = index, concurrentSearches = concurrentSearches) comp.competitor('my_modified_version', 'lucene_candidate', index = candidate_index, concurrentSearches = concurrentSearches) # start the benchmark - this can take long depending on your index and machines comp.benchmark("baseline_vs_patch") {code} was (Author: gf2121): Thanks [~gsmiller] ! Yes I do thought i would get some regression in sparse-hits tasks and the result suprised me too. Maybe we should thank to the powerful {{{}ForUtil{}}}? :) By reading codes in LUCENE-10033, i suspect there are two reasons that could probably lead to a more obvious regression than this approach: 1. 10033 approach computes bpv for each small block and need to read the pointer from a {{DirectMonotonicReader}} before seeking. While this approach is using a global bpv and pointers can be computed by {{{}offset + blockBytes * block{}}}. A global bpv can lead larger index size but i think it acceptable since it's what we used to do. 2. 10033 approach decode offset/gcd/delta for each block, some of them could be auto-vectorized but still a bit heavier. This approach is trying to make the decoding of blocks as simple as possible and jobs like gcd decoding is only done for hit docs. I'm not really sure these are major reasons but just trying to explain the benchmark result here. By the way, here is my localrun script. I post it here in case there is something wrong with it. (I personally added ('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without this, but i'm not sure this is correct since the readme did not mention it) {code:python} if __name__ == '__main__': sourceData = competition.sourceData() comp = competition.Competition() facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', "RandomLabel")) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') candidate_index = comp.newIndex('lucene_candidate', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') #Warning -- Do not break the order of arguments #TODO -- Fix the following by using argparser if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches': concurrentSearches = True else: concurrentSearches = False # create a competitor named baseline with sources in the ../trunk folder comp.competitor('baseline', 'lucene_baseline',
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463911#comment-17463911 ] Feng Guo edited comment on LUCENE-10334 at 12/22/21, 6:24 PM: -- Thanks [~gsmiller] ! Yes I do thought i would get some regression in sparse-hits tasks and the result suprised me too. Maybe we should thank to the powerful {{{}ForUtil{}}}? :) By reading codes in LUCENE-10033, i suspect there are two reasons that could probably lead to a more obvious regression than this approach: 1. 10033 approach computes bpv for each small block and need to read the pointer from a {{DirectMonotonicReader}} before seeking. While this approach is using a global bpv and pointers can be computed by {{{}offset + blockBytes * block{}}}. A global bpv can lead larger index size but i think it acceptable since it's what we used to do. 2. 10033 approach decode offset/gcd/delta for each block, some of them could be auto-vectorized but still a bit heavier. This approach is trying to make the decoding of blocks as simple as possible and jobs like gcd decoding is only done for hit docs. I'm not really sure these are major reasons but just trying to explain the benchmark result here. By the way, here is my localrun script. I post it here in case there is something wrong with it. (I personally added ('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without this, but i'm not sure this is correct since the readme did not mention it) {code:python} if __name__ == '__main__': sourceData = competition.sourceData() comp = competition.Competition() facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', "RandomLabel")) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') candidate_index = comp.newIndex('lucene_candidate', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') #Warning -- Do not break the order of arguments #TODO -- Fix the following by using argparser if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches': concurrentSearches = True else: concurrentSearches = False # create a competitor named baseline with sources in the ../trunk folder comp.competitor('baseline', 'lucene_baseline', index = index, concurrentSearches = concurrentSearches) comp.competitor('my_modified_version', 'lucene_candidate', index = candidate_index, concurrentSearches = concurrentSearches) # start the benchmark - this can take long depending on your index and machines comp.benchmark("baseline_vs_patch") {code} was (Author: gf2121): Thanks [~gsmiller] ! Yes I do thought i would get some regression in sparse-hits tasks and the result suprised me too. Maybe we should thank to the powerful {{{}ForUtil{}}}? :) By reading codes in LUCENE-10033, i suspect there are two reasons that could probably lead to a more obvious regression than this approach: 1. 10033 approach computes bpv for each small block and need to read the pointer from a {{DirectMonotonicReader}} before seek. But this approach is using a global bpv and pointers can be computed by {{{}offset + blockBytes * block{}}}. A global bpv can lead larger index size but i think it acceptable since it's what we used to do. 2. 10033 approach decode offset/gcd/delta for each block, some of them could be auto-vectorized but still a bit heavier. This approach is trying to make the decoding of blocks as simple as possible and jobs like gcd decoding is only done for hit docs. I'm not really sure these are major reasons but just trying to explain the benchmark result here. By the way, here is my localrun script. I post it here in case there is something wrong with it. (I personally added ('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without this, but i'm not sure this is correct since the readme did not mention it) {code:python} if __name__ == '__main__': sourceData = competition.sourceData() comp = competition.Competition() facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', "RandomLabel")) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') candidate_index = comp.newIndex('lucene_candidate', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') #Warning -- Do not break the order of arguments #TODO -- Fix the following by using argparser if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches': concurrentSearches = True else: concurrentSearches = False # create a competitor named baseline with sources in the ../trunk folder comp.competitor('baseline', 'lucene_baseline', index = index,
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463911#comment-17463911 ] Feng Guo edited comment on LUCENE-10334 at 12/22/21, 6:10 PM: -- Thanks [~gsmiller] ! Yes I do thought i would get some regression in sparse-hits tasks and the result suprised me too. Maybe we should thank to the powerful {{{}ForUtil{}}}? :) By reading codes in LUCENE-10033, i suspect there are two reasons that could probably lead to a more obvious regression than this approach: 1. 10033 approach computes bpv for each small block and need to read the pointer from a {{DirectMonotonicReader}} before seek. But this approach is using a global bpv and pointers can be computed by {{{}offset + blockBytes * block{}}}. A global bpv can lead larger index size but i think it acceptable since it's what we used to do. 2. 10033 approach decode offset/gcd/delta for each block, some of them could be auto-vectorized but still a bit heavier. This approach is trying to make the decoding of blocks as simple as possible and jobs like gcd decoding is only done for hit docs. I'm not really sure these are major reasons but just trying to explain the benchmark result here. By the way, here is my localrun script. I post it here in case there is something wrong with it. (I personally added ('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without this, but i'm not sure this is correct since the readme did not mention it) {code:python} if __name__ == '__main__': sourceData = competition.sourceData() comp = competition.Competition() facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', "RandomLabel")) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') candidate_index = comp.newIndex('lucene_candidate', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') #Warning -- Do not break the order of arguments #TODO -- Fix the following by using argparser if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches': concurrentSearches = True else: concurrentSearches = False # create a competitor named baseline with sources in the ../trunk folder comp.competitor('baseline', 'lucene_baseline', index = index, concurrentSearches = concurrentSearches) comp.competitor('my_modified_version', 'lucene_candidate', index = candidate_index, concurrentSearches = concurrentSearches) # start the benchmark - this can take long depending on your index and machines comp.benchmark("baseline_vs_patch") {code} was (Author: gf2121): Thanks [~gsmiller] ! Yes I do thought i would get some regression in sparse-hits tasks and the result suprised me too. Maybe we should thank to the powerful {{{}ForUtil{}}}? :) By reading codes in LUCENE-10033, i suspect there are two reasons that could probably lead to a more obvious regression than this approach: 1. 10033 approach computes bpv for each small block and need to read the pointer from a {{DirectMonotonicReader}} before seek. But this approach is using a global bpv and pointers can be computed by {{{}offset + blockBytes * block{}}}. A global bpv can lead larger index size but i think it acceptable since it's what we used to do. 2. 10033 approach decode offset/gcd/delta for each block, some of them could be auto-vectorized but still a bit heavier. This approach is trying to make the decoding of blocks as simple as possible and jobs like gcd decoding is only done for hit docs. I'm not really sure these are major reasons but just trying to explain the benchmark result here. By the way, here is my localrun script. I post it here in case there is something wrong with it. (I personally added ('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without this, but i'm not sure this is correct since the readme did not mention this) {code:python} if __name__ == '__main__': sourceData = competition.sourceData() comp = competition.Competition() facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', "RandomLabel")) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') candidate_index = comp.newIndex('lucene_candidate', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') #Warning -- Do not break the order of arguments #TODO -- Fix the following by using argparser if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches': concurrentSearches = True else: concurrentSearches = False # create a competitor named baseline with sources in the ../trunk folder comp.competitor('baseline', 'lucene_baseline', index = index, concurrentSearches
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463911#comment-17463911 ] Feng Guo edited comment on LUCENE-10334 at 12/22/21, 6:05 PM: -- Thanks [~gsmiller] ! Yes I do thought i would get some regression in sparse-hits tasks and the result suprised me too. Maybe we should thank to the powerful {{{}ForUtil{}}}? :) By reading codes in LUCENE-10033, i suspect there are two reasons that could probably lead to a more obvious regression than this approach: 1. 10033 approach computes bpv for each small block and need to read the pointer from a {{DirectMonotonicReader}} before seek. But this approach is using a global bpv and pointers can be computed by {{{}offset + blockBytes * block{}}}. A global bpv can lead larger index size but i think it acceptable since it's what we used to do. 2. 10033 approach decode offset/gcd/delta for each block, some of them could be auto-vectorized but still a bit heavier. This approach is trying to make the decoding of blocks as simple as possible and jobs like gcd decoding is only done for hit docs. I'm not really sure these are major reasons but just trying to explain the benchmark result here. By the way, here is my localrun script. I post it here in case there is something wrong with it. (I personally added ('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without this, but i'm not sure this is correct since the readme did not mention this) {code:python} if __name__ == '__main__': sourceData = competition.sourceData() comp = competition.Competition() facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', "RandomLabel")) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') candidate_index = comp.newIndex('lucene_candidate', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') #Warning -- Do not break the order of arguments #TODO -- Fix the following by using argparser if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches': concurrentSearches = True else: concurrentSearches = False # create a competitor named baseline with sources in the ../trunk folder comp.competitor('baseline', 'lucene_baseline', index = index, concurrentSearches = concurrentSearches) comp.competitor('my_modified_version', 'lucene_candidate', index = candidate_index, concurrentSearches = concurrentSearches) # start the benchmark - this can take long depending on your index and machines comp.benchmark("baseline_vs_patch") {code} was (Author: gf2121): Thanks [~gsmiller] ! Yes I do thought i would get some regression in sparse-hits tasks and the result suprised me too. Maybe it should thank to the powerful implementation of {{ForUtil}} ? By the way, here is my localrun script. I post it here in case there is something wrong with it.(e.g. I added {{('sortedset:RandomLabel', "RandomLabel")}} in facets. This is not mentioned in ReadMe but luceneutil can not work without it ) {code:python} if __name__ == '__main__': sourceData = competition.sourceData() comp = competition.Competition() facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', "RandomLabel")) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') candidate_index = comp.newIndex('lucene_candidate', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') #Warning -- Do not break the order of arguments #TODO -- Fix the following by using argparser if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches': concurrentSearches = True else: concurrentSearches = False # create a competitor named baseline with sources in the ../trunk folder comp.competitor('baseline', 'lucene_baseline', index = index, concurrentSearches = concurrentSearches) comp.competitor('my_modified_version', 'lucene_candidate', index = candidate_index, concurrentSearches = concurrentSearches) # start the benchmark - this can take long depending on your index and machines comp.benchmark("baseline_vs_patch") {code} > Introduce a BlockReader based on ForUtil and use it for NumericDocValues > > > Key: LUCENE-10334 > URL: https://issues.apache.org/jira/browse/LUCENE-10334 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Previous talk is here: https://github.com/apache/lucene/pull/557 >
[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463911#comment-17463911 ] Feng Guo edited comment on LUCENE-10334 at 12/22/21, 4:34 PM: -- Thanks [~gsmiller] ! Yes I do thought i would get some regression in sparse-hits tasks and the result suprised me too. Maybe it should thank to the powerful implementation of {{ForUtil}} ? By the way, here is my localrun script. I post it here in case there is something wrong with it.(e.g. I added {{('sortedset:RandomLabel', "RandomLabel")}} in facets. This is not mentioned in ReadMe but luceneutil can not work without it ) {code:python} if __name__ == '__main__': sourceData = competition.sourceData() comp = competition.Competition() facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', "RandomLabel")) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') candidate_index = comp.newIndex('lucene_candidate', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') #Warning -- Do not break the order of arguments #TODO -- Fix the following by using argparser if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches': concurrentSearches = True else: concurrentSearches = False # create a competitor named baseline with sources in the ../trunk folder comp.competitor('baseline', 'lucene_baseline', index = index, concurrentSearches = concurrentSearches) comp.competitor('my_modified_version', 'lucene_candidate', index = candidate_index, concurrentSearches = concurrentSearches) # start the benchmark - this can take long depending on your index and machines comp.benchmark("baseline_vs_patch") {code} was (Author: gf2121): Thanks [~gsmiller] ! Yes I do thought i would get some regression in sparse-result tasks and the result suprised me too. Maybe it should thank to the powerful implementation of {{ForUtil}} ? And i post my localrun script here in case there is something wrong (e.g. I added {{'sortedset:RandomLabel', "RandomLabel"}} in facets. This is not mentioned in readme but luceneutil can not work without it ) {code:python} if __name__ == '__main__': sourceData = competition.sourceData() comp = competition.Competition() facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', "RandomLabel")) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') candidate_index = comp.newIndex('lucene_candidate', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') #Warning -- Do not break the order of arguments #TODO -- Fix the following by using argparser if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches': concurrentSearches = True else: concurrentSearches = False # create a competitor named baseline with sources in the ../trunk folder comp.competitor('baseline', 'lucene_baseline', index = index, concurrentSearches = concurrentSearches) comp.competitor('my_modified_version', 'lucene_candidate', index = candidate_index, concurrentSearches = concurrentSearches) # start the benchmark - this can take long depending on your index and machines comp.benchmark("baseline_vs_patch") {code} > Introduce a BlockReader based on ForUtil and use it for NumericDocValues > > > Key: LUCENE-10334 > URL: https://issues.apache.org/jira/browse/LUCENE-10334 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Previous talk is here: https://github.com/apache/lucene/pull/557 > This is trying to add a new BlockReader based on ForUtil to replace the > DirectReader we are using for NumericDocvalues > *Benchmark based on wiki10m* > {code:java} > TaskQPS baseline StdDevQPS > my_modified_version StdDevPct diff p-value >OrNotHighHigh 694.17 (8.2%) 685.83 > (7.0%) -1.2% ( -15% - 15%) 0.618 > Respell 75.15 (2.7%) 74.32 > (2.0%) -1.1% ( -5% -3%) 0.146 > Prefix3 220.11 (5.1%) 217.78 > (5.8%) -1.1% ( -11% - 10%) 0.541 > Wildcard 129.75 (3.7%) 128.63 > (2.5%) -0.9% ( -6% -5%) 0.383 > LowSpanNear 68.54 (2.1%) 68.00 > (2.4%) -0.8% ( -5% -3%) 0.269 >
[jira] [Commented] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463911#comment-17463911 ] Feng Guo commented on LUCENE-10334: --- Thanks [~gsmiller] ! Yes I do thought i would get some regression in sparse-result tasks and the result suprised me too. Maybe it should thank to the powerful implementation of {{ForUtil}} ? And i post my localrun script here in case there is something wrong (e.g. I added {{'sortedset:RandomLabel', "RandomLabel"}} in facets. This is not mentioned in readme but luceneutil can not work without it ) {code:python} if __name__ == '__main__': sourceData = competition.sourceData() comp = competition.Competition() facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', "RandomLabel")) index = comp.newIndex('lucene_baseline', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') candidate_index = comp.newIndex('lucene_candidate', sourceData, facets=facets, indexSort='dayOfYearNumericDV:long') #Warning -- Do not break the order of arguments #TODO -- Fix the following by using argparser if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches': concurrentSearches = True else: concurrentSearches = False # create a competitor named baseline with sources in the ../trunk folder comp.competitor('baseline', 'lucene_baseline', index = index, concurrentSearches = concurrentSearches) comp.competitor('my_modified_version', 'lucene_candidate', index = candidate_index, concurrentSearches = concurrentSearches) # start the benchmark - this can take long depending on your index and machines comp.benchmark("baseline_vs_patch") {code} > Introduce a BlockReader based on ForUtil and use it for NumericDocValues > > > Key: LUCENE-10334 > URL: https://issues.apache.org/jira/browse/LUCENE-10334 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Previous talk is here: https://github.com/apache/lucene/pull/557 > This is trying to add a new BlockReader based on ForUtil to replace the > DirectReader we are using for NumericDocvalues > *Benchmark based on wiki10m* > {code:java} > TaskQPS baseline StdDevQPS > my_modified_version StdDevPct diff p-value >OrNotHighHigh 694.17 (8.2%) 685.83 > (7.0%) -1.2% ( -15% - 15%) 0.618 > Respell 75.15 (2.7%) 74.32 > (2.0%) -1.1% ( -5% -3%) 0.146 > Prefix3 220.11 (5.1%) 217.78 > (5.8%) -1.1% ( -11% - 10%) 0.541 > Wildcard 129.75 (3.7%) 128.63 > (2.5%) -0.9% ( -6% -5%) 0.383 > LowSpanNear 68.54 (2.1%) 68.00 > (2.4%) -0.8% ( -5% -3%) 0.269 > OrNotHighMed 732.90 (6.8%) 727.49 > (5.3%) -0.7% ( -12% - 12%) 0.703 > BrowseRandomLabelTaxoFacets11879.03 (8.6%)11799.33 > (5.5%) -0.7% ( -13% - 14%) 0.769 > HighSloppyPhrase6.87 (2.9%)6.83 > (2.3%) -0.6% ( -5% -4%) 0.496 > OrHighNotMed 827.54 (9.2%) 822.94 > (8.0%) -0.6% ( -16% - 18%) 0.838 > MedSpanNear 18.92 (5.7%) 18.82 > (5.6%) -0.5% ( -11% - 11%) 0.759 > OrHighMedDayTaxoFacets 10.27 (4.0%) 10.21 > (4.3%) -0.5% ( -8% -8%) 0.676 > PKLookup 207.98 (4.0%) 206.85 > (2.7%) -0.5% ( -7% -6%) 0.621 > LowIntervalsOrdered 159.17 (2.3%) 158.32 > (2.2%) -0.5% ( -4% -3%) 0.445 > HighSpanNear6.32 (4.2%)6.28 > (4.1%) -0.5% ( -8% -8%) 0.691 > MedIntervalsOrdered 85.31 (3.2%) 84.88 > (2.9%) -0.5% ( -6% -5%) 0.607 > HighTerm 1170.55 (5.8%) 1164.79 > (3.9%) -0.5% ( -9% -9%) 0.753 > LowSloppyPhrase 14.54 (3.1%) 14.48 > (2.9%) -0.4% ( -6% -5%) 0.651 > HighPhrase 112.81 (4.4%) 112.39 > (4.1%) -0.4% ( -8% -8%) 0.781 > OrNotHighLow 858.02 (5.9%) 854.99 > (4.8%) -0.4% ( -10% - 10%) 0.835 > HighIntervalsOrdered 25.08 (2.8%) 25.00 > (2.6%) -0.3% ( -5%
[jira] [Created] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues
Feng Guo created LUCENE-10334: - Summary: Introduce a BlockReader based on ForUtil and use it for NumericDocValues Key: LUCENE-10334 URL: https://issues.apache.org/jira/browse/LUCENE-10334 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Feng Guo Previous talk is here: https://github.com/apache/lucene/pull/557 This is trying to add a new BlockReader based on ForUtil to replace the DirectReader we are using for NumericDocvalues *Benchmark based on wiki10m* {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value OrNotHighHigh 694.17 (8.2%) 685.83 (7.0%) -1.2% ( -15% - 15%) 0.618 Respell 75.15 (2.7%) 74.32 (2.0%) -1.1% ( -5% -3%) 0.146 Prefix3 220.11 (5.1%) 217.78 (5.8%) -1.1% ( -11% - 10%) 0.541 Wildcard 129.75 (3.7%) 128.63 (2.5%) -0.9% ( -6% -5%) 0.383 LowSpanNear 68.54 (2.1%) 68.00 (2.4%) -0.8% ( -5% -3%) 0.269 OrNotHighMed 732.90 (6.8%) 727.49 (5.3%) -0.7% ( -12% - 12%) 0.703 BrowseRandomLabelTaxoFacets11879.03 (8.6%)11799.33 (5.5%) -0.7% ( -13% - 14%) 0.769 HighSloppyPhrase6.87 (2.9%)6.83 (2.3%) -0.6% ( -5% -4%) 0.496 OrHighNotMed 827.54 (9.2%) 822.94 (8.0%) -0.6% ( -16% - 18%) 0.838 MedSpanNear 18.92 (5.7%) 18.82 (5.6%) -0.5% ( -11% - 11%) 0.759 OrHighMedDayTaxoFacets 10.27 (4.0%) 10.21 (4.3%) -0.5% ( -8% -8%) 0.676 PKLookup 207.98 (4.0%) 206.85 (2.7%) -0.5% ( -7% -6%) 0.621 LowIntervalsOrdered 159.17 (2.3%) 158.32 (2.2%) -0.5% ( -4% -3%) 0.445 HighSpanNear6.32 (4.2%)6.28 (4.1%) -0.5% ( -8% -8%) 0.691 MedIntervalsOrdered 85.31 (3.2%) 84.88 (2.9%) -0.5% ( -6% -5%) 0.607 HighTerm 1170.55 (5.8%) 1164.79 (3.9%) -0.5% ( -9% -9%) 0.753 LowSloppyPhrase 14.54 (3.1%) 14.48 (2.9%) -0.4% ( -6% -5%) 0.651 HighPhrase 112.81 (4.4%) 112.39 (4.1%) -0.4% ( -8% -8%) 0.781 OrNotHighLow 858.02 (5.9%) 854.99 (4.8%) -0.4% ( -10% - 10%) 0.835 HighIntervalsOrdered 25.08 (2.8%) 25.00 (2.6%) -0.3% ( -5% -5%) 0.701 MedPhrase 27.20 (2.1%) 27.11 (2.9%) -0.3% ( -5% -4%) 0.689 MedTermDayTaxoFacets 81.55 (2.3%) 81.35 (2.9%) -0.3% ( -5% -5%) 0.762 IntNRQ 63.36 (2.0%) 63.21 (2.5%) -0.2% ( -4% -4%) 0.740 Fuzzy2 73.24 (5.5%) 73.10 (6.2%) -0.2% ( -11% - 12%) 0.916 AndHighMedDayTaxoFacets 76.08 (3.5%) 75.98 (3.4%) -0.1% ( -6% -7%) 0.905 AndHighHigh 62.20 (2.0%) 62.18 (2.4%) -0.0% ( -4% -4%) 0.954 BrowseMonthTaxoFacets11993.48 (6.7%)11989.53 (4.8%) -0.0% ( -10% - 12%) 0.986 OrHighNotLow 732.82 (7.2%) 732.80 (6.2%) -0.0% ( -12% - 14%) 0.999 Fuzzy1 46.43 (5.3%) 46.45 (6.0%)0.0% ( -10% - 11%) 0.989 LowTerm 1608.25 (6.0%) 1608.84 (4.9%)0.0% ( -10% - 11%) 0.983 OrHighMed 75.90 (2.3%) 75.93 (1.8%)0.0% ( -3% -4%) 0.939 LowPhrase 273.81 (2.9%) 274.04 (3.3%)0.1% ( -5% -6%) 0.932 AndHighLow 717.24 (6.1%) 718.17 (3.3%)0.1% ( -8% - 10%) 0.933 AndHighHighDayTaxoFacets 39.63 (2.5%) 39.69 (2.6%)0.1% ( -4% -5%) 0.862 OrHighHigh 34.63 (1.8%) 34.68 (2.0%)0.1% ( -3% -4%) 0.821 MedSloppyPhrase 158.80 (2.8%) 159.09 (2.6%)0.2% ( -5% -5%) 0.832 OrHighLow 257.77 (2.9%) 258.46 (4.6%)0.3% ( -7% -8%) 0.826 AndHighMed 133.43 (2.1%) 133.79 (2.7%)0.3% (
[jira] [Updated] (LUCENE-10333) Speed up BinaryDocValues with a batch reading on LongValues
[ https://issues.apache.org/jira/browse/LUCENE-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10333: -- Description: *Description* In {{{}Lucene90DocValuesProducer{}}}, {{BinaryDocValue}} (as well as {{SortedNumericDocValues}} not in singleton case) has code patterns like this: {code:java} long startOffset = addresses.get(doc); bytes.length = (int) (addresses.get(doc + 1L) - startOffset); {code} This means we need to read 2 longs stored together. We could probably push down this info to {{LongValues}} and read 2 values together in one call. I think this can make sense because these codes could be rather hot. *Benchmark* In today's LuceneUtil benchmark, all results looks even. I suspect this is because we do not use {{BinaryDocValues}} any more in tasks. So i tried to rollback the baseline and candidate to a stale code version (before https://issues.apache.org/jira/browse/LUCENE-10062), we used to use {{BinaryDocvalues}} to store taxonomy ordinals in that version, and it can been seen a QPS increasing there. (This is tricky, i wonder if there is a more official way to benchmark BinaryDocValues by chaging some params or add some tasks? ) Anyway, I believe It is still worth optimizing {{BinarayDocValue}} though facets do not use it any more :) *Benchmark result on stale code version where taxonomy ordinals are stored in BinaryDocvalues (to justify a speed up in BinaryDocValues)* {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseMonthSSDVFacets 17.25 (8.6%) 16.78 (17.8%) -2.7% ( -26% - 25%) 0.536 LowTerm 1458.66 (3.6%) 1438.15 (4.4%) -1.4% ( -9% -6%) 0.268 HighTermDayOfYearSort 108.55 (10.0%) 108.04 (9.1%) -0.5% ( -17% - 20%) 0.874 HighPhrase 168.65 (1.9%) 168.06 (2.3%) -0.3% ( -4% -3%) 0.602 OrNotHighLow 1201.79 (3.4%) 1197.93 (4.6%) -0.3% ( -8% -7%) 0.801 HighSpanNear 15.26 (1.6%) 15.21 (1.4%) -0.3% ( -3% -2%) 0.499 Respell 62.61 (1.8%) 62.45 (1.9%) -0.3% ( -3% -3%) 0.649 MedPhrase 57.57 (1.4%) 57.44 (1.8%) -0.2% ( -3% -2%) 0.648 OrHighMed 129.10 (3.0%) 128.83 (3.1%) -0.2% ( -6% -6%) 0.830 MedSpanNear 19.45 (2.3%) 19.41 (2.2%) -0.2% ( -4% -4%) 0.784 OrHighHigh 34.85 (1.5%) 34.79 (1.4%) -0.2% ( -3% -2%) 0.722 HighIntervalsOrdered 26.92 (4.7%) 26.89 (4.9%) -0.1% ( -9% -9%) 0.929 IntNRQ 343.52 (1.6%) 343.16 (2.0%) -0.1% ( -3% -3%) 0.855 OrHighNotHigh 595.61 (3.2%) 595.10 (4.3%) -0.1% ( -7% -7%) 0.944 MedIntervalsOrdered 17.66 (3.6%) 17.65 (3.8%) -0.1% ( -7% -7%) 0.961 LowIntervalsOrdered 109.23 (3.3%) 109.18 (3.5%) -0.0% ( -6% -7%) 0.969 AndHighHigh 81.09 (1.5%) 81.10 (2.0%)0.0% ( -3% -3%) 0.967 LowSpanNear 203.33 (2.1%) 203.41 (1.8%)0.0% ( -3% -3%) 0.948 MedSloppyPhrase 27.15 (1.5%) 27.17 (1.2%)0.1% ( -2% -2%) 0.907 LowPhrase 75.76 (1.8%) 75.81 (2.0%)0.1% ( -3% -3%) 0.904 AndHighMedDayTaxoFacets 97.27 (1.9%) 97.35 (1.9%)0.1% ( -3% -4%) 0.888 HighSloppyPhrase 14.32 (2.7%) 14.34 (1.8%)0.1% ( -4% -4%) 0.870 Fuzzy2 76.00 (3.9%) 76.12 (3.4%)0.2% ( -6% -7%) 0.894 Wildcard 123.51 (1.8%) 123.71 (2.1%)0.2% ( -3% -4%) 0.796 OrHighNotLow 722.64 (4.4%) 724.15 (5.4%)0.2% ( -9% - 10%) 0.894 AndHighLow 929.73 (4.0%) 931.75 (3.8%)0.2% ( -7% -8%) 0.859 Prefix3 240.13 (1.5%) 240.69 (1.9%)0.2% ( -3% -3%) 0.675 AndHighMed 210.17 (1.7%) 210.84 (1.6%)0.3% ( -2% -3%) 0.532 LowSloppyPhrase 142.83 (1.8%) 143.54 (2.0%)0.5% ( -3% -4%) 0.410 OrNotHighMed 709.24 (4.4%) 712.78 (4.3%)0.5% ( -7% -9%)
[jira] [Updated] (LUCENE-10333) Speed up BinaryDocValues with a batch reading on LongValues
[ https://issues.apache.org/jira/browse/LUCENE-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10333: -- Description: *Description* In {{{}Lucene90DocValuesProducer{}}}, {{BinaryDocValue}} (as well as {{SortedNumericDocValues}} not in singleton case) has code patterns like this: {code:java} long startOffset = addresses.get(doc); bytes.length = (int) (addresses.get(doc + 1L) - startOffset); {code} This means we need to read 2 longs stored together. We could probably push down this info to {{LongValues}} and read 2 values together in one call. I think this can make sense because these codes could be rather hot. *Benchmark* In today's LuceneUtil benchmark, all results looks even. I suspect this is because we do not use {{BinaryDocValues}} any more in tasks. So i tried to roll back the baseline and candidate to a stale code version (before https://issues.apache.org/jira/browse/LUCENE-10062), we used to use {{BinaryDocvalues}} to store taxonomy ordinals in that version, and it can been seen a QPS increasing there. (This is tricky, i wonder if there is a more official way to benchmark BinaryDocValues by chaging some params or add some tasks? ) Anyway, I believe It is still worth optimizing {{BinarayDocValue}} though facets do not use it any more :) *Benchmark result on stale code version where taxonomy ordinals are stored in BinaryDocvalues (to justify a speed up in BinaryDocValues)* {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseMonthSSDVFacets 17.25 (8.6%) 16.78 (17.8%) -2.7% ( -26% - 25%) 0.536 LowTerm 1458.66 (3.6%) 1438.15 (4.4%) -1.4% ( -9% -6%) 0.268 HighTermDayOfYearSort 108.55 (10.0%) 108.04 (9.1%) -0.5% ( -17% - 20%) 0.874 HighPhrase 168.65 (1.9%) 168.06 (2.3%) -0.3% ( -4% -3%) 0.602 OrNotHighLow 1201.79 (3.4%) 1197.93 (4.6%) -0.3% ( -8% -7%) 0.801 HighSpanNear 15.26 (1.6%) 15.21 (1.4%) -0.3% ( -3% -2%) 0.499 Respell 62.61 (1.8%) 62.45 (1.9%) -0.3% ( -3% -3%) 0.649 MedPhrase 57.57 (1.4%) 57.44 (1.8%) -0.2% ( -3% -2%) 0.648 OrHighMed 129.10 (3.0%) 128.83 (3.1%) -0.2% ( -6% -6%) 0.830 MedSpanNear 19.45 (2.3%) 19.41 (2.2%) -0.2% ( -4% -4%) 0.784 OrHighHigh 34.85 (1.5%) 34.79 (1.4%) -0.2% ( -3% -2%) 0.722 HighIntervalsOrdered 26.92 (4.7%) 26.89 (4.9%) -0.1% ( -9% -9%) 0.929 IntNRQ 343.52 (1.6%) 343.16 (2.0%) -0.1% ( -3% -3%) 0.855 OrHighNotHigh 595.61 (3.2%) 595.10 (4.3%) -0.1% ( -7% -7%) 0.944 MedIntervalsOrdered 17.66 (3.6%) 17.65 (3.8%) -0.1% ( -7% -7%) 0.961 LowIntervalsOrdered 109.23 (3.3%) 109.18 (3.5%) -0.0% ( -6% -7%) 0.969 AndHighHigh 81.09 (1.5%) 81.10 (2.0%)0.0% ( -3% -3%) 0.967 LowSpanNear 203.33 (2.1%) 203.41 (1.8%)0.0% ( -3% -3%) 0.948 MedSloppyPhrase 27.15 (1.5%) 27.17 (1.2%)0.1% ( -2% -2%) 0.907 LowPhrase 75.76 (1.8%) 75.81 (2.0%)0.1% ( -3% -3%) 0.904 AndHighMedDayTaxoFacets 97.27 (1.9%) 97.35 (1.9%)0.1% ( -3% -4%) 0.888 HighSloppyPhrase 14.32 (2.7%) 14.34 (1.8%)0.1% ( -4% -4%) 0.870 Fuzzy2 76.00 (3.9%) 76.12 (3.4%)0.2% ( -6% -7%) 0.894 Wildcard 123.51 (1.8%) 123.71 (2.1%)0.2% ( -3% -4%) 0.796 OrHighNotLow 722.64 (4.4%) 724.15 (5.4%)0.2% ( -9% - 10%) 0.894 AndHighLow 929.73 (4.0%) 931.75 (3.8%)0.2% ( -7% -8%) 0.859 Prefix3 240.13 (1.5%) 240.69 (1.9%)0.2% ( -3% -3%) 0.675 AndHighMed 210.17 (1.7%) 210.84 (1.6%)0.3% ( -2% -3%) 0.532 LowSloppyPhrase 142.83 (1.8%) 143.54 (2.0%)0.5% ( -3% -4%) 0.410 OrNotHighMed 709.24 (4.4%) 712.78 (4.3%)0.5% ( -7% -
[jira] [Updated] (LUCENE-10333) Speed up BinaryDocValues with a batch reading on LongValues
[ https://issues.apache.org/jira/browse/LUCENE-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10333: -- Description: *Description* In {{{}Lucene90DocValuesProducer{}}}, {{BinaryDocValue}} (as well as {{SortedNumericDocValues}} not in singleton case) has code patterns like this: {code:java} long startOffset = addresses.get(doc); bytes.length = (int) (addresses.get(doc + 1L) - startOffset); {code} This means we need to read 2 longs stored together. We could probably push down this info to {{LongValues}} and read 2 values together in one call. I think this can make sense because these codes could be rather hot. *Benchmark* In today's LuceneUtil benchmark, all results looks even. I suspect this is because we do not use {{BinaryDocValues}} any more in tasks. So i tried to roll back the baseline and candidate to a stale code version (before https://issues.apache.org/jira/browse/LUCENE-10062), we used to use {{BinaryDocvalues}} to store taxonomy ordinals in that version, and it can been seen a QPS increasing there. (This is tricky, i wonder if we can have a more official way to benchmark BinaryDocValues by chaging some params or add some tasks?) Anyway, I believe It is still worth optimizing {{BinarayDocValue}} though facets do not use it any more :) *Benchmark result on stale code version where taxonomy ordinals are stored in BinaryDocvalues (to justify a speed up in BinaryDocValues)* {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseMonthSSDVFacets 17.25 (8.6%) 16.78 (17.8%) -2.7% ( -26% - 25%) 0.536 LowTerm 1458.66 (3.6%) 1438.15 (4.4%) -1.4% ( -9% -6%) 0.268 HighTermDayOfYearSort 108.55 (10.0%) 108.04 (9.1%) -0.5% ( -17% - 20%) 0.874 HighPhrase 168.65 (1.9%) 168.06 (2.3%) -0.3% ( -4% -3%) 0.602 OrNotHighLow 1201.79 (3.4%) 1197.93 (4.6%) -0.3% ( -8% -7%) 0.801 HighSpanNear 15.26 (1.6%) 15.21 (1.4%) -0.3% ( -3% -2%) 0.499 Respell 62.61 (1.8%) 62.45 (1.9%) -0.3% ( -3% -3%) 0.649 MedPhrase 57.57 (1.4%) 57.44 (1.8%) -0.2% ( -3% -2%) 0.648 OrHighMed 129.10 (3.0%) 128.83 (3.1%) -0.2% ( -6% -6%) 0.830 MedSpanNear 19.45 (2.3%) 19.41 (2.2%) -0.2% ( -4% -4%) 0.784 OrHighHigh 34.85 (1.5%) 34.79 (1.4%) -0.2% ( -3% -2%) 0.722 HighIntervalsOrdered 26.92 (4.7%) 26.89 (4.9%) -0.1% ( -9% -9%) 0.929 IntNRQ 343.52 (1.6%) 343.16 (2.0%) -0.1% ( -3% -3%) 0.855 OrHighNotHigh 595.61 (3.2%) 595.10 (4.3%) -0.1% ( -7% -7%) 0.944 MedIntervalsOrdered 17.66 (3.6%) 17.65 (3.8%) -0.1% ( -7% -7%) 0.961 LowIntervalsOrdered 109.23 (3.3%) 109.18 (3.5%) -0.0% ( -6% -7%) 0.969 AndHighHigh 81.09 (1.5%) 81.10 (2.0%)0.0% ( -3% -3%) 0.967 LowSpanNear 203.33 (2.1%) 203.41 (1.8%)0.0% ( -3% -3%) 0.948 MedSloppyPhrase 27.15 (1.5%) 27.17 (1.2%)0.1% ( -2% -2%) 0.907 LowPhrase 75.76 (1.8%) 75.81 (2.0%)0.1% ( -3% -3%) 0.904 AndHighMedDayTaxoFacets 97.27 (1.9%) 97.35 (1.9%)0.1% ( -3% -4%) 0.888 HighSloppyPhrase 14.32 (2.7%) 14.34 (1.8%)0.1% ( -4% -4%) 0.870 Fuzzy2 76.00 (3.9%) 76.12 (3.4%)0.2% ( -6% -7%) 0.894 Wildcard 123.51 (1.8%) 123.71 (2.1%)0.2% ( -3% -4%) 0.796 OrHighNotLow 722.64 (4.4%) 724.15 (5.4%)0.2% ( -9% - 10%) 0.894 AndHighLow 929.73 (4.0%) 931.75 (3.8%)0.2% ( -7% -8%) 0.859 Prefix3 240.13 (1.5%) 240.69 (1.9%)0.2% ( -3% -3%) 0.675 AndHighMed 210.17 (1.7%) 210.84 (1.6%)0.3% ( -2% -3%) 0.532 LowSloppyPhrase 142.83 (1.8%) 143.54 (2.0%)0.5% ( -3% -4%) 0.410 OrNotHighMed 709.24 (4.4%) 712.78 (4.3%)0.5% ( -7% -
[jira] [Updated] (LUCENE-10333) Speed up BinaryDocValues with a batch reading on LongValues
[ https://issues.apache.org/jira/browse/LUCENE-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10333: -- Description: *Description* In {{{}Lucene90DocValuesProducer{}}}, {{BinaryDocValue}} (as well as {{SortedNumericDocValues}} not in singleton case) has code patterns like this: {code:java} long startOffset = addresses.get(doc); bytes.length = (int) (addresses.get(doc + 1L) - startOffset); {code} This means we need to read 2 longs stored together. We could probably push down this info to {{LongValues}} and read 2 values together in one call. I think this can make sense because these codes could be rather hot. *Benchmark* In today's LuceneUtil benchmark, all results looks even. I suspect this is because we do not use {{BinaryDocValues}} any more in tasks. So i tried to roll back the baseline and candidate to a stale code version (before https://issues.apache.org/jira/browse/LUCENE-10062), we used {{BinaryDocvalues}} to store taxonomy ordinals in that version, and it can been seen a QPS increasing there. (This is tricky, i wonder if we can have a more official way to benchmark BinaryDocValues by chaging some params or add some tasks?) Anyway, I believe It is still worth optimizing {{BinarayDocValue}} though facets do not use it any more :) *Benchmark result on stale code version where taxonomy ordinals are stored in BinaryDocvalues (to justify a speed up in BinaryDocValues)* {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseMonthSSDVFacets 17.25 (8.6%) 16.78 (17.8%) -2.7% ( -26% - 25%) 0.536 LowTerm 1458.66 (3.6%) 1438.15 (4.4%) -1.4% ( -9% -6%) 0.268 HighTermDayOfYearSort 108.55 (10.0%) 108.04 (9.1%) -0.5% ( -17% - 20%) 0.874 HighPhrase 168.65 (1.9%) 168.06 (2.3%) -0.3% ( -4% -3%) 0.602 OrNotHighLow 1201.79 (3.4%) 1197.93 (4.6%) -0.3% ( -8% -7%) 0.801 HighSpanNear 15.26 (1.6%) 15.21 (1.4%) -0.3% ( -3% -2%) 0.499 Respell 62.61 (1.8%) 62.45 (1.9%) -0.3% ( -3% -3%) 0.649 MedPhrase 57.57 (1.4%) 57.44 (1.8%) -0.2% ( -3% -2%) 0.648 OrHighMed 129.10 (3.0%) 128.83 (3.1%) -0.2% ( -6% -6%) 0.830 MedSpanNear 19.45 (2.3%) 19.41 (2.2%) -0.2% ( -4% -4%) 0.784 OrHighHigh 34.85 (1.5%) 34.79 (1.4%) -0.2% ( -3% -2%) 0.722 HighIntervalsOrdered 26.92 (4.7%) 26.89 (4.9%) -0.1% ( -9% -9%) 0.929 IntNRQ 343.52 (1.6%) 343.16 (2.0%) -0.1% ( -3% -3%) 0.855 OrHighNotHigh 595.61 (3.2%) 595.10 (4.3%) -0.1% ( -7% -7%) 0.944 MedIntervalsOrdered 17.66 (3.6%) 17.65 (3.8%) -0.1% ( -7% -7%) 0.961 LowIntervalsOrdered 109.23 (3.3%) 109.18 (3.5%) -0.0% ( -6% -7%) 0.969 AndHighHigh 81.09 (1.5%) 81.10 (2.0%)0.0% ( -3% -3%) 0.967 LowSpanNear 203.33 (2.1%) 203.41 (1.8%)0.0% ( -3% -3%) 0.948 MedSloppyPhrase 27.15 (1.5%) 27.17 (1.2%)0.1% ( -2% -2%) 0.907 LowPhrase 75.76 (1.8%) 75.81 (2.0%)0.1% ( -3% -3%) 0.904 AndHighMedDayTaxoFacets 97.27 (1.9%) 97.35 (1.9%)0.1% ( -3% -4%) 0.888 HighSloppyPhrase 14.32 (2.7%) 14.34 (1.8%)0.1% ( -4% -4%) 0.870 Fuzzy2 76.00 (3.9%) 76.12 (3.4%)0.2% ( -6% -7%) 0.894 Wildcard 123.51 (1.8%) 123.71 (2.1%)0.2% ( -3% -4%) 0.796 OrHighNotLow 722.64 (4.4%) 724.15 (5.4%)0.2% ( -9% - 10%) 0.894 AndHighLow 929.73 (4.0%) 931.75 (3.8%)0.2% ( -7% -8%) 0.859 Prefix3 240.13 (1.5%) 240.69 (1.9%)0.2% ( -3% -3%) 0.675 AndHighMed 210.17 (1.7%) 210.84 (1.6%)0.3% ( -2% -3%) 0.532 LowSloppyPhrase 142.83 (1.8%) 143.54 (2.0%)0.5% ( -3% -4%) 0.410 OrNotHighMed 709.24 (4.4%) 712.78 (4.3%)0.5% ( -7% -9%)
[jira] [Created] (LUCENE-10333) Speed up BinaryDocValues with a batch reading on LongValues
Feng Guo created LUCENE-10333: - Summary: Speed up BinaryDocValues with a batch reading on LongValues Key: LUCENE-10333 URL: https://issues.apache.org/jira/browse/LUCENE-10333 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Feng Guo *Description* In {{{}Lucene90DocValuesProducer{}}}, {{BinaryDocValue}} (as well as {{SortedNumericDocValues}} not in singleton case) has code patterns like this: {code:java} long startOffset = addresses.get(doc); bytes.length = (int) (addresses.get(doc + 1L) - startOffset); {code} This means we need to read 2 longs stored together. We could probably push down this info to {{LongValues}} and read 2 values together in one call. I think this can make sense because these codes could be rather hot. *Benchmark* In today's LuceneUtil benchmark, all results looks even. I suspect this is because we do not use {{BinaryDocValues}} any more in tasks. So i tried to roll back the baseline and candidate to a stale code version (before https://issues.apache.org/jira/browse/LUCENE-10062), we used {{BinaryDocvalues}} to store taxonomy ordinals in that version, and it can been seen a QPS increasing there. (This is tricky, i wonder if we can have a more official way to benchmark BinaryDocValues by chaging some params or add some tasks?) Anyway, I believe It is still worth optimizing {{BinarayDocValue}} though facets do not use it any more :) *Benchmark result on stale code version where taxonomy ordinals are stored in BinaryDocvalues (to justivy a speed up in BinaryDocValues)* {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseMonthSSDVFacets 17.25 (8.6%) 16.78 (17.8%) -2.7% ( -26% - 25%) 0.536 LowTerm 1458.66 (3.6%) 1438.15 (4.4%) -1.4% ( -9% -6%) 0.268 HighTermDayOfYearSort 108.55 (10.0%) 108.04 (9.1%) -0.5% ( -17% - 20%) 0.874 HighPhrase 168.65 (1.9%) 168.06 (2.3%) -0.3% ( -4% -3%) 0.602 OrNotHighLow 1201.79 (3.4%) 1197.93 (4.6%) -0.3% ( -8% -7%) 0.801 HighSpanNear 15.26 (1.6%) 15.21 (1.4%) -0.3% ( -3% -2%) 0.499 Respell 62.61 (1.8%) 62.45 (1.9%) -0.3% ( -3% -3%) 0.649 MedPhrase 57.57 (1.4%) 57.44 (1.8%) -0.2% ( -3% -2%) 0.648 OrHighMed 129.10 (3.0%) 128.83 (3.1%) -0.2% ( -6% -6%) 0.830 MedSpanNear 19.45 (2.3%) 19.41 (2.2%) -0.2% ( -4% -4%) 0.784 OrHighHigh 34.85 (1.5%) 34.79 (1.4%) -0.2% ( -3% -2%) 0.722 HighIntervalsOrdered 26.92 (4.7%) 26.89 (4.9%) -0.1% ( -9% -9%) 0.929 IntNRQ 343.52 (1.6%) 343.16 (2.0%) -0.1% ( -3% -3%) 0.855 OrHighNotHigh 595.61 (3.2%) 595.10 (4.3%) -0.1% ( -7% -7%) 0.944 MedIntervalsOrdered 17.66 (3.6%) 17.65 (3.8%) -0.1% ( -7% -7%) 0.961 LowIntervalsOrdered 109.23 (3.3%) 109.18 (3.5%) -0.0% ( -6% -7%) 0.969 AndHighHigh 81.09 (1.5%) 81.10 (2.0%)0.0% ( -3% -3%) 0.967 LowSpanNear 203.33 (2.1%) 203.41 (1.8%)0.0% ( -3% -3%) 0.948 MedSloppyPhrase 27.15 (1.5%) 27.17 (1.2%)0.1% ( -2% -2%) 0.907 LowPhrase 75.76 (1.8%) 75.81 (2.0%)0.1% ( -3% -3%) 0.904 AndHighMedDayTaxoFacets 97.27 (1.9%) 97.35 (1.9%)0.1% ( -3% -4%) 0.888 HighSloppyPhrase 14.32 (2.7%) 14.34 (1.8%)0.1% ( -4% -4%) 0.870 Fuzzy2 76.00 (3.9%) 76.12 (3.4%)0.2% ( -6% -7%) 0.894 Wildcard 123.51 (1.8%) 123.71 (2.1%)0.2% ( -3% -4%) 0.796 OrHighNotLow 722.64 (4.4%) 724.15 (5.4%)0.2% ( -9% - 10%) 0.894 AndHighLow 929.73 (4.0%) 931.75 (3.8%)0.2% ( -7% -8%) 0.859 Prefix3 240.13 (1.5%) 240.69 (1.9%)0.2% ( -3% -3%) 0.675 AndHighMed 210.17 (1.7%) 210.84 (1.6%)0.3% ( -2% -3%) 0.532 LowSloppyPhrase
[jira] [Commented] (LUCENE-10332) Speed up Facets by enable batch reading of LongValues
[ https://issues.apache.org/jira/browse/LUCENE-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462758#comment-17462758 ] Feng Guo commented on LUCENE-10332: --- Sorry that i was developing on a old branch, missing this optimization: https://github.com/apache/lucene/pull/443, I'll take a further look but close this now. > Speed up Facets by enable batch reading of LongValues > - > > Key: LUCENE-10332 > URL: https://issues.apache.org/jira/browse/LUCENE-10332 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > In {{Lucene90DocValuesProducer}}, there are several places reading LongValues > like this pattern: > {code:java} > long startOffset = addresses.get(doc); > bytes.length = (int) (addresses.get(doc + 1L) - startOffset); > {code} > In these cases, we are needing to read 2 numbers stored together. It would be > great if we can read 2 longs once. The luceneutil benchmark shows that some > Facets tasks were speed up nearly 20% by this approach: > *Benchmark* > {code:java} > TaskQPS baseline StdDevQPS > my_modified_version StdDevPct diff p-value >BrowseMonthSSDVFacets 17.25 (8.6%) 16.78 > (17.8%) -2.7% ( -26% - 25%) 0.536 > LowTerm 1458.66 (3.6%) 1438.15 > (4.4%) -1.4% ( -9% -6%) 0.268 >HighTermDayOfYearSort 108.55 (10.0%) 108.04 > (9.1%) -0.5% ( -17% - 20%) 0.874 > HighPhrase 168.65 (1.9%) 168.06 > (2.3%) -0.3% ( -4% -3%) 0.602 > OrNotHighLow 1201.79 (3.4%) 1197.93 > (4.6%) -0.3% ( -8% -7%) 0.801 > HighSpanNear 15.26 (1.6%) 15.21 > (1.4%) -0.3% ( -3% -2%) 0.499 > Respell 62.61 (1.8%) 62.45 > (1.9%) -0.3% ( -3% -3%) 0.649 >MedPhrase 57.57 (1.4%) 57.44 > (1.8%) -0.2% ( -3% -2%) 0.648 >OrHighMed 129.10 (3.0%) 128.83 > (3.1%) -0.2% ( -6% -6%) 0.830 > MedSpanNear 19.45 (2.3%) 19.41 > (2.2%) -0.2% ( -4% -4%) 0.784 > OrHighHigh 34.85 (1.5%) 34.79 > (1.4%) -0.2% ( -3% -2%) 0.722 > HighIntervalsOrdered 26.92 (4.7%) 26.89 > (4.9%) -0.1% ( -9% -9%) 0.929 > IntNRQ 343.52 (1.6%) 343.16 > (2.0%) -0.1% ( -3% -3%) 0.855 >OrHighNotHigh 595.61 (3.2%) 595.10 > (4.3%) -0.1% ( -7% -7%) 0.944 > MedIntervalsOrdered 17.66 (3.6%) 17.65 > (3.8%) -0.1% ( -7% -7%) 0.961 > LowIntervalsOrdered 109.23 (3.3%) 109.18 > (3.5%) -0.0% ( -6% -7%) 0.969 > AndHighHigh 81.09 (1.5%) 81.10 > (2.0%)0.0% ( -3% -3%) 0.967 > LowSpanNear 203.33 (2.1%) 203.41 > (1.8%)0.0% ( -3% -3%) 0.948 > MedSloppyPhrase 27.15 (1.5%) 27.17 > (1.2%)0.1% ( -2% -2%) 0.907 >LowPhrase 75.76 (1.8%) 75.81 > (2.0%)0.1% ( -3% -3%) 0.904 > AndHighMedDayTaxoFacets 97.27 (1.9%) 97.35 > (1.9%)0.1% ( -3% -4%) 0.888 > HighSloppyPhrase 14.32 (2.7%) 14.34 > (1.8%)0.1% ( -4% -4%) 0.870 > Fuzzy2 76.00 (3.9%) 76.12 > (3.4%)0.2% ( -6% -7%) 0.894 > Wildcard 123.51 (1.8%) 123.71 > (2.1%)0.2% ( -3% -4%) 0.796 > OrHighNotLow 722.64 (4.4%) 724.15 > (5.4%)0.2% ( -9% - 10%) 0.894 > AndHighLow 929.73 (4.0%) 931.75 > (3.8%)0.2% ( -7% -8%) 0.859 > Prefix3 240.13 (1.5%) 240.69 > (1.9%)0.2% ( -3% -3%) 0.675 > AndHighMed 210.17 (1.7%) 210.84 > (1.6%)0.3% ( -2% -3%) 0.532 > LowSloppyPhrase 142.83 (1.8%) 143.54 > (2.0%)0.5% ( -3% -4%) 0.410 > OrNotHighMed 709.24 (4.4%) 712.78 > (4.3%)0.5% ( -7% -9%) 0.715 > Fuzzy1 85.33 (5.7%)
[jira] [Resolved] (LUCENE-10332) Speed up Facets by enable batch reading of LongValues
[ https://issues.apache.org/jira/browse/LUCENE-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo resolved LUCENE-10332. --- Resolution: Won't Do > Speed up Facets by enable batch reading of LongValues > - > > Key: LUCENE-10332 > URL: https://issues.apache.org/jira/browse/LUCENE-10332 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > In {{Lucene90DocValuesProducer}}, there are several places reading LongValues > like this pattern: > {code:java} > long startOffset = addresses.get(doc); > bytes.length = (int) (addresses.get(doc + 1L) - startOffset); > {code} > In these cases, we are needing to read 2 numbers stored together. It would be > great if we can read 2 longs once. The luceneutil benchmark shows that some > Facets tasks were speed up nearly 20% by this approach: > *Benchmark* > {code:java} > TaskQPS baseline StdDevQPS > my_modified_version StdDevPct diff p-value >BrowseMonthSSDVFacets 17.25 (8.6%) 16.78 > (17.8%) -2.7% ( -26% - 25%) 0.536 > LowTerm 1458.66 (3.6%) 1438.15 > (4.4%) -1.4% ( -9% -6%) 0.268 >HighTermDayOfYearSort 108.55 (10.0%) 108.04 > (9.1%) -0.5% ( -17% - 20%) 0.874 > HighPhrase 168.65 (1.9%) 168.06 > (2.3%) -0.3% ( -4% -3%) 0.602 > OrNotHighLow 1201.79 (3.4%) 1197.93 > (4.6%) -0.3% ( -8% -7%) 0.801 > HighSpanNear 15.26 (1.6%) 15.21 > (1.4%) -0.3% ( -3% -2%) 0.499 > Respell 62.61 (1.8%) 62.45 > (1.9%) -0.3% ( -3% -3%) 0.649 >MedPhrase 57.57 (1.4%) 57.44 > (1.8%) -0.2% ( -3% -2%) 0.648 >OrHighMed 129.10 (3.0%) 128.83 > (3.1%) -0.2% ( -6% -6%) 0.830 > MedSpanNear 19.45 (2.3%) 19.41 > (2.2%) -0.2% ( -4% -4%) 0.784 > OrHighHigh 34.85 (1.5%) 34.79 > (1.4%) -0.2% ( -3% -2%) 0.722 > HighIntervalsOrdered 26.92 (4.7%) 26.89 > (4.9%) -0.1% ( -9% -9%) 0.929 > IntNRQ 343.52 (1.6%) 343.16 > (2.0%) -0.1% ( -3% -3%) 0.855 >OrHighNotHigh 595.61 (3.2%) 595.10 > (4.3%) -0.1% ( -7% -7%) 0.944 > MedIntervalsOrdered 17.66 (3.6%) 17.65 > (3.8%) -0.1% ( -7% -7%) 0.961 > LowIntervalsOrdered 109.23 (3.3%) 109.18 > (3.5%) -0.0% ( -6% -7%) 0.969 > AndHighHigh 81.09 (1.5%) 81.10 > (2.0%)0.0% ( -3% -3%) 0.967 > LowSpanNear 203.33 (2.1%) 203.41 > (1.8%)0.0% ( -3% -3%) 0.948 > MedSloppyPhrase 27.15 (1.5%) 27.17 > (1.2%)0.1% ( -2% -2%) 0.907 >LowPhrase 75.76 (1.8%) 75.81 > (2.0%)0.1% ( -3% -3%) 0.904 > AndHighMedDayTaxoFacets 97.27 (1.9%) 97.35 > (1.9%)0.1% ( -3% -4%) 0.888 > HighSloppyPhrase 14.32 (2.7%) 14.34 > (1.8%)0.1% ( -4% -4%) 0.870 > Fuzzy2 76.00 (3.9%) 76.12 > (3.4%)0.2% ( -6% -7%) 0.894 > Wildcard 123.51 (1.8%) 123.71 > (2.1%)0.2% ( -3% -4%) 0.796 > OrHighNotLow 722.64 (4.4%) 724.15 > (5.4%)0.2% ( -9% - 10%) 0.894 > AndHighLow 929.73 (4.0%) 931.75 > (3.8%)0.2% ( -7% -8%) 0.859 > Prefix3 240.13 (1.5%) 240.69 > (1.9%)0.2% ( -3% -3%) 0.675 > AndHighMed 210.17 (1.7%) 210.84 > (1.6%)0.3% ( -2% -3%) 0.532 > LowSloppyPhrase 142.83 (1.8%) 143.54 > (2.0%)0.5% ( -3% -4%) 0.410 > OrNotHighMed 709.24 (4.4%) 712.78 > (4.3%)0.5% ( -7% -9%) 0.715 > Fuzzy1 85.33 (5.7%) 85.77 > (6.3%)0.5% ( -10% - 13%) 0.786 > MedTerm 1466.50 (3.5%) 1474.85 > (3.9%)0.6% ( -6% -8%) 0.629 >
[jira] [Created] (LUCENE-10332) Speed up Facets by enable batch reading of LongValues
Feng Guo created LUCENE-10332: - Summary: Speed up Facets by enable batch reading of LongValues Key: LUCENE-10332 URL: https://issues.apache.org/jira/browse/LUCENE-10332 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Feng Guo In {{Lucene90DocValuesProducer}}, there are several places reading LongValues like this pattern: {code:java} long startOffset = addresses.get(doc); bytes.length = (int) (addresses.get(doc + 1L) - startOffset); {code} In these cases, we are needing to read 2 numbers stored together. It would be great if we can read 2 longs once. The luceneutil benchmark shows that some Facets tasks were speed up nearly 20% by this approach: *Benchmark* {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseMonthSSDVFacets 17.25 (8.6%) 16.78 (17.8%) -2.7% ( -26% - 25%) 0.536 LowTerm 1458.66 (3.6%) 1438.15 (4.4%) -1.4% ( -9% -6%) 0.268 HighTermDayOfYearSort 108.55 (10.0%) 108.04 (9.1%) -0.5% ( -17% - 20%) 0.874 HighPhrase 168.65 (1.9%) 168.06 (2.3%) -0.3% ( -4% -3%) 0.602 OrNotHighLow 1201.79 (3.4%) 1197.93 (4.6%) -0.3% ( -8% -7%) 0.801 HighSpanNear 15.26 (1.6%) 15.21 (1.4%) -0.3% ( -3% -2%) 0.499 Respell 62.61 (1.8%) 62.45 (1.9%) -0.3% ( -3% -3%) 0.649 MedPhrase 57.57 (1.4%) 57.44 (1.8%) -0.2% ( -3% -2%) 0.648 OrHighMed 129.10 (3.0%) 128.83 (3.1%) -0.2% ( -6% -6%) 0.830 MedSpanNear 19.45 (2.3%) 19.41 (2.2%) -0.2% ( -4% -4%) 0.784 OrHighHigh 34.85 (1.5%) 34.79 (1.4%) -0.2% ( -3% -2%) 0.722 HighIntervalsOrdered 26.92 (4.7%) 26.89 (4.9%) -0.1% ( -9% -9%) 0.929 IntNRQ 343.52 (1.6%) 343.16 (2.0%) -0.1% ( -3% -3%) 0.855 OrHighNotHigh 595.61 (3.2%) 595.10 (4.3%) -0.1% ( -7% -7%) 0.944 MedIntervalsOrdered 17.66 (3.6%) 17.65 (3.8%) -0.1% ( -7% -7%) 0.961 LowIntervalsOrdered 109.23 (3.3%) 109.18 (3.5%) -0.0% ( -6% -7%) 0.969 AndHighHigh 81.09 (1.5%) 81.10 (2.0%)0.0% ( -3% -3%) 0.967 LowSpanNear 203.33 (2.1%) 203.41 (1.8%)0.0% ( -3% -3%) 0.948 MedSloppyPhrase 27.15 (1.5%) 27.17 (1.2%)0.1% ( -2% -2%) 0.907 LowPhrase 75.76 (1.8%) 75.81 (2.0%)0.1% ( -3% -3%) 0.904 AndHighMedDayTaxoFacets 97.27 (1.9%) 97.35 (1.9%)0.1% ( -3% -4%) 0.888 HighSloppyPhrase 14.32 (2.7%) 14.34 (1.8%)0.1% ( -4% -4%) 0.870 Fuzzy2 76.00 (3.9%) 76.12 (3.4%)0.2% ( -6% -7%) 0.894 Wildcard 123.51 (1.8%) 123.71 (2.1%)0.2% ( -3% -4%) 0.796 OrHighNotLow 722.64 (4.4%) 724.15 (5.4%)0.2% ( -9% - 10%) 0.894 AndHighLow 929.73 (4.0%) 931.75 (3.8%)0.2% ( -7% -8%) 0.859 Prefix3 240.13 (1.5%) 240.69 (1.9%)0.2% ( -3% -3%) 0.675 AndHighMed 210.17 (1.7%) 210.84 (1.6%)0.3% ( -2% -3%) 0.532 LowSloppyPhrase 142.83 (1.8%) 143.54 (2.0%)0.5% ( -3% -4%) 0.410 OrNotHighMed 709.24 (4.4%) 712.78 (4.3%)0.5% ( -7% -9%) 0.715 Fuzzy1 85.33 (5.7%) 85.77 (6.3%)0.5% ( -10% - 13%) 0.786 MedTerm 1466.50 (3.5%) 1474.85 (3.9%)0.6% ( -6% -8%) 0.629 TermDTSort 105.51 (7.7%) 106.33 (7.3%)0.8% ( -13% - 17%) 0.746 PKLookup 206.18 (2.9%) 208.68 (2.9%)1.2% ( -4% -7%) 0.179 OrHighNotMed 876.71 (3.0%) 887.84 (3.9%)1.3% ( -5% -8%) 0.251 OrNotHighHigh 774.25 (4.7%) 785.03 (6.0%)1.4% ( -8% - 12%)
[jira] [Commented] (LUCENE-10329) Use Computed Mask For DirectMonotonicReader#get
[ https://issues.apache.org/jira/browse/LUCENE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462559#comment-17462559 ] Feng Guo commented on LUCENE-10329: --- The PR 552 was not linked to this Jira issue so i opened a new 553, but it seems both of them are linked here now... Please ignore 552 and look at 553 then :) > Use Computed Mask For DirectMonotonicReader#get > --- > > Key: LUCENE-10329 > URL: https://issues.apache.org/jira/browse/LUCENE-10329 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > I saw {{DirectMonotonicReader#get}} was the hot method during when running > luceneutil test, So using a computed mask for DirectMonotonicReader#get > instead of computing it for every call may make a bit sense :) > {code:java} > PERCENT CPU SAMPLES STACK > 14.07%66936 > org.apache.lucene.util.packed.DirectMonotonicReader#get() > 5.93% 28198 > org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$17#binaryValue() > 5.44% 25858 > org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get() > 5.27% 25052 > org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll() > 4.48% 21310 java.nio.ByteBuffer#get() > 1.83% 8722 java.nio.Buffer#position() > 1.80% 8573 > jdk.internal.misc.ScopedMemoryAccess#getByteInternal() > 1.80% 8558 > org.apache.lucene.store.ByteBufferGuard#ensureValid() > 1.79% 8537 java.nio.Buffer#scope() > 1.67% 7939 > org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment() > 1.51% 7182 > org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment() > 1.43% 6781 java.nio.Buffer#nextGetIndex() > 1.40% 6657 java.nio.Buffer#checkIndex() > 1.26% 5979 > org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance() > 1.19% 5670 jdk.internal.misc.Unsafe#convEndian() > 1.17% 5565 java.nio.DirectByteBuffer#ix() > 1.12% 5310 > org.apache.lucene.search.BooleanScorer$OrCollector#collect() > 1.07% 5075 org.apache.lucene.store.ByteBufferGuard#getShort() > 1.06% 5065 org.apache.lucene.search.ConjunctionDISI#doNext() > 1.03% 4914 > jdk.internal.util.Preconditions#checkFromIndexSize() > 1.02% 4869 > jdk.internal.misc.ScopedMemoryAccess#getShortUnalignedInternal() > 0.99% 4719 > org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#seek() > 0.96% 4587 java.nio.DirectByteBuffer#get() > 0.96% 4587 > org.apache.lucene.search.MultiCollector$MultiLeafCollector#collect() > 0.94% 4460 > org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition() > 0.90% 4297 > org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#count() > 0.84% 3996 > org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score() > 0.79% 3769 > org.apache.lucene.search.BooleanScorer#scoreDocument() > 0.77% 3648 > org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc() > 0.75% 3572 > org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue() > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10329) Use Computed Mask For DirectMonotonicReader#get
[ https://issues.apache.org/jira/browse/LUCENE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462543#comment-17462543 ] Feng Guo commented on LUCENE-10329: --- *Benchmark result (a bit improved)* {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value HighTermDayOfYearSort 90.49 (18.0%) 88.62 (11.1%) -2.1% ( -26% - 32%) 0.661 TermDTSort 86.84 (10.6%) 85.83 (8.7%) -1.2% ( -18% - 20%) 0.705 Prefix3 140.90 (5.7%) 139.70 (7.0%) -0.8% ( -12% - 12%) 0.675 HighTermMonthSort 176.44 (14.0%) 175.30 (11.8%) -0.6% ( -23% - 29%) 0.875 HighPhrase 430.23 (3.2%) 428.04 (3.6%) -0.5% ( -7% -6%) 0.635 OrNotHighHigh 676.66 (4.2%) 673.70 (3.2%) -0.4% ( -7% -7%) 0.711 OrHighNotLow 823.43 (3.5%) 822.17 (4.5%) -0.2% ( -7% -8%) 0.905 Wildcard 121.59 (2.0%) 121.41 (2.6%) -0.2% ( -4% -4%) 0.840 AndHighMedDayTaxoFacets 55.32 (1.8%) 55.26 (1.9%) -0.1% ( -3% -3%) 0.851 HighSpanNear 14.79 (2.6%) 14.77 (2.5%) -0.1% ( -5% -5%) 0.899 BrowseMonthSSDVFacets 17.99 (9.8%) 17.98 (9.6%) -0.1% ( -17% - 21%) 0.982 HighIntervalsOrdered 21.05 (3.1%) 21.04 (3.2%) -0.0% ( -6% -6%) 0.966 MedIntervalsOrdered 18.57 (3.0%) 18.57 (2.9%) -0.0% ( -5% -6%) 0.983 MedSpanNear 88.82 (2.0%) 88.86 (2.1%)0.0% ( -3% -4%) 0.943 LowSpanNear 154.86 (1.9%) 154.95 (1.6%)0.1% ( -3% -3%) 0.916 Respell 65.43 (2.2%) 65.51 (2.3%)0.1% ( -4% -4%) 0.862 BrowseDayOfYearSSDVFacets 16.76 (11.3%) 16.79 (11.6%)0.2% ( -20% - 25%) 0.963 LowPhrase 513.12 (2.7%) 514.01 (3.1%)0.2% ( -5% -6%) 0.850 IntNRQ 288.28 (1.3%) 288.90 (1.2%)0.2% ( -2% -2%) 0.586 LowSloppyPhrase 214.50 (2.4%) 215.09 (2.2%)0.3% ( -4% -5%) 0.706 LowIntervalsOrdered 202.73 (2.8%) 203.30 (2.9%)0.3% ( -5% -6%) 0.757 OrHighHigh 41.48 (1.8%) 41.64 (2.0%)0.4% ( -3% -4%) 0.524 OrNotHighMed 809.16 (5.0%) 812.45 (3.1%)0.4% ( -7% -8%) 0.757 AndHighLow 665.08 (3.1%) 668.14 (3.3%)0.5% ( -5% -7%) 0.649 PKLookup 211.67 (3.1%) 212.66 (3.3%)0.5% ( -5% -7%) 0.644 MedPhrase 304.39 (2.5%) 305.90 (2.3%)0.5% ( -4% -5%) 0.519 AndHighMed 157.06 (4.0%) 157.89 (4.0%)0.5% ( -7% -8%) 0.678 AndHighHigh 99.07 (2.6%) 99.69 (3.6%)0.6% ( -5% -7%) 0.534 Fuzzy2 36.32 (3.6%) 36.55 (3.7%)0.6% ( -6% -8%) 0.579 OrHighMed 80.10 (2.4%) 80.62 (1.8%)0.6% ( -3% -4%) 0.329 LowTerm 1411.61 (3.0%) 1421.53 (4.3%)0.7% ( -6% -8%) 0.549 AndHighHighDayTaxoFacets 12.47 (2.6%) 12.56 (2.5%)0.8% ( -4% -6%) 0.343 MedSloppyPhrase 37.22 (1.6%) 37.51 (1.7%)0.8% ( -2% -4%) 0.138 Fuzzy1 60.37 (4.8%) 60.87 (4.3%)0.8% ( -7% - 10%) 0.564 OrHighLow 565.69 (4.2%) 570.55 (4.3%)0.9% ( -7% -9%) 0.523 HighTerm 1167.96 (5.0%) 1178.00 (5.8%)0.9% ( -9% - 12%) 0.615 MedTerm 1392.49 (4.3%) 1404.95 (4.0%)0.9% ( -7% -9%) 0.496 OrHighMedDayTaxoFacets4.17 (2.1%)4.21 (2.4%)1.0% ( -3% -5%) 0.189 MedTermDayTaxoFacets 21.41 (1.8%) 21.61 (2.0%)1.0% ( -2% -4%) 0.115 HighSloppyPhrase3.74 (2.8%)3.77 (3.0%)1.0% ( -4% -6%) 0.298 OrNotHighLow 1020.98 (4.6%) 1034.25 (3.9%)1.3% ( -6% - 10%)
[jira] [Updated] (LUCENE-10329) Use Computed Mask For DirectMonotonicReader#get
[ https://issues.apache.org/jira/browse/LUCENE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10329: -- Attachment: (was: screenshot-1.png) > Use Computed Mask For DirectMonotonicReader#get > --- > > Key: LUCENE-10329 > URL: https://issues.apache.org/jira/browse/LUCENE-10329 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > > I saw {{DirectMonotonicReader#get}} was the hot method during when running > luceneutil test, So using a computed mask for DirectMonotonicReader#get > instead of computing it for every call may make a bit sense :) > {code:java} > PERCENT CPU SAMPLES STACK > 14.07%66936 > org.apache.lucene.util.packed.DirectMonotonicReader#get() > 5.93% 28198 > org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$17#binaryValue() > 5.44% 25858 > org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get() > 5.27% 25052 > org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll() > 4.48% 21310 java.nio.ByteBuffer#get() > 1.83% 8722 java.nio.Buffer#position() > 1.80% 8573 > jdk.internal.misc.ScopedMemoryAccess#getByteInternal() > 1.80% 8558 > org.apache.lucene.store.ByteBufferGuard#ensureValid() > 1.79% 8537 java.nio.Buffer#scope() > 1.67% 7939 > org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment() > 1.51% 7182 > org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment() > 1.43% 6781 java.nio.Buffer#nextGetIndex() > 1.40% 6657 java.nio.Buffer#checkIndex() > 1.26% 5979 > org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance() > 1.19% 5670 jdk.internal.misc.Unsafe#convEndian() > 1.17% 5565 java.nio.DirectByteBuffer#ix() > 1.12% 5310 > org.apache.lucene.search.BooleanScorer$OrCollector#collect() > 1.07% 5075 org.apache.lucene.store.ByteBufferGuard#getShort() > 1.06% 5065 org.apache.lucene.search.ConjunctionDISI#doNext() > 1.03% 4914 > jdk.internal.util.Preconditions#checkFromIndexSize() > 1.02% 4869 > jdk.internal.misc.ScopedMemoryAccess#getShortUnalignedInternal() > 0.99% 4719 > org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#seek() > 0.96% 4587 java.nio.DirectByteBuffer#get() > 0.96% 4587 > org.apache.lucene.search.MultiCollector$MultiLeafCollector#collect() > 0.94% 4460 > org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition() > 0.90% 4297 > org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#count() > 0.84% 3996 > org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score() > 0.79% 3769 > org.apache.lucene.search.BooleanScorer#scoreDocument() > 0.77% 3648 > org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc() > 0.75% 3572 > org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue() > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10329) Use Computed Mask For DirectMonotonicReader#get
[ https://issues.apache.org/jira/browse/LUCENE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10329: -- Description: I saw {{DirectMonotonicReader#get}} was the hot method during when running luceneutil test, So using a computed mask for DirectMonotonicReader#get instead of computing it for every call may make a bit sense :) {code:java} PERCENT CPU SAMPLES STACK 14.07%66936 org.apache.lucene.util.packed.DirectMonotonicReader#get() 5.93% 28198 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$17#binaryValue() 5.44% 25858 org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get() 5.27% 25052 org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll() 4.48% 21310 java.nio.ByteBuffer#get() 1.83% 8722 java.nio.Buffer#position() 1.80% 8573 jdk.internal.misc.ScopedMemoryAccess#getByteInternal() 1.80% 8558 org.apache.lucene.store.ByteBufferGuard#ensureValid() 1.79% 8537 java.nio.Buffer#scope() 1.67% 7939 org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment() 1.51% 7182 org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment() 1.43% 6781 java.nio.Buffer#nextGetIndex() 1.40% 6657 java.nio.Buffer#checkIndex() 1.26% 5979 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance() 1.19% 5670 jdk.internal.misc.Unsafe#convEndian() 1.17% 5565 java.nio.DirectByteBuffer#ix() 1.12% 5310 org.apache.lucene.search.BooleanScorer$OrCollector#collect() 1.07% 5075 org.apache.lucene.store.ByteBufferGuard#getShort() 1.06% 5065 org.apache.lucene.search.ConjunctionDISI#doNext() 1.03% 4914 jdk.internal.util.Preconditions#checkFromIndexSize() 1.02% 4869 jdk.internal.misc.ScopedMemoryAccess#getShortUnalignedInternal() 0.99% 4719 org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#seek() 0.96% 4587 java.nio.DirectByteBuffer#get() 0.96% 4587 org.apache.lucene.search.MultiCollector$MultiLeafCollector#collect() 0.94% 4460 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition() 0.90% 4297 org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#count() 0.84% 3996 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score() 0.79% 3769 org.apache.lucene.search.BooleanScorer#scoreDocument() 0.77% 3648 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc() 0.75% 3572 org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue() {code} was:Use a computed mask for {{DirectMonotonicReader#get}} instead of computing it for every call. > Use Computed Mask For DirectMonotonicReader#get > --- > > Key: LUCENE-10329 > URL: https://issues.apache.org/jira/browse/LUCENE-10329 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Attachments: screenshot-1.png > > > I saw {{DirectMonotonicReader#get}} was the hot method during when running > luceneutil test, So using a computed mask for DirectMonotonicReader#get > instead of computing it for every call may make a bit sense :) > {code:java} > PERCENT CPU SAMPLES STACK > 14.07%66936 > org.apache.lucene.util.packed.DirectMonotonicReader#get() > 5.93% 28198 > org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$17#binaryValue() > 5.44% 25858 > org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get() > 5.27% 25052 > org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll() > 4.48% 21310 java.nio.ByteBuffer#get() > 1.83% 8722 java.nio.Buffer#position() > 1.80% 8573 > jdk.internal.misc.ScopedMemoryAccess#getByteInternal() > 1.80% 8558 > org.apache.lucene.store.ByteBufferGuard#ensureValid() > 1.79% 8537 java.nio.Buffer#scope() > 1.67% 7939 > org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment() > 1.51% 7182 > org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment() > 1.43% 6781 java.nio.Buffer#nextGetIndex() > 1.40% 6657 java.nio.Buffer#checkIndex() > 1.26% 5979 >
[jira] [Updated] (LUCENE-10329) Use Computed Mask For DirectMonotonicReader#get
[ https://issues.apache.org/jira/browse/LUCENE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10329: -- Attachment: screenshot-1.png > Use Computed Mask For DirectMonotonicReader#get > --- > > Key: LUCENE-10329 > URL: https://issues.apache.org/jira/browse/LUCENE-10329 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Attachments: screenshot-1.png > > > Use a computed mask for {{DirectMonotonicReader#get}} instead of computing it > for every call. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10329) Use Computed Mask For DirectMonotonicReader#get
[ https://issues.apache.org/jira/browse/LUCENE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10329: -- Description: Use a computed mask for {{DirectMonotonicReader#get}} instead of computing it for every call. (was: Uss a computed mask for {{DirectMonotonicReader#get}} instead of computing it for every call.) > Use Computed Mask For DirectMonotonicReader#get > --- > > Key: LUCENE-10329 > URL: https://issues.apache.org/jira/browse/LUCENE-10329 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > > Use a computed mask for {{DirectMonotonicReader#get}} instead of computing it > for every call. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10329) Use Computed Mask For DirectMonotonicReader#get
Feng Guo created LUCENE-10329: - Summary: Use Computed Mask For DirectMonotonicReader#get Key: LUCENE-10329 URL: https://issues.apache.org/jira/browse/LUCENE-10329 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Feng Guo Uss a computed mask for {{DirectMonotonicReader#get}} instead of computing it for every call. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-10315 ] Feng Guo deleted comment on LUCENE-10315: --- was (Author: gf2121): The optimization can only be triggered when {{count == BKDConfig#DEFAULT_MAX_POINTS_IN_LEAF_NODE}}, This is fragile because users can customize the {{maxPointsInLeaf}} in the Codec, leading the optimization meaningless. Here are some ways i can think of to address this: 1. Directly drop the support of customizing {{maxPointsInLeaf}}, like what we do in postings. 2. Generate a series of ForUtils, like {{ForUitil128}}, {{ForUitil256}}, {{ForUitil512}}, {{ForUtil1024}} ... and make some notes to hint users to choose values from them. > Speed up BKD leaf block ids codec by a 512 ints ForUtil > --- > > Key: LUCENE-10315 > URL: https://issues.apache.org/jira/browse/LUCENE-10315 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Feng Guo >Priority: Major > Attachments: addall.svg > > Time Spent: 10m > Remaining Estimate: 0h > > Elasticsearch (which based on lucene) can automatically infers types for > users with its dynamic mapping feature. When users index some low cardinality > fields, such as gender / age / status... they often use some numbers to > represent the values, while ES will infer these fields as {{{}long{}}}, and > ES uses BKD as the index of {{long}} fields. When the data volume grows, > building the result set of low-cardinality fields will make the CPU usage and > load very high. > This is a flame graph we obtained from the production environment: > [^addall.svg] > It can be seen that almost all CPU is used in addAll. When we reindex > {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly > reduced ( We spent weeks of time to reindex all indices... ). I know that ES > recommended to use {{keyword}} for term/terms query and {{long}} for range > query in the document, but there are always some users who didn't realize > this and keep their habit of using sql database, or dynamic mapping > automatically selects the type for them. All in all, users won't realize that > there would be such a big difference in performance between {{long}} and > {{keyword}} fields in low cardinality fields. So from my point of view it > will make sense if we can make BKD works better for the low/medium > cardinality fields. > As far as i can see, for low cardinality fields, there are two advantages of > {{keyword}} over {{{}long{}}}: > 1. {{ForUtil}} used in {{keyword}} postings is much more efficient than BKD's > delta VInt, because its batch reading (readLongs) and SIMD decode. > 2. When the query term count is less than 16, {{TermsInSetQuery}} can lazily > materialize of its result set, and when another small result clause > intersects with this low cardinality condition, the low cardinality field can > avoid reading all docIds into memory. > This ISSUE is targeting to solve the first point. The basic idea is trying to > use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked this optimization > by mocking some random {{LongPoint}} and querying them with > {{PointInSetQuery}}. > *Benchmark Result* > |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff > percentage| > |1|32|1|51.44|148.26|188.22%| > |1|32|2|26.8|101.88|280.15%| > |1|32|4|14.04|53.52|281.20%| > |1|32|8|7.04|28.54|305.40%| > |1|32|16|3.54|14.61|312.71%| > |1|128|1|110.56|350.26|216.81%| > |1|128|8|16.6|89.81|441.02%| > |1|128|16|8.45|48.07|468.88%| > |1|128|32|4.2|25.35|503.57%| > |1|128|64|2.13|13.02|511.27%| > |1|1024|1|536.19|843.88|57.38%| > |1|1024|8|109.71|251.89|129.60%| > |1|1024|32|33.24|104.11|213.21%| > |1|1024|128|8.87|30.47|243.52%| > |1|1024|512|2.24|8.3|270.54%| > |1|8192|1|.33|5000|50.00%| > |1|8192|32|139.47|214.59|53.86%| > |1|8192|128|54.59|109.23|100.09%| > |1|8192|512|15.61|36.15|131.58%| > |1|8192|2048|4.11|11.14|171.05%| > |1|1048576|1|2597.4|3030.3|16.67%| > |1|1048576|32|314.96|371.75|18.03%| > |1|1048576|128|99.7|116.28|16.63%| > |1|1048576|512|30.5|37.15|21.80%| > |1|1048576|2048|10.38|12.3|18.50%| > |1|8388608|1|2564.1|3174.6|23.81%| > |1|8388608|32|196.27|238.95|21.75%| > |1|8388608|128|55.36|68.03|22.89%| > |1|8388608|512|15.58|19.24|23.49%| > |1|8388608|2048|4.56|5.71|25.22%| > The indices size is reduced for low cardinality fields and flat for high > cardinality fields. > {code:java} > 113Mindex_1_doc_32_cardinality_baseline > 114Mindex_1_doc_32_cardinality_candidate > 140Mindex_1_doc_128_cardinality_baseline > 133M
[jira] [Comment Edited] (LUCENE-10319) Make ForUtil#BLOCK_SIZE changeable
[ https://issues.apache.org/jira/browse/LUCENE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461367#comment-17461367 ] Feng Guo edited comment on LUCENE-10319 at 12/19/21, 9:25 AM: -- Out of curiosity, I tried to run the luceneutil wikimedium1m for block size = 256, but got an error there: {code:java} WARNING: cat=AndHighHigh: hit counts differ: 10274+ vs 10884+ WARNING: cat=HighTerm: hit counts differ: 5969+ vs 9423+ WARNING: cat=LowTerm: hit counts differ: 2394+ vs 3325+ WARNING: cat=MedTerm: hit counts differ: 4558+ vs 7118+ WARNING: cat=OrHighHigh: hit counts differ: 5986+ vs 5987+ WARNING: cat=OrHighMed: hit counts differ: 3044+ vs 3445+ Traceback (most recent call last): File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/localrun.py", line 60, in comp.benchmark("baseline_vs_patch") File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/competition.py", line 494, in benchmark searchBench.run(id, base, challenger, File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/searchBench.py", line 196, in run raise RuntimeError('errors occurred: %s' % str(cmpDiffs)) RuntimeError: errors occurred: ([], ['query=+body:web +body:up filter=None sort=None groupField=None hitCount=10274+: wrong hitCount: 10274+ vs 10884+', 'query=body:he body:resulting filter=None sort=None groupField=None hitCount=3044+: wrong hitCount: 3044+ vs 3445+', 'query=body:official filter=None sort=None groupField=None hitCount=4558+: wrong hitCount: 4558+ vs 7118+', 'query=body:thumb filter=None sort=None groupField=None hitCount=5969+: wrong hitCount: 5969+ vs 9423+', 'query=body:years body:pages filter=None sort=None groupField=None hitCount=5986+: wrong hitCount: 5986+ vs 5987+', 'query=body:goods filter=None sort=None groupField=None hitCount=2394+: wrong hitCount: 2394+ vs 3325+'], 1.0) {code} I guess this error may be something about MaxScore optimization? So i changed the {{#TOTAL_HITS_THRESHOLD}} to a very large number for both baseline and candidate and rerun the benchmark, everything looks good now and i got a rather good report. But notice that this report does *not* really make sense since we changed the {{{}#TOTAL_HITS_THRESHOLD{}}}, this is just to verify the results are right. {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value Fuzzy1 118.73 (11.5%) 114.82 (13.0%) -3.3% ( -24% - 23%) 0.407 LowTerm 2369.88 (9.2%) 2323.31 (5.7%) -2.0% ( -15% - 14%) 0.428 PKLookup 250.07 (5.0%) 245.42 (4.3%) -1.9% ( -10% -7%) 0.214 Prefix3 306.43 (6.9%) 301.82 (7.0%) -1.5% ( -14% - 13%) 0.502 Wildcard 221.77 (5.2%) 218.64 (4.0%) -1.4% ( -10% -8%) 0.348 HighTermMonthSort 1161.02 (12.7%) 1156.95 (11.1%) -0.4% ( -21% - 26%) 0.928 BrowseDayOfYearSSDVFacets 140.62 (1.3%) 140.48 (1.1%) -0.1% ( -2% -2%) 0.791 Fuzzy2 47.51 (8.9%) 47.57 (7.0%)0.1% ( -14% - 17%) 0.961 Respell 200.51 (2.7%) 200.82 (1.4%)0.2% ( -3% -4%) 0.823 OrHighMed 197.90 (3.0%) 198.36 (3.6%)0.2% ( -6% -7%) 0.830 BrowseMonthSSDVFacets 152.24 (2.8%) 152.74 (1.0%)0.3% ( -3% -4%) 0.630 OrHighLow 245.11 (3.5%) 245.97 (3.1%)0.4% ( -6% -7%) 0.744 AndHighLow 1598.05 (7.2%) 1604.55 (4.6%)0.4% ( -10% - 13%) 0.836 BrowseDayOfYearTaxoFacets 28.84 (3.0%) 28.99 (3.3%)0.5% ( -5% -7%) 0.603 OrHighHigh 109.37 (4.2%) 110.14 (4.0%)0.7% ( -7% -9%) 0.599 BrowseMonthTaxoFacets 30.77 (3.5%) 31.00 (4.1%)0.8% ( -6% -8%) 0.541 BrowseDateTaxoFacets 28.71 (3.2%) 28.93 (3.3%)0.8% ( -5% -7%) 0.461 HighTermDayOfYearSort 593.30 (13.5%) 599.82 (13.2%)1.1% ( -22% - 32%) 0.800 AndHighHigh 441.62 (5.0%) 452.99 (4.1%)2.6% ( -6% - 12%) 0.083 IntNRQ 121.71 (6.2%) 124.89 (4.2%)2.6% ( -7% - 13%) 0.127 HighTerm 599.78 (4.2%) 615.86 (2.6%)2.7% ( -3% -9%) 0.019 MedSloppyPhrase 397.69 (3.1%) 411.46 (3.3%)3.5% ( -2% - 10%)
[jira] [Updated] (LUCENE-10319) Make ForUtil#BLOCK_SIZE changeable
[ https://issues.apache.org/jira/browse/LUCENE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10319: -- Description: In LUCENE-10315, I tried to generate a {{ForUtil}} whose {{{}BLOCK_SIZE=512{}}}, I thought it could be simple since it looks like i only need to change the {{BLOCK_SIZE}}, but it turns out that there are a lot of values related to the {{BLOCK_SIZE}} but hard coded. So this approach is trying to make all hard code value related to BLOCK_SIZE to be generated from the {{BLOCK_SIZE}} in case we need a different {{BLOCK_SIZE}} {{ForUtil}} somewhere else or want to change {{BLOCK_SIZE}} in postings in feature. I tried to make the {{BLOCK_SIZE = 64 / 256}} and all tests passed. was: In LUCENE-10315, I tried to generate a {{ForUtil}} whose {{{}BLOCK_SIZE=512{}}}, I thought it could be simple since it looks like i only need to change the BLOCK_SIZE, but it turns out that there are a lot of values related to the BLOCK_SIZE but hard coded. So this is trying to make all hard code value generated from the BLOCK_SIZE in case we need a ForUtil somewhere else or want to change BLOCK_SIZE in postings in feature. I tried to make the BLOCK_SIZE = 64 / 256 and all tests passed. > Make ForUtil#BLOCK_SIZE changeable > -- > > Key: LUCENE-10319 > URL: https://issues.apache.org/jira/browse/LUCENE-10319 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > In LUCENE-10315, I tried to generate a {{ForUtil}} whose > {{{}BLOCK_SIZE=512{}}}, I thought it could be simple since it looks like i > only need to change the {{BLOCK_SIZE}}, but it turns out that there are a lot > of values related to the {{BLOCK_SIZE}} but hard coded. > So this approach is trying to make all hard code value related to BLOCK_SIZE > to be generated from the {{BLOCK_SIZE}} in case we need a different > {{BLOCK_SIZE}} {{ForUtil}} somewhere else or want to change {{BLOCK_SIZE}} in > postings in feature. > I tried to make the {{BLOCK_SIZE = 64 / 256}} and all tests passed. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10319) Make ForUtil#BLOCK_SIZE changeable
[ https://issues.apache.org/jira/browse/LUCENE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461367#comment-17461367 ] Feng Guo edited comment on LUCENE-10319 at 12/17/21, 2:08 PM: -- Out of curiosity, I tried to run the luceneutil wikimedium1m for block size = 256, but got an error there: {code:java} WARNING: cat=AndHighHigh: hit counts differ: 10274+ vs 10884+ WARNING: cat=HighTerm: hit counts differ: 5969+ vs 9423+ WARNING: cat=LowTerm: hit counts differ: 2394+ vs 3325+ WARNING: cat=MedTerm: hit counts differ: 4558+ vs 7118+ WARNING: cat=OrHighHigh: hit counts differ: 5986+ vs 5987+ WARNING: cat=OrHighMed: hit counts differ: 3044+ vs 3445+ Traceback (most recent call last): File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/localrun.py", line 60, in comp.benchmark("baseline_vs_patch") File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/competition.py", line 494, in benchmark searchBench.run(id, base, challenger, File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/searchBench.py", line 196, in run raise RuntimeError('errors occurred: %s' % str(cmpDiffs)) RuntimeError: errors occurred: ([], ['query=+body:web +body:up filter=None sort=None groupField=None hitCount=10274+: wrong hitCount: 10274+ vs 10884+', 'query=body:he body:resulting filter=None sort=None groupField=None hitCount=3044+: wrong hitCount: 3044+ vs 3445+', 'query=body:official filter=None sort=None groupField=None hitCount=4558+: wrong hitCount: 4558+ vs 7118+', 'query=body:thumb filter=None sort=None groupField=None hitCount=5969+: wrong hitCount: 5969+ vs 9423+', 'query=body:years body:pages filter=None sort=None groupField=None hitCount=5986+: wrong hitCount: 5986+ vs 5987+', 'query=body:goods filter=None sort=None groupField=None hitCount=2394+: wrong hitCount: 2394+ vs 3325+'], 1.0) {code} I guess this error may be something about Impacts? So i changed the {{#TOTAL_HITS_THRESHOLD}} to a very large number for both baseline and candidate and rerun the benchmark, everything looks good now and i got a rather good report. But notice that this report does *not* really make sense since we changed the {{{}#TOTAL_HITS_THRESHOLD{}}}, this is just to verify the results are right. {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value Fuzzy1 118.73 (11.5%) 114.82 (13.0%) -3.3% ( -24% - 23%) 0.407 LowTerm 2369.88 (9.2%) 2323.31 (5.7%) -2.0% ( -15% - 14%) 0.428 PKLookup 250.07 (5.0%) 245.42 (4.3%) -1.9% ( -10% -7%) 0.214 Prefix3 306.43 (6.9%) 301.82 (7.0%) -1.5% ( -14% - 13%) 0.502 Wildcard 221.77 (5.2%) 218.64 (4.0%) -1.4% ( -10% -8%) 0.348 HighTermMonthSort 1161.02 (12.7%) 1156.95 (11.1%) -0.4% ( -21% - 26%) 0.928 BrowseDayOfYearSSDVFacets 140.62 (1.3%) 140.48 (1.1%) -0.1% ( -2% -2%) 0.791 Fuzzy2 47.51 (8.9%) 47.57 (7.0%)0.1% ( -14% - 17%) 0.961 Respell 200.51 (2.7%) 200.82 (1.4%)0.2% ( -3% -4%) 0.823 OrHighMed 197.90 (3.0%) 198.36 (3.6%)0.2% ( -6% -7%) 0.830 BrowseMonthSSDVFacets 152.24 (2.8%) 152.74 (1.0%)0.3% ( -3% -4%) 0.630 OrHighLow 245.11 (3.5%) 245.97 (3.1%)0.4% ( -6% -7%) 0.744 AndHighLow 1598.05 (7.2%) 1604.55 (4.6%)0.4% ( -10% - 13%) 0.836 BrowseDayOfYearTaxoFacets 28.84 (3.0%) 28.99 (3.3%)0.5% ( -5% -7%) 0.603 OrHighHigh 109.37 (4.2%) 110.14 (4.0%)0.7% ( -7% -9%) 0.599 BrowseMonthTaxoFacets 30.77 (3.5%) 31.00 (4.1%)0.8% ( -6% -8%) 0.541 BrowseDateTaxoFacets 28.71 (3.2%) 28.93 (3.3%)0.8% ( -5% -7%) 0.461 HighTermDayOfYearSort 593.30 (13.5%) 599.82 (13.2%)1.1% ( -22% - 32%) 0.800 AndHighHigh 441.62 (5.0%) 452.99 (4.1%)2.6% ( -6% - 12%) 0.083 IntNRQ 121.71 (6.2%) 124.89 (4.2%)2.6% ( -7% - 13%) 0.127 HighTerm 599.78 (4.2%) 615.86 (2.6%)2.7% ( -3% -9%) 0.019 MedSloppyPhrase 397.69 (3.1%) 411.46 (3.3%)3.5% ( -2% - 10%) 0.001
[jira] [Comment Edited] (LUCENE-10319) Make ForUtil#BLOCK_SIZE changeable
[ https://issues.apache.org/jira/browse/LUCENE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461367#comment-17461367 ] Feng Guo edited comment on LUCENE-10319 at 12/17/21, 12:16 PM: --- Out of curiosity, I tried to run the luceneutil wikimedium1m for block size = 256, but got an error there: {code:java} WARNING: cat=AndHighHigh: hit counts differ: 10274+ vs 10884+ WARNING: cat=HighTerm: hit counts differ: 5969+ vs 9423+ WARNING: cat=LowTerm: hit counts differ: 2394+ vs 3325+ WARNING: cat=MedTerm: hit counts differ: 4558+ vs 7118+ WARNING: cat=OrHighHigh: hit counts differ: 5986+ vs 5987+ WARNING: cat=OrHighMed: hit counts differ: 3044+ vs 3445+ Traceback (most recent call last): File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/localrun.py", line 60, in comp.benchmark("baseline_vs_patch") File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/competition.py", line 494, in benchmark searchBench.run(id, base, challenger, File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/searchBench.py", line 196, in run raise RuntimeError('errors occurred: %s' % str(cmpDiffs)) RuntimeError: errors occurred: ([], ['query=+body:web +body:up filter=None sort=None groupField=None hitCount=10274+: wrong hitCount: 10274+ vs 10884+', 'query=body:he body:resulting filter=None sort=None groupField=None hitCount=3044+: wrong hitCount: 3044+ vs 3445+', 'query=body:official filter=None sort=None groupField=None hitCount=4558+: wrong hitCount: 4558+ vs 7118+', 'query=body:thumb filter=None sort=None groupField=None hitCount=5969+: wrong hitCount: 5969+ vs 9423+', 'query=body:years body:pages filter=None sort=None groupField=None hitCount=5986+: wrong hitCount: 5986+ vs 5987+', 'query=body:goods filter=None sort=None groupField=None hitCount=2394+: wrong hitCount: 2394+ vs 3325+'], 1.0) {code} I guess this error may be something about Impacts? So i changed the {{#TOTAL_HITS_THRESHOLD}} to a very large number for both baseline and candidate and rerun the benchmark, everything looks good now and i got a rather good report. But notice that this report does *not* really make sense since we changed the {{{}#TOTAL_HITS_THRESHOLD{}}}, this is just to verify the results are right. {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value Fuzzy1 118.73 (11.5%) 114.82 (13.0%) -3.3% ( -24% - 23%) 0.407 LowTerm 2369.88 (9.2%) 2323.31 (5.7%) -2.0% ( -15% - 14%) 0.428 PKLookup 250.07 (5.0%) 245.42 (4.3%) -1.9% ( -10% -7%) 0.214 Prefix3 306.43 (6.9%) 301.82 (7.0%) -1.5% ( -14% - 13%) 0.502 Wildcard 221.77 (5.2%) 218.64 (4.0%) -1.4% ( -10% -8%) 0.348 HighTermMonthSort 1161.02 (12.7%) 1156.95 (11.1%) -0.4% ( -21% - 26%) 0.928 BrowseDayOfYearSSDVFacets 140.62 (1.3%) 140.48 (1.1%) -0.1% ( -2% -2%) 0.791 Fuzzy2 47.51 (8.9%) 47.57 (7.0%)0.1% ( -14% - 17%) 0.961 Respell 200.51 (2.7%) 200.82 (1.4%)0.2% ( -3% -4%) 0.823 OrHighMed 197.90 (3.0%) 198.36 (3.6%)0.2% ( -6% -7%) 0.830 BrowseMonthSSDVFacets 152.24 (2.8%) 152.74 (1.0%)0.3% ( -3% -4%) 0.630 OrHighLow 245.11 (3.5%) 245.97 (3.1%)0.4% ( -6% -7%) 0.744 AndHighLow 1598.05 (7.2%) 1604.55 (4.6%)0.4% ( -10% - 13%) 0.836 BrowseDayOfYearTaxoFacets 28.84 (3.0%) 28.99 (3.3%)0.5% ( -5% -7%) 0.603 OrHighHigh 109.37 (4.2%) 110.14 (4.0%)0.7% ( -7% -9%) 0.599 BrowseMonthTaxoFacets 30.77 (3.5%) 31.00 (4.1%)0.8% ( -6% -8%) 0.541 BrowseDateTaxoFacets 28.71 (3.2%) 28.93 (3.3%)0.8% ( -5% -7%) 0.461 HighTermDayOfYearSort 593.30 (13.5%) 599.82 (13.2%)1.1% ( -22% - 32%) 0.800 AndHighHigh 441.62 (5.0%) 452.99 (4.1%)2.6% ( -6% - 12%) 0.083 IntNRQ 121.71 (6.2%) 124.89 (4.2%)2.6% ( -7% - 13%) 0.127 HighTerm 599.78 (4.2%) 615.86 (2.6%)2.7% ( -3% -9%) 0.019 MedSloppyPhrase 397.69 (3.1%) 411.46 (3.3%)3.5% ( -2% - 10%) 0.001
[jira] [Comment Edited] (LUCENE-10319) Make ForUtil#BLOCK_SIZE changeable
[ https://issues.apache.org/jira/browse/LUCENE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461367#comment-17461367 ] Feng Guo edited comment on LUCENE-10319 at 12/17/21, 10:50 AM: --- Out of curiosity, I tried to run the luceneutil wikimedium1m for block size = 256, but got an error there: {code:java} WARNING: cat=AndHighHigh: hit counts differ: 10274+ vs 10884+ WARNING: cat=HighTerm: hit counts differ: 5969+ vs 9423+ WARNING: cat=LowTerm: hit counts differ: 2394+ vs 3325+ WARNING: cat=MedTerm: hit counts differ: 4558+ vs 7118+ WARNING: cat=OrHighHigh: hit counts differ: 5986+ vs 5987+ WARNING: cat=OrHighMed: hit counts differ: 3044+ vs 3445+ Traceback (most recent call last): File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/localrun.py", line 60, in comp.benchmark("baseline_vs_patch") File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/competition.py", line 494, in benchmark searchBench.run(id, base, challenger, File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/searchBench.py", line 196, in run raise RuntimeError('errors occurred: %s' % str(cmpDiffs)) RuntimeError: errors occurred: ([], ['query=+body:web +body:up filter=None sort=None groupField=None hitCount=10274+: wrong hitCount: 10274+ vs 10884+', 'query=body:he body:resulting filter=None sort=None groupField=None hitCount=3044+: wrong hitCount: 3044+ vs 3445+', 'query=body:official filter=None sort=None groupField=None hitCount=4558+: wrong hitCount: 4558+ vs 7118+', 'query=body:thumb filter=None sort=None groupField=None hitCount=5969+: wrong hitCount: 5969+ vs 9423+', 'query=body:years body:pages filter=None sort=None groupField=None hitCount=5986+: wrong hitCount: 5986+ vs 5987+', 'query=body:goods filter=None sort=None groupField=None hitCount=2394+: wrong hitCount: 2394+ vs 3325+'], 1.0) {code} I guess this error may be something about Impacts? So i changed the {{#TOTAL_HITS_THRESHOLD}} to a very large number for both baseline and candidate and rerun the benchmark, everything looks good now and i got a rather good report. But notice that this report does *not* really make sense since we changed the {{{}#TOTAL_HITS_THRESHOLD{}}}, this is just to verify the results are right. {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value Fuzzy1 118.73 (11.5%) 114.82 (13.0%) -3.3% ( -24% - 23%) 0.407 LowTerm 2369.88 (9.2%) 2323.31 (5.7%) -2.0% ( -15% - 14%) 0.428 PKLookup 250.07 (5.0%) 245.42 (4.3%) -1.9% ( -10% -7%) 0.214 Prefix3 306.43 (6.9%) 301.82 (7.0%) -1.5% ( -14% - 13%) 0.502 Wildcard 221.77 (5.2%) 218.64 (4.0%) -1.4% ( -10% -8%) 0.348 HighTermMonthSort 1161.02 (12.7%) 1156.95 (11.1%) -0.4% ( -21% - 26%) 0.928 BrowseDayOfYearSSDVFacets 140.62 (1.3%) 140.48 (1.1%) -0.1% ( -2% -2%) 0.791 Fuzzy2 47.51 (8.9%) 47.57 (7.0%)0.1% ( -14% - 17%) 0.961 Respell 200.51 (2.7%) 200.82 (1.4%)0.2% ( -3% -4%) 0.823 OrHighMed 197.90 (3.0%) 198.36 (3.6%)0.2% ( -6% -7%) 0.830 BrowseMonthSSDVFacets 152.24 (2.8%) 152.74 (1.0%)0.3% ( -3% -4%) 0.630 OrHighLow 245.11 (3.5%) 245.97 (3.1%)0.4% ( -6% -7%) 0.744 AndHighLow 1598.05 (7.2%) 1604.55 (4.6%)0.4% ( -10% - 13%) 0.836 BrowseDayOfYearTaxoFacets 28.84 (3.0%) 28.99 (3.3%)0.5% ( -5% -7%) 0.603 OrHighHigh 109.37 (4.2%) 110.14 (4.0%)0.7% ( -7% -9%) 0.599 BrowseMonthTaxoFacets 30.77 (3.5%) 31.00 (4.1%)0.8% ( -6% -8%) 0.541 BrowseDateTaxoFacets 28.71 (3.2%) 28.93 (3.3%)0.8% ( -5% -7%) 0.461 HighTermDayOfYearSort 593.30 (13.5%) 599.82 (13.2%)1.1% ( -22% - 32%) 0.800 AndHighHigh 441.62 (5.0%) 452.99 (4.1%)2.6% ( -6% - 12%) 0.083 IntNRQ 121.71 (6.2%) 124.89 (4.2%)2.6% ( -7% - 13%) 0.127 HighTerm 599.78 (4.2%) 615.86 (2.6%)2.7% ( -3% -9%) 0.019 MedSloppyPhrase 397.69 (3.1%) 411.46 (3.3%)3.5% ( -2% - 10%) 0.001
[jira] [Comment Edited] (LUCENE-10319) Make ForUtil#BLOCK_SIZE changeable
[ https://issues.apache.org/jira/browse/LUCENE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461367#comment-17461367 ] Feng Guo edited comment on LUCENE-10319 at 12/17/21, 10:49 AM: --- Out of curiosity, I tried to run the luceneutil wikimedium1m for block size = 256, but got an error there: {code:java} WARNING: cat=AndHighHigh: hit counts differ: 10274+ vs 10884+ WARNING: cat=HighTerm: hit counts differ: 5969+ vs 9423+ WARNING: cat=LowTerm: hit counts differ: 2394+ vs 3325+ WARNING: cat=MedTerm: hit counts differ: 4558+ vs 7118+ WARNING: cat=OrHighHigh: hit counts differ: 5986+ vs 5987+ WARNING: cat=OrHighMed: hit counts differ: 3044+ vs 3445+ Traceback (most recent call last): File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/localrun.py", line 60, in comp.benchmark("baseline_vs_patch") File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/competition.py", line 494, in benchmark searchBench.run(id, base, challenger, File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/searchBench.py", line 196, in run raise RuntimeError('errors occurred: %s' % str(cmpDiffs)) RuntimeError: errors occurred: ([], ['query=+body:web +body:up filter=None sort=None groupField=None hitCount=10274+: wrong hitCount: 10274+ vs 10884+', 'query=body:he body:resulting filter=None sort=None groupField=None hitCount=3044+: wrong hitCount: 3044+ vs 3445+', 'query=body:official filter=None sort=None groupField=None hitCount=4558+: wrong hitCount: 4558+ vs 7118+', 'query=body:thumb filter=None sort=None groupField=None hitCount=5969+: wrong hitCount: 5969+ vs 9423+', 'query=body:years body:pages filter=None sort=None groupField=None hitCount=5986+: wrong hitCount: 5986+ vs 5987+', 'query=body:goods filter=None sort=None groupField=None hitCount=2394+: wrong hitCount: 2394+ vs 3325+'], 1.0) {code} I guess this error may be something about Impacts? So i changed the {{#TOTAL_HITS_THRESHOLD}} to a very large number for both baseline and candidate and rerun the benchmark, everything looks good now and i got a rather good report. But notice that this report does *not* really make sense since we changed the {{{}#TOTAL_HITS_THRESHOLD{}}}, this is just to verify the results are right. {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value Fuzzy1 118.73 (11.5%) 114.82 (13.0%) -3.3% ( -24% - 23%) 0.407 LowTerm 2369.88 (9.2%) 2323.31 (5.7%) -2.0% ( -15% - 14%) 0.428 PKLookup 250.07 (5.0%) 245.42 (4.3%) -1.9% ( -10% -7%) 0.214 Prefix3 306.43 (6.9%) 301.82 (7.0%) -1.5% ( -14% - 13%) 0.502 Wildcard 221.77 (5.2%) 218.64 (4.0%) -1.4% ( -10% -8%) 0.348 HighTermMonthSort 1161.02 (12.7%) 1156.95 (11.1%) -0.4% ( -21% - 26%) 0.928 BrowseDayOfYearSSDVFacets 140.62 (1.3%) 140.48 (1.1%) -0.1% ( -2% -2%) 0.791 Fuzzy2 47.51 (8.9%) 47.57 (7.0%)0.1% ( -14% - 17%) 0.961 Respell 200.51 (2.7%) 200.82 (1.4%)0.2% ( -3% -4%) 0.823 OrHighMed 197.90 (3.0%) 198.36 (3.6%)0.2% ( -6% -7%) 0.830 BrowseMonthSSDVFacets 152.24 (2.8%) 152.74 (1.0%)0.3% ( -3% -4%) 0.630 OrHighLow 245.11 (3.5%) 245.97 (3.1%)0.4% ( -6% -7%) 0.744 AndHighLow 1598.05 (7.2%) 1604.55 (4.6%)0.4% ( -10% - 13%) 0.836 BrowseDayOfYearTaxoFacets 28.84 (3.0%) 28.99 (3.3%)0.5% ( -5% -7%) 0.603 OrHighHigh 109.37 (4.2%) 110.14 (4.0%)0.7% ( -7% -9%) 0.599 BrowseMonthTaxoFacets 30.77 (3.5%) 31.00 (4.1%)0.8% ( -6% -8%) 0.541 BrowseDateTaxoFacets 28.71 (3.2%) 28.93 (3.3%)0.8% ( -5% -7%) 0.461 HighTermDayOfYearSort 593.30 (13.5%) 599.82 (13.2%)1.1% ( -22% - 32%) 0.800 AndHighHigh 441.62 (5.0%) 452.99 (4.1%)2.6% ( -6% - 12%) 0.083 IntNRQ 121.71 (6.2%) 124.89 (4.2%)2.6% ( -7% - 13%) 0.127 HighTerm 599.78 (4.2%) 615.86 (2.6%)2.7% ( -3% -9%) 0.019 MedSloppyPhrase 397.69 (3.1%) 411.46 (3.3%)3.5% ( -2% - 10%) 0.001
[jira] [Commented] (LUCENE-10319) Make ForUtil#BLOCK_SIZE changeable
[ https://issues.apache.org/jira/browse/LUCENE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461367#comment-17461367 ] Feng Guo commented on LUCENE-10319: --- Out of curiosity, I tried to run the luceneutil wikimedium1m for block size = 256, but got an error there: {code:java} WARNING: cat=AndHighHigh: hit counts differ: 10274+ vs 10884+ WARNING: cat=HighTerm: hit counts differ: 5969+ vs 9423+ WARNING: cat=LowTerm: hit counts differ: 2394+ vs 3325+ WARNING: cat=MedTerm: hit counts differ: 4558+ vs 7118+ WARNING: cat=OrHighHigh: hit counts differ: 5986+ vs 5987+ WARNING: cat=OrHighMed: hit counts differ: 3044+ vs 3445+ Traceback (most recent call last): File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/localrun.py", line 60, in comp.benchmark("baseline_vs_patch") File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/competition.py", line 494, in benchmark searchBench.run(id, base, challenger, File "/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/searchBench.py", line 196, in run raise RuntimeError('errors occurred: %s' % str(cmpDiffs)) RuntimeError: errors occurred: ([], ['query=+body:web +body:up filter=None sort=None groupField=None hitCount=10274+: wrong hitCount: 10274+ vs 10884+', 'query=body:he body:resulting filter=None sort=None groupField=None hitCount=3044+: wrong hitCount: 3044+ vs 3445+', 'query=body:official filter=None sort=None groupField=None hitCount=4558+: wrong hitCount: 4558+ vs 7118+', 'query=body:thumb filter=None sort=None groupField=None hitCount=5969+: wrong hitCount: 5969+ vs 9423+', 'query=body:years body:pages filter=None sort=None groupField=None hitCount=5986+: wrong hitCount: 5986+ vs 5987+', 'query=body:goods filter=None sort=None groupField=None hitCount=2394+: wrong hitCount: 2394+ vs 3325+'], 1.0) {code} I guess this error may be something about Impacts? So i changed the {{#TOTAL_HITS_THRESHOLD}} to a very large number for both baseline and candidate and rerun the benchmark, everything looks good now and i got a rather good report. But this report does *not* really makes sense since we changed the {{{}#TOTAL_HITS_THRESHOLD{}}}, just to verify the results are right. {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value Fuzzy1 118.73 (11.5%) 114.82 (13.0%) -3.3% ( -24% - 23%) 0.407 LowTerm 2369.88 (9.2%) 2323.31 (5.7%) -2.0% ( -15% - 14%) 0.428 PKLookup 250.07 (5.0%) 245.42 (4.3%) -1.9% ( -10% -7%) 0.214 Prefix3 306.43 (6.9%) 301.82 (7.0%) -1.5% ( -14% - 13%) 0.502 Wildcard 221.77 (5.2%) 218.64 (4.0%) -1.4% ( -10% -8%) 0.348 HighTermMonthSort 1161.02 (12.7%) 1156.95 (11.1%) -0.4% ( -21% - 26%) 0.928 BrowseDayOfYearSSDVFacets 140.62 (1.3%) 140.48 (1.1%) -0.1% ( -2% -2%) 0.791 Fuzzy2 47.51 (8.9%) 47.57 (7.0%)0.1% ( -14% - 17%) 0.961 Respell 200.51 (2.7%) 200.82 (1.4%)0.2% ( -3% -4%) 0.823 OrHighMed 197.90 (3.0%) 198.36 (3.6%)0.2% ( -6% -7%) 0.830 BrowseMonthSSDVFacets 152.24 (2.8%) 152.74 (1.0%)0.3% ( -3% -4%) 0.630 OrHighLow 245.11 (3.5%) 245.97 (3.1%)0.4% ( -6% -7%) 0.744 AndHighLow 1598.05 (7.2%) 1604.55 (4.6%)0.4% ( -10% - 13%) 0.836 BrowseDayOfYearTaxoFacets 28.84 (3.0%) 28.99 (3.3%)0.5% ( -5% -7%) 0.603 OrHighHigh 109.37 (4.2%) 110.14 (4.0%)0.7% ( -7% -9%) 0.599 BrowseMonthTaxoFacets 30.77 (3.5%) 31.00 (4.1%)0.8% ( -6% -8%) 0.541 BrowseDateTaxoFacets 28.71 (3.2%) 28.93 (3.3%)0.8% ( -5% -7%) 0.461 HighTermDayOfYearSort 593.30 (13.5%) 599.82 (13.2%)1.1% ( -22% - 32%) 0.800 AndHighHigh 441.62 (5.0%) 452.99 (4.1%)2.6% ( -6% - 12%) 0.083 IntNRQ 121.71 (6.2%) 124.89 (4.2%)2.6% ( -7% - 13%) 0.127 HighTerm 599.78 (4.2%) 615.86 (2.6%)2.7% ( -3% -9%) 0.019 MedSloppyPhrase 397.69 (3.1%) 411.46 (3.3%)3.5% ( -2% - 10%) 0.001 MedSpanNear 75.75 (2.8%) 78.59 (1.5%)
[jira] (LUCENE-10319) Make ForUtil#BLOCK_SIZE changeable
[ https://issues.apache.org/jira/browse/LUCENE-10319 ] Feng Guo deleted comment on LUCENE-10319: --- was (Author: gf2121): Out of curiosity, I run the luceneutil wikimedium1m for block size = 64 / 256, I post the result here in case someone would be interested in this :) *BLOCK_SIZE=64* {{Index size:}} {{434M (block size = 128)}} {{446M (block size = 64)}} {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value AndHighMed 742.46 (6.2%) 632.83 (3.9%) -14.8% ( -23% - -4%) 0.000 MedSpanNear 106.50 (2.8%) 92.48 (3.7%) -13.2% ( -19% - -6%) 0.000 MedSloppyPhrase 147.88 (3.0%) 128.80 (2.2%) -12.9% ( -17% - -7%) 0.000 LowSloppyPhrase 491.02 (3.7%) 428.92 (3.5%) -12.6% ( -19% - -5%) 0.000 LowSpanNear 332.59 (3.0%) 292.64 (3.8%) -12.0% ( -18% - -5%) 0.000 MedIntervalsOrdered 80.37 (3.3%) 71.33 (2.6%) -11.2% ( -16% - -5%) 0.000 LowIntervalsOrdered 163.87 (3.1%) 145.73 (2.2%) -11.1% ( -15% - -5%) 0.000 HighSloppyPhrase 137.71 (3.8%) 122.61 (3.4%) -11.0% ( -17% - -3%) 0.000 LowTerm 2787.22 (6.1%) 2488.95 (6.1%) -10.7% ( -21% -1%) 0.000 OrHighHigh 160.41 (3.1%) 144.06 (3.7%) -10.2% ( -16% - -3%) 0.000 HighSpanNear 140.00 (1.7%) 127.69 (3.0%) -8.8% ( -13% - -4%) 0.000 OrHighMed 258.10 (4.3%) 235.96 (4.6%) -8.6% ( -16% -0%) 0.000 HighIntervalsOrdered 257.27 (3.0%) 242.95 (4.8%) -5.6% ( -12% -2%) 0.000 AndHighHigh 248.63 (3.0%) 234.84 (3.2%) -5.5% ( -11% -0%) 0.000 HighTermDayOfYearSort 954.02 (9.5%) 905.20 (7.4%) -5.1% ( -20% - 13%) 0.058 AndHighLow 1550.86 (5.0%) 1498.68 (4.5%) -3.4% ( -12% -6%) 0.026 HighTermMonthSort 633.80 (10.4%) 613.68 (5.9%) -3.2% ( -17% - 14%) 0.236 LowPhrase 547.94 (3.9%) 534.39 (3.1%) -2.5% ( -9% -4%) 0.027 Prefix3 566.20 (11.3%) 554.74 (8.9%) -2.0% ( -19% - 20%) 0.529 MedPhrase 468.94 (3.0%) 461.20 (4.8%) -1.7% ( -9% -6%) 0.192 Respell 149.39 (3.9%) 147.07 (5.3%) -1.6% ( -10% -7%) 0.287 OrHighLow 908.68 (5.2%) 899.50 (5.3%) -1.0% ( -10% - 10%) 0.542 Fuzzy2 75.80 (10.0%) 75.37 (12.6%) -0.6% ( -21% - 24%) 0.876 BrowseMonthSSDVFacets 151.56 (0.7%) 150.73 (2.8%) -0.5% ( -4% -2%) 0.399 Fuzzy1 117.46 (14.0%) 116.84 (12.6%) -0.5% ( -23% - 30%) 0.899 BrowseDayOfYearSSDVFacets 139.72 (0.9%) 139.01 (1.8%) -0.5% ( -3% -2%) 0.250 Wildcard 418.32 (11.7%) 416.56 (11.3%) -0.4% ( -20% - 25%) 0.908 IntNRQ 641.72 (5.4%) 643.10 (5.5%)0.2% ( -10% - 11%) 0.900 HighPhrase 547.62 (6.0%) 549.35 (11.0%)0.3% ( -15% - 18%) 0.910 BrowseDateTaxoFacets 29.02 (2.9%) 29.40 (5.3%)1.3% ( -6% -9%) 0.336 BrowseMonthTaxoFacets 31.12 (3.7%) 31.52 (6.4%)1.3% ( -8% - 11%) 0.430 BrowseDayOfYearTaxoFacets 29.03 (3.2%) 29.42 (5.3%)1.4% ( -6% - 10%) 0.328 PKLookup 239.41 (2.5%) 242.82 (4.0%)1.4% ( -4% -8%) 0.174 MedTerm 2332.72 (4.5%) 2445.01 (4.6%)4.8% ( -4% - 14%) 0.001 HighTerm 1835.22 (5.3%) 1935.28 (6.0%)5.5% ( -5% - 17%) 0.002 {code} *BLOCK_SIZE=256* {{Index size:}} {{434M (block size = 128)}} {{438M (block size = 256)}} {code:java} TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value AndHighHigh 214.93 (3.8%) 183.83 (2.6%) -14.5% ( -20% - -8%) 0.000 MedTerm 2589.52 (4.5%) 2303.67 (5.5%) -11.0% ( -20% - -1%) 0.000 HighTerm 1750.90 (4.0%) 1560.54
[jira] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-10315 ] Feng Guo deleted comment on LUCENE-10315: --- was (Author: gf2121): I noticed that benchmark in LuceneUtil is mainly for geo scenes (BKD can support multi-dimension points is really a powerful feature! ), but the main direction of this optimization is the low/medium cardinality 1D point scenario (high cardinality of 1D field has also been improved by nearly 20%), so here I'd like to describe some background of optimizing medium/low cardinality fields in BKD: I'm a user of Elasticsearch (which based on lucene), ES can automatically infers types for users with its dynamic mapping feature. When users index some low cardinality fields, such as gender / age / status... they often use some numbers to represent the values, while ES will infer these fields as {{{}long{}}}, and ES uses BKD as the index of {{long}} fields. When the data volume grows, building the result set of low-cardinality fields will make the CPU usage and load very high. This is a flame graph we obtained from the production environment: [^addall.svg] It can be seen that almost all CPU is used in addAll. When we reindex {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly reduced ( We spent weeks of time to reindex all indices... ). I know that ES recommended to use {{keyword}} for term/terms query and {{long}} for range query in its document, but there are always some users who didn't realize this and keep their habit of using sql database, or dynamic mapping automatically selects the type for them. All in all, users won't realize that there is such a big difference in performance between {{long}} and {{keyword}} fields in low cardinality fields. So from my point of view it will make sense if we can make BKD works better for the low/medium cardinality fields. As far as i can see, for low cardinality fields, there are two advantages of {{keyword}} over {{{}long{}}}: 1. {{ForUtil}} used in {{keyword}} postings is much more efficient than BKD's delta VInt, because its batch reading (readLongs) and SIMD decode. 2. When the query term count is less than 16, {{TermsInSetQuery}} can lazily materialize of its result set, and when another small result clause intersects with this low cardinality condition, the low cardinality field can avoid reading all docIds into memory. This ISSUE is targeting to solve the first point. I hope these words can explain a bit the motivation of this ISSUE :) > Speed up BKD leaf block ids codec by a 512 ints ForUtil > --- > > Key: LUCENE-10315 > URL: https://issues.apache.org/jira/browse/LUCENE-10315 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Feng Guo >Priority: Major > Attachments: addall.svg > > Time Spent: 10m > Remaining Estimate: 0h > > Elasticsearch (which based on lucene) can automatically infers types for > users with its dynamic mapping feature. When users index some low cardinality > fields, such as gender / age / status... they often use some numbers to > represent the values, while ES will infer these fields as {{{}long{}}}, and > ES uses BKD as the index of {{long}} fields. When the data volume grows, > building the result set of low-cardinality fields will make the CPU usage and > load very high. > This is a flame graph we obtained from the production environment: > [^addall.svg] > It can be seen that almost all CPU is used in addAll. When we reindex > {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly > reduced ( We spent weeks of time to reindex all indices... ). I know that ES > recommended to use {{keyword}} for term/terms query and {{long}} for range > query in the document, but there are always some users who didn't realize > this and keep their habit of using sql database, or dynamic mapping > automatically selects the type for them. All in all, users won't realize that > there would be such a big difference in performance between {{long}} and > {{keyword}} fields in low cardinality fields. So from my point of view it > will make sense if we can make BKD works better for the low/medium > cardinality fields. > As far as i can see, for low cardinality fields, there are two advantages of > {{keyword}} over {{{}long{}}}: > 1. {{ForUtil}} used in {{keyword}} postings is much more efficient than BKD's > delta VInt, because its batch reading (readLongs) and SIMD decode. > 2. When the query term count is less than 16, {{TermsInSetQuery}} can lazily > materialize of its result set, and when another small result clause > intersects with this low cardinality condition, the low cardinality field can > avoid reading all docIds into memory. > This ISSUE is targeting to solve the first point. The basic idea is trying to > use a 512 ints {{ForUtil}}