from:"Feng Guo \(Jira\)"

[jira] [Commented] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil

2022-04-18 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523700#comment-17523700
 ] 

Feng Guo commented on LUCENE-10315:
---

Thanks [~ivera]! +1 to remove the int24 forutil implementation. I have updated 
the branch: https://github.com/apache/lucene/pull/797

> Speed up BKD leaf block ids codec by a 512 ints ForUtil
> ---
>
> Key: LUCENE-10315
> URL: https://issues.apache.org/jira/browse/LUCENE-10315
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Feng Guo
>Assignee: Feng Guo
>Priority: Major
> Attachments: addall.svg, cpu_profile_baseline.html, 
> cpu_profile_path.html
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Elasticsearch (which based on lucene) can automatically infers types for 
> users with its dynamic mapping feature. When users index some low cardinality 
> fields, such as gender / age / status... they often use some numbers to 
> represent the values, while ES will infer these fields as {{{}long{}}}, and 
> ES uses BKD as the index of {{long}} fields. When the data volume grows, 
> building the result set of low-cardinality fields will make the CPU usage and 
> load very high.
> This is a flame graph we obtained from the production environment:
> [^addall.svg]
> It can be seen that almost all CPU is used in addAll. When we reindex 
> {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly 
> reduced ( We spent weeks of time to reindex all indices... ). I know that ES 
> recommended to use {{keyword}} for term/terms query and {{long}} for range 
> query in the document, but there are always some users who didn't realize 
> this and keep their habit of using sql database, or dynamic mapping 
> automatically selects the type for them. All in all, users won't realize that 
> there would be such a big difference in performance between {{long}} and 
> {{keyword}} fields in low cardinality fields. So from my point of view it 
> will make sense if we can make BKD works better for the low/medium 
> cardinality fields.
> As far as i can see, for low cardinality fields, there are two advantages of 
> {{keyword}} over {{{}long{}}}:
> 1. {{ForUtil}} used in {{keyword}} postings is much more efficient than BKD's 
> delta VInt, because its batch reading (readLongs) and SIMD decode.
> 2. When the query term count is less than 16, {{TermsInSetQuery}} can lazily 
> materialize of its result set, and when another small result clause 
> intersects with this low cardinality condition, the low cardinality field can 
> avoid reading all docIds into memory.
> This ISSUE is targeting to solve the first point. The basic idea is trying to 
> use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked this optimization 
> by mocking some random {{LongPoint}} and querying them with 
> {{PointInSetQuery}}.
> *Benchmark Result*
> |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff 
> percentage|
> |1|32|1|51.44|148.26|188.22%|
> |1|32|2|26.8|101.88|280.15%|
> |1|32|4|14.04|53.52|281.20%|
> |1|32|8|7.04|28.54|305.40%|
> |1|32|16|3.54|14.61|312.71%|
> |1|128|1|110.56|350.26|216.81%|
> |1|128|8|16.6|89.81|441.02%|
> |1|128|16|8.45|48.07|468.88%|
> |1|128|32|4.2|25.35|503.57%|
> |1|128|64|2.13|13.02|511.27%|
> |1|1024|1|536.19|843.88|57.38%|
> |1|1024|8|109.71|251.89|129.60%|
> |1|1024|32|33.24|104.11|213.21%|
> |1|1024|128|8.87|30.47|243.52%|
> |1|1024|512|2.24|8.3|270.54%|
> |1|8192|1|.33|5000|50.00%|
> |1|8192|32|139.47|214.59|53.86%|
> |1|8192|128|54.59|109.23|100.09%|
> |1|8192|512|15.61|36.15|131.58%|
> |1|8192|2048|4.11|11.14|171.05%|
> |1|1048576|1|2597.4|3030.3|16.67%|
> |1|1048576|32|314.96|371.75|18.03%|
> |1|1048576|128|99.7|116.28|16.63%|
> |1|1048576|512|30.5|37.15|21.80%|
> |1|1048576|2048|10.38|12.3|18.50%|
> |1|8388608|1|2564.1|3174.6|23.81%|
> |1|8388608|32|196.27|238.95|21.75%|
> |1|8388608|128|55.36|68.03|22.89%|
> |1|8388608|512|15.58|19.24|23.49%|
> |1|8388608|2048|4.56|5.71|25.22%|
> The indices size is reduced for low cardinality fields and flat for high 
> cardinality fields.
> {code:java}
> 113Mindex_1_doc_32_cardinality_baseline
> 114Mindex_1_doc_32_cardinality_candidate
> 140Mindex_1_doc_128_cardinality_baseline
> 133Mindex_1_doc_128_cardinality_candidate
> 193Mindex_1_doc_1024_cardinality_baseline
> 174Mindex_1_doc_1024_cardinality_candidate
> 241Mindex_1_doc_8192_cardinality_baseline
> 233Mindex_1_doc_8192_cardinality_candidate
> 314M

[jira] [Commented] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil

2022-04-06 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518473#comment-17518473
 ] 

Feng Guo commented on LUCENE-10315:
---

Here is the benchmark result I got on my machine by 
[https://github.com/iverase/benchmark_forutil].
{code:java}
Benchmark                                            Mode  Cnt   Score   Error  
 Units
ReadInts24Benchmark.readInts24ForUtil               thrpt   25   9.086 ± 0.089  
ops/us
ReadInts24Benchmark.readInts24ForUtilVisitor        thrpt   25   0.764 ± 0.005  
ops/us
ReadInts24Benchmark.readInts24Legacy                thrpt   25   2.877 ± 0.013  
ops/us
ReadInts24Benchmark.readInts24Visitor               thrpt   25   0.778 ± 0.006  
ops/us
ReadIntsAsLongBenchmark.readIntsLegacyLong1         thrpt   25   3.329 ± 0.023  
ops/us
ReadIntsAsLongBenchmark.readIntsLegacyLong2         thrpt   25   3.218 ± 0.037  
ops/us
ReadIntsAsLongBenchmark.readIntsLegacyLong3         thrpt   25   3.755 ± 0.017  
ops/us
ReadIntsAsLongBenchmark.readIntsLegacyLong4         thrpt   25   3.862 ± 0.025  
ops/us
ReadIntsAsLongBenchmark.readIntsLegacyLongVisitor1  thrpt   25   0.710 ± 0.008  
ops/us
ReadIntsAsLongBenchmark.readIntsLegacyLongVisitor2  thrpt   25   0.849 ± 0.013  
ops/us
ReadIntsAsLongBenchmark.readIntsLegacyLongVisitor3  thrpt   25   0.804 ± 0.006  
ops/us
ReadIntsAsLongBenchmark.readIntsLegacyLongVisitor4  thrpt   25   0.768 ± 0.007  
ops/us
ReadIntsBenchmark.readIntsForUtil                   thrpt   25  18.957 ± 0.194  
ops/us
ReadIntsBenchmark.readIntsForUtilVisitor            thrpt   25   0.817 ± 0.004  
ops/us
ReadIntsBenchmark.readIntsLegacy                    thrpt   25   2.456 ± 0.016  
ops/us
ReadIntsBenchmark.readIntsLegacyVisitor             thrpt   25   0.608 ± 0.007  
ops/us
{code}
In this result, I'm seeing {{readInts24ForUtil}} runs 3 times faster than 
{{{}readInts24Legacy{}}}. This speed is attractive to me. So i'm trying to find 
some ways to solve the regression when calling visitor. A way i'm thinking 
about is to introduce {{visit(int[] docs, int count)}} for {{IntersectVisitor.}}

 

The benefit of this method:

1. This method can help reduce the number of virtual function call.
2. {{BufferAdder}} can directly use {{System#arraycopy}} to append doc ids.
3. {{InverseIntersectVisitor}} can count cost faster.




Based on luceneutil, I reproduced the regression successfully on my local 
machine by nightly benchmark tasks and random seed = 10:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  IntNRQ   27.43  (1.8%)   24.12  
(1.1%)  -12.1% ( -14% -   -9%) 0.000
{code}
After the optimization, I can see the speed up with the same seed:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  IntNRQ   27.68  (1.7%)   31.89  
(2.0%)   15.2% (  11% -   19%) 0.000
{code}


I post the draft code here: [https://github.com/apache/lucene/pull/797].
This commit 
[https://github.com/apache/lucene/pull/797/commits/7fb6ac3f5901a29d87e9fa427ba429d1e1749b14]
 shows what was changed.

> Speed up BKD leaf block ids codec by a 512 ints ForUtil
> ---
>
> Key: LUCENE-10315
> URL: https://issues.apache.org/jira/browse/LUCENE-10315
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Feng Guo
>Assignee: Feng Guo
>Priority: Major
> Attachments: addall.svg, cpu_profile_baseline.html, 
> cpu_profile_path.html
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Elasticsearch (which based on lucene) can automatically infers types for 
> users with its dynamic mapping feature. When users index some low cardinality 
> fields, such as gender / age / status... they often use some numbers to 
> represent the values, while ES will infer these fields as {{{}long{}}}, and 
> ES uses BKD as the index of {{long}} fields. When the data volume grows, 
> building the result set of low-cardinality fields will make the CPU usage and 
> load very high.
> This is a flame graph we obtained from the production environment:
> [^addall.svg]
> It can be seen that almost all CPU is used in addAll. When we reindex 
> {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly 
> reduced ( We spent weeks of time to reindex all indices... ). I know that ES 
> recommended to use {{keyword}} for term/terms query and {{long}} for range 
> query in the document, but there are always some users who didn't realize 
> this and keep their habit of using sql database, or dynamic mapping 
> automatically selects the type for them. All in all, users won't realize that 
> there would be such a big

[jira] [Commented] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil

2022-04-06 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518272#comment-17518272
 ] 

Feng Guo commented on LUCENE-10315:
---

Thanks [~ivera], [~jpountz] for all effort and suggestions here! 

FYI, here is something interesting: I tried to change
{code:java}
@Benchmark
public void readInts24ForUtilVisitor(IntDecodeState state, Blackhole bh) {
decode24(state);
for (int i = 0; i < state.count; i++) {
bh.consume(state.outputInts[i]);
}
}
{code}
To
{code:java}
@Benchmark
public void readInts24ForUtilVisitorImproved(IntDecodeState state, 
Blackhole bh) {
decode24(state);
int[] ints = state.outputInts;
for (int i = 0; i < state.count; i++) {
bh.consume(ints[i]);
}
}
{code}
And here is the result:
{code:java}
Benchmark  Mode  Cnt  Score   Error 
  Units
ReadInts24Benchmark.readInts24ForUtilVisitor  thrpt   10  0.776 ± 0.012 
 ops/us
ReadInts24Benchmark.readInts24ForUtilVisitorImproved  thrpt   10  0.848 ± 0.012 
 ops/us
ReadInts24Benchmark.readInts24Visitor thrpt   10  0.786 ± 0.006 
 ops/us

$ java -version
openjdk version "17.0.2" 2022-01-18
OpenJDK Runtime Environment (build 17.0.2+8-86)
OpenJDK 64-Bit Server VM (build 17.0.2+8-86, mixed mode, sharing)
{code}

> Speed up BKD leaf block ids codec by a 512 ints ForUtil
> ---
>
> Key: LUCENE-10315
> URL: https://issues.apache.org/jira/browse/LUCENE-10315
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Feng Guo
>Assignee: Feng Guo
>Priority: Major
> Attachments: addall.svg, cpu_profile_baseline.html, 
> cpu_profile_path.html
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Elasticsearch (which based on lucene) can automatically infers types for 
> users with its dynamic mapping feature. When users index some low cardinality 
> fields, such as gender / age / status... they often use some numbers to 
> represent the values, while ES will infer these fields as {{{}long{}}}, and 
> ES uses BKD as the index of {{long}} fields. When the data volume grows, 
> building the result set of low-cardinality fields will make the CPU usage and 
> load very high.
> This is a flame graph we obtained from the production environment:
> [^addall.svg]
> It can be seen that almost all CPU is used in addAll. When we reindex 
> {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly 
> reduced ( We spent weeks of time to reindex all indices... ). I know that ES 
> recommended to use {{keyword}} for term/terms query and {{long}} for range 
> query in the document, but there are always some users who didn't realize 
> this and keep their habit of using sql database, or dynamic mapping 
> automatically selects the type for them. All in all, users won't realize that 
> there would be such a big difference in performance between {{long}} and 
> {{keyword}} fields in low cardinality fields. So from my point of view it 
> will make sense if we can make BKD works better for the low/medium 
> cardinality fields.
> As far as i can see, for low cardinality fields, there are two advantages of 
> {{keyword}} over {{{}long{}}}:
> 1. {{ForUtil}} used in {{keyword}} postings is much more efficient than BKD's 
> delta VInt, because its batch reading (readLongs) and SIMD decode.
> 2. When the query term count is less than 16, {{TermsInSetQuery}} can lazily 
> materialize of its result set, and when another small result clause 
> intersects with this low cardinality condition, the low cardinality field can 
> avoid reading all docIds into memory.
> This ISSUE is targeting to solve the first point. The basic idea is trying to 
> use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked this optimization 
> by mocking some random {{LongPoint}} and querying them with 
> {{PointInSetQuery}}.
> *Benchmark Result*
> |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff 
> percentage|
> |1|32|1|51.44|148.26|188.22%|
> |1|32|2|26.8|101.88|280.15%|
> |1|32|4|14.04|53.52|281.20%|
> |1|32|8|7.04|28.54|305.40%|
> |1|32|16|3.54|14.61|312.71%|
> |1|128|1|110.56|350.26|216.81%|
> |1|128|8|16.6|89.81|441.02%|
> |1|128|16|8.45|48.07|468.88%|
> |1|128|32|4.2|25.35|503.57%|
> |1|128|64|2.13|13.02|511.27%|
> |1|1024|1|536.19|843.88|57.38%|
> |1|1024|8|109.71|251.89|129.60%|
> |1|1024|32|33.24|104.11|213.21%|
> |1|1024|128|8.87|30.47|243.52%|
> |1|1024|512|2.24|8.3|270.54%|
> |1|8192|1|.33|5000|50.00%|
> |1|8192|32|139.47|214.59|53.86%|
> |1|8192|128|54.59|109.23|100.09%|
>

[jira] [Assigned] (LUCENE-10417) IntNRQ task performance decreased in nightly benchmark

2022-02-10 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo reassigned LUCENE-10417:
-

Assignee: Feng Guo

> IntNRQ task performance decreased in nightly benchmark
> --
>
> Key: LUCENE-10417
> URL: https://issues.apache.org/jira/browse/LUCENE-10417
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/codecs
>Reporter: Feng Guo
>Assignee: Feng Guo
>Priority: Major
>
> Link: https://home.apache.org/~mikemccand/lucenebench/2022.02.07.18.02.48.html
> Probably related to LUCENE-10315,  I'll dig.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10417) IntNRQ task performance decreased in nightly benchmark

2022-02-10 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10417:
--
Description: 
Link: https://home.apache.org/~mikemccand/lucenebench/2022.02.07.18.02.48.html

Probably related to LUCENE-10315,  I'll dig.

  was:
Link: https://home.apache.org/~mikemccand/lucenebench/2022.02.07.18.02.48.html

Probably related to LUCENE-LUCENE-10315,  I'll dig.


> IntNRQ task performance decreased in nightly benchmark
> --
>
> Key: LUCENE-10417
> URL: https://issues.apache.org/jira/browse/LUCENE-10417
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>
> Link: https://home.apache.org/~mikemccand/lucenebench/2022.02.07.18.02.48.html
> Probably related to LUCENE-10315,  I'll dig.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10417) IntNRQ task performance decreased in nightly benchmark

2022-02-10 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10417:
--
Description: 
Link: https://home.apache.org/~mikemccand/lucenebench/2022.02.07.18.02.48.html

Probably related to LUCENE-LUCENE-10315,  I'll dig.

  was:Probably related to LUCENE-LUCENE-10315,  I'll dig.


> IntNRQ task performance decreased in nightly benchmark
> --
>
> Key: LUCENE-10417
> URL: https://issues.apache.org/jira/browse/LUCENE-10417
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>
> Link: https://home.apache.org/~mikemccand/lucenebench/2022.02.07.18.02.48.html
> Probably related to LUCENE-LUCENE-10315,  I'll dig.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10417) IntNRQ task performance decreased in nightly benchmark

2022-02-10 Thread Feng Guo (Jira)

Feng Guo created LUCENE-10417:
-

 Summary: IntNRQ task performance decreased in nightly benchmark
 Key: LUCENE-10417
 URL: https://issues.apache.org/jira/browse/LUCENE-10417
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/codecs
Reporter: Feng Guo


Probably related to LUCENE-LUCENE-10315,  I'll dig.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10409) Improve BKDWriter's DocIdsWriter to better encode decreasing sequences of doc IDs

2022-02-09 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489439#comment-17489439
 ] 

Feng Guo commented on LUCENE-10409:
---

Hi [~jpountz]  ! I'm seeing this issue is marked as a {{{}Task{}}}. I'm not 
exactly sure what does this mean. Does it mean that someone can work on it if 
he is interested in this issue? Feel free to ignore me if you already plan to 
work on this. I just want to say that i'd like to take on this if you don't 
have the time :)

> Improve BKDWriter's DocIdsWriter to better encode decreasing sequences of doc 
> IDs
> -
>
> Key: LUCENE-10409
> URL: https://issues.apache.org/jira/browse/LUCENE-10409
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>
> [~gf2121] recently improved DocIdsWriter for the case when doc IDs are dense 
> and come in the same order as values via the CONTINUOUS_IDS and BITSET_IDS 
> encodings.
> We could do the same for the case when doc IDs come in the opposite order to 
> values. This would be used whenever searching on a field that is used for 
> index sorting in the descending order. This would be a frequent case for 
> Elasticsearch users as we're planning on using index sorting more and more on 
> time-based data with a descending sort on the timestamp as the last sort 
> field.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10410) Add some more tests for legacy encoding logic in DocIdsWriter

2022-02-08 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo resolved LUCENE-10410.
---
Fix Version/s: 9.1
   Resolution: Fixed

> Add some more tests for legacy encoding logic in DocIdsWriter
> -
>
> Key: LUCENE-10410
> URL: https://issues.apache.org/jira/browse/LUCENE-10410
> Project: Lucene - Core
>  Issue Type: Test
>  Components: core/codecs
>Reporter: Feng Guo
>Assignee: Feng Guo
>Priority: Trivial
> Fix For: 9.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is a follow-up 0f LUCENE-10315, add some more tests for legacy encoding 
> logic in DocIdsWriter.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10409) Improve BKDWriter's DocIdsWriter to better encode decreasing sequences of doc IDs

2022-02-08 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488607#comment-17488607
 ] 

Feng Guo edited comment on LUCENE-10409 at 2/8/22, 8:15 AM:


+1, Great idea!


was (Author: gf2121):
+1, Great idea! I'd like to take on this if you agree.

> Improve BKDWriter's DocIdsWriter to better encode decreasing sequences of doc 
> IDs
> -
>
> Key: LUCENE-10409
> URL: https://issues.apache.org/jira/browse/LUCENE-10409
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>
> [~gf2121] recently improved DocIdsWriter for the case when doc IDs are dense 
> and come in the same order as values via the CONTINUOUS_IDS and BITSET_IDS 
> encodings.
> We could do the same for the case when doc IDs come in the opposite order to 
> values. This would be used whenever searching on a field that is used for 
> index sorting in the descending order. This would be a frequent case for 
> Elasticsearch users as we're planning on using index sorting more and more on 
> time-based data with a descending sort on the timestamp as the last sort 
> field.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-10410) Add some more tests for legacy encoding logic in DocIdsWriter

2022-02-07 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo reassigned LUCENE-10410:
-

Assignee: Feng Guo

> Add some more tests for legacy encoding logic in DocIdsWriter
> -
>
> Key: LUCENE-10410
> URL: https://issues.apache.org/jira/browse/LUCENE-10410
> Project: Lucene - Core
>  Issue Type: Test
>  Components: core/codecs
>Reporter: Feng Guo
>Assignee: Feng Guo
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is a follow-up 0f LUCENE-10315, add some more tests for legacy encoding 
> logic in DocIdsWriter.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10409) Improve BKDWriter's DocIdsWriter to better encode decreasing sequences of doc IDs

2022-02-07 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488607#comment-17488607
 ] 

Feng Guo commented on LUCENE-10409:
---

+1, Great idea! I'd like to take on this if you agree.

> Improve BKDWriter's DocIdsWriter to better encode decreasing sequences of doc 
> IDs
> -
>
> Key: LUCENE-10409
> URL: https://issues.apache.org/jira/browse/LUCENE-10409
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>
> [~gf2121] recently improved DocIdsWriter for the case when doc IDs are dense 
> and come in the same order as values via the CONTINUOUS_IDS and BITSET_IDS 
> encodings.
> We could do the same for the case when doc IDs come in the opposite order to 
> values. This would be used whenever searching on a field that is used for 
> index sorting in the descending order. This would be a frequent case for 
> Elasticsearch users as we're planning on using index sorting more and more on 
> time-based data with a descending sort on the timestamp as the last sort 
> field.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10410) Add some more tests for legacy encoding logic in DocIdsWriter

2022-02-07 Thread Feng Guo (Jira)

Feng Guo created LUCENE-10410:
-

 Summary: Add some more tests for legacy encoding logic in 
DocIdsWriter
 Key: LUCENE-10410
 URL: https://issues.apache.org/jira/browse/LUCENE-10410
 Project: Lucene - Core
  Issue Type: Test
  Components: core/codecs
Reporter: Feng Guo


This is a follow-up 0f LUCENE-10315, add some more tests for legacy encoding 
logic in DocIdsWriter.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil

2022-02-07 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo resolved LUCENE-10315.
---
Fix Version/s: 9.1
   Resolution: Fixed

> Speed up BKD leaf block ids codec by a 512 ints ForUtil
> ---
>
> Key: LUCENE-10315
> URL: https://issues.apache.org/jira/browse/LUCENE-10315
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Feng Guo
>Assignee: Feng Guo
>Priority: Major
> Fix For: 9.1
>
> Attachments: addall.svg
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Elasticsearch (which based on lucene) can automatically infers types for 
> users with its dynamic mapping feature. When users index some low cardinality 
> fields, such as gender / age / status... they often use some numbers to 
> represent the values, while ES will infer these fields as {{{}long{}}}, and 
> ES uses BKD as the index of {{long}} fields. When the data volume grows, 
> building the result set of low-cardinality fields will make the CPU usage and 
> load very high.
> This is a flame graph we obtained from the production environment:
> [^addall.svg]
> It can be seen that almost all CPU is used in addAll. When we reindex 
> {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly 
> reduced ( We spent weeks of time to reindex all indices... ). I know that ES 
> recommended to use {{keyword}} for term/terms query and {{long}} for range 
> query in the document, but there are always some users who didn't realize 
> this and keep their habit of using sql database, or dynamic mapping 
> automatically selects the type for them. All in all, users won't realize that 
> there would be such a big difference in performance between {{long}} and 
> {{keyword}} fields in low cardinality fields. So from my point of view it 
> will make sense if we can make BKD works better for the low/medium 
> cardinality fields.
> As far as i can see, for low cardinality fields, there are two advantages of 
> {{keyword}} over {{{}long{}}}:
> 1. {{ForUtil}} used in {{keyword}} postings is much more efficient than BKD's 
> delta VInt, because its batch reading (readLongs) and SIMD decode.
> 2. When the query term count is less than 16, {{TermsInSetQuery}} can lazily 
> materialize of its result set, and when another small result clause 
> intersects with this low cardinality condition, the low cardinality field can 
> avoid reading all docIds into memory.
> This ISSUE is targeting to solve the first point. The basic idea is trying to 
> use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked this optimization 
> by mocking some random {{LongPoint}} and querying them with 
> {{PointInSetQuery}}.
> *Benchmark Result*
> |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff 
> percentage|
> |1|32|1|51.44|148.26|188.22%|
> |1|32|2|26.8|101.88|280.15%|
> |1|32|4|14.04|53.52|281.20%|
> |1|32|8|7.04|28.54|305.40%|
> |1|32|16|3.54|14.61|312.71%|
> |1|128|1|110.56|350.26|216.81%|
> |1|128|8|16.6|89.81|441.02%|
> |1|128|16|8.45|48.07|468.88%|
> |1|128|32|4.2|25.35|503.57%|
> |1|128|64|2.13|13.02|511.27%|
> |1|1024|1|536.19|843.88|57.38%|
> |1|1024|8|109.71|251.89|129.60%|
> |1|1024|32|33.24|104.11|213.21%|
> |1|1024|128|8.87|30.47|243.52%|
> |1|1024|512|2.24|8.3|270.54%|
> |1|8192|1|.33|5000|50.00%|
> |1|8192|32|139.47|214.59|53.86%|
> |1|8192|128|54.59|109.23|100.09%|
> |1|8192|512|15.61|36.15|131.58%|
> |1|8192|2048|4.11|11.14|171.05%|
> |1|1048576|1|2597.4|3030.3|16.67%|
> |1|1048576|32|314.96|371.75|18.03%|
> |1|1048576|128|99.7|116.28|16.63%|
> |1|1048576|512|30.5|37.15|21.80%|
> |1|1048576|2048|10.38|12.3|18.50%|
> |1|8388608|1|2564.1|3174.6|23.81%|
> |1|8388608|32|196.27|238.95|21.75%|
> |1|8388608|128|55.36|68.03|22.89%|
> |1|8388608|512|15.58|19.24|23.49%|
> |1|8388608|2048|4.56|5.71|25.22%|
> The indices size is reduced for low cardinality fields and flat for high 
> cardinality fields.
> {code:java}
> 113Mindex_1_doc_32_cardinality_baseline
> 114Mindex_1_doc_32_cardinality_candidate
> 140Mindex_1_doc_128_cardinality_baseline
> 133Mindex_1_doc_128_cardinality_candidate
> 193Mindex_1_doc_1024_cardinality_baseline
> 174Mindex_1_doc_1024_cardinality_candidate
> 241Mindex_1_doc_8192_cardinality_baseline
> 233Mindex_1_doc_8192_cardinality_candidate
> 314Mindex_1_doc_1048576_cardinality_baseline
> 315Mindex_1_doc_1048576_cardinality_candidate
> 392M

[jira] [Assigned] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil

2022-02-07 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo reassigned LUCENE-10315:
-

Assignee: Feng Guo

> Speed up BKD leaf block ids codec by a 512 ints ForUtil
> ---
>
> Key: LUCENE-10315
> URL: https://issues.apache.org/jira/browse/LUCENE-10315
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Feng Guo
>Assignee: Feng Guo
>Priority: Major
> Attachments: addall.svg
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Elasticsearch (which based on lucene) can automatically infers types for 
> users with its dynamic mapping feature. When users index some low cardinality 
> fields, such as gender / age / status... they often use some numbers to 
> represent the values, while ES will infer these fields as {{{}long{}}}, and 
> ES uses BKD as the index of {{long}} fields. When the data volume grows, 
> building the result set of low-cardinality fields will make the CPU usage and 
> load very high.
> This is a flame graph we obtained from the production environment:
> [^addall.svg]
> It can be seen that almost all CPU is used in addAll. When we reindex 
> {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly 
> reduced ( We spent weeks of time to reindex all indices... ). I know that ES 
> recommended to use {{keyword}} for term/terms query and {{long}} for range 
> query in the document, but there are always some users who didn't realize 
> this and keep their habit of using sql database, or dynamic mapping 
> automatically selects the type for them. All in all, users won't realize that 
> there would be such a big difference in performance between {{long}} and 
> {{keyword}} fields in low cardinality fields. So from my point of view it 
> will make sense if we can make BKD works better for the low/medium 
> cardinality fields.
> As far as i can see, for low cardinality fields, there are two advantages of 
> {{keyword}} over {{{}long{}}}:
> 1. {{ForUtil}} used in {{keyword}} postings is much more efficient than BKD's 
> delta VInt, because its batch reading (readLongs) and SIMD decode.
> 2. When the query term count is less than 16, {{TermsInSetQuery}} can lazily 
> materialize of its result set, and when another small result clause 
> intersects with this low cardinality condition, the low cardinality field can 
> avoid reading all docIds into memory.
> This ISSUE is targeting to solve the first point. The basic idea is trying to 
> use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked this optimization 
> by mocking some random {{LongPoint}} and querying them with 
> {{PointInSetQuery}}.
> *Benchmark Result*
> |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff 
> percentage|
> |1|32|1|51.44|148.26|188.22%|
> |1|32|2|26.8|101.88|280.15%|
> |1|32|4|14.04|53.52|281.20%|
> |1|32|8|7.04|28.54|305.40%|
> |1|32|16|3.54|14.61|312.71%|
> |1|128|1|110.56|350.26|216.81%|
> |1|128|8|16.6|89.81|441.02%|
> |1|128|16|8.45|48.07|468.88%|
> |1|128|32|4.2|25.35|503.57%|
> |1|128|64|2.13|13.02|511.27%|
> |1|1024|1|536.19|843.88|57.38%|
> |1|1024|8|109.71|251.89|129.60%|
> |1|1024|32|33.24|104.11|213.21%|
> |1|1024|128|8.87|30.47|243.52%|
> |1|1024|512|2.24|8.3|270.54%|
> |1|8192|1|.33|5000|50.00%|
> |1|8192|32|139.47|214.59|53.86%|
> |1|8192|128|54.59|109.23|100.09%|
> |1|8192|512|15.61|36.15|131.58%|
> |1|8192|2048|4.11|11.14|171.05%|
> |1|1048576|1|2597.4|3030.3|16.67%|
> |1|1048576|32|314.96|371.75|18.03%|
> |1|1048576|128|99.7|116.28|16.63%|
> |1|1048576|512|30.5|37.15|21.80%|
> |1|1048576|2048|10.38|12.3|18.50%|
> |1|8388608|1|2564.1|3174.6|23.81%|
> |1|8388608|32|196.27|238.95|21.75%|
> |1|8388608|128|55.36|68.03|22.89%|
> |1|8388608|512|15.58|19.24|23.49%|
> |1|8388608|2048|4.56|5.71|25.22%|
> The indices size is reduced for low cardinality fields and flat for high 
> cardinality fields.
> {code:java}
> 113Mindex_1_doc_32_cardinality_baseline
> 114Mindex_1_doc_32_cardinality_candidate
> 140Mindex_1_doc_128_cardinality_baseline
> 133Mindex_1_doc_128_cardinality_candidate
> 193Mindex_1_doc_1024_cardinality_baseline
> 174Mindex_1_doc_1024_cardinality_candidate
> 241Mindex_1_doc_8192_cardinality_baseline
> 233Mindex_1_doc_8192_cardinality_candidate
> 314Mindex_1_doc_1048576_cardinality_baseline
> 315Mindex_1_doc_1048576_cardinality_candidate
> 392Mindex_1_doc_8388608_cardinality_baseline
> 391M

[jira] [Updated] (LUCENE-10388) Remove MultiLevelSkipListReader#SkipBuffer

2022-01-26 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10388:
--
Fix Version/s: 9.1
Affects Version/s: 9.1

> Remove MultiLevelSkipListReader#SkipBuffer
> --
>
> Key: LUCENE-10388
> URL: https://issues.apache.org/jira/browse/LUCENE-10388
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 9.1
>Reporter: Feng Guo
>Assignee: Feng Guo
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Previous talk can be found in [https://github.com/apache/lucene/pull/592] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10388) Remove MultiLevelSkipListReader#SkipBuffer

2022-01-26 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo resolved LUCENE-10388.
---
Resolution: Fixed

> Remove MultiLevelSkipListReader#SkipBuffer
> --
>
> Key: LUCENE-10388
> URL: https://issues.apache.org/jira/browse/LUCENE-10388
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 9.1
>Reporter: Feng Guo
>Assignee: Feng Guo
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Previous talk can be found in [https://github.com/apache/lucene/pull/592] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-10388) Remove MultiLevelSkipListReader#SkipBuffer

2022-01-26 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo reassigned LUCENE-10388:
-

Assignee: Feng Guo

> Remove MultiLevelSkipListReader#SkipBuffer
> --
>
> Key: LUCENE-10388
> URL: https://issues.apache.org/jira/browse/LUCENE-10388
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Assignee: Feng Guo
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Previous talk can be found in [https://github.com/apache/lucene/pull/592] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-10387) Clean unused lastPayloadByteUpto in Lucene90SkipWriter

2022-01-25 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo reassigned LUCENE-10387:
-

Fix Version/s: 10.0 (main)
Affects Version/s: 10.0 (main)
 Assignee: Feng Guo

> Clean unused lastPayloadByteUpto in Lucene90SkipWriter
> --
>
> Key: LUCENE-10387
> URL: https://issues.apache.org/jira/browse/LUCENE-10387
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 10.0 (main)
>Reporter: Feng Guo
>Assignee: Feng Guo
>Priority: Trivial
> Fix For: 10.0 (main)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10387) Clean unused lastPayloadByteUpto in Lucene90SkipWriter

2022-01-25 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo resolved LUCENE-10387.
---
Resolution: Fixed

> Clean unused lastPayloadByteUpto in Lucene90SkipWriter
> --
>
> Key: LUCENE-10387
> URL: https://issues.apache.org/jira/browse/LUCENE-10387
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 10.0 (main)
>Reporter: Feng Guo
>Assignee: Feng Guo
>Priority: Trivial
> Fix For: 10.0 (main)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10388) Remove MultiLevelSkipListReader#SkipBuffer

2022-01-23 Thread Feng Guo (Jira)

Feng Guo created LUCENE-10388:
-

 Summary: Remove MultiLevelSkipListReader#SkipBuffer
 Key: LUCENE-10388
 URL: https://issues.apache.org/jira/browse/LUCENE-10388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Feng Guo


Previous talk can be found in [https://github.com/apache/lucene/pull/592] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10387) Clean unused lastPayloadByteUpto in Lucene90SkipWriter

2022-01-23 Thread Feng Guo (Jira)

Feng Guo created LUCENE-10387:
-

 Summary: Clean unused lastPayloadByteUpto in Lucene90SkipWriter
 Key: LUCENE-10387
 URL: https://issues.apache.org/jira/browse/LUCENE-10387
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Feng Guo






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10376) Roll up the loop in vint/vlong in DataInput

2022-01-13 Thread Feng Guo (Jira)

Feng Guo created LUCENE-10376:
-

 Summary: Roll up the loop in vint/vlong in DataInput
 Key: LUCENE-10376
 URL: https://issues.apache.org/jira/browse/LUCENE-10376
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Reporter: Feng Guo


This issue proposes to roll up the loop in {{{}DataInput#readVInt and 
{{DataInput#readVLong{}}}{}}}.

Previous talk can be found here: [https://github.com/apache/lucene/pull/592.]

Benchmark:
{code:java}
   TaskQPS baseline  StdDevQPS my_modified_version  
StdDevPct diff p-value
   BrowseMonthTaxoFacets5.17 (15.9%)5.00 
(12.1%)   -3.4% ( -27% -   29%) 0.446
OrNotHighLow 1010.74  (4.0%)  978.71  
(4.6%)   -3.2% ( -11% -5%) 0.021
  HighPhrase  171.95  (3.6%)  166.92  
(4.6%)   -2.9% ( -10% -5%) 0.025
  AndHighLow  594.12  (4.2%)  577.24  
(5.4%)   -2.8% ( -11% -7%) 0.064
   OrHighLow  540.46  (4.1%)  526.17  
(5.4%)   -2.6% ( -11% -7%) 0.083
  OrHighMedDayTaxoFacets6.01  (5.3%)5.88  
(3.9%)   -2.2% ( -10% -7%) 0.136
 AndHighMedDayTaxoFacets   14.78  (2.6%)   14.51  
(2.1%)   -1.8% (  -6% -2%) 0.013
   MedPhrase  142.26  (2.9%)  139.67  
(3.1%)   -1.8% (  -7% -4%) 0.058
   LowPhrase   21.22  (2.8%)   20.85  
(3.1%)   -1.8% (  -7% -4%) 0.061
AndHighHighDayTaxoFacets4.31  (4.5%)4.24  
(3.2%)   -1.7% (  -8% -6%) 0.158
   BrowseDayOfYearTaxoFacets4.70 (17.3%)4.63 
(12.9%)   -1.3% ( -26% -   34%) 0.787
BrowseDateTaxoFacets4.65 (16.9%)4.59 
(12.9%)   -1.2% ( -26% -   34%) 0.803
 MedSloppyPhrase   34.40  (2.9%)   34.02  
(4.0%)   -1.1% (  -7% -5%) 0.318
MedTermDayTaxoFacets   13.85  (6.7%)   13.70  
(4.5%)   -1.0% ( -11% -   10%) 0.563
 BrowseRandomLabelTaxoFacets4.16 (12.7%)4.11  
(9.7%)   -1.0% ( -20% -   24%) 0.772
 LowSloppyPhrase5.77  (2.2%)5.72  
(3.3%)   -0.9% (  -6% -4%) 0.307
 LowSpanNear   53.67  (3.6%)   53.22  
(3.9%)   -0.8% (  -8% -6%) 0.481
HighSpanNear2.66  (4.8%)2.63  
(5.4%)   -0.8% ( -10% -9%) 0.616
 MedIntervalsOrdered   25.88  (9.4%)   25.68  
(9.5%)   -0.8% ( -17% -   20%) 0.797
   OrHighNotHigh 1043.34  (3.7%) 1037.43  
(4.4%)   -0.6% (  -8% -7%) 0.658
HighSloppyPhrase1.47  (3.4%)1.46  
(4.2%)   -0.6% (  -7% -7%) 0.645
 MedSpanNear   11.52  (3.5%)   11.46  
(4.3%)   -0.5% (  -7% -7%) 0.685
   OrNotHighHigh 1615.92  (3.4%) 1608.09  
(3.6%)   -0.5% (  -7% -6%) 0.663
 BrowseRandomLabelSSDVFacets3.11  (6.0%)3.10  
(4.4%)   -0.2% ( -10% -   10%) 0.881
 LowIntervalsOrdered4.06  (8.9%)4.06  
(8.9%)   -0.2% ( -16% -   19%) 0.957
OrHighNotMed 1188.76  (3.8%) 1187.46  
(4.4%)   -0.1% (  -7% -8%) 0.933
OrNotHighMed 1220.26  (3.1%) 1219.23  
(3.7%)   -0.1% (  -6% -6%) 0.938
  AndHighMed  115.92  (3.6%)  116.03  
(3.3%)0.1% (  -6% -7%) 0.928
  Fuzzy1  111.98  (3.2%)  112.15  
(3.5%)0.1% (  -6% -7%) 0.889
HighIntervalsOrdered5.14  (7.5%)5.15  
(7.3%)0.2% ( -13% -   16%) 0.937
OrHighNotLow 1222.80  (4.1%) 1226.76  
(4.7%)0.3% (  -8% -9%) 0.817
  TermDTSort   51.02 (14.1%)   51.21 
(18.9%)0.4% ( -28% -   38%) 0.944
HighTerm 1570.53  (3.7%) 1578.45  
(4.4%)0.5% (  -7% -8%) 0.693
   BrowseDayOfYearSSDVFacets4.26  (3.9%)4.28  
(9.1%)0.5% ( -12% -   14%) 0.811
 AndHighHigh   40.61  (4.1%)   40.83  
(4.1%)0.5% (  -7% -9%) 0.681
 MedTerm 2002.17  (3.6%) 2013.12  
(4.3%)0.5% (  -7% -8%) 0.659
 Respell   67.74  (3.8%)   68.14  
(3.3%)0.6% (  -6% -8%) 0.594
 LowTerm 1633.26  (2.8%) 1643.86  
(2.6%)0.6% (  -4% -6%) 0.444
   OrHighMed

[jira] [Updated] (LUCENE-10372) Performance of TaxoFacets in Nightly benchmark decreased

2022-01-10 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10372:
--
Description: 
link: https://home.apache.org/~mikemccand/lucenebench/2022.01.10.18.03.12.html

{code:java}
BrowseDayOfYearTaxoFacets   7.6 (12.1%) 6.3 (26.7%) 0.8 X   0.010
BrowseDateTaxoFacets7.6 (11.9%) 6.3 (26.4%) 0.8 X   0.010
BrowseRandomLabelTaxoFacets 6.4 (7.5%)  5.7 (16.2%) 0.9 X   0.004

BrowseMonthTaxoFacets   6.6 (2.6%)  8.2 (87.0%) 1.2 X   0.218
{code}

I'm not sure why but it should be related to 
https://issues.apache.org/jira/browse/LUCENE-10350, I'll raise a PR revert it.


  was:
link: https://home.apache.org/~mikemccand/lucenebench/2022.01.10.18.03.12.html

{code:java}
BrowseDayOfYearTaxoFacets   7.6 (12.1%) 6.3 (26.7%) 0.8 X   0.010
BrowseDateTaxoFacets7.6 (11.9%) 6.3 (26.4%) 0.8 X   0.010
BrowseRandomLabelTaxoFacets 6.4 (7.5%)  5.7 (16.2%) 0.9 X   0.004

BrowseMonthTaxoFacets   6.6 (2.6%)  8.2 (87.0%) 1.2 X   0.218
{code}

I'm not sure why but it should be related to 
https://issues.apache.org/jira/browse/LUCENE-10350, I'll raise a PR revert it.



> Performance of TaxoFacets in Nightly benchmark decreased
> 
>
> Key: LUCENE-10372
> URL: https://issues.apache.org/jira/browse/LUCENE-10372
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Feng Guo
>Priority: Major
>
> link: https://home.apache.org/~mikemccand/lucenebench/2022.01.10.18.03.12.html
> {code:java}
> BrowseDayOfYearTaxoFacets 7.6 (12.1%) 6.3 (26.7%) 0.8 X   0.010
> BrowseDateTaxoFacets  7.6 (11.9%) 6.3 (26.4%) 0.8 X   0.010
> BrowseRandomLabelTaxoFacets   6.4 (7.5%)  5.7 (16.2%) 0.9 X   0.004
> BrowseMonthTaxoFacets 6.6 (2.6%)  8.2 (87.0%) 1.2 X   0.218
> {code}
> I'm not sure why but it should be related to 
> https://issues.apache.org/jira/browse/LUCENE-10350, I'll raise a PR revert it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10372) Performance of TaxoFacets in Nightly benchmark decreased

2022-01-10 Thread Feng Guo (Jira)

Feng Guo created LUCENE-10372:
-

 Summary: Performance of TaxoFacets in Nightly benchmark decreased
 Key: LUCENE-10372
 URL: https://issues.apache.org/jira/browse/LUCENE-10372
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Feng Guo


link: https://home.apache.org/~mikemccand/lucenebench/2022.01.10.18.03.12.html

{code:java}
BrowseDayOfYearTaxoFacets   7.6 (12.1%) 6.3 (26.7%) 0.8 X   0.010
BrowseDateTaxoFacets7.6 (11.9%) 6.3 (26.4%) 0.8 X   0.010
BrowseRandomLabelTaxoFacets 6.4 (7.5%)  5.7 (16.2%) 0.9 X   0.004

BrowseMonthTaxoFacets   6.6 (2.6%)  8.2 (87.0%) 1.2 X   0.218
{code}

I'm not sure why but it should be related to 
https://issues.apache.org/jira/browse/LUCENE-10350, I'll raise a PR revert it.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10366) Reduce the number of valid checks for ByteBufferIndexInput#readVInt

2022-01-09 Thread Feng Guo (Jira)

Feng Guo created LUCENE-10366:
-

 Summary: Reduce the number of valid checks for 
ByteBufferIndexInput#readVInt
 Key: LUCENE-10366
 URL: https://issues.apache.org/jira/browse/LUCENE-10366
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Feng Guo


Today, we do not rewrite {{#readVInt}} and {{#readVLong}} for 
{{ByteBufferIndexInput}}. By default, the logic will call {{#readByte}} several 
times, and we need to check whether ByteBuffer is valid every time. This may 
not be necessary as we just need a final check.

{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
   BrowseDayOfYearSSDVFacets   16.74 (17.3%)   15.91 
(12.3%)   -5.0% ( -29% -   29%) 0.295
MedTermDayTaxoFacets   27.01  (6.9%)   26.56  
(5.9%)   -1.7% ( -13% -   11%) 0.402
Wildcard  111.55  (8.1%)  109.67  
(7.6%)   -1.7% ( -16% -   15%) 0.499
 Respell   58.06  (2.6%)   57.20  
(2.6%)   -1.5% (  -6% -3%) 0.074
  OrHighMedDayTaxoFacets8.91  (4.7%)8.81  
(7.2%)   -1.1% ( -12% -   11%) 0.557
  Fuzzy1  117.17  (3.8%)  116.14  
(3.3%)   -0.9% (  -7% -6%) 0.437
  Fuzzy2  103.70  (3.2%)  102.82  
(4.3%)   -0.9% (  -8% -6%) 0.472
HighIntervalsOrdered   10.11  (7.9%)   10.05  
(7.4%)   -0.6% ( -14% -   15%) 0.797
   HighTermDayOfYearSort  183.18  (8.8%)  182.92 
(10.8%)   -0.1% ( -18% -   21%) 0.964
AndHighHighDayTaxoFacets   11.44  (3.8%)   11.43  
(3.1%)   -0.1% (  -6% -7%) 0.936
 Prefix3  161.90 (13.5%)  161.80 
(13.3%)   -0.1% ( -23% -   30%) 0.989
HighSpanNear   11.43  (4.8%)   11.45  
(4.2%)0.1% (  -8% -9%) 0.928
PKLookup  220.15  (3.3%)  220.69  
(6.2%)0.2% (  -8% -   10%) 0.874
 MedSpanNear   92.60  (4.0%)   93.11  
(3.7%)0.5% (  -6% -8%) 0.656
  TermDTSort  143.26  (9.0%)  144.14 
(10.9%)0.6% ( -17% -   22%) 0.847
 MedIntervalsOrdered   63.74  (6.6%)   64.21  
(6.1%)0.8% ( -11% -   14%) 0.707
HighTermTitleBDVSort   99.61  (9.1%)  100.49 
(12.4%)0.9% ( -18% -   24%) 0.796
 LowSpanNear  126.43  (3.6%)  127.61  
(3.2%)0.9% (  -5% -8%) 0.383
 LowIntervalsOrdered   12.45  (5.4%)   12.58  
(5.2%)1.0% (  -9% -   12%) 0.535
 LowTerm 1767.08  (3.7%) 1788.83  
(3.1%)1.2% (  -5% -8%) 0.257
HighSloppyPhrase   11.45  (7.0%)   11.61  
(7.1%)1.5% ( -11% -   16%) 0.515
 AndHighMedDayTaxoFacets   69.41  (3.7%)   70.46  
(2.8%)1.5% (  -4% -8%) 0.147
 BrowseRandomLabelSSDVFacets   10.85  (6.1%)   11.04  
(5.1%)1.7% (  -9% -   13%) 0.342
 MedTerm 2083.04  (5.3%) 2119.48  
(5.7%)1.7% (  -8% -   13%) 0.316
 LowSloppyPhrase  148.79  (3.6%)  151.76  
(3.2%)2.0% (  -4% -9%) 0.062
  HighPhrase   98.67  (3.4%)  100.80  
(3.5%)2.2% (  -4% -9%) 0.048
OrHighNotLow 1371.31  (7.1%) 1400.91  
(7.9%)2.2% ( -12% -   18%) 0.365
   BrowseMonthTaxoFacets   16.65 (11.6%)   17.03 
(13.1%)2.2% ( -20% -   30%) 0.565
   OrHighNotHigh 1267.37  (6.8%) 1297.42  
(8.9%)2.4% ( -12% -   19%) 0.344
 MedSloppyPhrase   39.35  (3.6%)   40.42  
(4.2%)2.7% (  -4% -   10%) 0.028
   OrNotHighHigh 1190.01  (6.6%) 1224.72  
(7.6%)2.9% ( -10% -   18%) 0.194
  OrHighHigh   37.72  (4.3%)   39.00  
(3.4%)3.4% (  -4% -   11%) 0.005
 AndHighHigh   92.46  (4.5%)   95.76  
(4.9%)3.6% (  -5% -   13%) 0.017
OrHighNotMed 1231.31  (6.3%) 1275.65  
(7.9%)3.6% (  -9% -   18%) 0.109
   OrHighMed  174.32  (3.8%)  181.43  
(2.9%)4.1% (  -2% -   11%) 0.000
  AndHighLow 2761.91 (10.7%) 2885.28 
(10.1%)4.5% ( -14% -   28%) 0.175
   MedPhrase  214.87  (4.9%)  224.55  
(4.8%)4.5% (  -4% -   14%) 0.003
   LowPhrase  333.03

[jira] [Updated] (LUCENE-10355) Remove EMPTY LongValues in favor of LongValues#ZERO

2022-01-04 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10355:
--
Description: Remove EMPTY LongValues in favor of LongValues#ZEROS

> Remove EMPTY LongValues in favor of LongValues#ZERO
> ---
>
> Key: LUCENE-10355
> URL: https://issues.apache.org/jira/browse/LUCENE-10355
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Trivial
>
> Remove EMPTY LongValues in favor of LongValues#ZEROS



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10355) Remove EMPTY LongValues in favor of LongValues#ZERO

2022-01-04 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10355:
--
Component/s: core/codecs

> Remove EMPTY LongValues in favor of LongValues#ZERO
> ---
>
> Key: LUCENE-10355
> URL: https://issues.apache.org/jira/browse/LUCENE-10355
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10355) Remove EMPTY LongValues in favor of LongValues#ZERO

2022-01-04 Thread Feng Guo (Jira)

Feng Guo created LUCENE-10355:
-

 Summary: Remove EMPTY LongValues in favor of LongValues#ZERO
 Key: LUCENE-10355
 URL: https://issues.apache.org/jira/browse/LUCENE-10355
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Feng Guo






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10350) Avoid some null checking for FastTaxonomyFacetCounts#countAll()

2022-01-03 Thread Feng Guo (Jira)

Feng Guo created LUCENE-10350:
-

 Summary: Avoid some null checking for 
FastTaxonomyFacetCounts#countAll()
 Key: LUCENE-10350
 URL: https://issues.apache.org/jira/browse/LUCENE-10350
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Feng Guo


I find that {{org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()}} 
is using about 2% cpu of luceneutil, this could probably be replaced with 
{{values[doc]++}} since {{#countAll}} will never use hashTable.

Two changes:

# No need to check liveDocs null again and again.
# Call {{values[doc]++}} instead of {{#increment}} since {{#countAll}} will 
never use hashTable.


*Benchmark* (baseline is the newest main, including LUCENE-10346)
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  IntNRQ  128.51 (27.8%)  120.13 
(27.4%)   -6.5% ( -48% -   67%) 0.455
PKLookup  232.55  (5.0%)  226.26  
(4.2%)   -2.7% ( -11% -6%) 0.065
Wildcard  178.54  (5.5%)  175.13  
(5.7%)   -1.9% ( -12% -9%) 0.283
   BrowseMonthSSDVFacets   16.37  (6.9%)   16.13  
(4.6%)   -1.5% ( -12% -   10%) 0.422
  HighPhrase  211.52  (3.7%)  209.59  
(3.3%)   -0.9% (  -7% -6%) 0.414
   MedPhrase  239.31  (3.2%)  237.14  
(2.5%)   -0.9% (  -6% -4%) 0.311
HighSloppyPhrase   33.08  (3.3%)   32.79  
(3.5%)   -0.9% (  -7% -6%) 0.407
 Prefix3  171.63  (7.5%)  170.33  
(8.3%)   -0.8% ( -15% -   16%) 0.762
 Respell   80.21  (3.3%)   79.74  
(2.7%)   -0.6% (  -6% -5%) 0.530
   LowPhrase   26.21  (3.6%)   26.05  
(2.5%)   -0.6% (  -6% -5%) 0.549
 LowSloppyPhrase  165.34  (2.4%)  164.47  
(2.7%)   -0.5% (  -5% -4%) 0.516
OrHighNotLow 1984.04  (3.9%) 1974.07  
(5.2%)   -0.5% (  -9% -8%) 0.730
   OrHighMed   93.69  (4.2%)   93.23  
(4.1%)   -0.5% (  -8% -8%) 0.711
 MedSpanNear   12.19  (3.6%)   12.14  
(4.0%)   -0.3% (  -7% -7%) 0.777
  Fuzzy2   98.86  (3.0%)   98.56  
(2.6%)   -0.3% (  -5% -5%) 0.735
HighTerm 2284.28  (4.3%) 2277.92  
(3.4%)   -0.3% (  -7% -7%) 0.819
   BrowseDayOfYearSSDVFacets   14.65  (4.8%)   14.61  
(4.0%)   -0.3% (  -8% -8%) 0.844
 LowSpanNear  101.85  (1.7%)  101.58  
(2.0%)   -0.3% (  -3% -3%) 0.662
 BrowseRandomLabelSSDVFacets   11.04  (5.4%)   11.02  
(7.2%)   -0.2% ( -12% -   13%) 0.902
  OrHighHigh   39.59  (4.2%)   39.49  
(4.1%)   -0.2% (  -8% -8%) 0.859
  Fuzzy1   84.27  (3.1%)   84.11  
(2.3%)   -0.2% (  -5% -5%) 0.826
  AndHighMed   94.85  (5.1%)   94.77  
(6.9%)   -0.1% ( -11% -   12%) 0.969
   HighTermDayOfYearSort  179.66 (17.0%)  179.56 
(12.8%)   -0.1% ( -25% -   35%) 0.991
 LowTerm 2016.63  (3.5%) 2015.71  
(3.9%)   -0.0% (  -7% -7%) 0.969
  AndHighLow 1011.34  (4.1%) 1011.05  
(5.3%)   -0.0% (  -9% -9%) 0.985
HighTermTitleBDVSort  121.48 (14.4%)  121.49 
(15.9%)0.0% ( -26% -   35%) 0.998
 MedTerm 2239.73  (4.6%) 2245.65  
(3.1%)0.3% (  -7% -8%) 0.830
 AndHighHigh  102.09  (3.1%)  102.48  
(5.3%)0.4% (  -7% -9%) 0.778
OrNotHighLow 1113.23  (2.3%) 1117.98  
(2.4%)0.4% (  -4% -5%) 0.568
HighSpanNear1.92  (4.7%)1.93  
(5.4%)0.5% (  -9% -   11%) 0.738
OrHighNotMed 1322.20  (4.3%) 1330.58  
(3.1%)0.6% (  -6% -8%) 0.592
 AndHighMedDayTaxoFacets   65.82  (1.8%)   66.30  
(2.5%)0.7% (  -3% -5%) 0.295
OrNotHighMed 1262.49  (3.0%) 1272.12  
(3.8%)0.8% (  -5% -7%) 0.480
MedTermDayTaxoFacets   52.07  (4.7%)   52.54  
(6.9%)0.9% ( -10% -   13%) 0.628
   OrNotHighHigh  944.56  (3.7%)  953.87  
(3.0%)1.0% (  -5% -7%) 0.352
 MedSloppyPhrase   64.28  (5.4%)   64.92  
(4.7%)1.0% (  -8% -   11%) 0.531

[jira] [Updated] (LUCENE-10346) Specially treat SingletonSortedNumericDocValues in FastTaxonomyFacetCounts#countAll()

2021-12-31 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10346:
--
Description: 
CPU profile often tells {{SingletonSortedNumericDocValues#nextDoc()}} is using 
a high percentage of CPU when running luceneutil, but the {{nextDoc()}} of 
dense cases should be rather simple. So I suspect that it is too many layers of 
abstraction (and wrap) that cause the stress of JVM. Unwraping it to 
{{NumericDocvalues}} shows around 30% speed up.
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
HighTermTitleBDVSort  132.24 (20.6%)  125.67  
(9.9%)   -5.0% ( -29% -   32%) 0.330
 LowTerm 1424.13  (3.2%) 1381.34  
(4.4%)   -3.0% ( -10% -4%) 0.014
   OrHighNotHigh  707.82  (3.3%)  687.49  
(6.0%)   -2.9% ( -11% -6%) 0.062
  TermDTSort  155.32 (10.9%)  151.02 
(10.2%)   -2.8% ( -21% -   20%) 0.406
OrNotHighMed  618.46  (3.7%)  602.65  
(4.4%)   -2.6% ( -10% -5%) 0.047
  Fuzzy1   76.22  (5.3%)   74.71  
(6.6%)   -2.0% ( -13% -   10%) 0.293
   HighTermMonthSort  174.89 (10.4%)  171.45 
(10.6%)   -2.0% ( -20% -   21%) 0.554
OrHighNotMed  776.08  (4.9%)  761.70  
(7.8%)   -1.9% ( -13% -   11%) 0.367
   HighTermDayOfYearSort   56.23 (10.7%)   55.26 
(10.9%)   -1.7% ( -21% -   22%) 0.615
 MedTerm 1449.48  (3.7%) 1425.87  
(5.1%)   -1.6% ( -10% -7%) 0.250
   OrNotHighHigh  687.92  (4.9%)  677.06  
(5.5%)   -1.6% ( -11% -9%) 0.339
OrHighNotLow  742.99  (4.7%)  732.23  
(5.9%)   -1.4% ( -11% -9%) 0.390
OrNotHighLow  789.37  (2.7%)  778.80  
(4.7%)   -1.3% (  -8% -6%) 0.270
  HighPhrase   75.84  (2.2%)   75.14  
(3.0%)   -0.9% (  -6% -4%) 0.269
HighSloppyPhrase   20.71  (5.9%)   20.56  
(5.2%)   -0.7% ( -11% -   11%) 0.678
  IntNRQ  106.38 (18.4%)  105.67 
(18.2%)   -0.7% ( -31% -   44%) 0.908
   OrHighMed   45.10  (1.5%)   44.83  
(1.8%)   -0.6% (  -3% -2%) 0.261
 MedSpanNear  192.49  (2.5%)  191.51  
(3.5%)   -0.5% (  -6% -5%) 0.593
   OrHighLow  489.82  (5.5%)  487.79  
(5.7%)   -0.4% ( -11% -   11%) 0.815
 MedSloppyPhrase   27.33  (2.9%)   27.22  
(2.3%)   -0.4% (  -5% -5%) 0.623
   MedPhrase  208.94  (2.9%)  208.09  
(3.7%)   -0.4% (  -6% -6%) 0.696
 Respell   71.84  (2.4%)   71.55  
(2.4%)   -0.4% (  -5% -4%) 0.600
  OrHighHigh   36.26  (1.3%)   36.13  
(1.1%)   -0.4% (  -2% -2%) 0.344
   BrowseMonthSSDVFacets   15.95  (2.7%)   15.90  
(2.5%)   -0.4% (  -5% -5%) 0.672
  AndHighMed   85.83  (2.2%)   85.53  
(2.7%)   -0.3% (  -5% -4%) 0.658
 Prefix3  123.15  (2.6%)  122.74  
(2.5%)   -0.3% (  -5% -4%) 0.678
  Fuzzy2   76.41  (4.7%)   76.23  
(4.2%)   -0.2% (  -8% -9%) 0.867
   BrowseDayOfYearSSDVFacets   14.52  (2.4%)   14.49  
(2.2%)   -0.2% (  -4% -4%) 0.747
 MedIntervalsOrdered   56.39  (4.2%)   56.27  
(4.1%)   -0.2% (  -8% -8%) 0.871
HighIntervalsOrdered9.29  (4.7%)9.27  
(4.4%)   -0.2% (  -8% -9%) 0.896
 AndHighMedDayTaxoFacets  119.76  (2.5%)  119.53  
(2.9%)   -0.2% (  -5% -5%) 0.831
HighSpanNear   20.89  (2.0%)   20.85  
(2.3%)   -0.2% (  -4% -4%) 0.803
 LowIntervalsOrdered   45.51  (4.9%)   45.47  
(4.8%)   -0.1% (  -9% -   10%) 0.952
   LowPhrase   64.17  (2.6%)   64.14  
(2.6%)   -0.1% (  -5% -5%) 0.951
 LowSpanNear  104.45  (2.2%)  104.41  
(1.9%)   -0.0% (  -4% -4%) 0.959
Wildcard  103.83  (2.8%)  103.80  
(2.8%)   -0.0% (  -5% -5%) 0.970
 AndHighHigh   42.33  (2.6%)   42.33  
(2.4%)   -0.0% (  -4% -5%) 0.991
 BrowseRandomLabelSSDVFacets   10.62  (2.5%)   10.62  
(1.8%)0.0% (  -4% -4%) 0.981
AndHighHighDayTaxoFacets

[jira] [Created] (LUCENE-10346) Specially treat SingletonSortedNumericDocValues in FastTaxonomyFacetCounts#countAll()

2021-12-31 Thread Feng Guo (Jira)

Feng Guo created LUCENE-10346:
-

 Summary: Specially treat SingletonSortedNumericDocValues in 
FastTaxonomyFacetCounts#countAll()
 Key: LUCENE-10346
 URL: https://issues.apache.org/jira/browse/LUCENE-10346
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Feng Guo


CPU profile often tells {{SingletonSortedNumericDocValues#nextDoc()}} is using 
a high percentage of CPU when running luceneutil, but the {{nextDoc()}} of 
dense cases should be rather simple. So I suspect that it is too many layers of 
abstraction that cause the stress of JVM. Unwraping it to {{NumericDocvalues}} 
shows around 30% speed up.
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
HighTermTitleBDVSort  132.24 (20.6%)  125.67  
(9.9%)   -5.0% ( -29% -   32%) 0.330
 LowTerm 1424.13  (3.2%) 1381.34  
(4.4%)   -3.0% ( -10% -4%) 0.014
   OrHighNotHigh  707.82  (3.3%)  687.49  
(6.0%)   -2.9% ( -11% -6%) 0.062
  TermDTSort  155.32 (10.9%)  151.02 
(10.2%)   -2.8% ( -21% -   20%) 0.406
OrNotHighMed  618.46  (3.7%)  602.65  
(4.4%)   -2.6% ( -10% -5%) 0.047
  Fuzzy1   76.22  (5.3%)   74.71  
(6.6%)   -2.0% ( -13% -   10%) 0.293
   HighTermMonthSort  174.89 (10.4%)  171.45 
(10.6%)   -2.0% ( -20% -   21%) 0.554
OrHighNotMed  776.08  (4.9%)  761.70  
(7.8%)   -1.9% ( -13% -   11%) 0.367
   HighTermDayOfYearSort   56.23 (10.7%)   55.26 
(10.9%)   -1.7% ( -21% -   22%) 0.615
 MedTerm 1449.48  (3.7%) 1425.87  
(5.1%)   -1.6% ( -10% -7%) 0.250
   OrNotHighHigh  687.92  (4.9%)  677.06  
(5.5%)   -1.6% ( -11% -9%) 0.339
OrHighNotLow  742.99  (4.7%)  732.23  
(5.9%)   -1.4% ( -11% -9%) 0.390
OrNotHighLow  789.37  (2.7%)  778.80  
(4.7%)   -1.3% (  -8% -6%) 0.270
  HighPhrase   75.84  (2.2%)   75.14  
(3.0%)   -0.9% (  -6% -4%) 0.269
HighSloppyPhrase   20.71  (5.9%)   20.56  
(5.2%)   -0.7% ( -11% -   11%) 0.678
  IntNRQ  106.38 (18.4%)  105.67 
(18.2%)   -0.7% ( -31% -   44%) 0.908
   OrHighMed   45.10  (1.5%)   44.83  
(1.8%)   -0.6% (  -3% -2%) 0.261
 MedSpanNear  192.49  (2.5%)  191.51  
(3.5%)   -0.5% (  -6% -5%) 0.593
   OrHighLow  489.82  (5.5%)  487.79  
(5.7%)   -0.4% ( -11% -   11%) 0.815
 MedSloppyPhrase   27.33  (2.9%)   27.22  
(2.3%)   -0.4% (  -5% -5%) 0.623
   MedPhrase  208.94  (2.9%)  208.09  
(3.7%)   -0.4% (  -6% -6%) 0.696
 Respell   71.84  (2.4%)   71.55  
(2.4%)   -0.4% (  -5% -4%) 0.600
  OrHighHigh   36.26  (1.3%)   36.13  
(1.1%)   -0.4% (  -2% -2%) 0.344
   BrowseMonthSSDVFacets   15.95  (2.7%)   15.90  
(2.5%)   -0.4% (  -5% -5%) 0.672
  AndHighMed   85.83  (2.2%)   85.53  
(2.7%)   -0.3% (  -5% -4%) 0.658
 Prefix3  123.15  (2.6%)  122.74  
(2.5%)   -0.3% (  -5% -4%) 0.678
  Fuzzy2   76.41  (4.7%)   76.23  
(4.2%)   -0.2% (  -8% -9%) 0.867
   BrowseDayOfYearSSDVFacets   14.52  (2.4%)   14.49  
(2.2%)   -0.2% (  -4% -4%) 0.747
 MedIntervalsOrdered   56.39  (4.2%)   56.27  
(4.1%)   -0.2% (  -8% -8%) 0.871
HighIntervalsOrdered9.29  (4.7%)9.27  
(4.4%)   -0.2% (  -8% -9%) 0.896
 AndHighMedDayTaxoFacets  119.76  (2.5%)  119.53  
(2.9%)   -0.2% (  -5% -5%) 0.831
HighSpanNear   20.89  (2.0%)   20.85  
(2.3%)   -0.2% (  -4% -4%) 0.803
 LowIntervalsOrdered   45.51  (4.9%)   45.47  
(4.8%)   -0.1% (  -9% -   10%) 0.952
   LowPhrase   64.17  (2.6%)   64.14  
(2.6%)   -0.1% (  -5% -5%) 0.951
 LowSpanNear  104.45  (2.2%)  104.41  
(1.9%)   -0.0% (  -4% -4%) 0.959
Wildcard  103.83  (2.8%)  103.80  
(2.8%)   -0.0% (  -5% -5%) 0.970
 AndHighHigh   42.33  (2.6%)

[jira] [Commented] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-30 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466863#comment-17466863
 ] 

Feng Guo commented on LUCENE-10334:
---

OK! I've prepared the [PR for first 
patch|https://github.com/apache/lucene/pull/562/files] ready for a review now, 
please help take a look when you have free time, Thanks [~rcmuir]!

> Introduce a BlockReader based on ForUtil and use it for NumericDocValues
> 
>
> Key: LUCENE-10334
> URL: https://issues.apache.org/jira/browse/LUCENE-10334
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Previous talk is here: [https://github.com/apache/lucene/pull/557]
> This is trying to add a new BlockReader based on ForUtil to replace the 
> DirectReader we are using for NumericDocvalues
> -*Benchmark based on wiki10m*- (Previous benchmark results are wrong so i 
> deleted it to avoid misleading, let's see the benchmark in comments.)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-30 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466709#comment-17466709
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/30/21, 10:06 AM:
---

In order to save reading time, I deleted some previous progress comments and 
try to make a final summary here.

??one idea is we could try using the new block compression just for ordinals as 
a start??

Thanks [~rcmuir] for the suggestion! I made some optimizations in this approach 
and browse taxo tasks (Browse*TaxoFacets) are getting speed up too. So the 
benchmark is telling "dense faster sparse slower" instead of "SSDV faster Taxos 
slower" now. I suspect we probably did not see a SSDV regression just because 
we have not added reading sparse SSDV values tasks, e.g. a 
{{{}MedTermDaySSDVFacets{}}}.

I've got two schemes in mind so far:

*ForUtil Approach*

This approach tends to make file format friendly to block decoding and decode 
block based on the efficient ForUtil (with SIMD opt) for each get. As a result 
we can get a rather delicious (130%) speed up in {{Browse*}} tasks. But we also 
get a slight (10%) regression in tasks that reading facets with a query (like 
MedTermDayTaxoFacets) since we are reading sparse values there and we need to 
decompress the whole 128 values block even we only need one value in that block.

Here is the [code|https://github.com/apache/lucene/pull/562] and luceneutil 
benchmark:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
 AndHighMedDayTaxoFacets   71.49  (2.1%)   64.72  
(2.0%)   -9.5% ( -13% -   -5%) 0.000
MedTermDayTaxoFacets   25.79  (2.6%)   24.00  
(1.8%)   -6.9% ( -11% -   -2%) 0.000
AndHighHighDayTaxoFacets   13.13  (3.4%)   12.63  
(3.1%)   -3.9% ( -10% -2%) 0.000
  OrHighMedDayTaxoFacets   13.71  (4.1%)   13.41  
(4.7%)   -2.2% ( -10% -6%) 0.118
PKLookup  204.87  (3.9%)  203.03  
(3.6%)   -0.9% (  -8% -6%) 0.450
 Prefix3  113.85  (3.6%)  113.32  
(4.6%)   -0.5% (  -8% -8%) 0.724
HighSpanNear   25.34  (2.5%)   25.26  
(3.1%)   -0.3% (  -5% -5%) 0.714
 LowSpanNear   55.96  (2.0%)   55.80  
(2.1%)   -0.3% (  -4% -3%) 0.658
 MedSpanNear   56.84  (2.4%)   56.90  
(2.2%)0.1% (  -4% -4%) 0.895
 MedSloppyPhrase   26.57  (1.8%)   26.60  
(1.9%)0.1% (  -3% -3%) 0.831
HighSloppyPhrase   30.20  (3.7%)   30.24  
(3.6%)0.2% (  -6% -7%) 0.890
   OrHighMed   49.96  (2.1%)   50.06  
(1.7%)0.2% (  -3% -4%) 0.742
  AndHighMed   96.70  (2.9%)   96.95  
(2.6%)0.3% (  -5% -5%) 0.772
 LowIntervalsOrdered   23.32  (4.6%)   23.38  
(4.5%)0.3% (  -8% -9%) 0.856
  OrHighHigh   38.09  (1.9%)   38.20  
(1.8%)0.3% (  -3% -4%) 0.643
  TermDTSort  128.55 (14.7%)  128.94 
(11.6%)0.3% ( -22% -   31%) 0.942
  Fuzzy1   99.54  (7.1%)   99.86  
(8.0%)0.3% ( -13% -   16%) 0.893
HighIntervalsOrdered   15.58  (2.6%)   15.65  
(2.6%)0.4% (  -4% -5%) 0.636
 Respell   63.96  (1.9%)   64.22  
(2.3%)0.4% (  -3% -4%) 0.542
   OrHighNotHigh  611.12  (5.8%)  613.85  
(6.2%)0.4% ( -10% -   13%) 0.814
 MedIntervalsOrdered   59.48  (5.2%)   59.75  
(5.1%)0.5% (  -9% -   11%) 0.780
 AndHighHigh   58.76  (3.0%)   59.16  
(3.0%)0.7% (  -5% -6%) 0.478
   OrNotHighHigh  619.53  (6.0%)  623.79  
(7.1%)0.7% ( -11% -   14%) 0.740
  HighPhrase   31.00  (2.5%)   31.26  
(2.7%)0.8% (  -4% -6%) 0.307
  AndHighLow  828.41  (5.9%)  835.65  
(7.1%)0.9% ( -11% -   14%) 0.672
OrNotHighLow  986.46  (6.8%)  995.13 
(10.5%)0.9% ( -15% -   19%) 0.752
HighTermTitleBDVSort  110.39 (12.3%)  111.38 
(11.1%)0.9% ( -20% -   27%) 0.807
  IntNRQ  151.29  (2.6%)  152.96  
(3.5%)1.1% (  -4% -7%) 0.262
 LowTerm 1876.18  (7.8%) 1897.19  
(8.3%)1.1% ( -13% -   18%) 0.660
   HighTermDayOfYearSort  108.34 (18.9%)  109.87 
(17.4%)

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-30 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466709#comment-17466709
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/30/21, 9:18 AM:
--

In order to save reading time, I deleted some previous progress comments and 
try to make a final summary here.

??one idea is we could try using the new block compression just for ordinals as 
a start??

Thanks [~rcmuir] for the suggestion! I made some optimizations in this approach 
and browse taxo tasks (Browse*TaxoFacets) are getting speed up too. So the 
benchmark is telling "dense faster sparse slower" instead of "SSDV faster Taxos 
slower" now. I suspect we probably did not see a SSDV regression just because 
we have not added reading sparse SSDV values tasks, e.g. a 
{{{}MedTermDaySSDVFacets{}}}.

I've got two schemes in mind so far:

*ForUtil Approach*

This approach tends to make file format friendly to block decoding and decode 
block based on the efficient ForUtil (with SIMD opt) for each get. As a result 
we can get a rather delicious (130%) speed up in {{Browse*}} tasks. But we also 
get a slight (10%) regression in tasks that reading facets with a query (like 
MedTermDayTaxoFacets) since we are reading sparse values there and we need to 
decompress the whole 128 values block even we only need one value in that block.

Here is the [code|https://github.com/apache/lucene/pull/562] and luceneutil 
benchmark:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
 AndHighMedDayTaxoFacets   71.49  (2.1%)   64.72  
(2.0%)   -9.5% ( -13% -   -5%) 0.000
MedTermDayTaxoFacets   25.79  (2.6%)   24.00  
(1.8%)   -6.9% ( -11% -   -2%) 0.000
AndHighHighDayTaxoFacets   13.13  (3.4%)   12.63  
(3.1%)   -3.9% ( -10% -2%) 0.000
  OrHighMedDayTaxoFacets   13.71  (4.1%)   13.41  
(4.7%)   -2.2% ( -10% -6%) 0.118
PKLookup  204.87  (3.9%)  203.03  
(3.6%)   -0.9% (  -8% -6%) 0.450
 Prefix3  113.85  (3.6%)  113.32  
(4.6%)   -0.5% (  -8% -8%) 0.724
HighSpanNear   25.34  (2.5%)   25.26  
(3.1%)   -0.3% (  -5% -5%) 0.714
 LowSpanNear   55.96  (2.0%)   55.80  
(2.1%)   -0.3% (  -4% -3%) 0.658
 MedSpanNear   56.84  (2.4%)   56.90  
(2.2%)0.1% (  -4% -4%) 0.895
 MedSloppyPhrase   26.57  (1.8%)   26.60  
(1.9%)0.1% (  -3% -3%) 0.831
HighSloppyPhrase   30.20  (3.7%)   30.24  
(3.6%)0.2% (  -6% -7%) 0.890
   OrHighMed   49.96  (2.1%)   50.06  
(1.7%)0.2% (  -3% -4%) 0.742
  AndHighMed   96.70  (2.9%)   96.95  
(2.6%)0.3% (  -5% -5%) 0.772
 LowIntervalsOrdered   23.32  (4.6%)   23.38  
(4.5%)0.3% (  -8% -9%) 0.856
  OrHighHigh   38.09  (1.9%)   38.20  
(1.8%)0.3% (  -3% -4%) 0.643
  TermDTSort  128.55 (14.7%)  128.94 
(11.6%)0.3% ( -22% -   31%) 0.942
  Fuzzy1   99.54  (7.1%)   99.86  
(8.0%)0.3% ( -13% -   16%) 0.893
HighIntervalsOrdered   15.58  (2.6%)   15.65  
(2.6%)0.4% (  -4% -5%) 0.636
 Respell   63.96  (1.9%)   64.22  
(2.3%)0.4% (  -3% -4%) 0.542
   OrHighNotHigh  611.12  (5.8%)  613.85  
(6.2%)0.4% ( -10% -   13%) 0.814
 MedIntervalsOrdered   59.48  (5.2%)   59.75  
(5.1%)0.5% (  -9% -   11%) 0.780
 AndHighHigh   58.76  (3.0%)   59.16  
(3.0%)0.7% (  -5% -6%) 0.478
   OrNotHighHigh  619.53  (6.0%)  623.79  
(7.1%)0.7% ( -11% -   14%) 0.740
  HighPhrase   31.00  (2.5%)   31.26  
(2.7%)0.8% (  -4% -6%) 0.307
  AndHighLow  828.41  (5.9%)  835.65  
(7.1%)0.9% ( -11% -   14%) 0.672
OrNotHighLow  986.46  (6.8%)  995.13 
(10.5%)0.9% ( -15% -   19%) 0.752
HighTermTitleBDVSort  110.39 (12.3%)  111.38 
(11.1%)0.9% ( -20% -   27%) 0.807
  IntNRQ  151.29  (2.6%)  152.96  
(3.5%)1.1% (  -4% -7%) 0.262
 LowTerm 1876.18  (7.8%) 1897.19  
(8.3%)1.1% ( -13% -   18%) 0.660
   HighTermDayOfYearSort  108.34 (18.9%)  109.87 
(17.4%)

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-30 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466709#comment-17466709
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/30/21, 9:16 AM:
--

In order to save reading time, I deleted some previous progress comments and 
try to make a final summary here.

??one idea is we could try using the new block compression just for ordinals as 
a start??

Thanks [~rcmuir] for the suggestion! I made some optimizations in this approach 
and browse taxo tasks (Browse*TaxoFacets) are getting speed up too. So the 
benchmark is telling "dense faster sparse slower" instead of "SSDV faster Taxos 
slower" now. I suspect we probably did not see a SSDV regression just because 
we have not added reading sparse SSDV values tasks, e.g. a 
{{{}MedTermDaySSDVFacets{}}}.

I've got two schemes in mind so far:

*ForUtil Approach*

This approach tends to make file format friendly to block decoding and decode 
block based on the efficient ForUtil (with SIMD opt) for each get. As a result 
we can get a rather delicious (130%) speed up in {{Browse*}} tasks. But we also 
get a slight (10%) regression in tasks that reading facets with a query (like 
MedTermDayTaxoFacets) since we are reading sparse values there and we need to 
decompress the whole 128 values block even we only need one value in that block.

Here is the [code|https://github.com/apache/lucene/pull/562] and luceneutil 
benchmark:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
 AndHighMedDayTaxoFacets   71.49  (2.1%)   64.72  
(2.0%)   -9.5% ( -13% -   -5%) 0.000
MedTermDayTaxoFacets   25.79  (2.6%)   24.00  
(1.8%)   -6.9% ( -11% -   -2%) 0.000
AndHighHighDayTaxoFacets   13.13  (3.4%)   12.63  
(3.1%)   -3.9% ( -10% -2%) 0.000
  OrHighMedDayTaxoFacets   13.71  (4.1%)   13.41  
(4.7%)   -2.2% ( -10% -6%) 0.118
PKLookup  204.87  (3.9%)  203.03  
(3.6%)   -0.9% (  -8% -6%) 0.450
 Prefix3  113.85  (3.6%)  113.32  
(4.6%)   -0.5% (  -8% -8%) 0.724
HighSpanNear   25.34  (2.5%)   25.26  
(3.1%)   -0.3% (  -5% -5%) 0.714
 LowSpanNear   55.96  (2.0%)   55.80  
(2.1%)   -0.3% (  -4% -3%) 0.658
 MedSpanNear   56.84  (2.4%)   56.90  
(2.2%)0.1% (  -4% -4%) 0.895
 MedSloppyPhrase   26.57  (1.8%)   26.60  
(1.9%)0.1% (  -3% -3%) 0.831
HighSloppyPhrase   30.20  (3.7%)   30.24  
(3.6%)0.2% (  -6% -7%) 0.890
   OrHighMed   49.96  (2.1%)   50.06  
(1.7%)0.2% (  -3% -4%) 0.742
  AndHighMed   96.70  (2.9%)   96.95  
(2.6%)0.3% (  -5% -5%) 0.772
 LowIntervalsOrdered   23.32  (4.6%)   23.38  
(4.5%)0.3% (  -8% -9%) 0.856
  OrHighHigh   38.09  (1.9%)   38.20  
(1.8%)0.3% (  -3% -4%) 0.643
  TermDTSort  128.55 (14.7%)  128.94 
(11.6%)0.3% ( -22% -   31%) 0.942
  Fuzzy1   99.54  (7.1%)   99.86  
(8.0%)0.3% ( -13% -   16%) 0.893
HighIntervalsOrdered   15.58  (2.6%)   15.65  
(2.6%)0.4% (  -4% -5%) 0.636
 Respell   63.96  (1.9%)   64.22  
(2.3%)0.4% (  -3% -4%) 0.542
   OrHighNotHigh  611.12  (5.8%)  613.85  
(6.2%)0.4% ( -10% -   13%) 0.814
 MedIntervalsOrdered   59.48  (5.2%)   59.75  
(5.1%)0.5% (  -9% -   11%) 0.780
 AndHighHigh   58.76  (3.0%)   59.16  
(3.0%)0.7% (  -5% -6%) 0.478
   OrNotHighHigh  619.53  (6.0%)  623.79  
(7.1%)0.7% ( -11% -   14%) 0.740
  HighPhrase   31.00  (2.5%)   31.26  
(2.7%)0.8% (  -4% -6%) 0.307
  AndHighLow  828.41  (5.9%)  835.65  
(7.1%)0.9% ( -11% -   14%) 0.672
OrNotHighLow  986.46  (6.8%)  995.13 
(10.5%)0.9% ( -15% -   19%) 0.752
HighTermTitleBDVSort  110.39 (12.3%)  111.38 
(11.1%)0.9% ( -20% -   27%) 0.807
  IntNRQ  151.29  (2.6%)  152.96  
(3.5%)1.1% (  -4% -7%) 0.262
 LowTerm 1876.18  (7.8%) 1897.19  
(8.3%)1.1% ( -13% -   18%) 0.660
   HighTermDayOfYearSort  108.34 (18.9%)  109.87 
(17.4%)

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-30 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466709#comment-17466709
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/30/21, 8:41 AM:
--

In order to save reading time, I deleted some previous progress comments and 
try to make a final summary here.

??one idea is we could try using the new block compression just for ordinals as 
a start??

Thanks [~rcmuir] for the suggestion! I made some optimizations in this approach 
and browse taxo tasks (Browse*TaxoFacets) are getting speed up too. So the 
benchmark is telling "dense faster sparse slower" instead of "SSDV faster Taxos 
slower" now. I suspect we probably did not see a SSDV regression just because 
we have not added reading sparse SSDV values tasks, e.g. a 
{{{}MedTermDaySSDVFacets{}}}.

I've got two schemes in mind so far:

*ForUtil Approach*

This approach tends to make file format friendly to block decoding and do block 
decoding for each get. As a result we can get a rather delicious (130%) speed 
up in {{Browse*}} tasks. But we also get a slight (10%) regression in tasks 
that reading facets with a query (like MedTermDayTaxoFacets) since we are 
reading sparse values there and we need to decompress the whole 128 values 
block even we only need one value in that block.

Here is the [code|https://github.com/apache/lucene/pull/562] and luceneutil 
benchmark:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
 AndHighMedDayTaxoFacets   71.49  (2.1%)   64.72  
(2.0%)   -9.5% ( -13% -   -5%) 0.000
MedTermDayTaxoFacets   25.79  (2.6%)   24.00  
(1.8%)   -6.9% ( -11% -   -2%) 0.000
AndHighHighDayTaxoFacets   13.13  (3.4%)   12.63  
(3.1%)   -3.9% ( -10% -2%) 0.000
  OrHighMedDayTaxoFacets   13.71  (4.1%)   13.41  
(4.7%)   -2.2% ( -10% -6%) 0.118
PKLookup  204.87  (3.9%)  203.03  
(3.6%)   -0.9% (  -8% -6%) 0.450
 Prefix3  113.85  (3.6%)  113.32  
(4.6%)   -0.5% (  -8% -8%) 0.724
HighSpanNear   25.34  (2.5%)   25.26  
(3.1%)   -0.3% (  -5% -5%) 0.714
 LowSpanNear   55.96  (2.0%)   55.80  
(2.1%)   -0.3% (  -4% -3%) 0.658
 MedSpanNear   56.84  (2.4%)   56.90  
(2.2%)0.1% (  -4% -4%) 0.895
 MedSloppyPhrase   26.57  (1.8%)   26.60  
(1.9%)0.1% (  -3% -3%) 0.831
HighSloppyPhrase   30.20  (3.7%)   30.24  
(3.6%)0.2% (  -6% -7%) 0.890
   OrHighMed   49.96  (2.1%)   50.06  
(1.7%)0.2% (  -3% -4%) 0.742
  AndHighMed   96.70  (2.9%)   96.95  
(2.6%)0.3% (  -5% -5%) 0.772
 LowIntervalsOrdered   23.32  (4.6%)   23.38  
(4.5%)0.3% (  -8% -9%) 0.856
  OrHighHigh   38.09  (1.9%)   38.20  
(1.8%)0.3% (  -3% -4%) 0.643
  TermDTSort  128.55 (14.7%)  128.94 
(11.6%)0.3% ( -22% -   31%) 0.942
  Fuzzy1   99.54  (7.1%)   99.86  
(8.0%)0.3% ( -13% -   16%) 0.893
HighIntervalsOrdered   15.58  (2.6%)   15.65  
(2.6%)0.4% (  -4% -5%) 0.636
 Respell   63.96  (1.9%)   64.22  
(2.3%)0.4% (  -3% -4%) 0.542
   OrHighNotHigh  611.12  (5.8%)  613.85  
(6.2%)0.4% ( -10% -   13%) 0.814
 MedIntervalsOrdered   59.48  (5.2%)   59.75  
(5.1%)0.5% (  -9% -   11%) 0.780
 AndHighHigh   58.76  (3.0%)   59.16  
(3.0%)0.7% (  -5% -6%) 0.478
   OrNotHighHigh  619.53  (6.0%)  623.79  
(7.1%)0.7% ( -11% -   14%) 0.740
  HighPhrase   31.00  (2.5%)   31.26  
(2.7%)0.8% (  -4% -6%) 0.307
  AndHighLow  828.41  (5.9%)  835.65  
(7.1%)0.9% ( -11% -   14%) 0.672
OrNotHighLow  986.46  (6.8%)  995.13 
(10.5%)0.9% ( -15% -   19%) 0.752
HighTermTitleBDVSort  110.39 (12.3%)  111.38 
(11.1%)0.9% ( -20% -   27%) 0.807
  IntNRQ  151.29  (2.6%)  152.96  
(3.5%)1.1% (  -4% -7%) 0.262
 LowTerm 1876.18  (7.8%) 1897.19  
(8.3%)1.1% ( -13% -   18%) 0.660
   HighTermDayOfYearSort  108.34 (18.9%)  109.87 
(17.4%)1.4% ( -29% -   46%) 0.805

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-29 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466709#comment-17466709
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/30/21, 7:53 AM:
--

In order to save reading time, I deleted some previous progress comments and 
try to make a final summary here.

??one idea is we could try using the new block compression just for ordinals as 
a start??

Thanks [~rcmuir] for the suggestion! I made some optimizations in this approach 
and browse taxo tasks (Browse*TaxoFacets) are getting speed up too. So the 
benchmark is telling "dense faster sparse slower" instead of "SSDV faster Taxos 
slower" now. I suspect we probably did not see a SSDV regression just because 
we have not added reading sparse SSDV values tasks, e.g. a 
{{{}MedTermDaySSDVFacets{}}}.

I've got two schemes in mind so far:

*ForUtil Approach*

This approach tends to make file format friendly to block decoding and do block 
decoding for each get. As a result we can get a rather delicious (130%) speed 
up in {{Browse*}} tasks. But we also get a slight (10%) regression in tasks 
that reading facets with a query (like MedTermDayTaxoFacets) since we are 
reading sparse values there and we need to decompress the whole 128 values 
block even we only need one value in that block.

Here is the [code|https://github.com/apache/lucene/pull/562] and luceneutil 
benchmark:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
 AndHighMedDayTaxoFacets   71.49  (2.1%)   64.72  
(2.0%)   -9.5% ( -13% -   -5%) 0.000
MedTermDayTaxoFacets   25.79  (2.6%)   24.00  
(1.8%)   -6.9% ( -11% -   -2%) 0.000
AndHighHighDayTaxoFacets   13.13  (3.4%)   12.63  
(3.1%)   -3.9% ( -10% -2%) 0.000
  OrHighMedDayTaxoFacets   13.71  (4.1%)   13.41  
(4.7%)   -2.2% ( -10% -6%) 0.118
PKLookup  204.87  (3.9%)  203.03  
(3.6%)   -0.9% (  -8% -6%) 0.450
 Prefix3  113.85  (3.6%)  113.32  
(4.6%)   -0.5% (  -8% -8%) 0.724
HighSpanNear   25.34  (2.5%)   25.26  
(3.1%)   -0.3% (  -5% -5%) 0.714
 LowSpanNear   55.96  (2.0%)   55.80  
(2.1%)   -0.3% (  -4% -3%) 0.658
 MedSpanNear   56.84  (2.4%)   56.90  
(2.2%)0.1% (  -4% -4%) 0.895
 MedSloppyPhrase   26.57  (1.8%)   26.60  
(1.9%)0.1% (  -3% -3%) 0.831
HighSloppyPhrase   30.20  (3.7%)   30.24  
(3.6%)0.2% (  -6% -7%) 0.890
   OrHighMed   49.96  (2.1%)   50.06  
(1.7%)0.2% (  -3% -4%) 0.742
  AndHighMed   96.70  (2.9%)   96.95  
(2.6%)0.3% (  -5% -5%) 0.772
 LowIntervalsOrdered   23.32  (4.6%)   23.38  
(4.5%)0.3% (  -8% -9%) 0.856
  OrHighHigh   38.09  (1.9%)   38.20  
(1.8%)0.3% (  -3% -4%) 0.643
  TermDTSort  128.55 (14.7%)  128.94 
(11.6%)0.3% ( -22% -   31%) 0.942
  Fuzzy1   99.54  (7.1%)   99.86  
(8.0%)0.3% ( -13% -   16%) 0.893
HighIntervalsOrdered   15.58  (2.6%)   15.65  
(2.6%)0.4% (  -4% -5%) 0.636
 Respell   63.96  (1.9%)   64.22  
(2.3%)0.4% (  -3% -4%) 0.542
   OrHighNotHigh  611.12  (5.8%)  613.85  
(6.2%)0.4% ( -10% -   13%) 0.814
 MedIntervalsOrdered   59.48  (5.2%)   59.75  
(5.1%)0.5% (  -9% -   11%) 0.780
 AndHighHigh   58.76  (3.0%)   59.16  
(3.0%)0.7% (  -5% -6%) 0.478
   OrNotHighHigh  619.53  (6.0%)  623.79  
(7.1%)0.7% ( -11% -   14%) 0.740
  HighPhrase   31.00  (2.5%)   31.26  
(2.7%)0.8% (  -4% -6%) 0.307
  AndHighLow  828.41  (5.9%)  835.65  
(7.1%)0.9% ( -11% -   14%) 0.672
OrNotHighLow  986.46  (6.8%)  995.13 
(10.5%)0.9% ( -15% -   19%) 0.752
HighTermTitleBDVSort  110.39 (12.3%)  111.38 
(11.1%)0.9% ( -20% -   27%) 0.807
  IntNRQ  151.29  (2.6%)  152.96  
(3.5%)1.1% (  -4% -7%) 0.262
 LowTerm 1876.18  (7.8%) 1897.19  
(8.3%)1.1% ( -13% -   18%) 0.660
   HighTermDayOfYearSort  108.34 (18.9%)  109.87 
(17.4%)1.4% ( -29% -   46%) 0.805

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-29 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466709#comment-17466709
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/30/21, 7:39 AM:
--

In order to save reading time, I deleted some previous progress comments and 
try to make a final summary here.

??one idea is we could try using the new block compression just for ordinals as 
a start??

Thanks [~rcmuir] for the suggestion! I made some optimizations in this approach 
and browse taxo tasks (Browse*TaxoFacets) are getting speed up too. So the 
benchmark is telling "dense faster sparse slower" instead of "SSDV faster Taxos 
slower" now. I suspect we probably did not see a SSDV regression just because 
we have not added reading sparse SSDV values tasks, e.g. a 
{{MedTermDaySSDVFacets}}.

 I've got two schemes in mind so far:

*ForUtil Approach*

This approach tends to make file format friendly to block decoding and do block 
decoding for each get. As a result we can get a rather delicious (130%) speed 
up in {{Browse*}} tasks. But we also get a  slight (10%) regression in tasks 
that reading facets with a query (like MedTermDayTaxoFacets) since we are 
reading sparse values there and we need to decompress the whole 128 values 
block even we only need one value in that block.

Here is the [code|https://github.com/apache/lucene/pull/562] luceneutil 
benchmark:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
 AndHighMedDayTaxoFacets   71.49  (2.1%)   64.72  
(2.0%)   -9.5% ( -13% -   -5%) 0.000
MedTermDayTaxoFacets   25.79  (2.6%)   24.00  
(1.8%)   -6.9% ( -11% -   -2%) 0.000
AndHighHighDayTaxoFacets   13.13  (3.4%)   12.63  
(3.1%)   -3.9% ( -10% -2%) 0.000
  OrHighMedDayTaxoFacets   13.71  (4.1%)   13.41  
(4.7%)   -2.2% ( -10% -6%) 0.118
PKLookup  204.87  (3.9%)  203.03  
(3.6%)   -0.9% (  -8% -6%) 0.450
 Prefix3  113.85  (3.6%)  113.32  
(4.6%)   -0.5% (  -8% -8%) 0.724
HighSpanNear   25.34  (2.5%)   25.26  
(3.1%)   -0.3% (  -5% -5%) 0.714
 LowSpanNear   55.96  (2.0%)   55.80  
(2.1%)   -0.3% (  -4% -3%) 0.658
 MedSpanNear   56.84  (2.4%)   56.90  
(2.2%)0.1% (  -4% -4%) 0.895
 MedSloppyPhrase   26.57  (1.8%)   26.60  
(1.9%)0.1% (  -3% -3%) 0.831
HighSloppyPhrase   30.20  (3.7%)   30.24  
(3.6%)0.2% (  -6% -7%) 0.890
   OrHighMed   49.96  (2.1%)   50.06  
(1.7%)0.2% (  -3% -4%) 0.742
  AndHighMed   96.70  (2.9%)   96.95  
(2.6%)0.3% (  -5% -5%) 0.772
 LowIntervalsOrdered   23.32  (4.6%)   23.38  
(4.5%)0.3% (  -8% -9%) 0.856
  OrHighHigh   38.09  (1.9%)   38.20  
(1.8%)0.3% (  -3% -4%) 0.643
  TermDTSort  128.55 (14.7%)  128.94 
(11.6%)0.3% ( -22% -   31%) 0.942
  Fuzzy1   99.54  (7.1%)   99.86  
(8.0%)0.3% ( -13% -   16%) 0.893
HighIntervalsOrdered   15.58  (2.6%)   15.65  
(2.6%)0.4% (  -4% -5%) 0.636
 Respell   63.96  (1.9%)   64.22  
(2.3%)0.4% (  -3% -4%) 0.542
   OrHighNotHigh  611.12  (5.8%)  613.85  
(6.2%)0.4% ( -10% -   13%) 0.814
 MedIntervalsOrdered   59.48  (5.2%)   59.75  
(5.1%)0.5% (  -9% -   11%) 0.780
 AndHighHigh   58.76  (3.0%)   59.16  
(3.0%)0.7% (  -5% -6%) 0.478
   OrNotHighHigh  619.53  (6.0%)  623.79  
(7.1%)0.7% ( -11% -   14%) 0.740
  HighPhrase   31.00  (2.5%)   31.26  
(2.7%)0.8% (  -4% -6%) 0.307
  AndHighLow  828.41  (5.9%)  835.65  
(7.1%)0.9% ( -11% -   14%) 0.672
OrNotHighLow  986.46  (6.8%)  995.13 
(10.5%)0.9% ( -15% -   19%) 0.752
HighTermTitleBDVSort  110.39 (12.3%)  111.38 
(11.1%)0.9% ( -20% -   27%) 0.807
  IntNRQ  151.29  (2.6%)  152.96  
(3.5%)1.1% (  -4% -7%) 0.262
 LowTerm 1876.18  (7.8%) 1897.19  
(8.3%)1.1% ( -13% -   18%) 0.660
   HighTermDayOfYearSort  108.34 (18.9%)  109.87 
(17.4%)1.4% ( -29% -   46%) 0.805

[jira] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-29 Thread Feng Guo (Jira)



[ https://issues.apache.org/jira/browse/LUCENE-10334 ]


Feng Guo deleted comment on LUCENE-10334:
---

was (Author: gf2121):
Thanks [~rcmuir] for suggestion! I tried some optimizations on this patch:

1. I replaced {{DirectWriter#unsignedBitsRequired}} with 
{{PackedInts#unsignedBitsRequired}} at first since ForUtil can support all bpv, 
this change can reduce some index size. But now i rollbacked this change since 
the decode of 1,2,4,8,12,16... could also be a bit faster in ForUtil.

2. {{ForUtil#decode}} will do a {{switch}} for each call, this can be avoided 
by the way like what we do in {{{}DirectReader{}}}, choose a implementation of 
an interface at the beginning. I applied this change in ForUtil.

I'm not sure which is the major optimization but the report seems better now:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
 AndHighMedDayTaxoFacets   71.49  (2.1%)   64.72  
(2.0%)   -9.5% ( -13% -   -5%) 0.000
MedTermDayTaxoFacets   25.79  (2.6%)   24.00  
(1.8%)   -6.9% ( -11% -   -2%) 0.000
AndHighHighDayTaxoFacets   13.13  (3.4%)   12.63  
(3.1%)   -3.9% ( -10% -2%) 0.000
  OrHighMedDayTaxoFacets   13.71  (4.1%)   13.41  
(4.7%)   -2.2% ( -10% -6%) 0.118
PKLookup  204.87  (3.9%)  203.03  
(3.6%)   -0.9% (  -8% -6%) 0.450
 Prefix3  113.85  (3.6%)  113.32  
(4.6%)   -0.5% (  -8% -8%) 0.724
HighSpanNear   25.34  (2.5%)   25.26  
(3.1%)   -0.3% (  -5% -5%) 0.714
 LowSpanNear   55.96  (2.0%)   55.80  
(2.1%)   -0.3% (  -4% -3%) 0.658
 MedSpanNear   56.84  (2.4%)   56.90  
(2.2%)0.1% (  -4% -4%) 0.895
 MedSloppyPhrase   26.57  (1.8%)   26.60  
(1.9%)0.1% (  -3% -3%) 0.831
HighSloppyPhrase   30.20  (3.7%)   30.24  
(3.6%)0.2% (  -6% -7%) 0.890
   OrHighMed   49.96  (2.1%)   50.06  
(1.7%)0.2% (  -3% -4%) 0.742
  AndHighMed   96.70  (2.9%)   96.95  
(2.6%)0.3% (  -5% -5%) 0.772
 LowIntervalsOrdered   23.32  (4.6%)   23.38  
(4.5%)0.3% (  -8% -9%) 0.856
  OrHighHigh   38.09  (1.9%)   38.20  
(1.8%)0.3% (  -3% -4%) 0.643
  TermDTSort  128.55 (14.7%)  128.94 
(11.6%)0.3% ( -22% -   31%) 0.942
  Fuzzy1   99.54  (7.1%)   99.86  
(8.0%)0.3% ( -13% -   16%) 0.893
HighIntervalsOrdered   15.58  (2.6%)   15.65  
(2.6%)0.4% (  -4% -5%) 0.636
 Respell   63.96  (1.9%)   64.22  
(2.3%)0.4% (  -3% -4%) 0.542
   OrHighNotHigh  611.12  (5.8%)  613.85  
(6.2%)0.4% ( -10% -   13%) 0.814
 MedIntervalsOrdered   59.48  (5.2%)   59.75  
(5.1%)0.5% (  -9% -   11%) 0.780
 AndHighHigh   58.76  (3.0%)   59.16  
(3.0%)0.7% (  -5% -6%) 0.478
   OrNotHighHigh  619.53  (6.0%)  623.79  
(7.1%)0.7% ( -11% -   14%) 0.740
  HighPhrase   31.00  (2.5%)   31.26  
(2.7%)0.8% (  -4% -6%) 0.307
  AndHighLow  828.41  (5.9%)  835.65  
(7.1%)0.9% ( -11% -   14%) 0.672
OrNotHighLow  986.46  (6.8%)  995.13 
(10.5%)0.9% ( -15% -   19%) 0.752
HighTermTitleBDVSort  110.39 (12.3%)  111.38 
(11.1%)0.9% ( -20% -   27%) 0.807
  IntNRQ  151.29  (2.6%)  152.96  
(3.5%)1.1% (  -4% -7%) 0.262
 LowTerm 1876.18  (7.8%) 1897.19  
(8.3%)1.1% ( -13% -   18%) 0.660
   HighTermDayOfYearSort  108.34 (18.9%)  109.87 
(17.4%)1.4% ( -29% -   46%) 0.805
   HighTermMonthSort   65.84 (11.0%)   66.78 
(11.7%)1.4% ( -19% -   27%) 0.689
OrHighNotMed  770.05  (5.3%)  782.54  
(8.8%)1.6% ( -11% -   16%) 0.480
Wildcard  182.10  (5.5%)  185.24  
(7.2%)1.7% ( -10% -   15%) 0.394
 LowSloppyPhrase   33.75  (6.6%)   34.35  
(8.8%)1.8% ( -12% -   18%) 0.478
   MedPhrase  161.57  (3.8%)  164.62  
(6.1%)1.9% (  -7% -   12%) 0.242
OrHighNotLow  679.46  (7.2%)  693.59  
(7.6%)2.1% ( -11% -

[jira] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-29 Thread Feng Guo (Jira)



[ https://issues.apache.org/jira/browse/LUCENE-10334 ]


Feng Guo deleted comment on LUCENE-10334:
---

was (Author: gf2121):
In the 'detect warm up' approach, I unrolled the block decode codes, speed it 
up  a bit:

{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
MedTermDayTaxoFacets   55.17  (6.5%)   53.38  
(7.1%)   -3.2% ( -15% -   11%) 0.129
Wildcard  309.31 (12.9%)  299.80 
(12.6%)   -3.1% ( -25% -   25%) 0.446
OrNotHighLow  696.18  (8.7%)  677.46 
(10.1%)   -2.7% ( -19% -   17%) 0.367
HighTerm 1183.05  (9.5%) 1151.67  
(9.2%)   -2.7% ( -19% -   17%) 0.368
   OrHighMed  120.60  (7.0%)  117.55  
(7.8%)   -2.5% ( -16% -   13%) 0.279
  OrHighMedDayTaxoFacets9.46  (7.5%)9.25  
(6.7%)   -2.2% ( -15% -   12%) 0.320
 Prefix3  177.41  (7.3%)  173.96 
(10.3%)   -1.9% ( -18% -   16%) 0.489
AndHighHighDayTaxoFacets   28.81  (6.7%)   28.35  
(6.3%)   -1.6% ( -13% -   12%) 0.433
   BrowseMonthTaxoFacets   13.50 (13.4%)   13.30  
(6.6%)   -1.5% ( -18% -   21%) 0.658
 AndHighMedDayTaxoFacets   46.43  (7.8%)   45.75  
(7.3%)   -1.5% ( -15% -   14%) 0.540
   MedPhrase  360.70  (8.5%)  355.47  
(8.3%)   -1.4% ( -16% -   16%) 0.587
  AndHighMed  233.52  (6.7%)  230.19  
(7.0%)   -1.4% ( -14% -   13%) 0.510
HighTermTitleBDVSort   72.17 (16.9%)   71.14 
(15.4%)   -1.4% ( -28% -   37%) 0.780
OrHighNotMed  659.68  (9.7%)  650.38 
(12.6%)   -1.4% ( -21% -   23%) 0.691
  HighPhrase   73.05  (7.6%)   72.25  
(9.3%)   -1.1% ( -16% -   17%) 0.685
  TermDTSort  123.29 (15.5%)  122.10 
(13.8%)   -1.0% ( -26% -   33%) 0.835
  IntNRQ  167.75  (7.4%)  166.17  
(8.6%)   -0.9% ( -15% -   16%) 0.710
   OrHighNotHigh  890.84 (13.2%)  883.31 
(11.5%)   -0.8% ( -22% -   27%) 0.828
   OrHighLow  279.24  (7.5%)  276.97  
(6.6%)   -0.8% ( -13% -   14%) 0.718
PKLookup  198.13  (6.6%)  196.54  
(6.9%)   -0.8% ( -13% -   13%) 0.707
 MedSloppyPhrase   94.28  (8.0%)   93.55  
(6.5%)   -0.8% ( -14% -   14%) 0.737
  AndHighLow  574.70  (8.2%)  570.50  
(9.6%)   -0.7% ( -17% -   18%) 0.795
OrNotHighMed  717.34 (11.5%)  712.33 
(12.4%)   -0.7% ( -22% -   26%) 0.853
 AndHighHigh   61.26  (7.2%)   60.84  
(6.3%)   -0.7% ( -13% -   13%) 0.753
HighSloppyPhrase6.56  (6.5%)6.52  
(5.3%)   -0.7% ( -11% -   11%) 0.729
 LowSloppyPhrase  159.12  (6.9%)  158.23  
(6.5%)   -0.6% ( -13% -   13%) 0.794
   LowPhrase   88.55  (8.6%)   88.07  
(8.6%)   -0.5% ( -16% -   18%) 0.844
 MedSpanNear   14.63  (6.1%)   14.55  
(5.3%)   -0.5% ( -11% -   11%) 0.786
  Fuzzy2   24.31  (9.5%)   24.19  
(7.7%)   -0.5% ( -16% -   18%) 0.858
 MedTerm 1440.59  (9.8%) 1433.53 
(11.5%)   -0.5% ( -19% -   23%) 0.885
HighSpanNear   23.52  (6.1%)   23.40  
(5.9%)   -0.5% ( -11% -   12%) 0.797
  OrHighHigh   32.01  (8.4%)   31.96  
(5.7%)   -0.1% ( -13% -   15%) 0.948
  Fuzzy1   82.31 (11.7%)   82.30 
(13.5%)   -0.0% ( -22% -   28%) 0.998
OrHighNotLow  724.27  (9.5%)  724.70 
(10.1%)0.1% ( -17% -   21%) 0.985
   HighTermDayOfYearSort  156.00 (14.4%)  156.22 
(14.6%)0.1% ( -25% -   33%) 0.975
 Respell   68.40  (8.2%)   68.49  
(7.6%)0.1% ( -14% -   17%) 0.955
 MedIntervalsOrdered9.22  (7.4%)9.23  
(6.9%)0.2% ( -13% -   15%) 0.936
   OrNotHighHigh  571.66  (8.1%)  572.72 
(10.9%)0.2% ( -17% -   20%) 0.951
 LowSpanNear   82.39  (7.1%)   82.57  
(4.4%)0.2% ( -10% -   12%) 0.907
 LowTerm 1355.15 (10.2%) 1358.89 
(10.0%)0.3% ( -18% -   22%) 0.931
   HighTermMonthSort   58.72 (20.8%)   58.95 
(20.1%)0.4% ( -33% -   52%) 0.950

[jira] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-29 Thread Feng Guo (Jira)



[ https://issues.apache.org/jira/browse/LUCENE-10334 ]


Feng Guo deleted comment on LUCENE-10334:
---

was (Author: gf2121):
If we can not tolerate the regression, another idea coming to my mind to solve 
the regression is introducing a 'detect warm up' phase for 
{{{}DirectReader{}}}. As most of the usage of DirectReader in DocvaluesProducer 
is a forward reading, we can probably judge hits is dense/sparse by first 128 
#get, e.g. we can assume the reading is dense if we get more than 80% times in 
the first block, and choose block decoding for following gets if dense.

Here is the POC code: [https://github.com/apache/lucene/pull/570] and benchmark 
result:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  OrHighMedDayTaxoFacets   12.08  (5.6%)   11.85  
(4.4%)   -1.9% ( -11% -8%) 0.228
MedTermDayTaxoFacets   35.50  (2.9%)   35.09  
(2.1%)   -1.2% (  -5% -3%) 0.148
AndHighHighDayTaxoFacets   20.35  (2.5%)   20.18  
(2.2%)   -0.8% (  -5% -4%) 0.275
   BrowseMonthTaxoFacets   14.09 (12.4%)   13.99  
(7.2%)   -0.7% ( -18% -   21%) 0.817
 AndHighMedDayTaxoFacets  100.43  (2.2%)   99.96  
(2.2%)   -0.5% (  -4% -3%) 0.501
 LowIntervalsOrdered   31.96  (3.6%)   31.90  
(2.7%)   -0.2% (  -6% -6%) 0.853
HighIntervalsOrdered9.82  (4.8%)9.81  
(3.8%)   -0.1% (  -8% -8%) 0.925
   HighTermDayOfYearSort   58.36  (8.2%)   58.29  
(7.2%)   -0.1% ( -14% -   16%) 0.962
 MedIntervalsOrdered   16.33  (3.3%)   16.33  
(2.5%)   -0.0% (  -5% -6%) 0.967
HighTermTitleBDVSort   82.38 (11.9%)   82.52 
(13.2%)0.2% ( -22% -   28%) 0.966
HighSpanNear   38.08  (1.9%)   38.17  
(1.5%)0.2% (  -3% -3%) 0.687
 AndHighHigh   73.02  (4.1%)   73.20  
(4.4%)0.2% (  -7% -9%) 0.854
  OrHighHigh   38.67  (2.1%)   38.77  
(1.9%)0.3% (  -3% -4%) 0.669
 LowSloppyPhrase   48.05  (5.4%)   48.20  
(5.5%)0.3% ( -10% -   11%) 0.856
 MedSloppyPhrase   34.55  (2.7%)   34.66  
(2.6%)0.3% (  -4% -5%) 0.696
  TermDTSort  200.08 (11.2%)  200.74 
(11.3%)0.3% ( -19% -   25%) 0.926
   HighTermMonthSort  126.69 (11.4%)  127.18 
(11.7%)0.4% ( -20% -   26%) 0.917
HighSloppyPhrase   14.03  (3.5%)   14.09  
(3.7%)0.4% (  -6% -7%) 0.703
 MedSpanNear  103.61  (2.1%)  104.14  
(1.2%)0.5% (  -2% -3%) 0.332
  IntNRQ  126.16  (2.3%)  126.81  
(2.7%)0.5% (  -4% -5%) 0.508
  AndHighMed  164.27  (4.2%)  165.20  
(4.4%)0.6% (  -7% -9%) 0.676
 LowSpanNear  167.58  (2.7%)  168.63  
(2.6%)0.6% (  -4% -6%) 0.460
PKLookup  201.62  (3.8%)  203.05  
(4.7%)0.7% (  -7% -9%) 0.599
 Respell   73.56  (2.1%)   74.43  
(2.7%)1.2% (  -3% -6%) 0.121
   MedPhrase  266.51  (5.2%)  270.42  
(5.9%)1.5% (  -9% -   13%) 0.405
   OrHighMed  116.57  (4.0%)  118.30  
(3.3%)1.5% (  -5% -9%) 0.202
 Prefix3  136.44  (3.9%)  138.51  
(3.6%)1.5% (  -5% -9%) 0.204
OrNotHighMed  669.05  (5.3%)  679.79  
(7.7%)1.6% ( -10% -   15%) 0.443
OrNotHighLow  907.93  (5.8%)  922.66 
(10.1%)1.6% ( -13% -   18%) 0.533
Wildcard  146.59  (3.2%)  149.19  
(4.9%)1.8% (  -6% -   10%) 0.172
   OrHighLow  383.74  (8.5%)  390.67  
(8.0%)1.8% ( -13% -   20%) 0.489
  HighPhrase   96.06  (4.4%)   97.81  
(6.8%)1.8% (  -8% -   13%) 0.316
  Fuzzy2   65.58 (12.9%)   66.81 
(11.3%)1.9% ( -19% -   29%) 0.624
   LowPhrase  145.74  (4.0%)  148.50  
(5.1%)1.9% (  -6% -   11%) 0.192
 MedTerm 1470.64  (7.1%) 1498.96  
(9.5%)1.9% ( -13% -   19%) 0.468
   OrHighNotHigh  562.56  (5.7%)  573.78  
(7.3%)2.0% ( -10% -   15%) 0.336
  Fuzzy1   95.47  (5.7%)   97.51  
(7.3%)2.1% ( -10% -   16%) 0.303

[jira] [Commented] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-29 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466709#comment-17466709
 ] 

Feng Guo commented on LUCENE-10334:
---

In order to save reading time, I deleted some previous progress comments and 
try to make a final summary here.

??one idea is we could try using the new block compression just for ordinals as 
a start??

Thanks [~rcmuir] for the suggestion! I made some optimizations in this approach 
and browse taxo tasks (Browse*TaxoFacets) are getting spped up too. So the 
benchmark is telling "dense faster sparse slower" instead of "SSDV faster Taxos 
slower" now. I suspect we probably did not see a SSDV regression just because 
we have not added reading sparse SSDV values tasks, e.g. a 
{{MedTermDaySSDVFacets}}.

 I've got two schemes in mind so far:

*ForUtil Approach*

This approach tends to make file format friendly to block decoding and do block 
decoding for each get. As a result we can get a rather delicious (130%) speed 
up in {{Browse*}} tasks. But we also get a  slight (10%) regression in tasks 
that reading facets with a query (like MedTermDayTaxoFacets) since we are 
reading sparse values there and we need to decompress the whole 128 values 
block even we only need one value in that block.

Here is the [code|https://github.com/apache/lucene/pull/562] luceneutil 
benchmark:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
 AndHighMedDayTaxoFacets   71.49  (2.1%)   64.72  
(2.0%)   -9.5% ( -13% -   -5%) 0.000
MedTermDayTaxoFacets   25.79  (2.6%)   24.00  
(1.8%)   -6.9% ( -11% -   -2%) 0.000
AndHighHighDayTaxoFacets   13.13  (3.4%)   12.63  
(3.1%)   -3.9% ( -10% -2%) 0.000
  OrHighMedDayTaxoFacets   13.71  (4.1%)   13.41  
(4.7%)   -2.2% ( -10% -6%) 0.118
PKLookup  204.87  (3.9%)  203.03  
(3.6%)   -0.9% (  -8% -6%) 0.450
 Prefix3  113.85  (3.6%)  113.32  
(4.6%)   -0.5% (  -8% -8%) 0.724
HighSpanNear   25.34  (2.5%)   25.26  
(3.1%)   -0.3% (  -5% -5%) 0.714
 LowSpanNear   55.96  (2.0%)   55.80  
(2.1%)   -0.3% (  -4% -3%) 0.658
 MedSpanNear   56.84  (2.4%)   56.90  
(2.2%)0.1% (  -4% -4%) 0.895
 MedSloppyPhrase   26.57  (1.8%)   26.60  
(1.9%)0.1% (  -3% -3%) 0.831
HighSloppyPhrase   30.20  (3.7%)   30.24  
(3.6%)0.2% (  -6% -7%) 0.890
   OrHighMed   49.96  (2.1%)   50.06  
(1.7%)0.2% (  -3% -4%) 0.742
  AndHighMed   96.70  (2.9%)   96.95  
(2.6%)0.3% (  -5% -5%) 0.772
 LowIntervalsOrdered   23.32  (4.6%)   23.38  
(4.5%)0.3% (  -8% -9%) 0.856
  OrHighHigh   38.09  (1.9%)   38.20  
(1.8%)0.3% (  -3% -4%) 0.643
  TermDTSort  128.55 (14.7%)  128.94 
(11.6%)0.3% ( -22% -   31%) 0.942
  Fuzzy1   99.54  (7.1%)   99.86  
(8.0%)0.3% ( -13% -   16%) 0.893
HighIntervalsOrdered   15.58  (2.6%)   15.65  
(2.6%)0.4% (  -4% -5%) 0.636
 Respell   63.96  (1.9%)   64.22  
(2.3%)0.4% (  -3% -4%) 0.542
   OrHighNotHigh  611.12  (5.8%)  613.85  
(6.2%)0.4% ( -10% -   13%) 0.814
 MedIntervalsOrdered   59.48  (5.2%)   59.75  
(5.1%)0.5% (  -9% -   11%) 0.780
 AndHighHigh   58.76  (3.0%)   59.16  
(3.0%)0.7% (  -5% -6%) 0.478
   OrNotHighHigh  619.53  (6.0%)  623.79  
(7.1%)0.7% ( -11% -   14%) 0.740
  HighPhrase   31.00  (2.5%)   31.26  
(2.7%)0.8% (  -4% -6%) 0.307
  AndHighLow  828.41  (5.9%)  835.65  
(7.1%)0.9% ( -11% -   14%) 0.672
OrNotHighLow  986.46  (6.8%)  995.13 
(10.5%)0.9% ( -15% -   19%) 0.752
HighTermTitleBDVSort  110.39 (12.3%)  111.38 
(11.1%)0.9% ( -20% -   27%) 0.807
  IntNRQ  151.29  (2.6%)  152.96  
(3.5%)1.1% (  -4% -7%) 0.262
 LowTerm 1876.18  (7.8%) 1897.19  
(8.3%)1.1% ( -13% -   18%) 0.660
   HighTermDayOfYearSort  108.34 (18.9%)  109.87 
(17.4%)1.4% ( -29% -   46%) 0.805
   HighTermMonthSort   65.84 (11.0%)   66.78

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-28 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465416#comment-17465416
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/29/21, 4:41 AM:
--

If we can not tolerate the regression, another idea coming to my mind to solve 
the regression is introducing a 'detect warm up' phase for 
{{{}DirectReader{}}}. As most of the usage of DirectReader in DocvaluesProducer 
is a forward reading, we can probably judge hits is dense/sparse by first 128 
#get, e.g. we can assume the reading is dense if we get more than 80% times in 
the first block, and choose block decoding for following gets if dense.

Here is the POC code: [https://github.com/apache/lucene/pull/570] and benchmark 
result:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  OrHighMedDayTaxoFacets   12.08  (5.6%)   11.85  
(4.4%)   -1.9% ( -11% -8%) 0.228
MedTermDayTaxoFacets   35.50  (2.9%)   35.09  
(2.1%)   -1.2% (  -5% -3%) 0.148
AndHighHighDayTaxoFacets   20.35  (2.5%)   20.18  
(2.2%)   -0.8% (  -5% -4%) 0.275
   BrowseMonthTaxoFacets   14.09 (12.4%)   13.99  
(7.2%)   -0.7% ( -18% -   21%) 0.817
 AndHighMedDayTaxoFacets  100.43  (2.2%)   99.96  
(2.2%)   -0.5% (  -4% -3%) 0.501
 LowIntervalsOrdered   31.96  (3.6%)   31.90  
(2.7%)   -0.2% (  -6% -6%) 0.853
HighIntervalsOrdered9.82  (4.8%)9.81  
(3.8%)   -0.1% (  -8% -8%) 0.925
   HighTermDayOfYearSort   58.36  (8.2%)   58.29  
(7.2%)   -0.1% ( -14% -   16%) 0.962
 MedIntervalsOrdered   16.33  (3.3%)   16.33  
(2.5%)   -0.0% (  -5% -6%) 0.967
HighTermTitleBDVSort   82.38 (11.9%)   82.52 
(13.2%)0.2% ( -22% -   28%) 0.966
HighSpanNear   38.08  (1.9%)   38.17  
(1.5%)0.2% (  -3% -3%) 0.687
 AndHighHigh   73.02  (4.1%)   73.20  
(4.4%)0.2% (  -7% -9%) 0.854
  OrHighHigh   38.67  (2.1%)   38.77  
(1.9%)0.3% (  -3% -4%) 0.669
 LowSloppyPhrase   48.05  (5.4%)   48.20  
(5.5%)0.3% ( -10% -   11%) 0.856
 MedSloppyPhrase   34.55  (2.7%)   34.66  
(2.6%)0.3% (  -4% -5%) 0.696
  TermDTSort  200.08 (11.2%)  200.74 
(11.3%)0.3% ( -19% -   25%) 0.926
   HighTermMonthSort  126.69 (11.4%)  127.18 
(11.7%)0.4% ( -20% -   26%) 0.917
HighSloppyPhrase   14.03  (3.5%)   14.09  
(3.7%)0.4% (  -6% -7%) 0.703
 MedSpanNear  103.61  (2.1%)  104.14  
(1.2%)0.5% (  -2% -3%) 0.332
  IntNRQ  126.16  (2.3%)  126.81  
(2.7%)0.5% (  -4% -5%) 0.508
  AndHighMed  164.27  (4.2%)  165.20  
(4.4%)0.6% (  -7% -9%) 0.676
 LowSpanNear  167.58  (2.7%)  168.63  
(2.6%)0.6% (  -4% -6%) 0.460
PKLookup  201.62  (3.8%)  203.05  
(4.7%)0.7% (  -7% -9%) 0.599
 Respell   73.56  (2.1%)   74.43  
(2.7%)1.2% (  -3% -6%) 0.121
   MedPhrase  266.51  (5.2%)  270.42  
(5.9%)1.5% (  -9% -   13%) 0.405
   OrHighMed  116.57  (4.0%)  118.30  
(3.3%)1.5% (  -5% -9%) 0.202
 Prefix3  136.44  (3.9%)  138.51  
(3.6%)1.5% (  -5% -9%) 0.204
OrNotHighMed  669.05  (5.3%)  679.79  
(7.7%)1.6% ( -10% -   15%) 0.443
OrNotHighLow  907.93  (5.8%)  922.66 
(10.1%)1.6% ( -13% -   18%) 0.533
Wildcard  146.59  (3.2%)  149.19  
(4.9%)1.8% (  -6% -   10%) 0.172
   OrHighLow  383.74  (8.5%)  390.67  
(8.0%)1.8% ( -13% -   20%) 0.489
  HighPhrase   96.06  (4.4%)   97.81  
(6.8%)1.8% (  -8% -   13%) 0.316
  Fuzzy2   65.58 (12.9%)   66.81 
(11.3%)1.9% ( -19% -   29%) 0.624
   LowPhrase  145.74  (4.0%)  148.50  
(5.1%)1.9% (  -6% -   11%) 0.192
 MedTerm 1470.64  (7.1%) 1498.96  
(9.5%)1.9% ( -13% -   19%) 0.468
   OrHighNotHigh  562.56  (5.7%)  573.78  
(7.3%)2.0% ( -10% -   15%) 0.336

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-28 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465416#comment-17465416
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/29/21, 4:40 AM:
--

If we can not tolerate the regression, another idea coming to my mind to solve 
the regression is introducing a 'detect warm up' phase for 
{{{}DirectReader{}}}. As most of the usage of DirectReader in DocvaluesProducer 
is a forward reading, we can probably judge hits is dense/sparse by first 128th 
#get, e.g. we can assume the reading is dense if we get more than 80% times in 
the first block, and choose block decoding for following gets if dense.

Here is the POC code: [https://github.com/apache/lucene/pull/570] and benchmark 
result:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  OrHighMedDayTaxoFacets   12.08  (5.6%)   11.85  
(4.4%)   -1.9% ( -11% -8%) 0.228
MedTermDayTaxoFacets   35.50  (2.9%)   35.09  
(2.1%)   -1.2% (  -5% -3%) 0.148
AndHighHighDayTaxoFacets   20.35  (2.5%)   20.18  
(2.2%)   -0.8% (  -5% -4%) 0.275
   BrowseMonthTaxoFacets   14.09 (12.4%)   13.99  
(7.2%)   -0.7% ( -18% -   21%) 0.817
 AndHighMedDayTaxoFacets  100.43  (2.2%)   99.96  
(2.2%)   -0.5% (  -4% -3%) 0.501
 LowIntervalsOrdered   31.96  (3.6%)   31.90  
(2.7%)   -0.2% (  -6% -6%) 0.853
HighIntervalsOrdered9.82  (4.8%)9.81  
(3.8%)   -0.1% (  -8% -8%) 0.925
   HighTermDayOfYearSort   58.36  (8.2%)   58.29  
(7.2%)   -0.1% ( -14% -   16%) 0.962
 MedIntervalsOrdered   16.33  (3.3%)   16.33  
(2.5%)   -0.0% (  -5% -6%) 0.967
HighTermTitleBDVSort   82.38 (11.9%)   82.52 
(13.2%)0.2% ( -22% -   28%) 0.966
HighSpanNear   38.08  (1.9%)   38.17  
(1.5%)0.2% (  -3% -3%) 0.687
 AndHighHigh   73.02  (4.1%)   73.20  
(4.4%)0.2% (  -7% -9%) 0.854
  OrHighHigh   38.67  (2.1%)   38.77  
(1.9%)0.3% (  -3% -4%) 0.669
 LowSloppyPhrase   48.05  (5.4%)   48.20  
(5.5%)0.3% ( -10% -   11%) 0.856
 MedSloppyPhrase   34.55  (2.7%)   34.66  
(2.6%)0.3% (  -4% -5%) 0.696
  TermDTSort  200.08 (11.2%)  200.74 
(11.3%)0.3% ( -19% -   25%) 0.926
   HighTermMonthSort  126.69 (11.4%)  127.18 
(11.7%)0.4% ( -20% -   26%) 0.917
HighSloppyPhrase   14.03  (3.5%)   14.09  
(3.7%)0.4% (  -6% -7%) 0.703
 MedSpanNear  103.61  (2.1%)  104.14  
(1.2%)0.5% (  -2% -3%) 0.332
  IntNRQ  126.16  (2.3%)  126.81  
(2.7%)0.5% (  -4% -5%) 0.508
  AndHighMed  164.27  (4.2%)  165.20  
(4.4%)0.6% (  -7% -9%) 0.676
 LowSpanNear  167.58  (2.7%)  168.63  
(2.6%)0.6% (  -4% -6%) 0.460
PKLookup  201.62  (3.8%)  203.05  
(4.7%)0.7% (  -7% -9%) 0.599
 Respell   73.56  (2.1%)   74.43  
(2.7%)1.2% (  -3% -6%) 0.121
   MedPhrase  266.51  (5.2%)  270.42  
(5.9%)1.5% (  -9% -   13%) 0.405
   OrHighMed  116.57  (4.0%)  118.30  
(3.3%)1.5% (  -5% -9%) 0.202
 Prefix3  136.44  (3.9%)  138.51  
(3.6%)1.5% (  -5% -9%) 0.204
OrNotHighMed  669.05  (5.3%)  679.79  
(7.7%)1.6% ( -10% -   15%) 0.443
OrNotHighLow  907.93  (5.8%)  922.66 
(10.1%)1.6% ( -13% -   18%) 0.533
Wildcard  146.59  (3.2%)  149.19  
(4.9%)1.8% (  -6% -   10%) 0.172
   OrHighLow  383.74  (8.5%)  390.67  
(8.0%)1.8% ( -13% -   20%) 0.489
  HighPhrase   96.06  (4.4%)   97.81  
(6.8%)1.8% (  -8% -   13%) 0.316
  Fuzzy2   65.58 (12.9%)   66.81 
(11.3%)1.9% ( -19% -   29%) 0.624
   LowPhrase  145.74  (4.0%)  148.50  
(5.1%)1.9% (  -6% -   11%) 0.192
 MedTerm 1470.64  (7.1%) 1498.96  
(9.5%)1.9% ( -13% -   19%) 0.468
   OrHighNotHigh  562.56  (5.7%)  573.78  
(7.3%)2.0% ( -10% -   15%)

[jira] [Commented] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-28 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466112#comment-17466112
 ] 

Feng Guo commented on LUCENE-10334:
---

In the 'detect warm up' approach, I unrolled the block decode codes, speed it 
up  a bit:

{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
MedTermDayTaxoFacets   55.17  (6.5%)   53.38  
(7.1%)   -3.2% ( -15% -   11%) 0.129
Wildcard  309.31 (12.9%)  299.80 
(12.6%)   -3.1% ( -25% -   25%) 0.446
OrNotHighLow  696.18  (8.7%)  677.46 
(10.1%)   -2.7% ( -19% -   17%) 0.367
HighTerm 1183.05  (9.5%) 1151.67  
(9.2%)   -2.7% ( -19% -   17%) 0.368
   OrHighMed  120.60  (7.0%)  117.55  
(7.8%)   -2.5% ( -16% -   13%) 0.279
  OrHighMedDayTaxoFacets9.46  (7.5%)9.25  
(6.7%)   -2.2% ( -15% -   12%) 0.320
 Prefix3  177.41  (7.3%)  173.96 
(10.3%)   -1.9% ( -18% -   16%) 0.489
AndHighHighDayTaxoFacets   28.81  (6.7%)   28.35  
(6.3%)   -1.6% ( -13% -   12%) 0.433
   BrowseMonthTaxoFacets   13.50 (13.4%)   13.30  
(6.6%)   -1.5% ( -18% -   21%) 0.658
 AndHighMedDayTaxoFacets   46.43  (7.8%)   45.75  
(7.3%)   -1.5% ( -15% -   14%) 0.540
   MedPhrase  360.70  (8.5%)  355.47  
(8.3%)   -1.4% ( -16% -   16%) 0.587
  AndHighMed  233.52  (6.7%)  230.19  
(7.0%)   -1.4% ( -14% -   13%) 0.510
HighTermTitleBDVSort   72.17 (16.9%)   71.14 
(15.4%)   -1.4% ( -28% -   37%) 0.780
OrHighNotMed  659.68  (9.7%)  650.38 
(12.6%)   -1.4% ( -21% -   23%) 0.691
  HighPhrase   73.05  (7.6%)   72.25  
(9.3%)   -1.1% ( -16% -   17%) 0.685
  TermDTSort  123.29 (15.5%)  122.10 
(13.8%)   -1.0% ( -26% -   33%) 0.835
  IntNRQ  167.75  (7.4%)  166.17  
(8.6%)   -0.9% ( -15% -   16%) 0.710
   OrHighNotHigh  890.84 (13.2%)  883.31 
(11.5%)   -0.8% ( -22% -   27%) 0.828
   OrHighLow  279.24  (7.5%)  276.97  
(6.6%)   -0.8% ( -13% -   14%) 0.718
PKLookup  198.13  (6.6%)  196.54  
(6.9%)   -0.8% ( -13% -   13%) 0.707
 MedSloppyPhrase   94.28  (8.0%)   93.55  
(6.5%)   -0.8% ( -14% -   14%) 0.737
  AndHighLow  574.70  (8.2%)  570.50  
(9.6%)   -0.7% ( -17% -   18%) 0.795
OrNotHighMed  717.34 (11.5%)  712.33 
(12.4%)   -0.7% ( -22% -   26%) 0.853
 AndHighHigh   61.26  (7.2%)   60.84  
(6.3%)   -0.7% ( -13% -   13%) 0.753
HighSloppyPhrase6.56  (6.5%)6.52  
(5.3%)   -0.7% ( -11% -   11%) 0.729
 LowSloppyPhrase  159.12  (6.9%)  158.23  
(6.5%)   -0.6% ( -13% -   13%) 0.794
   LowPhrase   88.55  (8.6%)   88.07  
(8.6%)   -0.5% ( -16% -   18%) 0.844
 MedSpanNear   14.63  (6.1%)   14.55  
(5.3%)   -0.5% ( -11% -   11%) 0.786
  Fuzzy2   24.31  (9.5%)   24.19  
(7.7%)   -0.5% ( -16% -   18%) 0.858
 MedTerm 1440.59  (9.8%) 1433.53 
(11.5%)   -0.5% ( -19% -   23%) 0.885
HighSpanNear   23.52  (6.1%)   23.40  
(5.9%)   -0.5% ( -11% -   12%) 0.797
  OrHighHigh   32.01  (8.4%)   31.96  
(5.7%)   -0.1% ( -13% -   15%) 0.948
  Fuzzy1   82.31 (11.7%)   82.30 
(13.5%)   -0.0% ( -22% -   28%) 0.998
OrHighNotLow  724.27  (9.5%)  724.70 
(10.1%)0.1% ( -17% -   21%) 0.985
   HighTermDayOfYearSort  156.00 (14.4%)  156.22 
(14.6%)0.1% ( -25% -   33%) 0.975
 Respell   68.40  (8.2%)   68.49  
(7.6%)0.1% ( -14% -   17%) 0.955
 MedIntervalsOrdered9.22  (7.4%)9.23  
(6.9%)0.2% ( -13% -   15%) 0.936
   OrNotHighHigh  571.66  (8.1%)  572.72 
(10.9%)0.2% ( -17% -   20%) 0.951
 LowSpanNear   82.39  (7.1%)   82.57  
(4.4%)0.2% ( -10% -   12%) 0.907
 LowTerm 1355.15 (10.2%) 1358.89 
(10.0%)0.3% ( -18% -   22%) 0.931
   HighTermMonthSort   58.72 (20.8%)

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-27 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465310#comment-17465310
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/28/21, 4:22 AM:
--

Thanks [~rcmuir] for suggestion! I tried some optimizations on this patch:

1. I replaced {{DirectWriter#unsignedBitsRequired}} with 
{{PackedInts#unsignedBitsRequired}} at first since ForUtil can support all bpv, 
this change can reduce some index size. But now i rollbacked this change since 
the decode of 1,2,4,8,12,16... could also be a bit faster in ForUtil.

2. {{ForUtil#decode}} will do a {{switch}} for each call, this can be avoided 
by the way like what we do in {{{}DirectReader{}}}, choose a implementation of 
an interface at the beginning. I applied this change in ForUtil.

I'm not sure which is the major optimization but the report seems better now:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
 AndHighMedDayTaxoFacets   71.49  (2.1%)   64.72  
(2.0%)   -9.5% ( -13% -   -5%) 0.000
MedTermDayTaxoFacets   25.79  (2.6%)   24.00  
(1.8%)   -6.9% ( -11% -   -2%) 0.000
AndHighHighDayTaxoFacets   13.13  (3.4%)   12.63  
(3.1%)   -3.9% ( -10% -2%) 0.000
  OrHighMedDayTaxoFacets   13.71  (4.1%)   13.41  
(4.7%)   -2.2% ( -10% -6%) 0.118
PKLookup  204.87  (3.9%)  203.03  
(3.6%)   -0.9% (  -8% -6%) 0.450
 Prefix3  113.85  (3.6%)  113.32  
(4.6%)   -0.5% (  -8% -8%) 0.724
HighSpanNear   25.34  (2.5%)   25.26  
(3.1%)   -0.3% (  -5% -5%) 0.714
 LowSpanNear   55.96  (2.0%)   55.80  
(2.1%)   -0.3% (  -4% -3%) 0.658
 MedSpanNear   56.84  (2.4%)   56.90  
(2.2%)0.1% (  -4% -4%) 0.895
 MedSloppyPhrase   26.57  (1.8%)   26.60  
(1.9%)0.1% (  -3% -3%) 0.831
HighSloppyPhrase   30.20  (3.7%)   30.24  
(3.6%)0.2% (  -6% -7%) 0.890
   OrHighMed   49.96  (2.1%)   50.06  
(1.7%)0.2% (  -3% -4%) 0.742
  AndHighMed   96.70  (2.9%)   96.95  
(2.6%)0.3% (  -5% -5%) 0.772
 LowIntervalsOrdered   23.32  (4.6%)   23.38  
(4.5%)0.3% (  -8% -9%) 0.856
  OrHighHigh   38.09  (1.9%)   38.20  
(1.8%)0.3% (  -3% -4%) 0.643
  TermDTSort  128.55 (14.7%)  128.94 
(11.6%)0.3% ( -22% -   31%) 0.942
  Fuzzy1   99.54  (7.1%)   99.86  
(8.0%)0.3% ( -13% -   16%) 0.893
HighIntervalsOrdered   15.58  (2.6%)   15.65  
(2.6%)0.4% (  -4% -5%) 0.636
 Respell   63.96  (1.9%)   64.22  
(2.3%)0.4% (  -3% -4%) 0.542
   OrHighNotHigh  611.12  (5.8%)  613.85  
(6.2%)0.4% ( -10% -   13%) 0.814
 MedIntervalsOrdered   59.48  (5.2%)   59.75  
(5.1%)0.5% (  -9% -   11%) 0.780
 AndHighHigh   58.76  (3.0%)   59.16  
(3.0%)0.7% (  -5% -6%) 0.478
   OrNotHighHigh  619.53  (6.0%)  623.79  
(7.1%)0.7% ( -11% -   14%) 0.740
  HighPhrase   31.00  (2.5%)   31.26  
(2.7%)0.8% (  -4% -6%) 0.307
  AndHighLow  828.41  (5.9%)  835.65  
(7.1%)0.9% ( -11% -   14%) 0.672
OrNotHighLow  986.46  (6.8%)  995.13 
(10.5%)0.9% ( -15% -   19%) 0.752
HighTermTitleBDVSort  110.39 (12.3%)  111.38 
(11.1%)0.9% ( -20% -   27%) 0.807
  IntNRQ  151.29  (2.6%)  152.96  
(3.5%)1.1% (  -4% -7%) 0.262
 LowTerm 1876.18  (7.8%) 1897.19  
(8.3%)1.1% ( -13% -   18%) 0.660
   HighTermDayOfYearSort  108.34 (18.9%)  109.87 
(17.4%)1.4% ( -29% -   46%) 0.805
   HighTermMonthSort   65.84 (11.0%)   66.78 
(11.7%)1.4% ( -19% -   27%) 0.689
OrHighNotMed  770.05  (5.3%)  782.54  
(8.8%)1.6% ( -11% -   16%) 0.480
Wildcard  182.10  (5.5%)  185.24  
(7.2%)1.7% ( -10% -   15%) 0.394
 LowSloppyPhrase   33.75  (6.6%)   34.35  
(8.8%)1.8% ( -12% -   18%) 0.478
   MedPhrase  161.57  (3.8%)  164.62  
(6.1%)1.9% (  -7%

[jira] [Updated] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-27 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10334:
--
Description: 
Previous talk is here: [https://github.com/apache/lucene/pull/557]

This is trying to add a new BlockReader based on ForUtil to replace the 
DirectReader we are using for NumericDocvalues

-*Benchmark based on wiki10m*- (Previous benchmark results are wrong so i 
deleted it to avoid misleading, let's see the benchmark in comments.)

  was:
Previous talk is here: https://github.com/apache/lucene/pull/557

This is trying to add a new BlockReader based on ForUtil to replace the 
DirectReader we are using for NumericDocvalues

*Benchmark based on wiki10m*

{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
   OrNotHighHigh  694.17  (8.2%)  685.83  
(7.0%)   -1.2% ( -15% -   15%) 0.618
 Respell   75.15  (2.7%)   74.32  
(2.0%)   -1.1% (  -5% -3%) 0.146
 Prefix3  220.11  (5.1%)  217.78  
(5.8%)   -1.1% ( -11% -   10%) 0.541
Wildcard  129.75  (3.7%)  128.63  
(2.5%)   -0.9% (  -6% -5%) 0.383
 LowSpanNear   68.54  (2.1%)   68.00  
(2.4%)   -0.8% (  -5% -3%) 0.269
OrNotHighMed  732.90  (6.8%)  727.49  
(5.3%)   -0.7% ( -12% -   12%) 0.703
 BrowseRandomLabelTaxoFacets11879.03  (8.6%)11799.33  
(5.5%)   -0.7% ( -13% -   14%) 0.769
HighSloppyPhrase6.87  (2.9%)6.83  
(2.3%)   -0.6% (  -5% -4%) 0.496
OrHighNotMed  827.54  (9.2%)  822.94  
(8.0%)   -0.6% ( -16% -   18%) 0.838
 MedSpanNear   18.92  (5.7%)   18.82  
(5.6%)   -0.5% ( -11% -   11%) 0.759
  OrHighMedDayTaxoFacets   10.27  (4.0%)   10.21  
(4.3%)   -0.5% (  -8% -8%) 0.676
PKLookup  207.98  (4.0%)  206.85  
(2.7%)   -0.5% (  -7% -6%) 0.621
 LowIntervalsOrdered  159.17  (2.3%)  158.32  
(2.2%)   -0.5% (  -4% -3%) 0.445
HighSpanNear6.32  (4.2%)6.28  
(4.1%)   -0.5% (  -8% -8%) 0.691
 MedIntervalsOrdered   85.31  (3.2%)   84.88  
(2.9%)   -0.5% (  -6% -5%) 0.607
HighTerm 1170.55  (5.8%) 1164.79  
(3.9%)   -0.5% (  -9% -9%) 0.753
 LowSloppyPhrase   14.54  (3.1%)   14.48  
(2.9%)   -0.4% (  -6% -5%) 0.651
  HighPhrase  112.81  (4.4%)  112.39  
(4.1%)   -0.4% (  -8% -8%) 0.781
OrNotHighLow  858.02  (5.9%)  854.99  
(4.8%)   -0.4% ( -10% -   10%) 0.835
HighIntervalsOrdered   25.08  (2.8%)   25.00  
(2.6%)   -0.3% (  -5% -5%) 0.701
   MedPhrase   27.20  (2.1%)   27.11  
(2.9%)   -0.3% (  -5% -4%) 0.689
MedTermDayTaxoFacets   81.55  (2.3%)   81.35  
(2.9%)   -0.3% (  -5% -5%) 0.762
  IntNRQ   63.36  (2.0%)   63.21  
(2.5%)   -0.2% (  -4% -4%) 0.740
  Fuzzy2   73.24  (5.5%)   73.10  
(6.2%)   -0.2% ( -11% -   12%) 0.916
 AndHighMedDayTaxoFacets   76.08  (3.5%)   75.98  
(3.4%)   -0.1% (  -6% -7%) 0.905
 AndHighHigh   62.20  (2.0%)   62.18  
(2.4%)   -0.0% (  -4% -4%) 0.954
   BrowseMonthTaxoFacets11993.48  (6.7%)11989.53  
(4.8%)   -0.0% ( -10% -   12%) 0.986
OrHighNotLow  732.82  (7.2%)  732.80  
(6.2%)   -0.0% ( -12% -   14%) 0.999
  Fuzzy1   46.43  (5.3%)   46.45  
(6.0%)0.0% ( -10% -   11%) 0.989
 LowTerm 1608.25  (6.0%) 1608.84  
(4.9%)0.0% ( -10% -   11%) 0.983
   OrHighMed   75.90  (2.3%)   75.93  
(1.8%)0.0% (  -3% -4%) 0.939
   LowPhrase  273.81  (2.9%)  274.04  
(3.3%)0.1% (  -5% -6%) 0.932
  AndHighLow  717.24  (6.1%)  718.17  
(3.3%)0.1% (  -8% -   10%) 0.933
AndHighHighDayTaxoFacets   39.63  (2.5%)   39.69  
(2.6%)0.1% (  -4% -5%) 0.862
  OrHighHigh   34.63  (1.8%)   34.68  
(2.0%)0.1% (  -3% -4%) 0.821
 MedSloppyPhrase  158.80  (2.8%)  159.09  
(2.6%)0.2% (  -5% -5%) 0.832
   OrHighLow  257.77  (2.9%)

[jira] [Created] (LUCENE-10343) Remove MyRandom in favor of test framework random

2021-12-27 Thread Feng Guo (Jira)

Feng Guo created LUCENE-10343:
-

 Summary: Remove MyRandom in favor of test framework random
 Key: LUCENE-10343
 URL: https://issues.apache.org/jira/browse/LUCENE-10343
 Project: Lucene - Core
  Issue Type: Test
Reporter: Feng Guo






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-27 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465416#comment-17465416
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/27/21, 2:30 PM:
--

If we can not tolerate the regression, another idea coming to my mind to solve 
the regression is introducing a 'detect warm up' phase for 
{{{}DirectReader{}}}. As most of the usage of DirectReader in DocvaluesProducer 
is a forward reading, we can probably judge hits is dense/sparse by first 128th 
#get, e.g. we can say the reading is dense if
{code:java}
128th reading index - 1st reading index <= 128 * 1.5
{code}
And choose block decoding for following gets if dense.

Here is the POC code: [https://github.com/apache/lucene/pull/570] and benchmark 
result:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  OrHighMedDayTaxoFacets   12.08  (5.6%)   11.85  
(4.4%)   -1.9% ( -11% -8%) 0.228
MedTermDayTaxoFacets   35.50  (2.9%)   35.09  
(2.1%)   -1.2% (  -5% -3%) 0.148
AndHighHighDayTaxoFacets   20.35  (2.5%)   20.18  
(2.2%)   -0.8% (  -5% -4%) 0.275
   BrowseMonthTaxoFacets   14.09 (12.4%)   13.99  
(7.2%)   -0.7% ( -18% -   21%) 0.817
 AndHighMedDayTaxoFacets  100.43  (2.2%)   99.96  
(2.2%)   -0.5% (  -4% -3%) 0.501
 LowIntervalsOrdered   31.96  (3.6%)   31.90  
(2.7%)   -0.2% (  -6% -6%) 0.853
HighIntervalsOrdered9.82  (4.8%)9.81  
(3.8%)   -0.1% (  -8% -8%) 0.925
   HighTermDayOfYearSort   58.36  (8.2%)   58.29  
(7.2%)   -0.1% ( -14% -   16%) 0.962
 MedIntervalsOrdered   16.33  (3.3%)   16.33  
(2.5%)   -0.0% (  -5% -6%) 0.967
HighTermTitleBDVSort   82.38 (11.9%)   82.52 
(13.2%)0.2% ( -22% -   28%) 0.966
HighSpanNear   38.08  (1.9%)   38.17  
(1.5%)0.2% (  -3% -3%) 0.687
 AndHighHigh   73.02  (4.1%)   73.20  
(4.4%)0.2% (  -7% -9%) 0.854
  OrHighHigh   38.67  (2.1%)   38.77  
(1.9%)0.3% (  -3% -4%) 0.669
 LowSloppyPhrase   48.05  (5.4%)   48.20  
(5.5%)0.3% ( -10% -   11%) 0.856
 MedSloppyPhrase   34.55  (2.7%)   34.66  
(2.6%)0.3% (  -4% -5%) 0.696
  TermDTSort  200.08 (11.2%)  200.74 
(11.3%)0.3% ( -19% -   25%) 0.926
   HighTermMonthSort  126.69 (11.4%)  127.18 
(11.7%)0.4% ( -20% -   26%) 0.917
HighSloppyPhrase   14.03  (3.5%)   14.09  
(3.7%)0.4% (  -6% -7%) 0.703
 MedSpanNear  103.61  (2.1%)  104.14  
(1.2%)0.5% (  -2% -3%) 0.332
  IntNRQ  126.16  (2.3%)  126.81  
(2.7%)0.5% (  -4% -5%) 0.508
  AndHighMed  164.27  (4.2%)  165.20  
(4.4%)0.6% (  -7% -9%) 0.676
 LowSpanNear  167.58  (2.7%)  168.63  
(2.6%)0.6% (  -4% -6%) 0.460
PKLookup  201.62  (3.8%)  203.05  
(4.7%)0.7% (  -7% -9%) 0.599
 Respell   73.56  (2.1%)   74.43  
(2.7%)1.2% (  -3% -6%) 0.121
   MedPhrase  266.51  (5.2%)  270.42  
(5.9%)1.5% (  -9% -   13%) 0.405
   OrHighMed  116.57  (4.0%)  118.30  
(3.3%)1.5% (  -5% -9%) 0.202
 Prefix3  136.44  (3.9%)  138.51  
(3.6%)1.5% (  -5% -9%) 0.204
OrNotHighMed  669.05  (5.3%)  679.79  
(7.7%)1.6% ( -10% -   15%) 0.443
OrNotHighLow  907.93  (5.8%)  922.66 
(10.1%)1.6% ( -13% -   18%) 0.533
Wildcard  146.59  (3.2%)  149.19  
(4.9%)1.8% (  -6% -   10%) 0.172
   OrHighLow  383.74  (8.5%)  390.67  
(8.0%)1.8% ( -13% -   20%) 0.489
  HighPhrase   96.06  (4.4%)   97.81  
(6.8%)1.8% (  -8% -   13%) 0.316
  Fuzzy2   65.58 (12.9%)   66.81 
(11.3%)1.9% ( -19% -   29%) 0.624
   LowPhrase  145.74  (4.0%)  148.50  
(5.1%)1.9% (  -6% -   11%) 0.192
 MedTerm 1470.64  (7.1%) 1498.96  
(9.5%)1.9% ( -13% -   19%) 0.468
   OrHighNotHigh  562.56  (5.7%)  573.78  
(7.3%)2.0%

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-27 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465416#comment-17465416
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/27/21, 11:42 AM:
---

If we can not tolerate the regression, another idea coming to my mind to solve 
the regression is introducing a 'detect warm up' phase for 
{{{}DirectReader{}}}. As most of the usage of DirectReader in DocvaluesProducer 
is a forward reading, we can probably judge hits is dense/sparse by first 128th 
#get, e.g. we can say the reading is dense if
{code:java}
128th reading index - 1st reading index <= 128 * 1.5
{code}
And we can choose block decoding if dense.

Here is the POC code: [https://github.com/apache/lucene/pull/570] and benchmark 
result:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  OrHighMedDayTaxoFacets   12.08  (5.6%)   11.85  
(4.4%)   -1.9% ( -11% -8%) 0.228
MedTermDayTaxoFacets   35.50  (2.9%)   35.09  
(2.1%)   -1.2% (  -5% -3%) 0.148
AndHighHighDayTaxoFacets   20.35  (2.5%)   20.18  
(2.2%)   -0.8% (  -5% -4%) 0.275
   BrowseMonthTaxoFacets   14.09 (12.4%)   13.99  
(7.2%)   -0.7% ( -18% -   21%) 0.817
 AndHighMedDayTaxoFacets  100.43  (2.2%)   99.96  
(2.2%)   -0.5% (  -4% -3%) 0.501
 LowIntervalsOrdered   31.96  (3.6%)   31.90  
(2.7%)   -0.2% (  -6% -6%) 0.853
HighIntervalsOrdered9.82  (4.8%)9.81  
(3.8%)   -0.1% (  -8% -8%) 0.925
   HighTermDayOfYearSort   58.36  (8.2%)   58.29  
(7.2%)   -0.1% ( -14% -   16%) 0.962
 MedIntervalsOrdered   16.33  (3.3%)   16.33  
(2.5%)   -0.0% (  -5% -6%) 0.967
HighTermTitleBDVSort   82.38 (11.9%)   82.52 
(13.2%)0.2% ( -22% -   28%) 0.966
HighSpanNear   38.08  (1.9%)   38.17  
(1.5%)0.2% (  -3% -3%) 0.687
 AndHighHigh   73.02  (4.1%)   73.20  
(4.4%)0.2% (  -7% -9%) 0.854
  OrHighHigh   38.67  (2.1%)   38.77  
(1.9%)0.3% (  -3% -4%) 0.669
 LowSloppyPhrase   48.05  (5.4%)   48.20  
(5.5%)0.3% ( -10% -   11%) 0.856
 MedSloppyPhrase   34.55  (2.7%)   34.66  
(2.6%)0.3% (  -4% -5%) 0.696
  TermDTSort  200.08 (11.2%)  200.74 
(11.3%)0.3% ( -19% -   25%) 0.926
   HighTermMonthSort  126.69 (11.4%)  127.18 
(11.7%)0.4% ( -20% -   26%) 0.917
HighSloppyPhrase   14.03  (3.5%)   14.09  
(3.7%)0.4% (  -6% -7%) 0.703
 MedSpanNear  103.61  (2.1%)  104.14  
(1.2%)0.5% (  -2% -3%) 0.332
  IntNRQ  126.16  (2.3%)  126.81  
(2.7%)0.5% (  -4% -5%) 0.508
  AndHighMed  164.27  (4.2%)  165.20  
(4.4%)0.6% (  -7% -9%) 0.676
 LowSpanNear  167.58  (2.7%)  168.63  
(2.6%)0.6% (  -4% -6%) 0.460
PKLookup  201.62  (3.8%)  203.05  
(4.7%)0.7% (  -7% -9%) 0.599
 Respell   73.56  (2.1%)   74.43  
(2.7%)1.2% (  -3% -6%) 0.121
   MedPhrase  266.51  (5.2%)  270.42  
(5.9%)1.5% (  -9% -   13%) 0.405
   OrHighMed  116.57  (4.0%)  118.30  
(3.3%)1.5% (  -5% -9%) 0.202
 Prefix3  136.44  (3.9%)  138.51  
(3.6%)1.5% (  -5% -9%) 0.204
OrNotHighMed  669.05  (5.3%)  679.79  
(7.7%)1.6% ( -10% -   15%) 0.443
OrNotHighLow  907.93  (5.8%)  922.66 
(10.1%)1.6% ( -13% -   18%) 0.533
Wildcard  146.59  (3.2%)  149.19  
(4.9%)1.8% (  -6% -   10%) 0.172
   OrHighLow  383.74  (8.5%)  390.67  
(8.0%)1.8% ( -13% -   20%) 0.489
  HighPhrase   96.06  (4.4%)   97.81  
(6.8%)1.8% (  -8% -   13%) 0.316
  Fuzzy2   65.58 (12.9%)   66.81 
(11.3%)1.9% ( -19% -   29%) 0.624
   LowPhrase  145.74  (4.0%)  148.50  
(5.1%)1.9% (  -6% -   11%) 0.192
 MedTerm 1470.64  (7.1%) 1498.96  
(9.5%)1.9% ( -13% -   19%) 0.468
   OrHighNotHigh  562.56  (5.7%)  573.78  
(7.3%)2.0% ( -10% -

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-27 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465416#comment-17465416
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/27/21, 9:35 AM:
--

Another idea coming to my mind to solve the regression is introducing a 'detect 
warm up' phase for {{{}DirectReader{}}}. As most of the usage of DirectReader 
in DocvaluesProducer is a forward reading, we can probably judge hits is 
dense/sparse by first 128th #get, e.g. we can say the reading is dense if

{code:java}
128th reading index - 1st reading index <= 128 * 1.5
{code}

And we can choose block decoding if dense.

This way could be an alternative if we can not accept the regression of the 
ForUtil patch, here is the POC code: 
[https://github.com/apache/lucene/pull/570] and benchmark result:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  OrHighMedDayTaxoFacets   12.08  (5.6%)   11.85  
(4.4%)   -1.9% ( -11% -8%) 0.228
MedTermDayTaxoFacets   35.50  (2.9%)   35.09  
(2.1%)   -1.2% (  -5% -3%) 0.148
AndHighHighDayTaxoFacets   20.35  (2.5%)   20.18  
(2.2%)   -0.8% (  -5% -4%) 0.275
   BrowseMonthTaxoFacets   14.09 (12.4%)   13.99  
(7.2%)   -0.7% ( -18% -   21%) 0.817
 AndHighMedDayTaxoFacets  100.43  (2.2%)   99.96  
(2.2%)   -0.5% (  -4% -3%) 0.501
 LowIntervalsOrdered   31.96  (3.6%)   31.90  
(2.7%)   -0.2% (  -6% -6%) 0.853
HighIntervalsOrdered9.82  (4.8%)9.81  
(3.8%)   -0.1% (  -8% -8%) 0.925
   HighTermDayOfYearSort   58.36  (8.2%)   58.29  
(7.2%)   -0.1% ( -14% -   16%) 0.962
 MedIntervalsOrdered   16.33  (3.3%)   16.33  
(2.5%)   -0.0% (  -5% -6%) 0.967
HighTermTitleBDVSort   82.38 (11.9%)   82.52 
(13.2%)0.2% ( -22% -   28%) 0.966
HighSpanNear   38.08  (1.9%)   38.17  
(1.5%)0.2% (  -3% -3%) 0.687
 AndHighHigh   73.02  (4.1%)   73.20  
(4.4%)0.2% (  -7% -9%) 0.854
  OrHighHigh   38.67  (2.1%)   38.77  
(1.9%)0.3% (  -3% -4%) 0.669
 LowSloppyPhrase   48.05  (5.4%)   48.20  
(5.5%)0.3% ( -10% -   11%) 0.856
 MedSloppyPhrase   34.55  (2.7%)   34.66  
(2.6%)0.3% (  -4% -5%) 0.696
  TermDTSort  200.08 (11.2%)  200.74 
(11.3%)0.3% ( -19% -   25%) 0.926
   HighTermMonthSort  126.69 (11.4%)  127.18 
(11.7%)0.4% ( -20% -   26%) 0.917
HighSloppyPhrase   14.03  (3.5%)   14.09  
(3.7%)0.4% (  -6% -7%) 0.703
 MedSpanNear  103.61  (2.1%)  104.14  
(1.2%)0.5% (  -2% -3%) 0.332
  IntNRQ  126.16  (2.3%)  126.81  
(2.7%)0.5% (  -4% -5%) 0.508
  AndHighMed  164.27  (4.2%)  165.20  
(4.4%)0.6% (  -7% -9%) 0.676
 LowSpanNear  167.58  (2.7%)  168.63  
(2.6%)0.6% (  -4% -6%) 0.460
PKLookup  201.62  (3.8%)  203.05  
(4.7%)0.7% (  -7% -9%) 0.599
 Respell   73.56  (2.1%)   74.43  
(2.7%)1.2% (  -3% -6%) 0.121
   MedPhrase  266.51  (5.2%)  270.42  
(5.9%)1.5% (  -9% -   13%) 0.405
   OrHighMed  116.57  (4.0%)  118.30  
(3.3%)1.5% (  -5% -9%) 0.202
 Prefix3  136.44  (3.9%)  138.51  
(3.6%)1.5% (  -5% -9%) 0.204
OrNotHighMed  669.05  (5.3%)  679.79  
(7.7%)1.6% ( -10% -   15%) 0.443
OrNotHighLow  907.93  (5.8%)  922.66 
(10.1%)1.6% ( -13% -   18%) 0.533
Wildcard  146.59  (3.2%)  149.19  
(4.9%)1.8% (  -6% -   10%) 0.172
   OrHighLow  383.74  (8.5%)  390.67  
(8.0%)1.8% ( -13% -   20%) 0.489
  HighPhrase   96.06  (4.4%)   97.81  
(6.8%)1.8% (  -8% -   13%) 0.316
  Fuzzy2   65.58 (12.9%)   66.81 
(11.3%)1.9% ( -19% -   29%) 0.624
   LowPhrase  145.74  (4.0%)  148.50  
(5.1%)1.9% (  -6% -   11%) 0.192
 MedTerm 1470.64  (7.1%) 1498.96  
(9.5%)1.9% ( -13% -   19%) 0.468
   OrHighNotHigh  562.56

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-26 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465416#comment-17465416
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/26/21, 6:09 PM:
--

Another idea coming to my mind to solve the regression is introducing a 'detect 
warm up' phase for {{{}DirectReader{}}}. As most of the usage of DirectReader 
in DocvaluesProducer is a forward reading, we can probably judge hits is 
dense/sparse by first 128th #get, e.g. we can say the reading is dense if 
{{{}128th index - 1st index <= 128 * 1.5{}}}. And if dense, we can do some 
block decoding there.

This way could be an alternative if we can not accept the regression of the 
ForUtil patch, here is the POC code: 
[https://github.com/apache/lucene/pull/570] and benchmark result:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  OrHighMedDayTaxoFacets   12.08  (5.6%)   11.85  
(4.4%)   -1.9% ( -11% -8%) 0.228
MedTermDayTaxoFacets   35.50  (2.9%)   35.09  
(2.1%)   -1.2% (  -5% -3%) 0.148
AndHighHighDayTaxoFacets   20.35  (2.5%)   20.18  
(2.2%)   -0.8% (  -5% -4%) 0.275
   BrowseMonthTaxoFacets   14.09 (12.4%)   13.99  
(7.2%)   -0.7% ( -18% -   21%) 0.817
 AndHighMedDayTaxoFacets  100.43  (2.2%)   99.96  
(2.2%)   -0.5% (  -4% -3%) 0.501
 LowIntervalsOrdered   31.96  (3.6%)   31.90  
(2.7%)   -0.2% (  -6% -6%) 0.853
HighIntervalsOrdered9.82  (4.8%)9.81  
(3.8%)   -0.1% (  -8% -8%) 0.925
   HighTermDayOfYearSort   58.36  (8.2%)   58.29  
(7.2%)   -0.1% ( -14% -   16%) 0.962
 MedIntervalsOrdered   16.33  (3.3%)   16.33  
(2.5%)   -0.0% (  -5% -6%) 0.967
HighTermTitleBDVSort   82.38 (11.9%)   82.52 
(13.2%)0.2% ( -22% -   28%) 0.966
HighSpanNear   38.08  (1.9%)   38.17  
(1.5%)0.2% (  -3% -3%) 0.687
 AndHighHigh   73.02  (4.1%)   73.20  
(4.4%)0.2% (  -7% -9%) 0.854
  OrHighHigh   38.67  (2.1%)   38.77  
(1.9%)0.3% (  -3% -4%) 0.669
 LowSloppyPhrase   48.05  (5.4%)   48.20  
(5.5%)0.3% ( -10% -   11%) 0.856
 MedSloppyPhrase   34.55  (2.7%)   34.66  
(2.6%)0.3% (  -4% -5%) 0.696
  TermDTSort  200.08 (11.2%)  200.74 
(11.3%)0.3% ( -19% -   25%) 0.926
   HighTermMonthSort  126.69 (11.4%)  127.18 
(11.7%)0.4% ( -20% -   26%) 0.917
HighSloppyPhrase   14.03  (3.5%)   14.09  
(3.7%)0.4% (  -6% -7%) 0.703
 MedSpanNear  103.61  (2.1%)  104.14  
(1.2%)0.5% (  -2% -3%) 0.332
  IntNRQ  126.16  (2.3%)  126.81  
(2.7%)0.5% (  -4% -5%) 0.508
  AndHighMed  164.27  (4.2%)  165.20  
(4.4%)0.6% (  -7% -9%) 0.676
 LowSpanNear  167.58  (2.7%)  168.63  
(2.6%)0.6% (  -4% -6%) 0.460
PKLookup  201.62  (3.8%)  203.05  
(4.7%)0.7% (  -7% -9%) 0.599
 Respell   73.56  (2.1%)   74.43  
(2.7%)1.2% (  -3% -6%) 0.121
   MedPhrase  266.51  (5.2%)  270.42  
(5.9%)1.5% (  -9% -   13%) 0.405
   OrHighMed  116.57  (4.0%)  118.30  
(3.3%)1.5% (  -5% -9%) 0.202
 Prefix3  136.44  (3.9%)  138.51  
(3.6%)1.5% (  -5% -9%) 0.204
OrNotHighMed  669.05  (5.3%)  679.79  
(7.7%)1.6% ( -10% -   15%) 0.443
OrNotHighLow  907.93  (5.8%)  922.66 
(10.1%)1.6% ( -13% -   18%) 0.533
Wildcard  146.59  (3.2%)  149.19  
(4.9%)1.8% (  -6% -   10%) 0.172
   OrHighLow  383.74  (8.5%)  390.67  
(8.0%)1.8% ( -13% -   20%) 0.489
  HighPhrase   96.06  (4.4%)   97.81  
(6.8%)1.8% (  -8% -   13%) 0.316
  Fuzzy2   65.58 (12.9%)   66.81 
(11.3%)1.9% ( -19% -   29%) 0.624
   LowPhrase  145.74  (4.0%)  148.50  
(5.1%)1.9% (  -6% -   11%) 0.192
 MedTerm 1470.64  (7.1%) 1498.96  
(9.5%)1.9% ( -13% -   19%) 0.468
   OrHighNotHigh  562.56  (5.7%)

[jira] [Commented] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-26 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465416#comment-17465416
 ] 

Feng Guo commented on LUCENE-10334:
---

Another idea coming to my mind to solve the regression is introducing a 'detect 
warm up' phase for {{DirectReader}}. As most of the usage of DirectReader in 
DocvaluesProducer is a forward reading, we can probably judge hits is 
dense/sparse by first 128th #get, e.g. we can say the reading is dense if 
{{128th index - 1st index <= 128 * 1.5}}. And if dense, we can do some block 
decoding there. 

This way could be an alternative if we can not accept the regression of the 
ForUtil patch, here is the POC code: https://github.com/apache/lucene/pull/570 
and benchmark result:

{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  OrHighMedDayTaxoFacets   12.08  (5.6%)   11.85  
(4.4%)   -1.9% ( -11% -8%) 0.228
MedTermDayTaxoFacets   35.50  (2.9%)   35.09  
(2.1%)   -1.2% (  -5% -3%) 0.148
AndHighHighDayTaxoFacets   20.35  (2.5%)   20.18  
(2.2%)   -0.8% (  -5% -4%) 0.275
   BrowseMonthTaxoFacets   14.09 (12.4%)   13.99  
(7.2%)   -0.7% ( -18% -   21%) 0.817
 AndHighMedDayTaxoFacets  100.43  (2.2%)   99.96  
(2.2%)   -0.5% (  -4% -3%) 0.501
 LowIntervalsOrdered   31.96  (3.6%)   31.90  
(2.7%)   -0.2% (  -6% -6%) 0.853
HighIntervalsOrdered9.82  (4.8%)9.81  
(3.8%)   -0.1% (  -8% -8%) 0.925
   HighTermDayOfYearSort   58.36  (8.2%)   58.29  
(7.2%)   -0.1% ( -14% -   16%) 0.962
 MedIntervalsOrdered   16.33  (3.3%)   16.33  
(2.5%)   -0.0% (  -5% -6%) 0.967
HighTermTitleBDVSort   82.38 (11.9%)   82.52 
(13.2%)0.2% ( -22% -   28%) 0.966
HighSpanNear   38.08  (1.9%)   38.17  
(1.5%)0.2% (  -3% -3%) 0.687
 AndHighHigh   73.02  (4.1%)   73.20  
(4.4%)0.2% (  -7% -9%) 0.854
  OrHighHigh   38.67  (2.1%)   38.77  
(1.9%)0.3% (  -3% -4%) 0.669
 LowSloppyPhrase   48.05  (5.4%)   48.20  
(5.5%)0.3% ( -10% -   11%) 0.856
 MedSloppyPhrase   34.55  (2.7%)   34.66  
(2.6%)0.3% (  -4% -5%) 0.696
  TermDTSort  200.08 (11.2%)  200.74 
(11.3%)0.3% ( -19% -   25%) 0.926
   HighTermMonthSort  126.69 (11.4%)  127.18 
(11.7%)0.4% ( -20% -   26%) 0.917
HighSloppyPhrase   14.03  (3.5%)   14.09  
(3.7%)0.4% (  -6% -7%) 0.703
 MedSpanNear  103.61  (2.1%)  104.14  
(1.2%)0.5% (  -2% -3%) 0.332
  IntNRQ  126.16  (2.3%)  126.81  
(2.7%)0.5% (  -4% -5%) 0.508
  AndHighMed  164.27  (4.2%)  165.20  
(4.4%)0.6% (  -7% -9%) 0.676
 LowSpanNear  167.58  (2.7%)  168.63  
(2.6%)0.6% (  -4% -6%) 0.460
PKLookup  201.62  (3.8%)  203.05  
(4.7%)0.7% (  -7% -9%) 0.599
 Respell   73.56  (2.1%)   74.43  
(2.7%)1.2% (  -3% -6%) 0.121
   MedPhrase  266.51  (5.2%)  270.42  
(5.9%)1.5% (  -9% -   13%) 0.405
   OrHighMed  116.57  (4.0%)  118.30  
(3.3%)1.5% (  -5% -9%) 0.202
 Prefix3  136.44  (3.9%)  138.51  
(3.6%)1.5% (  -5% -9%) 0.204
OrNotHighMed  669.05  (5.3%)  679.79  
(7.7%)1.6% ( -10% -   15%) 0.443
OrNotHighLow  907.93  (5.8%)  922.66 
(10.1%)1.6% ( -13% -   18%) 0.533
Wildcard  146.59  (3.2%)  149.19  
(4.9%)1.8% (  -6% -   10%) 0.172
   OrHighLow  383.74  (8.5%)  390.67  
(8.0%)1.8% ( -13% -   20%) 0.489
  HighPhrase   96.06  (4.4%)   97.81  
(6.8%)1.8% (  -8% -   13%) 0.316
  Fuzzy2   65.58 (12.9%)   66.81 
(11.3%)1.9% ( -19% -   29%) 0.624
   LowPhrase  145.74  (4.0%)  148.50  
(5.1%)1.9% (  -6% -   11%) 0.192
 MedTerm 1470.64  (7.1%) 1498.96  
(9.5%)1.9% ( -13% -   19%) 0.468
   OrHighNotHigh  562.56  (5.7%)  573.78  
(7.3%)2.0% ( -10% -   15%) 0.336

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-26 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465310#comment-17465310
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/26/21, 8:52 AM:
--

Thanks [~rcmuir] for suggestion! I tried some optimizations on this patch:

1. I replaced {{DirectWriter#unsignedBitsRequired}} with 
{{PackedInts#unsignedBitsRequired}} at first since ForUtil can support all bpv, 
this change can reduce some index size. But now i rollbacked this change since 
the decode of 1,2,4,8,12,16... could also be a bit faster in ForUtil.

2. {{ForUtil#decode}} will do a {{switch}} for each call, this can be avoided 
by the way like what we do in {{{}DirectReader{}}}, choose a implementation of 
an interface at the beginning. I applied this change in ForUtil.

I'm not sure which is the major optimization but the report seems better now:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
 AndHighMedDayTaxoFacets   71.49  (2.1%)   64.72  
(2.0%)   -9.5% ( -13% -   -5%) 0.000
MedTermDayTaxoFacets   25.79  (2.6%)   24.00  
(1.8%)   -6.9% ( -11% -   -2%) 0.000
AndHighHighDayTaxoFacets   13.13  (3.4%)   12.63  
(3.1%)   -3.9% ( -10% -2%) 0.000
  OrHighMedDayTaxoFacets   13.71  (4.1%)   13.41  
(4.7%)   -2.2% ( -10% -6%) 0.118
PKLookup  204.87  (3.9%)  203.03  
(3.6%)   -0.9% (  -8% -6%) 0.450
 Prefix3  113.85  (3.6%)  113.32  
(4.6%)   -0.5% (  -8% -8%) 0.724
HighSpanNear   25.34  (2.5%)   25.26  
(3.1%)   -0.3% (  -5% -5%) 0.714
 LowSpanNear   55.96  (2.0%)   55.80  
(2.1%)   -0.3% (  -4% -3%) 0.658
 MedSpanNear   56.84  (2.4%)   56.90  
(2.2%)0.1% (  -4% -4%) 0.895
 MedSloppyPhrase   26.57  (1.8%)   26.60  
(1.9%)0.1% (  -3% -3%) 0.831
HighSloppyPhrase   30.20  (3.7%)   30.24  
(3.6%)0.2% (  -6% -7%) 0.890
   OrHighMed   49.96  (2.1%)   50.06  
(1.7%)0.2% (  -3% -4%) 0.742
  AndHighMed   96.70  (2.9%)   96.95  
(2.6%)0.3% (  -5% -5%) 0.772
 LowIntervalsOrdered   23.32  (4.6%)   23.38  
(4.5%)0.3% (  -8% -9%) 0.856
  OrHighHigh   38.09  (1.9%)   38.20  
(1.8%)0.3% (  -3% -4%) 0.643
  TermDTSort  128.55 (14.7%)  128.94 
(11.6%)0.3% ( -22% -   31%) 0.942
  Fuzzy1   99.54  (7.1%)   99.86  
(8.0%)0.3% ( -13% -   16%) 0.893
HighIntervalsOrdered   15.58  (2.6%)   15.65  
(2.6%)0.4% (  -4% -5%) 0.636
 Respell   63.96  (1.9%)   64.22  
(2.3%)0.4% (  -3% -4%) 0.542
   OrHighNotHigh  611.12  (5.8%)  613.85  
(6.2%)0.4% ( -10% -   13%) 0.814
 MedIntervalsOrdered   59.48  (5.2%)   59.75  
(5.1%)0.5% (  -9% -   11%) 0.780
 AndHighHigh   58.76  (3.0%)   59.16  
(3.0%)0.7% (  -5% -6%) 0.478
   OrNotHighHigh  619.53  (6.0%)  623.79  
(7.1%)0.7% ( -11% -   14%) 0.740
  HighPhrase   31.00  (2.5%)   31.26  
(2.7%)0.8% (  -4% -6%) 0.307
  AndHighLow  828.41  (5.9%)  835.65  
(7.1%)0.9% ( -11% -   14%) 0.672
OrNotHighLow  986.46  (6.8%)  995.13 
(10.5%)0.9% ( -15% -   19%) 0.752
HighTermTitleBDVSort  110.39 (12.3%)  111.38 
(11.1%)0.9% ( -20% -   27%) 0.807
  IntNRQ  151.29  (2.6%)  152.96  
(3.5%)1.1% (  -4% -7%) 0.262
 LowTerm 1876.18  (7.8%) 1897.19  
(8.3%)1.1% ( -13% -   18%) 0.660
   HighTermDayOfYearSort  108.34 (18.9%)  109.87 
(17.4%)1.4% ( -29% -   46%) 0.805
   HighTermMonthSort   65.84 (11.0%)   66.78 
(11.7%)1.4% ( -19% -   27%) 0.689
OrHighNotMed  770.05  (5.3%)  782.54  
(8.8%)1.6% ( -11% -   16%) 0.480
Wildcard  182.10  (5.5%)  185.24  
(7.2%)1.7% ( -10% -   15%) 0.394
 LowSloppyPhrase   33.75  (6.6%)   34.35  
(8.8%)1.8% ( -12% -   18%) 0.478
   MedPhrase  161.57  (3.8%)  164.62  
(6.1%)1.9% (  -7%

[jira] [Commented] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-26 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465310#comment-17465310
 ] 

Feng Guo commented on LUCENE-10334:
---

Thanks [~rcmuir] for suggestion! I tried some optimizations on this patch:

1. I replaced {{DirectWriter#unsignedBitsRequired}} with 
{{PackedInts#unsignedBitsRequired}} at first since ForUtil can support all bpv, 
this change can reduce some index size. But now i rollbacked this change since 
the decode of 1,2,4,8,12,16... could also be a bit faster in ForUtil.

2. {{ForUtil#decode}} will do a {{switch}} for each call, this can be avoided 
by the way like what we do in {{{}DirectReader{}}}, choose a implementation of 
an interface at the beginning. I applied this change in ForUtil.

I'm not sure which is the major optimization but the report seems better now:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
 AndHighMedDayTaxoFacets   71.49  (2.1%)   64.72  
(2.0%)   -9.5% ( -13% -   -5%) 0.000
MedTermDayTaxoFacets   25.79  (2.6%)   24.00  
(1.8%)   -6.9% ( -11% -   -2%) 0.000
AndHighHighDayTaxoFacets   13.13  (3.4%)   12.63  
(3.1%)   -3.9% ( -10% -2%) 0.000
  OrHighMedDayTaxoFacets   13.71  (4.1%)   13.41  
(4.7%)   -2.2% ( -10% -6%) 0.118
PKLookup  204.87  (3.9%)  203.03  
(3.6%)   -0.9% (  -8% -6%) 0.450
 Prefix3  113.85  (3.6%)  113.32  
(4.6%)   -0.5% (  -8% -8%) 0.724
HighSpanNear   25.34  (2.5%)   25.26  
(3.1%)   -0.3% (  -5% -5%) 0.714
 LowSpanNear   55.96  (2.0%)   55.80  
(2.1%)   -0.3% (  -4% -3%) 0.658
 MedSpanNear   56.84  (2.4%)   56.90  
(2.2%)0.1% (  -4% -4%) 0.895
 MedSloppyPhrase   26.57  (1.8%)   26.60  
(1.9%)0.1% (  -3% -3%) 0.831
HighSloppyPhrase   30.20  (3.7%)   30.24  
(3.6%)0.2% (  -6% -7%) 0.890
   OrHighMed   49.96  (2.1%)   50.06  
(1.7%)0.2% (  -3% -4%) 0.742
  AndHighMed   96.70  (2.9%)   96.95  
(2.6%)0.3% (  -5% -5%) 0.772
 LowIntervalsOrdered   23.32  (4.6%)   23.38  
(4.5%)0.3% (  -8% -9%) 0.856
  OrHighHigh   38.09  (1.9%)   38.20  
(1.8%)0.3% (  -3% -4%) 0.643
  TermDTSort  128.55 (14.7%)  128.94 
(11.6%)0.3% ( -22% -   31%) 0.942
  Fuzzy1   99.54  (7.1%)   99.86  
(8.0%)0.3% ( -13% -   16%) 0.893
HighIntervalsOrdered   15.58  (2.6%)   15.65  
(2.6%)0.4% (  -4% -5%) 0.636
 Respell   63.96  (1.9%)   64.22  
(2.3%)0.4% (  -3% -4%) 0.542
   OrHighNotHigh  611.12  (5.8%)  613.85  
(6.2%)0.4% ( -10% -   13%) 0.814
 MedIntervalsOrdered   59.48  (5.2%)   59.75  
(5.1%)0.5% (  -9% -   11%) 0.780
 AndHighHigh   58.76  (3.0%)   59.16  
(3.0%)0.7% (  -5% -6%) 0.478
   OrNotHighHigh  619.53  (6.0%)  623.79  
(7.1%)0.7% ( -11% -   14%) 0.740
  HighPhrase   31.00  (2.5%)   31.26  
(2.7%)0.8% (  -4% -6%) 0.307
  AndHighLow  828.41  (5.9%)  835.65  
(7.1%)0.9% ( -11% -   14%) 0.672
OrNotHighLow  986.46  (6.8%)  995.13 
(10.5%)0.9% ( -15% -   19%) 0.752
HighTermTitleBDVSort  110.39 (12.3%)  111.38 
(11.1%)0.9% ( -20% -   27%) 0.807
  IntNRQ  151.29  (2.6%)  152.96  
(3.5%)1.1% (  -4% -7%) 0.262
 LowTerm 1876.18  (7.8%) 1897.19  
(8.3%)1.1% ( -13% -   18%) 0.660
   HighTermDayOfYearSort  108.34 (18.9%)  109.87 
(17.4%)1.4% ( -29% -   46%) 0.805
   HighTermMonthSort   65.84 (11.0%)   66.78 
(11.7%)1.4% ( -19% -   27%) 0.689
OrHighNotMed  770.05  (5.3%)  782.54  
(8.8%)1.6% ( -11% -   16%) 0.480
Wildcard  182.10  (5.5%)  185.24  
(7.2%)1.7% ( -10% -   15%) 0.394
 LowSloppyPhrase   33.75  (6.6%)   34.35  
(8.8%)1.8% ( -12% -   18%) 0.478
   MedPhrase  161.57  (3.8%)  164.62  
(6.1%)1.9% (  -7% -   12%) 0.242
OrHighNotLow

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-23 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464619#comment-17464619
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/23/21, 4:11 PM:
--

I'm so sorry to tell that there is something wrong with my benchmark: The 
localrun script was still using the facets described in luceneutil
[readme|https://github.com/mikemccand/luceneutil/blob/master/README.md], like 
this:
{code:python}
facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'))
index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
{code}
And i got the result mentioned above with this facets.

 But when i'm cloning a new luceneutil and rerun the setup.py, it becomes:
{code:python}
index = comp.newIndex('lucene_baseline', sourceData,
facets = (('taxonomy:Date', 'Date'),
  ('taxonomy:Month', 'Month'),
  ('taxonomy:DayOfYear', 'DayOfYear'),
  ('sortedset:Month', 'Month'),
  ('sortedset:DayOfYear', 'DayOfYear'),
  ('taxonomy:RandomLabel', 'RandomLabel'),
  ('sortedset:RandomLabel', 'RandomLabel')))
{code}
And the result is totally different with this:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
   BrowseDayOfYearTaxoFacets   13.65  (8.9%)   10.49  
(2.6%)  -23.2% ( -31% -  -12%) 0.000
   BrowseMonthTaxoFacets   13.54 (14.6%)   10.89  
(2.9%)  -19.6% ( -32% -   -2%) 0.000
BrowseDateTaxoFacets   13.50  (8.8%)   11.11  
(3.7%)  -17.7% ( -27% -   -5%) 0.000
 BrowseRandomLabelTaxoFacets   11.78  (7.0%)9.94  
(5.1%)  -15.6% ( -25% -   -3%) 0.000
MedTermDayTaxoFacets   47.49  (2.4%)   41.45  
(3.4%)  -12.7% ( -18% -   -7%) 0.000
 AndHighMedDayTaxoFacets  130.24  (2.7%)  119.48  
(3.9%)   -8.3% ( -14% -   -1%) 0.000
AndHighHighDayTaxoFacets   28.80  (2.8%)   27.09  
(3.1%)   -5.9% ( -11% -0%) 0.000
  OrHighMedDayTaxoFacets9.68  (2.7%)9.35  
(2.8%)   -3.4% (  -8% -2%) 0.000
   HighTermDayOfYearSort  139.73  (9.6%)  135.74 
(10.2%)   -2.9% ( -20% -   18%) 0.361
  TermDTSort  151.46  (9.0%)  147.40  
(7.7%)   -2.7% ( -17% -   15%) 0.311
  Fuzzy2   35.22  (6.3%)   34.38  
(5.9%)   -2.4% ( -13% -   10%) 0.213
 MedSloppyPhrase   78.99  (6.7%)   77.21  
(7.1%)   -2.3% ( -15% -   12%) 0.300
 LowTerm 1636.38  (6.4%) 1600.26  
(9.6%)   -2.2% ( -17% -   14%) 0.392
   LowPhrase  252.68  (3.8%)  247.11  
(6.5%)   -2.2% ( -12% -8%) 0.189
 Respell   61.23  (2.3%)   59.89  
(5.0%)   -2.2% (  -9% -5%) 0.078
 AndHighHigh   56.54  (2.6%)   55.43  
(4.3%)   -2.0% (  -8% -5%) 0.084
 MedSpanNear   99.37  (2.4%)   97.44  
(5.2%)   -1.9% (  -9% -5%) 0.128
HighSloppyPhrase   28.58  (5.4%)   28.05  
(5.4%)   -1.8% ( -11% -9%) 0.280
PKLookup  198.95  (3.0%)  195.34  
(4.8%)   -1.8% (  -9% -6%) 0.148
  AndHighMed  116.50  (3.3%)  114.65  
(4.5%)   -1.6% (  -9% -6%) 0.204
  Fuzzy1   75.07  (6.4%)   73.99  
(8.1%)   -1.4% ( -14% -   13%) 0.532
HighSpanNear   10.73  (2.8%)   10.58  
(3.9%)   -1.4% (  -7% -5%) 0.180
 LowSpanNear   43.92  (2.4%)   43.30  
(3.4%)   -1.4% (  -6% -4%) 0.128
 LowSloppyPhrase   14.70  (4.4%)   14.50  
(4.2%)   -1.3% (  -9% -7%) 0.329
   HighTermMonthSort  148.80  (8.3%)  146.84  
(8.1%)   -1.3% ( -16% -   16%) 0.612
   OrHighMed  103.00  (3.2%)  101.67  
(5.1%)   -1.3% (  -9% -7%) 0.341
 MedIntervalsOrdered5.44  (2.5%)5.37  
(2.2%)   -1.3% (  -5% -3%) 0.092
   OrHighNotHigh  648.74  (6.7%)  640.81  
(8.8%)   -1.2% ( -15% -   15%) 0.621
   MedPhrase   80.35  (2.7%)   79.38  
(4.8%)   -1.2% (  -8% -6%) 0.327
HighTerm 1384.91  (6.8%) 1369.27  
(8.8%)   -1.1% ( -15% -   15%) 0.650

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-23 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464619#comment-17464619
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/23/21, 2:37 PM:
--

I'm so sorry to tell that there is something wrong with my benchmark: The 
localrun script was still using the facets described in luceneutil
[readme|https://github.com/mikemccand/luceneutil/blob/master/README.md], like 
this:
{code:python}
facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'))
index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
{code}
And i got the result mentioned above with this facets.

 But when i'm cloning a new luceneutil and rerun the setup.py, it becomes:
{code:python}
index = comp.newIndex('lucene_baseline', sourceData,
facets = (('taxonomy:Date', 'Date'),
  ('taxonomy:Month', 'Month'),
  ('taxonomy:DayOfYear', 'DayOfYear'),
  ('sortedset:Month', 'Month'),
  ('sortedset:DayOfYear', 'DayOfYear'),
  ('taxonomy:RandomLabel', 'RandomLabel'),
  ('sortedset:RandomLabel', 'RandomLabel')))
{code}
And the result is totally different with this:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
   BrowseDayOfYearTaxoFacets   13.65  (8.9%)   10.49  
(2.6%)  -23.2% ( -31% -  -12%) 0.000
   BrowseMonthTaxoFacets   13.54 (14.6%)   10.89  
(2.9%)  -19.6% ( -32% -   -2%) 0.000
BrowseDateTaxoFacets   13.50  (8.8%)   11.11  
(3.7%)  -17.7% ( -27% -   -5%) 0.000
 BrowseRandomLabelTaxoFacets   11.78  (7.0%)9.94  
(5.1%)  -15.6% ( -25% -   -3%) 0.000
MedTermDayTaxoFacets   47.49  (2.4%)   41.45  
(3.4%)  -12.7% ( -18% -   -7%) 0.000
 AndHighMedDayTaxoFacets  130.24  (2.7%)  119.48  
(3.9%)   -8.3% ( -14% -   -1%) 0.000
AndHighHighDayTaxoFacets   28.80  (2.8%)   27.09  
(3.1%)   -5.9% ( -11% -0%) 0.000
  OrHighMedDayTaxoFacets9.68  (2.7%)9.35  
(2.8%)   -3.4% (  -8% -2%) 0.000
   HighTermDayOfYearSort  139.73  (9.6%)  135.74 
(10.2%)   -2.9% ( -20% -   18%) 0.361
  TermDTSort  151.46  (9.0%)  147.40  
(7.7%)   -2.7% ( -17% -   15%) 0.311
  Fuzzy2   35.22  (6.3%)   34.38  
(5.9%)   -2.4% ( -13% -   10%) 0.213
 MedSloppyPhrase   78.99  (6.7%)   77.21  
(7.1%)   -2.3% ( -15% -   12%) 0.300
 LowTerm 1636.38  (6.4%) 1600.26  
(9.6%)   -2.2% ( -17% -   14%) 0.392
   LowPhrase  252.68  (3.8%)  247.11  
(6.5%)   -2.2% ( -12% -8%) 0.189
 Respell   61.23  (2.3%)   59.89  
(5.0%)   -2.2% (  -9% -5%) 0.078
 AndHighHigh   56.54  (2.6%)   55.43  
(4.3%)   -2.0% (  -8% -5%) 0.084
 MedSpanNear   99.37  (2.4%)   97.44  
(5.2%)   -1.9% (  -9% -5%) 0.128
HighSloppyPhrase   28.58  (5.4%)   28.05  
(5.4%)   -1.8% ( -11% -9%) 0.280
PKLookup  198.95  (3.0%)  195.34  
(4.8%)   -1.8% (  -9% -6%) 0.148
  AndHighMed  116.50  (3.3%)  114.65  
(4.5%)   -1.6% (  -9% -6%) 0.204
  Fuzzy1   75.07  (6.4%)   73.99  
(8.1%)   -1.4% ( -14% -   13%) 0.532
HighSpanNear   10.73  (2.8%)   10.58  
(3.9%)   -1.4% (  -7% -5%) 0.180
 LowSpanNear   43.92  (2.4%)   43.30  
(3.4%)   -1.4% (  -6% -4%) 0.128
 LowSloppyPhrase   14.70  (4.4%)   14.50  
(4.2%)   -1.3% (  -9% -7%) 0.329
   HighTermMonthSort  148.80  (8.3%)  146.84  
(8.1%)   -1.3% ( -16% -   16%) 0.612
   OrHighMed  103.00  (3.2%)  101.67  
(5.1%)   -1.3% (  -9% -7%) 0.341
 MedIntervalsOrdered5.44  (2.5%)5.37  
(2.2%)   -1.3% (  -5% -3%) 0.092
   OrHighNotHigh  648.74  (6.7%)  640.81  
(8.8%)   -1.2% ( -15% -   15%) 0.621
   MedPhrase   80.35  (2.7%)   79.38  
(4.8%)   -1.2% (  -8% -6%) 0.327
HighTerm 1384.91  (6.8%) 1369.27  
(8.8%)   -1.1% ( -15% -   15%) 0.650

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-23 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464619#comment-17464619
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/23/21, 2:37 PM:
--

I'm so sorry to tell that there is something wrong with my benchmark: The 
localrun script is still using the facets described in luceneutil
[readme|https://github.com/mikemccand/luceneutil/blob/master/README.md], like 
this:
{code:python}
facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'))
index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
{code}
And i got the result mentioned above with this facets. But when i'm cloning a 
new luceneutil and rerun the setup.py, it becomes:
{code:python}
index = comp.newIndex('lucene_baseline', sourceData,
facets = (('taxonomy:Date', 'Date'),
  ('taxonomy:Month', 'Month'),
  ('taxonomy:DayOfYear', 'DayOfYear'),
  ('sortedset:Month', 'Month'),
  ('sortedset:DayOfYear', 'DayOfYear'),
  ('taxonomy:RandomLabel', 'RandomLabel'),
  ('sortedset:RandomLabel', 'RandomLabel')))
{code}
And the result is totally different with this:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
   BrowseDayOfYearTaxoFacets   13.65  (8.9%)   10.49  
(2.6%)  -23.2% ( -31% -  -12%) 0.000
   BrowseMonthTaxoFacets   13.54 (14.6%)   10.89  
(2.9%)  -19.6% ( -32% -   -2%) 0.000
BrowseDateTaxoFacets   13.50  (8.8%)   11.11  
(3.7%)  -17.7% ( -27% -   -5%) 0.000
 BrowseRandomLabelTaxoFacets   11.78  (7.0%)9.94  
(5.1%)  -15.6% ( -25% -   -3%) 0.000
MedTermDayTaxoFacets   47.49  (2.4%)   41.45  
(3.4%)  -12.7% ( -18% -   -7%) 0.000
 AndHighMedDayTaxoFacets  130.24  (2.7%)  119.48  
(3.9%)   -8.3% ( -14% -   -1%) 0.000
AndHighHighDayTaxoFacets   28.80  (2.8%)   27.09  
(3.1%)   -5.9% ( -11% -0%) 0.000
  OrHighMedDayTaxoFacets9.68  (2.7%)9.35  
(2.8%)   -3.4% (  -8% -2%) 0.000
   HighTermDayOfYearSort  139.73  (9.6%)  135.74 
(10.2%)   -2.9% ( -20% -   18%) 0.361
  TermDTSort  151.46  (9.0%)  147.40  
(7.7%)   -2.7% ( -17% -   15%) 0.311
  Fuzzy2   35.22  (6.3%)   34.38  
(5.9%)   -2.4% ( -13% -   10%) 0.213
 MedSloppyPhrase   78.99  (6.7%)   77.21  
(7.1%)   -2.3% ( -15% -   12%) 0.300
 LowTerm 1636.38  (6.4%) 1600.26  
(9.6%)   -2.2% ( -17% -   14%) 0.392
   LowPhrase  252.68  (3.8%)  247.11  
(6.5%)   -2.2% ( -12% -8%) 0.189
 Respell   61.23  (2.3%)   59.89  
(5.0%)   -2.2% (  -9% -5%) 0.078
 AndHighHigh   56.54  (2.6%)   55.43  
(4.3%)   -2.0% (  -8% -5%) 0.084
 MedSpanNear   99.37  (2.4%)   97.44  
(5.2%)   -1.9% (  -9% -5%) 0.128
HighSloppyPhrase   28.58  (5.4%)   28.05  
(5.4%)   -1.8% ( -11% -9%) 0.280
PKLookup  198.95  (3.0%)  195.34  
(4.8%)   -1.8% (  -9% -6%) 0.148
  AndHighMed  116.50  (3.3%)  114.65  
(4.5%)   -1.6% (  -9% -6%) 0.204
  Fuzzy1   75.07  (6.4%)   73.99  
(8.1%)   -1.4% ( -14% -   13%) 0.532
HighSpanNear   10.73  (2.8%)   10.58  
(3.9%)   -1.4% (  -7% -5%) 0.180
 LowSpanNear   43.92  (2.4%)   43.30  
(3.4%)   -1.4% (  -6% -4%) 0.128
 LowSloppyPhrase   14.70  (4.4%)   14.50  
(4.2%)   -1.3% (  -9% -7%) 0.329
   HighTermMonthSort  148.80  (8.3%)  146.84  
(8.1%)   -1.3% ( -16% -   16%) 0.612
   OrHighMed  103.00  (3.2%)  101.67  
(5.1%)   -1.3% (  -9% -7%) 0.341
 MedIntervalsOrdered5.44  (2.5%)5.37  
(2.2%)   -1.3% (  -5% -3%) 0.092
   OrHighNotHigh  648.74  (6.7%)  640.81  
(8.8%)   -1.2% ( -15% -   15%) 0.621
   MedPhrase   80.35  (2.7%)   79.38  
(4.8%)   -1.2% (  -8% -6%) 0.327
HighTerm 1384.91  (6.8%) 1369.27  
(8.8%)   -1.1% ( -15% -   15%) 0.650

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-23 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464619#comment-17464619
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/23/21, 2:34 PM:
--

I'm so sorry to say that there is something wrong with my benchmark: The 
localrun script is still using the facets described in luceneutil
[readme|https://github.com/mikemccand/luceneutil/blob/master/README.md], like 
this:
{code:python}
facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'))
index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
{code}
And i got the result mentioned above with this facets. But when i'm cloning a 
new luceneutil and rerun the setup.py, it becomes:
{code:python}
index = comp.newIndex('lucene_baseline', sourceData,
facets = (('taxonomy:Date', 'Date'),
  ('taxonomy:Month', 'Month'),
  ('taxonomy:DayOfYear', 'DayOfYear'),
  ('sortedset:Month', 'Month'),
  ('sortedset:DayOfYear', 'DayOfYear'),
  ('taxonomy:RandomLabel', 'RandomLabel'),
  ('sortedset:RandomLabel', 'RandomLabel')))
{code}
And the result is totally different with this:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
   BrowseDayOfYearTaxoFacets   13.65  (8.9%)   10.49  
(2.6%)  -23.2% ( -31% -  -12%) 0.000
   BrowseMonthTaxoFacets   13.54 (14.6%)   10.89  
(2.9%)  -19.6% ( -32% -   -2%) 0.000
BrowseDateTaxoFacets   13.50  (8.8%)   11.11  
(3.7%)  -17.7% ( -27% -   -5%) 0.000
 BrowseRandomLabelTaxoFacets   11.78  (7.0%)9.94  
(5.1%)  -15.6% ( -25% -   -3%) 0.000
MedTermDayTaxoFacets   47.49  (2.4%)   41.45  
(3.4%)  -12.7% ( -18% -   -7%) 0.000
 AndHighMedDayTaxoFacets  130.24  (2.7%)  119.48  
(3.9%)   -8.3% ( -14% -   -1%) 0.000
AndHighHighDayTaxoFacets   28.80  (2.8%)   27.09  
(3.1%)   -5.9% ( -11% -0%) 0.000
  OrHighMedDayTaxoFacets9.68  (2.7%)9.35  
(2.8%)   -3.4% (  -8% -2%) 0.000
   HighTermDayOfYearSort  139.73  (9.6%)  135.74 
(10.2%)   -2.9% ( -20% -   18%) 0.361
  TermDTSort  151.46  (9.0%)  147.40  
(7.7%)   -2.7% ( -17% -   15%) 0.311
  Fuzzy2   35.22  (6.3%)   34.38  
(5.9%)   -2.4% ( -13% -   10%) 0.213
 MedSloppyPhrase   78.99  (6.7%)   77.21  
(7.1%)   -2.3% ( -15% -   12%) 0.300
 LowTerm 1636.38  (6.4%) 1600.26  
(9.6%)   -2.2% ( -17% -   14%) 0.392
   LowPhrase  252.68  (3.8%)  247.11  
(6.5%)   -2.2% ( -12% -8%) 0.189
 Respell   61.23  (2.3%)   59.89  
(5.0%)   -2.2% (  -9% -5%) 0.078
 AndHighHigh   56.54  (2.6%)   55.43  
(4.3%)   -2.0% (  -8% -5%) 0.084
 MedSpanNear   99.37  (2.4%)   97.44  
(5.2%)   -1.9% (  -9% -5%) 0.128
HighSloppyPhrase   28.58  (5.4%)   28.05  
(5.4%)   -1.8% ( -11% -9%) 0.280
PKLookup  198.95  (3.0%)  195.34  
(4.8%)   -1.8% (  -9% -6%) 0.148
  AndHighMed  116.50  (3.3%)  114.65  
(4.5%)   -1.6% (  -9% -6%) 0.204
  Fuzzy1   75.07  (6.4%)   73.99  
(8.1%)   -1.4% ( -14% -   13%) 0.532
HighSpanNear   10.73  (2.8%)   10.58  
(3.9%)   -1.4% (  -7% -5%) 0.180
 LowSpanNear   43.92  (2.4%)   43.30  
(3.4%)   -1.4% (  -6% -4%) 0.128
 LowSloppyPhrase   14.70  (4.4%)   14.50  
(4.2%)   -1.3% (  -9% -7%) 0.329
   HighTermMonthSort  148.80  (8.3%)  146.84  
(8.1%)   -1.3% ( -16% -   16%) 0.612
   OrHighMed  103.00  (3.2%)  101.67  
(5.1%)   -1.3% (  -9% -7%) 0.341
 MedIntervalsOrdered5.44  (2.5%)5.37  
(2.2%)   -1.3% (  -5% -3%) 0.092
   OrHighNotHigh  648.74  (6.7%)  640.81  
(8.8%)   -1.2% ( -15% -   15%) 0.621
   MedPhrase   80.35  (2.7%)   79.38  
(4.8%)   -1.2% (  -8% -6%) 0.327
HighTerm 1384.91  (6.8%) 1369.27  
(8.8%)   -1.1% ( -15% -   15%) 0.650

[jira] [Commented] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-23 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464619#comment-17464619
 ] 

Feng Guo commented on LUCENE-10334:
---

I'm so sorry to say that there is something wrong with my benchmark: The 
localrun script is still using the facets described in luceneutil
[readme|https://github.com/mikemccand/luceneutil/blob/master/README.md], like 
this:
{code:python}
facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'))
index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
{code}
And i got the result mentioned above with this facets. But when i'm cloning a 
new luceneutil and rerun the setup.py, it becomes:
{code:python}
index = comp.newIndex('lucene_baseline', sourceData,
facets = (('taxonomy:Date', 'Date'),
  ('taxonomy:Month', 'Month'),
  ('taxonomy:DayOfYear', 'DayOfYear'),
  ('sortedset:Month', 'Month'),
  ('sortedset:DayOfYear', 'DayOfYear'),
  ('taxonomy:RandomLabel', 'RandomLabel'),
  ('sortedset:RandomLabel', 'RandomLabel')))
{code}
And the result is totally different with this:
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
   BrowseDayOfYearTaxoFacets   13.65  (8.9%)   10.49  
(2.6%)  -23.2% ( -31% -  -12%) 0.000
   BrowseMonthTaxoFacets   13.54 (14.6%)   10.89  
(2.9%)  -19.6% ( -32% -   -2%) 0.000
BrowseDateTaxoFacets   13.50  (8.8%)   11.11  
(3.7%)  -17.7% ( -27% -   -5%) 0.000
 BrowseRandomLabelTaxoFacets   11.78  (7.0%)9.94  
(5.1%)  -15.6% ( -25% -   -3%) 0.000
MedTermDayTaxoFacets   47.49  (2.4%)   41.45  
(3.4%)  -12.7% ( -18% -   -7%) 0.000
 AndHighMedDayTaxoFacets  130.24  (2.7%)  119.48  
(3.9%)   -8.3% ( -14% -   -1%) 0.000
AndHighHighDayTaxoFacets   28.80  (2.8%)   27.09  
(3.1%)   -5.9% ( -11% -0%) 0.000
  OrHighMedDayTaxoFacets9.68  (2.7%)9.35  
(2.8%)   -3.4% (  -8% -2%) 0.000
   HighTermDayOfYearSort  139.73  (9.6%)  135.74 
(10.2%)   -2.9% ( -20% -   18%) 0.361
  TermDTSort  151.46  (9.0%)  147.40  
(7.7%)   -2.7% ( -17% -   15%) 0.311
  Fuzzy2   35.22  (6.3%)   34.38  
(5.9%)   -2.4% ( -13% -   10%) 0.213
 MedSloppyPhrase   78.99  (6.7%)   77.21  
(7.1%)   -2.3% ( -15% -   12%) 0.300
 LowTerm 1636.38  (6.4%) 1600.26  
(9.6%)   -2.2% ( -17% -   14%) 0.392
   LowPhrase  252.68  (3.8%)  247.11  
(6.5%)   -2.2% ( -12% -8%) 0.189
 Respell   61.23  (2.3%)   59.89  
(5.0%)   -2.2% (  -9% -5%) 0.078
 AndHighHigh   56.54  (2.6%)   55.43  
(4.3%)   -2.0% (  -8% -5%) 0.084
 MedSpanNear   99.37  (2.4%)   97.44  
(5.2%)   -1.9% (  -9% -5%) 0.128
HighSloppyPhrase   28.58  (5.4%)   28.05  
(5.4%)   -1.8% ( -11% -9%) 0.280
PKLookup  198.95  (3.0%)  195.34  
(4.8%)   -1.8% (  -9% -6%) 0.148
  AndHighMed  116.50  (3.3%)  114.65  
(4.5%)   -1.6% (  -9% -6%) 0.204
  Fuzzy1   75.07  (6.4%)   73.99  
(8.1%)   -1.4% ( -14% -   13%) 0.532
HighSpanNear   10.73  (2.8%)   10.58  
(3.9%)   -1.4% (  -7% -5%) 0.180
 LowSpanNear   43.92  (2.4%)   43.30  
(3.4%)   -1.4% (  -6% -4%) 0.128
 LowSloppyPhrase   14.70  (4.4%)   14.50  
(4.2%)   -1.3% (  -9% -7%) 0.329
   HighTermMonthSort  148.80  (8.3%)  146.84  
(8.1%)   -1.3% ( -16% -   16%) 0.612
   OrHighMed  103.00  (3.2%)  101.67  
(5.1%)   -1.3% (  -9% -7%) 0.341
 MedIntervalsOrdered5.44  (2.5%)5.37  
(2.2%)   -1.3% (  -5% -3%) 0.092
   OrHighNotHigh  648.74  (6.7%)  640.81  
(8.8%)   -1.2% ( -15% -   15%) 0.621
   MedPhrase   80.35  (2.7%)   79.38  
(4.8%)   -1.2% (  -8% -6%) 0.327
HighTerm 1384.91  (6.8%) 1369.27  
(8.8%)   -1.1% ( -15% -   15%) 0.650
  IntNRQ  127.36  (2.8%)  125.95

[jira] [Commented] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-23 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464539#comment-17464539
 ] 

Feng Guo commented on LUCENE-10334:
---

Thanks [~rcmuir] for reply! No hurry here,  feel free to ignore this and have a 
nice holiday :)

> Introduce a BlockReader based on ForUtil and use it for NumericDocValues
> 
>
> Key: LUCENE-10334
> URL: https://issues.apache.org/jira/browse/LUCENE-10334
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Previous talk is here: https://github.com/apache/lucene/pull/557
> This is trying to add a new BlockReader based on ForUtil to replace the 
> DirectReader we are using for NumericDocvalues
> *Benchmark based on wiki10m*
> {code:java}
> TaskQPS baseline  StdDevQPS 
> my_modified_version  StdDevPct diff p-value
>OrNotHighHigh  694.17  (8.2%)  685.83  
> (7.0%)   -1.2% ( -15% -   15%) 0.618
>  Respell   75.15  (2.7%)   74.32  
> (2.0%)   -1.1% (  -5% -3%) 0.146
>  Prefix3  220.11  (5.1%)  217.78  
> (5.8%)   -1.1% ( -11% -   10%) 0.541
> Wildcard  129.75  (3.7%)  128.63  
> (2.5%)   -0.9% (  -6% -5%) 0.383
>  LowSpanNear   68.54  (2.1%)   68.00  
> (2.4%)   -0.8% (  -5% -3%) 0.269
> OrNotHighMed  732.90  (6.8%)  727.49  
> (5.3%)   -0.7% ( -12% -   12%) 0.703
>  BrowseRandomLabelTaxoFacets11879.03  (8.6%)11799.33  
> (5.5%)   -0.7% ( -13% -   14%) 0.769
> HighSloppyPhrase6.87  (2.9%)6.83  
> (2.3%)   -0.6% (  -5% -4%) 0.496
> OrHighNotMed  827.54  (9.2%)  822.94  
> (8.0%)   -0.6% ( -16% -   18%) 0.838
>  MedSpanNear   18.92  (5.7%)   18.82  
> (5.6%)   -0.5% ( -11% -   11%) 0.759
>   OrHighMedDayTaxoFacets   10.27  (4.0%)   10.21  
> (4.3%)   -0.5% (  -8% -8%) 0.676
> PKLookup  207.98  (4.0%)  206.85  
> (2.7%)   -0.5% (  -7% -6%) 0.621
>  LowIntervalsOrdered  159.17  (2.3%)  158.32  
> (2.2%)   -0.5% (  -4% -3%) 0.445
> HighSpanNear6.32  (4.2%)6.28  
> (4.1%)   -0.5% (  -8% -8%) 0.691
>  MedIntervalsOrdered   85.31  (3.2%)   84.88  
> (2.9%)   -0.5% (  -6% -5%) 0.607
> HighTerm 1170.55  (5.8%) 1164.79  
> (3.9%)   -0.5% (  -9% -9%) 0.753
>  LowSloppyPhrase   14.54  (3.1%)   14.48  
> (2.9%)   -0.4% (  -6% -5%) 0.651
>   HighPhrase  112.81  (4.4%)  112.39  
> (4.1%)   -0.4% (  -8% -8%) 0.781
> OrNotHighLow  858.02  (5.9%)  854.99  
> (4.8%)   -0.4% ( -10% -   10%) 0.835
> HighIntervalsOrdered   25.08  (2.8%)   25.00  
> (2.6%)   -0.3% (  -5% -5%) 0.701
>MedPhrase   27.20  (2.1%)   27.11  
> (2.9%)   -0.3% (  -5% -4%) 0.689
> MedTermDayTaxoFacets   81.55  (2.3%)   81.35  
> (2.9%)   -0.3% (  -5% -5%) 0.762
>   IntNRQ   63.36  (2.0%)   63.21  
> (2.5%)   -0.2% (  -4% -4%) 0.740
>   Fuzzy2   73.24  (5.5%)   73.10  
> (6.2%)   -0.2% ( -11% -   12%) 0.916
>  AndHighMedDayTaxoFacets   76.08  (3.5%)   75.98  
> (3.4%)   -0.1% (  -6% -7%) 0.905
>  AndHighHigh   62.20  (2.0%)   62.18  
> (2.4%)   -0.0% (  -4% -4%) 0.954
>BrowseMonthTaxoFacets11993.48  (6.7%)11989.53  
> (4.8%)   -0.0% ( -10% -   12%) 0.986
> OrHighNotLow  732.82  (7.2%)  732.80  
> (6.2%)   -0.0% ( -12% -   14%) 0.999
>   Fuzzy1   46.43  (5.3%)   46.45  
> (6.0%)0.0% ( -10% -   11%) 0.989
>  LowTerm 1608.25  (6.0%) 1608.84  
> (4.9%)0.0% ( -10% -   11%) 0.983
>OrHighMed   75.90  (2.3%)   75.93  
> (1.8%)0.0% (  -3% -4%) 0.939
>LowPhrase  273.81  (2.9%)  274.04  
> (3.3%)0.1% (  -5% -6%) 0.932
>   AndHighLow  717.24  (6.1%)  718.17  
> (3.3%)0.1% (  -8% -

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-23 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463911#comment-17463911
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/23/21, 12:22 PM:
---

Thanks [~gsmiller] ! Yes I do thought i would get some regression in 
sparse-hits tasks and the result suprised me too. Maybe we should thank to the 
powerful {{{}ForUtil{}}}? :)

By reading codes in LUCENE-10033, i suspect there are two reasons that could 
probably lead to a more obvious regression than this approach:

1. 10033 approach computes bpv for each small block and need to read the 
pointer from a {{DirectMonotonicReader}} before seeking. While this approach is 
using a global bpv and pointers can be computed by {{{}offset + blockBytes * 
block{}}}. This could be faster. A global bpv can lead larger index size but i 
think it acceptable since it's what we used to do.

2. 10033 approach decode offset/gcd/delta for each block, some of them could be 
auto-vectorized but still a bit heavier. This approach is trying to make the 
decoding of blocks as simple as possible and jobs like gcd decoding is only 
done for hit docs.

I'm not really sure these are major reasons but should make the benchmark 
result a bit more explainable.


was (Author: gf2121):
Thanks [~gsmiller] ! Yes I do thought i would get some regression in 
sparse-hits tasks and the result suprised me too. Maybe we should thank to the 
powerful {{{}ForUtil{}}}? :)

By reading codes in LUCENE-10033, i suspect there are two reasons that could 
probably lead to a more obvious regression than this approach:

1. 10033 approach computes bpv for each small block and need to read the 
pointer from a {{DirectMonotonicReader}} before seeking. While this approach is 
using a global bpv and pointers can be computed by {{{}offset + blockBytes * 
block{}}}. This could be faster. A global bpv can lead larger index size but i 
think it acceptable since it's what we used to do.

2. 10033 approach decode offset/gcd/delta for each block, some of them could be 
auto-vectorized but still a bit heavier. This approach is trying to make the 
decoding of blocks as simple as possible and jobs like gcd decoding is only 
done for hit docs.

I'm not really sure these are major reasons but should make the benchmark 
result a bit more explainable. By the way, here is my localrun script. I post 
it here in case there is something wrong with it. (I personally added 
('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without 
this, but i'm not sure this is correct since the readme did not mention it)
{code:python}
if __name__ == '__main__':
  sourceData = competition.sourceData()
  comp =  competition.Competition()

  facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', 
"RandomLabel"))
  index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
  candidate_index = comp.newIndex('lucene_candidate', sourceData, 
facets=facets, indexSort='dayOfYearNumericDV:long')

  #Warning -- Do not break the order of arguments
  #TODO -- Fix the following by using argparser
  if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches':
concurrentSearches = True
  else:
concurrentSearches = False

  # create a competitor named baseline with sources in the ../trunk folder
  comp.competitor('baseline', 'lucene_baseline',
  index = index, concurrentSearches = concurrentSearches)

  comp.competitor('my_modified_version', 'lucene_candidate',
  index = candidate_index, concurrentSearches = 
concurrentSearches)

  # start the benchmark - this can take long depending on your index and 
machines
  comp.benchmark("baseline_vs_patch")
{code}

> Introduce a BlockReader based on ForUtil and use it for NumericDocValues
> 
>
> Key: LUCENE-10334
> URL: https://issues.apache.org/jira/browse/LUCENE-10334
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Previous talk is here: https://github.com/apache/lucene/pull/557
> This is trying to add a new BlockReader based on ForUtil to replace the 
> DirectReader we are using for NumericDocvalues
> *Benchmark based on wiki10m*
> {code:java}
> TaskQPS baseline  StdDevQPS 
> my_modified_version  StdDevPct diff p-value
>OrNotHighHigh  694.17  (8.2%)  685.83  
> (7.0%)   -1.2% ( -15% -   15%) 0.618
>  Respell   75.15  (2.7%)   74.32  
> (2.0%)   -1.1% (  -5% -3%) 0.146
>

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-23 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464380#comment-17464380
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/23/21, 9:15 AM:
--

Hi all! Since all existing luceneutil tasks look good, I wonder if we need to 
add some more tasks or try this approach in Amazon's product search engine 
benchmark (like what we did in  
https://issues.apache.org/jira/browse/LUCENE-10033) to justify this change? I'm 
willing to do any work to futher test this if any. Or if we think existing 
luceneUtil tasks are enough to justify this, I've fixed CI issues and the PR is 
probably ready for a reivew now :)

In this PR, I only replaced the {{DirectReader}} used in 
{{NumericDocValues#longValue}} with {{BlockReader}}  but i suspect this could 
be used in some other places (e.g. {{DirectMonotonicReader}}, stored fields, 
even in BKD https://issues.apache.org/jira/browse/LUCENE-10315). I'll justify 
those changes in follow ups.


was (Author: gf2121):
Hi all! Since all existing luceneutil tasks look good, I wonder if we need to 
add some more tasks or try this approach in Amazon's product search engine 
benchmark (like what we did in  
https://issues.apache.org/jira/browse/LUCENE-10033) to justify this change? I'm 
willing to do any work to futher test this if any. Or if we think existing 
luceneUtil tasks are enough to justify this, I've fixed CI issues and the PR is 
probably ready for a reivew now :)

in this PR, I only replaced the {{DirectReader}} used in 
{{NumericDocValues#longValue}} with {{BlockReader}}  but i suspect this could 
probably be used in some other places (e.g. {{DirectMonotonicReader}}, stored 
fields, even in BKD https://issues.apache.org/jira/browse/LUCENE-10315). I'll 
justify those changes in follow ups.

> Introduce a BlockReader based on ForUtil and use it for NumericDocValues
> 
>
> Key: LUCENE-10334
> URL: https://issues.apache.org/jira/browse/LUCENE-10334
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Previous talk is here: https://github.com/apache/lucene/pull/557
> This is trying to add a new BlockReader based on ForUtil to replace the 
> DirectReader we are using for NumericDocvalues
> *Benchmark based on wiki10m*
> {code:java}
> TaskQPS baseline  StdDevQPS 
> my_modified_version  StdDevPct diff p-value
>OrNotHighHigh  694.17  (8.2%)  685.83  
> (7.0%)   -1.2% ( -15% -   15%) 0.618
>  Respell   75.15  (2.7%)   74.32  
> (2.0%)   -1.1% (  -5% -3%) 0.146
>  Prefix3  220.11  (5.1%)  217.78  
> (5.8%)   -1.1% ( -11% -   10%) 0.541
> Wildcard  129.75  (3.7%)  128.63  
> (2.5%)   -0.9% (  -6% -5%) 0.383
>  LowSpanNear   68.54  (2.1%)   68.00  
> (2.4%)   -0.8% (  -5% -3%) 0.269
> OrNotHighMed  732.90  (6.8%)  727.49  
> (5.3%)   -0.7% ( -12% -   12%) 0.703
>  BrowseRandomLabelTaxoFacets11879.03  (8.6%)11799.33  
> (5.5%)   -0.7% ( -13% -   14%) 0.769
> HighSloppyPhrase6.87  (2.9%)6.83  
> (2.3%)   -0.6% (  -5% -4%) 0.496
> OrHighNotMed  827.54  (9.2%)  822.94  
> (8.0%)   -0.6% ( -16% -   18%) 0.838
>  MedSpanNear   18.92  (5.7%)   18.82  
> (5.6%)   -0.5% ( -11% -   11%) 0.759
>   OrHighMedDayTaxoFacets   10.27  (4.0%)   10.21  
> (4.3%)   -0.5% (  -8% -8%) 0.676
> PKLookup  207.98  (4.0%)  206.85  
> (2.7%)   -0.5% (  -7% -6%) 0.621
>  LowIntervalsOrdered  159.17  (2.3%)  158.32  
> (2.2%)   -0.5% (  -4% -3%) 0.445
> HighSpanNear6.32  (4.2%)6.28  
> (4.1%)   -0.5% (  -8% -8%) 0.691
>  MedIntervalsOrdered   85.31  (3.2%)   84.88  
> (2.9%)   -0.5% (  -6% -5%) 0.607
> HighTerm 1170.55  (5.8%) 1164.79  
> (3.9%)   -0.5% (  -9% -9%) 0.753
>  LowSloppyPhrase   14.54  (3.1%)   14.48  
> (2.9%)   -0.4% (  -6% -5%) 0.651
>   HighPhrase  112.81  (4.4%)  112.39  
> (4.1%)   -0.4% (  -8% -8%) 0.781
> OrNotHighLow  858.02  (5.9%)  854.99  
> (4.8%)   -0.4% ( -10% -   10%) 0.835
> HighIntervalsOrdered   25.08

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-23 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464380#comment-17464380
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/23/21, 9:14 AM:
--

Hi all! Since all existing luceneutil tasks look good, I wonder if we need to 
add some more tasks or try this approach in Amazon's product search engine 
benchmark (like what we did in  
https://issues.apache.org/jira/browse/LUCENE-10033) to justify this change? I'm 
willing to do any work to futher test this if any. Or if we think existing 
luceneUtil tasks are enough to justify this, I've fixed CI issues and the PR is 
probably ready for a reivew now :)

in this PR, I only replaced the {{DirectReader}} used in 
{{NumericDocValues#longValue}} with {{BlockReader}}  but i suspect this could 
probably be used in some other places (e.g. {{DirectMonotonicReader}}, stored 
fields, even in BKD https://issues.apache.org/jira/browse/LUCENE-10315). I'll 
justify those changes in follow ups.


was (Author: gf2121):
Hi all! Since all existing luceneutil tasks look good, I wonder if we need to 
add some more tasks or try this approach in Amazon's product search engine 
benchmark (like what we did in  
https://issues.apache.org/jira/browse/LUCENE-10033) to justify this change? I'm 
willing to do any work to futher test this if any. Or if we think existing 
luceneUtil tasks are enough to justify this, I've fixed CI issues and the PR is 
probably ready for a reivew now :)

n this PR, I only replaced the {{DirectReader}} used in 
{{NumericDocValues#longValue}} with {{BlockReader}}  but i suspect this could 
probably be used in some other places (e.g. {{DirectMonotonicReader}}, stored 
fields, even in BKD https://issues.apache.org/jira/browse/LUCENE-10315). I'll 
justify those changes in follow ups.

> Introduce a BlockReader based on ForUtil and use it for NumericDocValues
> 
>
> Key: LUCENE-10334
> URL: https://issues.apache.org/jira/browse/LUCENE-10334
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Previous talk is here: https://github.com/apache/lucene/pull/557
> This is trying to add a new BlockReader based on ForUtil to replace the 
> DirectReader we are using for NumericDocvalues
> *Benchmark based on wiki10m*
> {code:java}
> TaskQPS baseline  StdDevQPS 
> my_modified_version  StdDevPct diff p-value
>OrNotHighHigh  694.17  (8.2%)  685.83  
> (7.0%)   -1.2% ( -15% -   15%) 0.618
>  Respell   75.15  (2.7%)   74.32  
> (2.0%)   -1.1% (  -5% -3%) 0.146
>  Prefix3  220.11  (5.1%)  217.78  
> (5.8%)   -1.1% ( -11% -   10%) 0.541
> Wildcard  129.75  (3.7%)  128.63  
> (2.5%)   -0.9% (  -6% -5%) 0.383
>  LowSpanNear   68.54  (2.1%)   68.00  
> (2.4%)   -0.8% (  -5% -3%) 0.269
> OrNotHighMed  732.90  (6.8%)  727.49  
> (5.3%)   -0.7% ( -12% -   12%) 0.703
>  BrowseRandomLabelTaxoFacets11879.03  (8.6%)11799.33  
> (5.5%)   -0.7% ( -13% -   14%) 0.769
> HighSloppyPhrase6.87  (2.9%)6.83  
> (2.3%)   -0.6% (  -5% -4%) 0.496
> OrHighNotMed  827.54  (9.2%)  822.94  
> (8.0%)   -0.6% ( -16% -   18%) 0.838
>  MedSpanNear   18.92  (5.7%)   18.82  
> (5.6%)   -0.5% ( -11% -   11%) 0.759
>   OrHighMedDayTaxoFacets   10.27  (4.0%)   10.21  
> (4.3%)   -0.5% (  -8% -8%) 0.676
> PKLookup  207.98  (4.0%)  206.85  
> (2.7%)   -0.5% (  -7% -6%) 0.621
>  LowIntervalsOrdered  159.17  (2.3%)  158.32  
> (2.2%)   -0.5% (  -4% -3%) 0.445
> HighSpanNear6.32  (4.2%)6.28  
> (4.1%)   -0.5% (  -8% -8%) 0.691
>  MedIntervalsOrdered   85.31  (3.2%)   84.88  
> (2.9%)   -0.5% (  -6% -5%) 0.607
> HighTerm 1170.55  (5.8%) 1164.79  
> (3.9%)   -0.5% (  -9% -9%) 0.753
>  LowSloppyPhrase   14.54  (3.1%)   14.48  
> (2.9%)   -0.4% (  -6% -5%) 0.651
>   HighPhrase  112.81  (4.4%)  112.39  
> (4.1%)   -0.4% (  -8% -8%) 0.781
> OrNotHighLow  858.02  (5.9%)  854.99  
> (4.8%)   -0.4% ( -10% -   10%) 0.835
> HighIntervalsOrdered

[jira] [Commented] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-23 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464380#comment-17464380
 ] 

Feng Guo commented on LUCENE-10334:
---

Hi all! Since all existing luceneutil tasks look good, I wonder if we need to 
add some more tasks or try this approach in Amazon's product search engine 
benchmark (like what we did in  
https://issues.apache.org/jira/browse/LUCENE-10033) to justify this change? I'm 
willing to do any work to futher test this if any. Or if we think existing 
luceneUtil tasks are enough to justify this, I've fixed CI issues and the PR is 
probably ready for a reivew now :)

n this PR, I only replaced the {{DirectReader}} used in 
{{NumericDocValues#longValue}} with {{BlockReader}}  but i suspect this could 
probably be used in some other places (e.g. {{DirectMonotonicReader}}, stored 
fields, even in BKD https://issues.apache.org/jira/browse/LUCENE-10315). I'll 
justify those changes in follow ups.

> Introduce a BlockReader based on ForUtil and use it for NumericDocValues
> 
>
> Key: LUCENE-10334
> URL: https://issues.apache.org/jira/browse/LUCENE-10334
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Previous talk is here: https://github.com/apache/lucene/pull/557
> This is trying to add a new BlockReader based on ForUtil to replace the 
> DirectReader we are using for NumericDocvalues
> *Benchmark based on wiki10m*
> {code:java}
> TaskQPS baseline  StdDevQPS 
> my_modified_version  StdDevPct diff p-value
>OrNotHighHigh  694.17  (8.2%)  685.83  
> (7.0%)   -1.2% ( -15% -   15%) 0.618
>  Respell   75.15  (2.7%)   74.32  
> (2.0%)   -1.1% (  -5% -3%) 0.146
>  Prefix3  220.11  (5.1%)  217.78  
> (5.8%)   -1.1% ( -11% -   10%) 0.541
> Wildcard  129.75  (3.7%)  128.63  
> (2.5%)   -0.9% (  -6% -5%) 0.383
>  LowSpanNear   68.54  (2.1%)   68.00  
> (2.4%)   -0.8% (  -5% -3%) 0.269
> OrNotHighMed  732.90  (6.8%)  727.49  
> (5.3%)   -0.7% ( -12% -   12%) 0.703
>  BrowseRandomLabelTaxoFacets11879.03  (8.6%)11799.33  
> (5.5%)   -0.7% ( -13% -   14%) 0.769
> HighSloppyPhrase6.87  (2.9%)6.83  
> (2.3%)   -0.6% (  -5% -4%) 0.496
> OrHighNotMed  827.54  (9.2%)  822.94  
> (8.0%)   -0.6% ( -16% -   18%) 0.838
>  MedSpanNear   18.92  (5.7%)   18.82  
> (5.6%)   -0.5% ( -11% -   11%) 0.759
>   OrHighMedDayTaxoFacets   10.27  (4.0%)   10.21  
> (4.3%)   -0.5% (  -8% -8%) 0.676
> PKLookup  207.98  (4.0%)  206.85  
> (2.7%)   -0.5% (  -7% -6%) 0.621
>  LowIntervalsOrdered  159.17  (2.3%)  158.32  
> (2.2%)   -0.5% (  -4% -3%) 0.445
> HighSpanNear6.32  (4.2%)6.28  
> (4.1%)   -0.5% (  -8% -8%) 0.691
>  MedIntervalsOrdered   85.31  (3.2%)   84.88  
> (2.9%)   -0.5% (  -6% -5%) 0.607
> HighTerm 1170.55  (5.8%) 1164.79  
> (3.9%)   -0.5% (  -9% -9%) 0.753
>  LowSloppyPhrase   14.54  (3.1%)   14.48  
> (2.9%)   -0.4% (  -6% -5%) 0.651
>   HighPhrase  112.81  (4.4%)  112.39  
> (4.1%)   -0.4% (  -8% -8%) 0.781
> OrNotHighLow  858.02  (5.9%)  854.99  
> (4.8%)   -0.4% ( -10% -   10%) 0.835
> HighIntervalsOrdered   25.08  (2.8%)   25.00  
> (2.6%)   -0.3% (  -5% -5%) 0.701
>MedPhrase   27.20  (2.1%)   27.11  
> (2.9%)   -0.3% (  -5% -4%) 0.689
> MedTermDayTaxoFacets   81.55  (2.3%)   81.35  
> (2.9%)   -0.3% (  -5% -5%) 0.762
>   IntNRQ   63.36  (2.0%)   63.21  
> (2.5%)   -0.2% (  -4% -4%) 0.740
>   Fuzzy2   73.24  (5.5%)   73.10  
> (6.2%)   -0.2% ( -11% -   12%) 0.916
>  AndHighMedDayTaxoFacets   76.08  (3.5%)   75.98  
> (3.4%)   -0.1% (  -6% -7%) 0.905
>  AndHighHigh   62.20  (2.0%)   62.18  
> (2.4%)   -0.0% (  -4% -4%) 0.954
>BrowseMonthTaxoFacets11993.48  (6.7%)11989.53  
> (4.8%)   -0.0% ( -10% -

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-22 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463911#comment-17463911
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/23/21, 4:29 AM:
--

Thanks [~gsmiller] ! Yes I do thought i would get some regression in 
sparse-hits tasks and the result suprised me too. Maybe we should thank to the 
powerful {{{}ForUtil{}}}? :)

By reading codes in LUCENE-10033, i suspect there are two reasons that could 
probably lead to a more obvious regression than this approach:

1. 10033 approach computes bpv for each small block and need to read the 
pointer from a {{DirectMonotonicReader}} before seeking. While this approach is 
using a global bpv and pointers can be computed by {{{}offset + blockBytes * 
block{}}}. This could be faster. A global bpv can lead larger index size but i 
think it acceptable since it's what we used to do.

2. 10033 approach decode offset/gcd/delta for each block, some of them could be 
auto-vectorized but still a bit heavier. This approach is trying to make the 
decoding of blocks as simple as possible and jobs like gcd decoding is only 
done for hit docs.

I'm not really sure these are major reasons but should make the benchmark 
result a bit more explainable. By the way, here is my localrun script. I post 
it here in case there is something wrong with it. (I personally added 
('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without 
this, but i'm not sure this is correct since the readme did not mention it)
{code:python}
if __name__ == '__main__':
  sourceData = competition.sourceData()
  comp =  competition.Competition()

  facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', 
"RandomLabel"))
  index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
  candidate_index = comp.newIndex('lucene_candidate', sourceData, 
facets=facets, indexSort='dayOfYearNumericDV:long')

  #Warning -- Do not break the order of arguments
  #TODO -- Fix the following by using argparser
  if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches':
concurrentSearches = True
  else:
concurrentSearches = False

  # create a competitor named baseline with sources in the ../trunk folder
  comp.competitor('baseline', 'lucene_baseline',
  index = index, concurrentSearches = concurrentSearches)

  comp.competitor('my_modified_version', 'lucene_candidate',
  index = candidate_index, concurrentSearches = 
concurrentSearches)

  # start the benchmark - this can take long depending on your index and 
machines
  comp.benchmark("baseline_vs_patch")
{code}


was (Author: gf2121):
Thanks [~gsmiller] ! Yes I do thought i would get some regression in 
sparse-hits tasks and the result suprised me too. Maybe we should thank to the 
powerful {{{}ForUtil{}}}? :)

By reading codes in LUCENE-10033, i suspect there are two reasons that could 
probably lead to a more obvious regression than this approach:

1. 10033 approach computes bpv for each small block and need to read the 
pointer from a {{DirectMonotonicReader}} before seeking. While this approach is 
using a global bpv and pointers can be computed by {{{}offset + blockBytes * 
block{}}}. This could be faster. A global bpv can lead larger index size but i 
think it acceptable since it's what we used to do.

2. 10033 approach decode offset/gcd/delta for each block, some of them could be 
auto-vectorized but still a bit heavier. This approach is trying to make the 
decoding of blocks as simple as possible and jobs like gcd decoding is only 
done for hit docs.

I'm not really sure these are major reasons but these reasons should make the 
benchmark result a bit more explainable. By the way, here is my localrun 
script. I post it here in case there is something wrong with it. (I personally 
added ('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run 
without this, but i'm not sure this is correct since the readme did not mention 
it)
{code:python}
if __name__ == '__main__':
  sourceData = competition.sourceData()
  comp =  competition.Competition()

  facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', 
"RandomLabel"))
  index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
  candidate_index = comp.newIndex('lucene_candidate', sourceData, 
facets=facets, indexSort='dayOfYearNumericDV:long')

  #Warning -- Do not break the order of arguments
  #TODO -- Fix the following by using argparser
  if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches':
concurrentSearches = True
  else:
concurrentSearches = False

  # create a competitor named baseline with sources in the ../trunk folder

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-22 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463911#comment-17463911
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/23/21, 3:08 AM:
--

Thanks [~gsmiller] ! Yes I do thought i would get some regression in 
sparse-hits tasks and the result suprised me too. Maybe we should thank to the 
powerful {{{}ForUtil{}}}? :)

By reading codes in LUCENE-10033, i suspect there are two reasons that could 
probably lead to a more obvious regression than this approach:

1. 10033 approach computes bpv for each small block and need to read the 
pointer from a {{DirectMonotonicReader}} before seeking. While this approach is 
using a global bpv and pointers can be computed by {{{}offset + blockBytes * 
block{}}}. This could be faster. A global bpv can lead larger index size but i 
think it acceptable since it's what we used to do.

2. 10033 approach decode offset/gcd/delta for each block, some of them could be 
auto-vectorized but still a bit heavier. This approach is trying to make the 
decoding of blocks as simple as possible and jobs like gcd decoding is only 
done for hit docs.

I'm not really sure these are major reasons but these reasons should make the 
benchmark result a bit more explainable. By the way, here is my localrun 
script. I post it here in case there is something wrong with it. (I personally 
added ('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run 
without this, but i'm not sure this is correct since the readme did not mention 
it)
{code:python}
if __name__ == '__main__':
  sourceData = competition.sourceData()
  comp =  competition.Competition()

  facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', 
"RandomLabel"))
  index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
  candidate_index = comp.newIndex('lucene_candidate', sourceData, 
facets=facets, indexSort='dayOfYearNumericDV:long')

  #Warning -- Do not break the order of arguments
  #TODO -- Fix the following by using argparser
  if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches':
concurrentSearches = True
  else:
concurrentSearches = False

  # create a competitor named baseline with sources in the ../trunk folder
  comp.competitor('baseline', 'lucene_baseline',
  index = index, concurrentSearches = concurrentSearches)

  comp.competitor('my_modified_version', 'lucene_candidate',
  index = candidate_index, concurrentSearches = 
concurrentSearches)

  # start the benchmark - this can take long depending on your index and 
machines
  comp.benchmark("baseline_vs_patch")
{code}


was (Author: gf2121):
Thanks [~gsmiller] ! Yes I do thought i would get some regression in 
sparse-hits tasks and the result suprised me too. Maybe we should thank to the 
powerful {{{}ForUtil{}}}? :)

By reading codes in LUCENE-10033, i suspect there are two reasons that could 
probably lead to a more obvious regression than this approach:

1. 10033 approach computes bpv for each small block and need to read the 
pointer from a {{DirectMonotonicReader}} before seeking. While this approach is 
using a global bpv and pointers can be computed by {{{}offset + blockBytes * 
block{}}}. This could be faster. A global bpv can lead larger index size but i 
think it acceptable since it's what we used to do.

2. 10033 approach decode offset/gcd/delta for each block, some of them could be 
auto-vectorized but still a bit heavier. This approach is trying to make the 
decoding of blocks as simple as possible and jobs like gcd decoding is only 
done for hit docs.

I'm not really sure these are major reasons but just trying to explain the 
benchmark result here. By the way, here is my localrun script. I post it here 
in case there is something wrong with it. (I personally added 
('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without 
this, but i'm not sure this is correct since the readme did not mention it)
{code:python}
if __name__ == '__main__':
  sourceData = competition.sourceData()
  comp =  competition.Competition()

  facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', 
"RandomLabel"))
  index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
  candidate_index = comp.newIndex('lucene_candidate', sourceData, 
facets=facets, indexSort='dayOfYearNumericDV:long')

  #Warning -- Do not break the order of arguments
  #TODO -- Fix the following by using argparser
  if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches':
concurrentSearches = True
  else:
concurrentSearches = False

  # create a competitor named baseline with sources in the ../trunk folder

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-22 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463911#comment-17463911
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/22/21, 6:25 PM:
--

Thanks [~gsmiller] ! Yes I do thought i would get some regression in 
sparse-hits tasks and the result suprised me too. Maybe we should thank to the 
powerful {{{}ForUtil{}}}? :)

By reading codes in LUCENE-10033, i suspect there are two reasons that could 
probably lead to a more obvious regression than this approach:

1. 10033 approach computes bpv for each small block and need to read the 
pointer from a {{DirectMonotonicReader}} before seeking. While this approach is 
using a global bpv and pointers can be computed by {{{}offset + blockBytes * 
block{}}}. This could be faster. A global bpv can lead larger index size but i 
think it acceptable since it's what we used to do.

2. 10033 approach decode offset/gcd/delta for each block, some of them could be 
auto-vectorized but still a bit heavier. This approach is trying to make the 
decoding of blocks as simple as possible and jobs like gcd decoding is only 
done for hit docs.

I'm not really sure these are major reasons but just trying to explain the 
benchmark result here. By the way, here is my localrun script. I post it here 
in case there is something wrong with it. (I personally added 
('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without 
this, but i'm not sure this is correct since the readme did not mention it)
{code:python}
if __name__ == '__main__':
  sourceData = competition.sourceData()
  comp =  competition.Competition()

  facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', 
"RandomLabel"))
  index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
  candidate_index = comp.newIndex('lucene_candidate', sourceData, 
facets=facets, indexSort='dayOfYearNumericDV:long')

  #Warning -- Do not break the order of arguments
  #TODO -- Fix the following by using argparser
  if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches':
concurrentSearches = True
  else:
concurrentSearches = False

  # create a competitor named baseline with sources in the ../trunk folder
  comp.competitor('baseline', 'lucene_baseline',
  index = index, concurrentSearches = concurrentSearches)

  comp.competitor('my_modified_version', 'lucene_candidate',
  index = candidate_index, concurrentSearches = 
concurrentSearches)

  # start the benchmark - this can take long depending on your index and 
machines
  comp.benchmark("baseline_vs_patch")
{code}


was (Author: gf2121):
Thanks [~gsmiller] ! Yes I do thought i would get some regression in 
sparse-hits tasks and the result suprised me too. Maybe we should thank to the 
powerful {{{}ForUtil{}}}? :)

By reading codes in LUCENE-10033, i suspect there are two reasons that could 
probably lead to a more obvious regression than this approach:

1. 10033 approach computes bpv for each small block and need to read the 
pointer from a {{DirectMonotonicReader}} before seeking. While this approach is 
using a global bpv and pointers can be computed by {{{}offset + blockBytes * 
block{}}}. A global bpv can lead larger index size but i think it acceptable 
since it's what we used to do.

2. 10033 approach decode offset/gcd/delta for each block, some of them could be 
auto-vectorized but still a bit heavier. This approach is trying to make the 
decoding of blocks as simple as possible and jobs like gcd decoding is only 
done for hit docs.

I'm not really sure these are major reasons but just trying to explain the 
benchmark result here. By the way, here is my localrun script. I post it here 
in case there is something wrong with it. (I personally added 
('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without 
this, but i'm not sure this is correct since the readme did not mention it)
{code:python}
if __name__ == '__main__':
  sourceData = competition.sourceData()
  comp =  competition.Competition()

  facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', 
"RandomLabel"))
  index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
  candidate_index = comp.newIndex('lucene_candidate', sourceData, 
facets=facets, indexSort='dayOfYearNumericDV:long')

  #Warning -- Do not break the order of arguments
  #TODO -- Fix the following by using argparser
  if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches':
concurrentSearches = True
  else:
concurrentSearches = False

  # create a competitor named baseline with sources in the ../trunk folder
  comp.competitor('baseline', 'lucene_baseline',

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-22 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463911#comment-17463911
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/22/21, 6:24 PM:
--

Thanks [~gsmiller] ! Yes I do thought i would get some regression in 
sparse-hits tasks and the result suprised me too. Maybe we should thank to the 
powerful {{{}ForUtil{}}}? :)

By reading codes in LUCENE-10033, i suspect there are two reasons that could 
probably lead to a more obvious regression than this approach:

1. 10033 approach computes bpv for each small block and need to read the 
pointer from a {{DirectMonotonicReader}} before seeking. While this approach is 
using a global bpv and pointers can be computed by {{{}offset + blockBytes * 
block{}}}. A global bpv can lead larger index size but i think it acceptable 
since it's what we used to do.

2. 10033 approach decode offset/gcd/delta for each block, some of them could be 
auto-vectorized but still a bit heavier. This approach is trying to make the 
decoding of blocks as simple as possible and jobs like gcd decoding is only 
done for hit docs.

I'm not really sure these are major reasons but just trying to explain the 
benchmark result here. By the way, here is my localrun script. I post it here 
in case there is something wrong with it. (I personally added 
('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without 
this, but i'm not sure this is correct since the readme did not mention it)
{code:python}
if __name__ == '__main__':
  sourceData = competition.sourceData()
  comp =  competition.Competition()

  facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', 
"RandomLabel"))
  index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
  candidate_index = comp.newIndex('lucene_candidate', sourceData, 
facets=facets, indexSort='dayOfYearNumericDV:long')

  #Warning -- Do not break the order of arguments
  #TODO -- Fix the following by using argparser
  if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches':
concurrentSearches = True
  else:
concurrentSearches = False

  # create a competitor named baseline with sources in the ../trunk folder
  comp.competitor('baseline', 'lucene_baseline',
  index = index, concurrentSearches = concurrentSearches)

  comp.competitor('my_modified_version', 'lucene_candidate',
  index = candidate_index, concurrentSearches = 
concurrentSearches)

  # start the benchmark - this can take long depending on your index and 
machines
  comp.benchmark("baseline_vs_patch")
{code}


was (Author: gf2121):
Thanks [~gsmiller] ! Yes I do thought i would get some regression in 
sparse-hits tasks and the result suprised me too. Maybe we should thank to the 
powerful {{{}ForUtil{}}}? :)

By reading codes in LUCENE-10033, i suspect there are two reasons that could 
probably lead to a more obvious regression than this approach:

1. 10033 approach computes bpv for each small block and need to read the 
pointer from a {{DirectMonotonicReader}} before seek. But this approach is 
using a global bpv and pointers can be computed by {{{}offset + blockBytes * 
block{}}}. A global bpv can lead larger index size but i think it acceptable 
since it's what we used to do.

2. 10033 approach decode offset/gcd/delta for each block, some of them could be 
auto-vectorized but still a bit heavier. This approach is trying to make the 
decoding of blocks as simple as possible and jobs like gcd decoding is only 
done for hit docs.

I'm not really sure these are major reasons but just trying to explain the 
benchmark result here. By the way, here is my localrun script. I post it here 
in case there is something wrong with it. (I personally added 
('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without 
this, but i'm not sure this is correct since the readme did not mention it)
{code:python}
if __name__ == '__main__':
  sourceData = competition.sourceData()
  comp =  competition.Competition()

  facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', 
"RandomLabel"))
  index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
  candidate_index = comp.newIndex('lucene_candidate', sourceData, 
facets=facets, indexSort='dayOfYearNumericDV:long')

  #Warning -- Do not break the order of arguments
  #TODO -- Fix the following by using argparser
  if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches':
concurrentSearches = True
  else:
concurrentSearches = False

  # create a competitor named baseline with sources in the ../trunk folder
  comp.competitor('baseline', 'lucene_baseline',
  index = index,

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-22 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463911#comment-17463911
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/22/21, 6:10 PM:
--

Thanks [~gsmiller] ! Yes I do thought i would get some regression in 
sparse-hits tasks and the result suprised me too. Maybe we should thank to the 
powerful {{{}ForUtil{}}}? :)

By reading codes in LUCENE-10033, i suspect there are two reasons that could 
probably lead to a more obvious regression than this approach:

1. 10033 approach computes bpv for each small block and need to read the 
pointer from a {{DirectMonotonicReader}} before seek. But this approach is 
using a global bpv and pointers can be computed by {{{}offset + blockBytes * 
block{}}}. A global bpv can lead larger index size but i think it acceptable 
since it's what we used to do.

2. 10033 approach decode offset/gcd/delta for each block, some of them could be 
auto-vectorized but still a bit heavier. This approach is trying to make the 
decoding of blocks as simple as possible and jobs like gcd decoding is only 
done for hit docs.

I'm not really sure these are major reasons but just trying to explain the 
benchmark result here. By the way, here is my localrun script. I post it here 
in case there is something wrong with it. (I personally added 
('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without 
this, but i'm not sure this is correct since the readme did not mention it)
{code:python}
if __name__ == '__main__':
  sourceData = competition.sourceData()
  comp =  competition.Competition()

  facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', 
"RandomLabel"))
  index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
  candidate_index = comp.newIndex('lucene_candidate', sourceData, 
facets=facets, indexSort='dayOfYearNumericDV:long')

  #Warning -- Do not break the order of arguments
  #TODO -- Fix the following by using argparser
  if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches':
concurrentSearches = True
  else:
concurrentSearches = False

  # create a competitor named baseline with sources in the ../trunk folder
  comp.competitor('baseline', 'lucene_baseline',
  index = index, concurrentSearches = concurrentSearches)

  comp.competitor('my_modified_version', 'lucene_candidate',
  index = candidate_index, concurrentSearches = 
concurrentSearches)

  # start the benchmark - this can take long depending on your index and 
machines
  comp.benchmark("baseline_vs_patch")
{code}


was (Author: gf2121):
Thanks [~gsmiller] ! Yes I do thought i would get some regression in 
sparse-hits tasks and the result suprised me too. Maybe we should thank to the 
powerful {{{}ForUtil{}}}? :)

By reading codes in LUCENE-10033, i suspect there are two reasons that could 
probably lead to a more obvious regression than this approach:

1. 10033 approach computes bpv for each small block and need to read the 
pointer from a {{DirectMonotonicReader}} before seek. But this approach is 
using a global bpv and pointers can be computed by {{{}offset + blockBytes * 
block{}}}. A global bpv can lead larger index size but i think it acceptable 
since it's what we used to do.

2. 10033 approach decode offset/gcd/delta for each block, some of them could be 
auto-vectorized but still a bit heavier. This approach is trying to make the 
decoding of blocks as simple as possible and jobs like gcd decoding is only 
done for hit docs.

I'm not really sure these are major reasons but just trying to explain the 
benchmark result here. By the way, here is my localrun script. I post it here 
in case there is something wrong with it. (I personally added 
('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without 
this, but i'm not sure this is correct since the readme did not mention this)
{code:python}
if __name__ == '__main__':
  sourceData = competition.sourceData()
  comp =  competition.Competition()

  facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', 
"RandomLabel"))
  index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
  candidate_index = comp.newIndex('lucene_candidate', sourceData, 
facets=facets, indexSort='dayOfYearNumericDV:long')

  #Warning -- Do not break the order of arguments
  #TODO -- Fix the following by using argparser
  if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches':
concurrentSearches = True
  else:
concurrentSearches = False

  # create a competitor named baseline with sources in the ../trunk folder
  comp.competitor('baseline', 'lucene_baseline',
  index = index, concurrentSearches

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-22 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463911#comment-17463911
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/22/21, 6:05 PM:
--

Thanks [~gsmiller] ! Yes I do thought i would get some regression in 
sparse-hits tasks and the result suprised me too. Maybe we should thank to the 
powerful {{{}ForUtil{}}}? :)

By reading codes in LUCENE-10033, i suspect there are two reasons that could 
probably lead to a more obvious regression than this approach:

1. 10033 approach computes bpv for each small block and need to read the 
pointer from a {{DirectMonotonicReader}} before seek. But this approach is 
using a global bpv and pointers can be computed by {{{}offset + blockBytes * 
block{}}}. A global bpv can lead larger index size but i think it acceptable 
since it's what we used to do.

2. 10033 approach decode offset/gcd/delta for each block, some of them could be 
auto-vectorized but still a bit heavier. This approach is trying to make the 
decoding of blocks as simple as possible and jobs like gcd decoding is only 
done for hit docs.

I'm not really sure these are major reasons but just trying to explain the 
benchmark result here. By the way, here is my localrun script. I post it here 
in case there is something wrong with it. (I personally added 
('sortedset:RandomLabel', "RandomLabel") because luceneutil can not run without 
this, but i'm not sure this is correct since the readme did not mention this)
{code:python}
if __name__ == '__main__':
  sourceData = competition.sourceData()
  comp =  competition.Competition()

  facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', 
"RandomLabel"))
  index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
  candidate_index = comp.newIndex('lucene_candidate', sourceData, 
facets=facets, indexSort='dayOfYearNumericDV:long')

  #Warning -- Do not break the order of arguments
  #TODO -- Fix the following by using argparser
  if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches':
concurrentSearches = True
  else:
concurrentSearches = False

  # create a competitor named baseline with sources in the ../trunk folder
  comp.competitor('baseline', 'lucene_baseline',
  index = index, concurrentSearches = concurrentSearches)

  comp.competitor('my_modified_version', 'lucene_candidate',
  index = candidate_index, concurrentSearches = 
concurrentSearches)

  # start the benchmark - this can take long depending on your index and 
machines
  comp.benchmark("baseline_vs_patch")
{code}


was (Author: gf2121):
Thanks [~gsmiller] ! Yes I do thought i would get some regression in 
sparse-hits tasks and the result suprised me too. Maybe it should thank to the 
powerful implementation of {{ForUtil}} ?

By the way, here is my localrun script. I post it here in case there is 
something wrong with it.(e.g. I added {{('sortedset:RandomLabel', 
"RandomLabel")}} in facets. This is not mentioned in ReadMe but luceneutil can 
not work without it ) 

{code:python}
if __name__ == '__main__':
  sourceData = competition.sourceData()
  comp =  competition.Competition()

  facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', 
"RandomLabel"))
  index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
  candidate_index = comp.newIndex('lucene_candidate', sourceData, 
facets=facets, indexSort='dayOfYearNumericDV:long')

  #Warning -- Do not break the order of arguments
  #TODO -- Fix the following by using argparser
  if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches':
concurrentSearches = True
  else:
concurrentSearches = False

  # create a competitor named baseline with sources in the ../trunk folder
  comp.competitor('baseline', 'lucene_baseline',
  index = index, concurrentSearches = concurrentSearches)

  comp.competitor('my_modified_version', 'lucene_candidate',
  index = candidate_index, concurrentSearches = 
concurrentSearches)

  # start the benchmark - this can take long depending on your index and 
machines
  comp.benchmark("baseline_vs_patch")
{code}

> Introduce a BlockReader based on ForUtil and use it for NumericDocValues
> 
>
> Key: LUCENE-10334
> URL: https://issues.apache.org/jira/browse/LUCENE-10334
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Previous talk is here: https://github.com/apache/lucene/pull/557
>

[jira] [Comment Edited] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-22 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463911#comment-17463911
 ] 

Feng Guo edited comment on LUCENE-10334 at 12/22/21, 4:34 PM:
--

Thanks [~gsmiller] ! Yes I do thought i would get some regression in 
sparse-hits tasks and the result suprised me too. Maybe it should thank to the 
powerful implementation of {{ForUtil}} ?

By the way, here is my localrun script. I post it here in case there is 
something wrong with it.(e.g. I added {{('sortedset:RandomLabel', 
"RandomLabel")}} in facets. This is not mentioned in ReadMe but luceneutil can 
not work without it ) 

{code:python}
if __name__ == '__main__':
  sourceData = competition.sourceData()
  comp =  competition.Competition()

  facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', 
"RandomLabel"))
  index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
  candidate_index = comp.newIndex('lucene_candidate', sourceData, 
facets=facets, indexSort='dayOfYearNumericDV:long')

  #Warning -- Do not break the order of arguments
  #TODO -- Fix the following by using argparser
  if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches':
concurrentSearches = True
  else:
concurrentSearches = False

  # create a competitor named baseline with sources in the ../trunk folder
  comp.competitor('baseline', 'lucene_baseline',
  index = index, concurrentSearches = concurrentSearches)

  comp.competitor('my_modified_version', 'lucene_candidate',
  index = candidate_index, concurrentSearches = 
concurrentSearches)

  # start the benchmark - this can take long depending on your index and 
machines
  comp.benchmark("baseline_vs_patch")
{code}


was (Author: gf2121):
Thanks [~gsmiller] ! Yes I do thought i would get some regression in 
sparse-result tasks and the result suprised me too. Maybe it should thank to 
the powerful implementation of {{ForUtil}} ?

And i post my localrun script here in case there is something wrong (e.g. I 
added {{'sortedset:RandomLabel', "RandomLabel"}} in facets. This is not 
mentioned in readme but luceneutil can not work without it ) 

{code:python}
if __name__ == '__main__':
  sourceData = competition.sourceData()
  comp =  competition.Competition()

  facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', 
"RandomLabel"))
  index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
  candidate_index = comp.newIndex('lucene_candidate', sourceData, 
facets=facets, indexSort='dayOfYearNumericDV:long')

  #Warning -- Do not break the order of arguments
  #TODO -- Fix the following by using argparser
  if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches':
concurrentSearches = True
  else:
concurrentSearches = False

  # create a competitor named baseline with sources in the ../trunk folder
  comp.competitor('baseline', 'lucene_baseline',
  index = index, concurrentSearches = concurrentSearches)

  comp.competitor('my_modified_version', 'lucene_candidate',
  index = candidate_index, concurrentSearches = 
concurrentSearches)

  # start the benchmark - this can take long depending on your index and 
machines
  comp.benchmark("baseline_vs_patch")
{code}

> Introduce a BlockReader based on ForUtil and use it for NumericDocValues
> 
>
> Key: LUCENE-10334
> URL: https://issues.apache.org/jira/browse/LUCENE-10334
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Previous talk is here: https://github.com/apache/lucene/pull/557
> This is trying to add a new BlockReader based on ForUtil to replace the 
> DirectReader we are using for NumericDocvalues
> *Benchmark based on wiki10m*
> {code:java}
> TaskQPS baseline  StdDevQPS 
> my_modified_version  StdDevPct diff p-value
>OrNotHighHigh  694.17  (8.2%)  685.83  
> (7.0%)   -1.2% ( -15% -   15%) 0.618
>  Respell   75.15  (2.7%)   74.32  
> (2.0%)   -1.1% (  -5% -3%) 0.146
>  Prefix3  220.11  (5.1%)  217.78  
> (5.8%)   -1.1% ( -11% -   10%) 0.541
> Wildcard  129.75  (3.7%)  128.63  
> (2.5%)   -0.9% (  -6% -5%) 0.383
>  LowSpanNear   68.54  (2.1%)   68.00  
> (2.4%)   -0.8% (  -5% -3%) 0.269
>

[jira] [Commented] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-22 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463911#comment-17463911
 ] 

Feng Guo commented on LUCENE-10334:
---

Thanks [~gsmiller] ! Yes I do thought i would get some regression in 
sparse-result tasks and the result suprised me too. Maybe it should thank to 
the powerful implementation of {{ForUtil}} ?

And i post my localrun script here in case there is something wrong (e.g. I 
added {{'sortedset:RandomLabel', "RandomLabel"}} in facets. This is not 
mentioned in readme but luceneutil can not work without it ) 

{code:python}
if __name__ == '__main__':
  sourceData = competition.sourceData()
  comp =  competition.Competition()

  facets = (('taxonomy:Date', 'Date'),('sortedset:Month', 
'Month'),('sortedset:DayOfYear', 'DayOfYear'),('sortedset:RandomLabel', 
"RandomLabel"))
  index = comp.newIndex('lucene_baseline', sourceData, facets=facets, 
indexSort='dayOfYearNumericDV:long')
  candidate_index = comp.newIndex('lucene_candidate', sourceData, 
facets=facets, indexSort='dayOfYearNumericDV:long')

  #Warning -- Do not break the order of arguments
  #TODO -- Fix the following by using argparser
  if len(sys.argv) > 3 and sys.argv[3] == '-concurrentSearches':
concurrentSearches = True
  else:
concurrentSearches = False

  # create a competitor named baseline with sources in the ../trunk folder
  comp.competitor('baseline', 'lucene_baseline',
  index = index, concurrentSearches = concurrentSearches)

  comp.competitor('my_modified_version', 'lucene_candidate',
  index = candidate_index, concurrentSearches = 
concurrentSearches)

  # start the benchmark - this can take long depending on your index and 
machines
  comp.benchmark("baseline_vs_patch")
{code}

> Introduce a BlockReader based on ForUtil and use it for NumericDocValues
> 
>
> Key: LUCENE-10334
> URL: https://issues.apache.org/jira/browse/LUCENE-10334
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Previous talk is here: https://github.com/apache/lucene/pull/557
> This is trying to add a new BlockReader based on ForUtil to replace the 
> DirectReader we are using for NumericDocvalues
> *Benchmark based on wiki10m*
> {code:java}
> TaskQPS baseline  StdDevQPS 
> my_modified_version  StdDevPct diff p-value
>OrNotHighHigh  694.17  (8.2%)  685.83  
> (7.0%)   -1.2% ( -15% -   15%) 0.618
>  Respell   75.15  (2.7%)   74.32  
> (2.0%)   -1.1% (  -5% -3%) 0.146
>  Prefix3  220.11  (5.1%)  217.78  
> (5.8%)   -1.1% ( -11% -   10%) 0.541
> Wildcard  129.75  (3.7%)  128.63  
> (2.5%)   -0.9% (  -6% -5%) 0.383
>  LowSpanNear   68.54  (2.1%)   68.00  
> (2.4%)   -0.8% (  -5% -3%) 0.269
> OrNotHighMed  732.90  (6.8%)  727.49  
> (5.3%)   -0.7% ( -12% -   12%) 0.703
>  BrowseRandomLabelTaxoFacets11879.03  (8.6%)11799.33  
> (5.5%)   -0.7% ( -13% -   14%) 0.769
> HighSloppyPhrase6.87  (2.9%)6.83  
> (2.3%)   -0.6% (  -5% -4%) 0.496
> OrHighNotMed  827.54  (9.2%)  822.94  
> (8.0%)   -0.6% ( -16% -   18%) 0.838
>  MedSpanNear   18.92  (5.7%)   18.82  
> (5.6%)   -0.5% ( -11% -   11%) 0.759
>   OrHighMedDayTaxoFacets   10.27  (4.0%)   10.21  
> (4.3%)   -0.5% (  -8% -8%) 0.676
> PKLookup  207.98  (4.0%)  206.85  
> (2.7%)   -0.5% (  -7% -6%) 0.621
>  LowIntervalsOrdered  159.17  (2.3%)  158.32  
> (2.2%)   -0.5% (  -4% -3%) 0.445
> HighSpanNear6.32  (4.2%)6.28  
> (4.1%)   -0.5% (  -8% -8%) 0.691
>  MedIntervalsOrdered   85.31  (3.2%)   84.88  
> (2.9%)   -0.5% (  -6% -5%) 0.607
> HighTerm 1170.55  (5.8%) 1164.79  
> (3.9%)   -0.5% (  -9% -9%) 0.753
>  LowSloppyPhrase   14.54  (3.1%)   14.48  
> (2.9%)   -0.4% (  -6% -5%) 0.651
>   HighPhrase  112.81  (4.4%)  112.39  
> (4.1%)   -0.4% (  -8% -8%) 0.781
> OrNotHighLow  858.02  (5.9%)  854.99  
> (4.8%)   -0.4% ( -10% -   10%) 0.835
> HighIntervalsOrdered   25.08  (2.8%)   25.00  
> (2.6%)   -0.3% (  -5%

[jira] [Created] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

2021-12-22 Thread Feng Guo (Jira)

Feng Guo created LUCENE-10334:
-

 Summary: Introduce a BlockReader based on ForUtil and use it for 
NumericDocValues
 Key: LUCENE-10334
 URL: https://issues.apache.org/jira/browse/LUCENE-10334
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Feng Guo


Previous talk is here: https://github.com/apache/lucene/pull/557

This is trying to add a new BlockReader based on ForUtil to replace the 
DirectReader we are using for NumericDocvalues

*Benchmark based on wiki10m*

{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
   OrNotHighHigh  694.17  (8.2%)  685.83  
(7.0%)   -1.2% ( -15% -   15%) 0.618
 Respell   75.15  (2.7%)   74.32  
(2.0%)   -1.1% (  -5% -3%) 0.146
 Prefix3  220.11  (5.1%)  217.78  
(5.8%)   -1.1% ( -11% -   10%) 0.541
Wildcard  129.75  (3.7%)  128.63  
(2.5%)   -0.9% (  -6% -5%) 0.383
 LowSpanNear   68.54  (2.1%)   68.00  
(2.4%)   -0.8% (  -5% -3%) 0.269
OrNotHighMed  732.90  (6.8%)  727.49  
(5.3%)   -0.7% ( -12% -   12%) 0.703
 BrowseRandomLabelTaxoFacets11879.03  (8.6%)11799.33  
(5.5%)   -0.7% ( -13% -   14%) 0.769
HighSloppyPhrase6.87  (2.9%)6.83  
(2.3%)   -0.6% (  -5% -4%) 0.496
OrHighNotMed  827.54  (9.2%)  822.94  
(8.0%)   -0.6% ( -16% -   18%) 0.838
 MedSpanNear   18.92  (5.7%)   18.82  
(5.6%)   -0.5% ( -11% -   11%) 0.759
  OrHighMedDayTaxoFacets   10.27  (4.0%)   10.21  
(4.3%)   -0.5% (  -8% -8%) 0.676
PKLookup  207.98  (4.0%)  206.85  
(2.7%)   -0.5% (  -7% -6%) 0.621
 LowIntervalsOrdered  159.17  (2.3%)  158.32  
(2.2%)   -0.5% (  -4% -3%) 0.445
HighSpanNear6.32  (4.2%)6.28  
(4.1%)   -0.5% (  -8% -8%) 0.691
 MedIntervalsOrdered   85.31  (3.2%)   84.88  
(2.9%)   -0.5% (  -6% -5%) 0.607
HighTerm 1170.55  (5.8%) 1164.79  
(3.9%)   -0.5% (  -9% -9%) 0.753
 LowSloppyPhrase   14.54  (3.1%)   14.48  
(2.9%)   -0.4% (  -6% -5%) 0.651
  HighPhrase  112.81  (4.4%)  112.39  
(4.1%)   -0.4% (  -8% -8%) 0.781
OrNotHighLow  858.02  (5.9%)  854.99  
(4.8%)   -0.4% ( -10% -   10%) 0.835
HighIntervalsOrdered   25.08  (2.8%)   25.00  
(2.6%)   -0.3% (  -5% -5%) 0.701
   MedPhrase   27.20  (2.1%)   27.11  
(2.9%)   -0.3% (  -5% -4%) 0.689
MedTermDayTaxoFacets   81.55  (2.3%)   81.35  
(2.9%)   -0.3% (  -5% -5%) 0.762
  IntNRQ   63.36  (2.0%)   63.21  
(2.5%)   -0.2% (  -4% -4%) 0.740
  Fuzzy2   73.24  (5.5%)   73.10  
(6.2%)   -0.2% ( -11% -   12%) 0.916
 AndHighMedDayTaxoFacets   76.08  (3.5%)   75.98  
(3.4%)   -0.1% (  -6% -7%) 0.905
 AndHighHigh   62.20  (2.0%)   62.18  
(2.4%)   -0.0% (  -4% -4%) 0.954
   BrowseMonthTaxoFacets11993.48  (6.7%)11989.53  
(4.8%)   -0.0% ( -10% -   12%) 0.986
OrHighNotLow  732.82  (7.2%)  732.80  
(6.2%)   -0.0% ( -12% -   14%) 0.999
  Fuzzy1   46.43  (5.3%)   46.45  
(6.0%)0.0% ( -10% -   11%) 0.989
 LowTerm 1608.25  (6.0%) 1608.84  
(4.9%)0.0% ( -10% -   11%) 0.983
   OrHighMed   75.90  (2.3%)   75.93  
(1.8%)0.0% (  -3% -4%) 0.939
   LowPhrase  273.81  (2.9%)  274.04  
(3.3%)0.1% (  -5% -6%) 0.932
  AndHighLow  717.24  (6.1%)  718.17  
(3.3%)0.1% (  -8% -   10%) 0.933
AndHighHighDayTaxoFacets   39.63  (2.5%)   39.69  
(2.6%)0.1% (  -4% -5%) 0.862
  OrHighHigh   34.63  (1.8%)   34.68  
(2.0%)0.1% (  -3% -4%) 0.821
 MedSloppyPhrase  158.80  (2.8%)  159.09  
(2.6%)0.2% (  -5% -5%) 0.832
   OrHighLow  257.77  (2.9%)  258.46  
(4.6%)0.3% (  -7% -8%) 0.826
  AndHighMed  133.43  (2.1%)  133.79  
(2.7%)0.3% (

[jira] [Updated] (LUCENE-10333) Speed up BinaryDocValues with a batch reading on LongValues

2021-12-21 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10333:
--
Description: 
*Description*
In {{{}Lucene90DocValuesProducer{}}}, {{BinaryDocValue}} (as well as 
{{SortedNumericDocValues}} not in singleton case) has code patterns like this:
{code:java}
long startOffset = addresses.get(doc);
bytes.length = (int) (addresses.get(doc + 1L) - startOffset);
{code}
This means we need to read 2 longs stored together. We could probably push down 
this info to {{LongValues}} and read 2 values together in one call. I think 
this can make sense because these codes could be rather hot.

*Benchmark*

In today's LuceneUtil benchmark, all results looks even. I suspect this is 
because we do not use {{BinaryDocValues}} any more in tasks. So i tried to 
rollback the baseline and candidate to a stale code version (before 
https://issues.apache.org/jira/browse/LUCENE-10062), we used to use 
{{BinaryDocvalues}} to store taxonomy ordinals in that version, and it can been 
seen a QPS increasing there. (This is tricky, i wonder if there is a more 
official way to benchmark BinaryDocValues by chaging some params or add some 
tasks? ) Anyway, I believe It is still worth optimizing {{BinarayDocValue}} 
though facets do not use it any more :)

*Benchmark result on stale code version where taxonomy ordinals are stored in 
BinaryDocvalues (to justify a speed up in BinaryDocValues)*
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
   BrowseMonthSSDVFacets   17.25  (8.6%)   16.78 
(17.8%)   -2.7% ( -26% -   25%) 0.536
 LowTerm 1458.66  (3.6%) 1438.15  
(4.4%)   -1.4% (  -9% -6%) 0.268
   HighTermDayOfYearSort  108.55 (10.0%)  108.04  
(9.1%)   -0.5% ( -17% -   20%) 0.874
  HighPhrase  168.65  (1.9%)  168.06  
(2.3%)   -0.3% (  -4% -3%) 0.602
OrNotHighLow 1201.79  (3.4%) 1197.93  
(4.6%)   -0.3% (  -8% -7%) 0.801
HighSpanNear   15.26  (1.6%)   15.21  
(1.4%)   -0.3% (  -3% -2%) 0.499
 Respell   62.61  (1.8%)   62.45  
(1.9%)   -0.3% (  -3% -3%) 0.649
   MedPhrase   57.57  (1.4%)   57.44  
(1.8%)   -0.2% (  -3% -2%) 0.648
   OrHighMed  129.10  (3.0%)  128.83  
(3.1%)   -0.2% (  -6% -6%) 0.830
 MedSpanNear   19.45  (2.3%)   19.41  
(2.2%)   -0.2% (  -4% -4%) 0.784
  OrHighHigh   34.85  (1.5%)   34.79  
(1.4%)   -0.2% (  -3% -2%) 0.722
HighIntervalsOrdered   26.92  (4.7%)   26.89  
(4.9%)   -0.1% (  -9% -9%) 0.929
  IntNRQ  343.52  (1.6%)  343.16  
(2.0%)   -0.1% (  -3% -3%) 0.855
   OrHighNotHigh  595.61  (3.2%)  595.10  
(4.3%)   -0.1% (  -7% -7%) 0.944
 MedIntervalsOrdered   17.66  (3.6%)   17.65  
(3.8%)   -0.1% (  -7% -7%) 0.961
 LowIntervalsOrdered  109.23  (3.3%)  109.18  
(3.5%)   -0.0% (  -6% -7%) 0.969
 AndHighHigh   81.09  (1.5%)   81.10  
(2.0%)0.0% (  -3% -3%) 0.967
 LowSpanNear  203.33  (2.1%)  203.41  
(1.8%)0.0% (  -3% -3%) 0.948
 MedSloppyPhrase   27.15  (1.5%)   27.17  
(1.2%)0.1% (  -2% -2%) 0.907
   LowPhrase   75.76  (1.8%)   75.81  
(2.0%)0.1% (  -3% -3%) 0.904
 AndHighMedDayTaxoFacets   97.27  (1.9%)   97.35  
(1.9%)0.1% (  -3% -4%) 0.888
HighSloppyPhrase   14.32  (2.7%)   14.34  
(1.8%)0.1% (  -4% -4%) 0.870
  Fuzzy2   76.00  (3.9%)   76.12  
(3.4%)0.2% (  -6% -7%) 0.894
Wildcard  123.51  (1.8%)  123.71  
(2.1%)0.2% (  -3% -4%) 0.796
OrHighNotLow  722.64  (4.4%)  724.15  
(5.4%)0.2% (  -9% -   10%) 0.894
  AndHighLow  929.73  (4.0%)  931.75  
(3.8%)0.2% (  -7% -8%) 0.859
 Prefix3  240.13  (1.5%)  240.69  
(1.9%)0.2% (  -3% -3%) 0.675
  AndHighMed  210.17  (1.7%)  210.84  
(1.6%)0.3% (  -2% -3%) 0.532
 LowSloppyPhrase  142.83  (1.8%)  143.54  
(2.0%)0.5% (  -3% -4%) 0.410
OrNotHighMed  709.24  (4.4%)  712.78  
(4.3%)0.5% (  -7% -9%)

[jira] [Updated] (LUCENE-10333) Speed up BinaryDocValues with a batch reading on LongValues

2021-12-20 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10333:
--
Description: 
*Description*
In {{{}Lucene90DocValuesProducer{}}}, {{BinaryDocValue}} (as well as 
{{SortedNumericDocValues}} not in singleton case) has code patterns like this:
{code:java}
long startOffset = addresses.get(doc);
bytes.length = (int) (addresses.get(doc + 1L) - startOffset);
{code}
This means we need to read 2 longs stored together. We could probably push down 
this info to {{LongValues}} and read 2 values together in one call. I think 
this can make sense because these codes could be rather hot.

*Benchmark*

In today's LuceneUtil benchmark, all results looks even. I suspect this is 
because we do not use {{BinaryDocValues}} any more in tasks. So i tried to roll 
back the baseline and candidate to a stale code version (before 
https://issues.apache.org/jira/browse/LUCENE-10062), we used to use 
{{BinaryDocvalues}} to store taxonomy ordinals in that version, and it can been 
seen a QPS increasing there. (This is tricky, i wonder if there is a more 
official way to benchmark BinaryDocValues by chaging some params or add some 
tasks? ) Anyway, I believe It is still worth optimizing {{BinarayDocValue}} 
though facets do not use it any more :)

*Benchmark result on stale code version where taxonomy ordinals are stored in 
BinaryDocvalues (to justify a speed up in BinaryDocValues)*
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
   BrowseMonthSSDVFacets   17.25  (8.6%)   16.78 
(17.8%)   -2.7% ( -26% -   25%) 0.536
 LowTerm 1458.66  (3.6%) 1438.15  
(4.4%)   -1.4% (  -9% -6%) 0.268
   HighTermDayOfYearSort  108.55 (10.0%)  108.04  
(9.1%)   -0.5% ( -17% -   20%) 0.874
  HighPhrase  168.65  (1.9%)  168.06  
(2.3%)   -0.3% (  -4% -3%) 0.602
OrNotHighLow 1201.79  (3.4%) 1197.93  
(4.6%)   -0.3% (  -8% -7%) 0.801
HighSpanNear   15.26  (1.6%)   15.21  
(1.4%)   -0.3% (  -3% -2%) 0.499
 Respell   62.61  (1.8%)   62.45  
(1.9%)   -0.3% (  -3% -3%) 0.649
   MedPhrase   57.57  (1.4%)   57.44  
(1.8%)   -0.2% (  -3% -2%) 0.648
   OrHighMed  129.10  (3.0%)  128.83  
(3.1%)   -0.2% (  -6% -6%) 0.830
 MedSpanNear   19.45  (2.3%)   19.41  
(2.2%)   -0.2% (  -4% -4%) 0.784
  OrHighHigh   34.85  (1.5%)   34.79  
(1.4%)   -0.2% (  -3% -2%) 0.722
HighIntervalsOrdered   26.92  (4.7%)   26.89  
(4.9%)   -0.1% (  -9% -9%) 0.929
  IntNRQ  343.52  (1.6%)  343.16  
(2.0%)   -0.1% (  -3% -3%) 0.855
   OrHighNotHigh  595.61  (3.2%)  595.10  
(4.3%)   -0.1% (  -7% -7%) 0.944
 MedIntervalsOrdered   17.66  (3.6%)   17.65  
(3.8%)   -0.1% (  -7% -7%) 0.961
 LowIntervalsOrdered  109.23  (3.3%)  109.18  
(3.5%)   -0.0% (  -6% -7%) 0.969
 AndHighHigh   81.09  (1.5%)   81.10  
(2.0%)0.0% (  -3% -3%) 0.967
 LowSpanNear  203.33  (2.1%)  203.41  
(1.8%)0.0% (  -3% -3%) 0.948
 MedSloppyPhrase   27.15  (1.5%)   27.17  
(1.2%)0.1% (  -2% -2%) 0.907
   LowPhrase   75.76  (1.8%)   75.81  
(2.0%)0.1% (  -3% -3%) 0.904
 AndHighMedDayTaxoFacets   97.27  (1.9%)   97.35  
(1.9%)0.1% (  -3% -4%) 0.888
HighSloppyPhrase   14.32  (2.7%)   14.34  
(1.8%)0.1% (  -4% -4%) 0.870
  Fuzzy2   76.00  (3.9%)   76.12  
(3.4%)0.2% (  -6% -7%) 0.894
Wildcard  123.51  (1.8%)  123.71  
(2.1%)0.2% (  -3% -4%) 0.796
OrHighNotLow  722.64  (4.4%)  724.15  
(5.4%)0.2% (  -9% -   10%) 0.894
  AndHighLow  929.73  (4.0%)  931.75  
(3.8%)0.2% (  -7% -8%) 0.859
 Prefix3  240.13  (1.5%)  240.69  
(1.9%)0.2% (  -3% -3%) 0.675
  AndHighMed  210.17  (1.7%)  210.84  
(1.6%)0.3% (  -2% -3%) 0.532
 LowSloppyPhrase  142.83  (1.8%)  143.54  
(2.0%)0.5% (  -3% -4%) 0.410
OrNotHighMed  709.24  (4.4%)  712.78  
(4.3%)0.5% (  -7% -

[jira] [Updated] (LUCENE-10333) Speed up BinaryDocValues with a batch reading on LongValues

2021-12-20 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10333:
--
Description: 
*Description*
In {{{}Lucene90DocValuesProducer{}}}, {{BinaryDocValue}} (as well as 
{{SortedNumericDocValues}} not in singleton case) has code patterns like this:
{code:java}
long startOffset = addresses.get(doc);
bytes.length = (int) (addresses.get(doc + 1L) - startOffset);
{code}
This means we need to read 2 longs stored together. We could probably push down 
this info to {{LongValues}} and read 2 values together in one call. I think 
this can make sense because these codes could be rather hot.

*Benchmark*

In today's LuceneUtil benchmark, all results looks even. I suspect this is 
because we do not use {{BinaryDocValues}} any more in tasks. So i tried to roll 
back the baseline and candidate to a stale code version (before 
https://issues.apache.org/jira/browse/LUCENE-10062), we used to use 
{{BinaryDocvalues}} to store taxonomy ordinals in that version, and it can been 
seen a QPS increasing there. (This is tricky, i wonder if we can have a more 
official way to benchmark BinaryDocValues by chaging some params or add some 
tasks?) Anyway, I believe It is still worth optimizing {{BinarayDocValue}} 
though facets do not use it any more :)

*Benchmark result on stale code version where taxonomy ordinals are stored in 
BinaryDocvalues (to justify a speed up in BinaryDocValues)*
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
   BrowseMonthSSDVFacets   17.25  (8.6%)   16.78 
(17.8%)   -2.7% ( -26% -   25%) 0.536
 LowTerm 1458.66  (3.6%) 1438.15  
(4.4%)   -1.4% (  -9% -6%) 0.268
   HighTermDayOfYearSort  108.55 (10.0%)  108.04  
(9.1%)   -0.5% ( -17% -   20%) 0.874
  HighPhrase  168.65  (1.9%)  168.06  
(2.3%)   -0.3% (  -4% -3%) 0.602
OrNotHighLow 1201.79  (3.4%) 1197.93  
(4.6%)   -0.3% (  -8% -7%) 0.801
HighSpanNear   15.26  (1.6%)   15.21  
(1.4%)   -0.3% (  -3% -2%) 0.499
 Respell   62.61  (1.8%)   62.45  
(1.9%)   -0.3% (  -3% -3%) 0.649
   MedPhrase   57.57  (1.4%)   57.44  
(1.8%)   -0.2% (  -3% -2%) 0.648
   OrHighMed  129.10  (3.0%)  128.83  
(3.1%)   -0.2% (  -6% -6%) 0.830
 MedSpanNear   19.45  (2.3%)   19.41  
(2.2%)   -0.2% (  -4% -4%) 0.784
  OrHighHigh   34.85  (1.5%)   34.79  
(1.4%)   -0.2% (  -3% -2%) 0.722
HighIntervalsOrdered   26.92  (4.7%)   26.89  
(4.9%)   -0.1% (  -9% -9%) 0.929
  IntNRQ  343.52  (1.6%)  343.16  
(2.0%)   -0.1% (  -3% -3%) 0.855
   OrHighNotHigh  595.61  (3.2%)  595.10  
(4.3%)   -0.1% (  -7% -7%) 0.944
 MedIntervalsOrdered   17.66  (3.6%)   17.65  
(3.8%)   -0.1% (  -7% -7%) 0.961
 LowIntervalsOrdered  109.23  (3.3%)  109.18  
(3.5%)   -0.0% (  -6% -7%) 0.969
 AndHighHigh   81.09  (1.5%)   81.10  
(2.0%)0.0% (  -3% -3%) 0.967
 LowSpanNear  203.33  (2.1%)  203.41  
(1.8%)0.0% (  -3% -3%) 0.948
 MedSloppyPhrase   27.15  (1.5%)   27.17  
(1.2%)0.1% (  -2% -2%) 0.907
   LowPhrase   75.76  (1.8%)   75.81  
(2.0%)0.1% (  -3% -3%) 0.904
 AndHighMedDayTaxoFacets   97.27  (1.9%)   97.35  
(1.9%)0.1% (  -3% -4%) 0.888
HighSloppyPhrase   14.32  (2.7%)   14.34  
(1.8%)0.1% (  -4% -4%) 0.870
  Fuzzy2   76.00  (3.9%)   76.12  
(3.4%)0.2% (  -6% -7%) 0.894
Wildcard  123.51  (1.8%)  123.71  
(2.1%)0.2% (  -3% -4%) 0.796
OrHighNotLow  722.64  (4.4%)  724.15  
(5.4%)0.2% (  -9% -   10%) 0.894
  AndHighLow  929.73  (4.0%)  931.75  
(3.8%)0.2% (  -7% -8%) 0.859
 Prefix3  240.13  (1.5%)  240.69  
(1.9%)0.2% (  -3% -3%) 0.675
  AndHighMed  210.17  (1.7%)  210.84  
(1.6%)0.3% (  -2% -3%) 0.532
 LowSloppyPhrase  142.83  (1.8%)  143.54  
(2.0%)0.5% (  -3% -4%) 0.410
OrNotHighMed  709.24  (4.4%)  712.78  
(4.3%)0.5% (  -7% -

[jira] [Updated] (LUCENE-10333) Speed up BinaryDocValues with a batch reading on LongValues

2021-12-20 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10333:
--
Description: 
*Description*
In {{{}Lucene90DocValuesProducer{}}}, {{BinaryDocValue}} (as well as 
{{SortedNumericDocValues}} not in singleton case) has code patterns like this:
{code:java}
long startOffset = addresses.get(doc);
bytes.length = (int) (addresses.get(doc + 1L) - startOffset);
{code}
This means we need to read 2 longs stored together. We could probably push down 
this info to {{LongValues}} and read 2 values together in one call. I think 
this can make sense because these codes could be rather hot.

*Benchmark*

In today's LuceneUtil benchmark, all results looks even. I suspect this is 
because we do not use {{BinaryDocValues}} any more in tasks. So i tried to roll 
back the baseline and candidate to a stale code version (before 
https://issues.apache.org/jira/browse/LUCENE-10062), we used 
{{BinaryDocvalues}} to store taxonomy ordinals in that version, and it can been 
seen a QPS increasing there. (This is tricky, i wonder if we can have a more 
official way to benchmark BinaryDocValues by chaging some params or add some 
tasks?) Anyway, I believe It is still worth optimizing {{BinarayDocValue}} 
though facets do not use it any more :)

*Benchmark result on stale code version where taxonomy ordinals are stored in 
BinaryDocvalues (to justify a speed up in BinaryDocValues)*
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
   BrowseMonthSSDVFacets   17.25  (8.6%)   16.78 
(17.8%)   -2.7% ( -26% -   25%) 0.536
 LowTerm 1458.66  (3.6%) 1438.15  
(4.4%)   -1.4% (  -9% -6%) 0.268
   HighTermDayOfYearSort  108.55 (10.0%)  108.04  
(9.1%)   -0.5% ( -17% -   20%) 0.874
  HighPhrase  168.65  (1.9%)  168.06  
(2.3%)   -0.3% (  -4% -3%) 0.602
OrNotHighLow 1201.79  (3.4%) 1197.93  
(4.6%)   -0.3% (  -8% -7%) 0.801
HighSpanNear   15.26  (1.6%)   15.21  
(1.4%)   -0.3% (  -3% -2%) 0.499
 Respell   62.61  (1.8%)   62.45  
(1.9%)   -0.3% (  -3% -3%) 0.649
   MedPhrase   57.57  (1.4%)   57.44  
(1.8%)   -0.2% (  -3% -2%) 0.648
   OrHighMed  129.10  (3.0%)  128.83  
(3.1%)   -0.2% (  -6% -6%) 0.830
 MedSpanNear   19.45  (2.3%)   19.41  
(2.2%)   -0.2% (  -4% -4%) 0.784
  OrHighHigh   34.85  (1.5%)   34.79  
(1.4%)   -0.2% (  -3% -2%) 0.722
HighIntervalsOrdered   26.92  (4.7%)   26.89  
(4.9%)   -0.1% (  -9% -9%) 0.929
  IntNRQ  343.52  (1.6%)  343.16  
(2.0%)   -0.1% (  -3% -3%) 0.855
   OrHighNotHigh  595.61  (3.2%)  595.10  
(4.3%)   -0.1% (  -7% -7%) 0.944
 MedIntervalsOrdered   17.66  (3.6%)   17.65  
(3.8%)   -0.1% (  -7% -7%) 0.961
 LowIntervalsOrdered  109.23  (3.3%)  109.18  
(3.5%)   -0.0% (  -6% -7%) 0.969
 AndHighHigh   81.09  (1.5%)   81.10  
(2.0%)0.0% (  -3% -3%) 0.967
 LowSpanNear  203.33  (2.1%)  203.41  
(1.8%)0.0% (  -3% -3%) 0.948
 MedSloppyPhrase   27.15  (1.5%)   27.17  
(1.2%)0.1% (  -2% -2%) 0.907
   LowPhrase   75.76  (1.8%)   75.81  
(2.0%)0.1% (  -3% -3%) 0.904
 AndHighMedDayTaxoFacets   97.27  (1.9%)   97.35  
(1.9%)0.1% (  -3% -4%) 0.888
HighSloppyPhrase   14.32  (2.7%)   14.34  
(1.8%)0.1% (  -4% -4%) 0.870
  Fuzzy2   76.00  (3.9%)   76.12  
(3.4%)0.2% (  -6% -7%) 0.894
Wildcard  123.51  (1.8%)  123.71  
(2.1%)0.2% (  -3% -4%) 0.796
OrHighNotLow  722.64  (4.4%)  724.15  
(5.4%)0.2% (  -9% -   10%) 0.894
  AndHighLow  929.73  (4.0%)  931.75  
(3.8%)0.2% (  -7% -8%) 0.859
 Prefix3  240.13  (1.5%)  240.69  
(1.9%)0.2% (  -3% -3%) 0.675
  AndHighMed  210.17  (1.7%)  210.84  
(1.6%)0.3% (  -2% -3%) 0.532
 LowSloppyPhrase  142.83  (1.8%)  143.54  
(2.0%)0.5% (  -3% -4%) 0.410
OrNotHighMed  709.24  (4.4%)  712.78  
(4.3%)0.5% (  -7% -9%)

[jira] [Created] (LUCENE-10333) Speed up BinaryDocValues with a batch reading on LongValues

2021-12-20 Thread Feng Guo (Jira)

Feng Guo created LUCENE-10333:
-

 Summary: Speed up BinaryDocValues with a batch reading on 
LongValues
 Key: LUCENE-10333
 URL: https://issues.apache.org/jira/browse/LUCENE-10333
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Feng Guo


*Description*
In {{{}Lucene90DocValuesProducer{}}}, {{BinaryDocValue}} (as well as 
{{SortedNumericDocValues}} not in singleton case) has code patterns like this:
{code:java}
long startOffset = addresses.get(doc);
bytes.length = (int) (addresses.get(doc + 1L) - startOffset);
{code}
This means we need to read 2 longs stored together. We could probably push down 
this info to {{LongValues}} and read 2 values together in one call. I think 
this can make sense because these codes could be rather hot.

*Benchmark*

In today's LuceneUtil benchmark, all results looks even. I suspect this is 
because we do not use {{BinaryDocValues}} any more in tasks. So i tried to roll 
back the baseline and candidate to a stale code version (before 
https://issues.apache.org/jira/browse/LUCENE-10062), we used 
{{BinaryDocvalues}} to store taxonomy ordinals in that version, and it can been 
seen a QPS increasing there. (This is tricky, i wonder if we can have a more 
official way to benchmark BinaryDocValues by chaging some params or add some 
tasks?) Anyway, I believe It is still worth optimizing {{BinarayDocValue}} 
though facets do not use it any more :)

*Benchmark result on stale code version where taxonomy ordinals are stored in 
BinaryDocvalues (to justivy a speed up in BinaryDocValues)*
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
   BrowseMonthSSDVFacets   17.25  (8.6%)   16.78 
(17.8%)   -2.7% ( -26% -   25%) 0.536
 LowTerm 1458.66  (3.6%) 1438.15  
(4.4%)   -1.4% (  -9% -6%) 0.268
   HighTermDayOfYearSort  108.55 (10.0%)  108.04  
(9.1%)   -0.5% ( -17% -   20%) 0.874
  HighPhrase  168.65  (1.9%)  168.06  
(2.3%)   -0.3% (  -4% -3%) 0.602
OrNotHighLow 1201.79  (3.4%) 1197.93  
(4.6%)   -0.3% (  -8% -7%) 0.801
HighSpanNear   15.26  (1.6%)   15.21  
(1.4%)   -0.3% (  -3% -2%) 0.499
 Respell   62.61  (1.8%)   62.45  
(1.9%)   -0.3% (  -3% -3%) 0.649
   MedPhrase   57.57  (1.4%)   57.44  
(1.8%)   -0.2% (  -3% -2%) 0.648
   OrHighMed  129.10  (3.0%)  128.83  
(3.1%)   -0.2% (  -6% -6%) 0.830
 MedSpanNear   19.45  (2.3%)   19.41  
(2.2%)   -0.2% (  -4% -4%) 0.784
  OrHighHigh   34.85  (1.5%)   34.79  
(1.4%)   -0.2% (  -3% -2%) 0.722
HighIntervalsOrdered   26.92  (4.7%)   26.89  
(4.9%)   -0.1% (  -9% -9%) 0.929
  IntNRQ  343.52  (1.6%)  343.16  
(2.0%)   -0.1% (  -3% -3%) 0.855
   OrHighNotHigh  595.61  (3.2%)  595.10  
(4.3%)   -0.1% (  -7% -7%) 0.944
 MedIntervalsOrdered   17.66  (3.6%)   17.65  
(3.8%)   -0.1% (  -7% -7%) 0.961
 LowIntervalsOrdered  109.23  (3.3%)  109.18  
(3.5%)   -0.0% (  -6% -7%) 0.969
 AndHighHigh   81.09  (1.5%)   81.10  
(2.0%)0.0% (  -3% -3%) 0.967
 LowSpanNear  203.33  (2.1%)  203.41  
(1.8%)0.0% (  -3% -3%) 0.948
 MedSloppyPhrase   27.15  (1.5%)   27.17  
(1.2%)0.1% (  -2% -2%) 0.907
   LowPhrase   75.76  (1.8%)   75.81  
(2.0%)0.1% (  -3% -3%) 0.904
 AndHighMedDayTaxoFacets   97.27  (1.9%)   97.35  
(1.9%)0.1% (  -3% -4%) 0.888
HighSloppyPhrase   14.32  (2.7%)   14.34  
(1.8%)0.1% (  -4% -4%) 0.870
  Fuzzy2   76.00  (3.9%)   76.12  
(3.4%)0.2% (  -6% -7%) 0.894
Wildcard  123.51  (1.8%)  123.71  
(2.1%)0.2% (  -3% -4%) 0.796
OrHighNotLow  722.64  (4.4%)  724.15  
(5.4%)0.2% (  -9% -   10%) 0.894
  AndHighLow  929.73  (4.0%)  931.75  
(3.8%)0.2% (  -7% -8%) 0.859
 Prefix3  240.13  (1.5%)  240.69  
(1.9%)0.2% (  -3% -3%) 0.675
  AndHighMed  210.17  (1.7%)  210.84  
(1.6%)0.3% (  -2% -3%) 0.532
 LowSloppyPhrase

[jira] [Commented] (LUCENE-10332) Speed up Facets by enable batch reading of LongValues

2021-12-20 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462758#comment-17462758
 ] 

Feng Guo commented on LUCENE-10332:
---

Sorry that i was developing on a old branch, missing this optimization: 
https://github.com/apache/lucene/pull/443, I'll take a further look but close 
this now.

> Speed up Facets by enable batch reading of LongValues
> -
>
> Key: LUCENE-10332
> URL: https://issues.apache.org/jira/browse/LUCENE-10332
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In {{Lucene90DocValuesProducer}}, there are several places reading LongValues 
> like this pattern: 
> {code:java}
> long startOffset = addresses.get(doc);
> bytes.length = (int) (addresses.get(doc + 1L) - startOffset);
> {code}
> In these cases, we are needing to read 2 numbers stored together. It would be 
> great if we can read 2 longs once. The luceneutil benchmark shows that some 
> Facets tasks were speed up nearly 20% by this approach:
> *Benchmark*
> {code:java}
> TaskQPS baseline  StdDevQPS 
> my_modified_version  StdDevPct diff p-value
>BrowseMonthSSDVFacets   17.25  (8.6%)   16.78 
> (17.8%)   -2.7% ( -26% -   25%) 0.536
>  LowTerm 1458.66  (3.6%) 1438.15  
> (4.4%)   -1.4% (  -9% -6%) 0.268
>HighTermDayOfYearSort  108.55 (10.0%)  108.04  
> (9.1%)   -0.5% ( -17% -   20%) 0.874
>   HighPhrase  168.65  (1.9%)  168.06  
> (2.3%)   -0.3% (  -4% -3%) 0.602
> OrNotHighLow 1201.79  (3.4%) 1197.93  
> (4.6%)   -0.3% (  -8% -7%) 0.801
> HighSpanNear   15.26  (1.6%)   15.21  
> (1.4%)   -0.3% (  -3% -2%) 0.499
>  Respell   62.61  (1.8%)   62.45  
> (1.9%)   -0.3% (  -3% -3%) 0.649
>MedPhrase   57.57  (1.4%)   57.44  
> (1.8%)   -0.2% (  -3% -2%) 0.648
>OrHighMed  129.10  (3.0%)  128.83  
> (3.1%)   -0.2% (  -6% -6%) 0.830
>  MedSpanNear   19.45  (2.3%)   19.41  
> (2.2%)   -0.2% (  -4% -4%) 0.784
>   OrHighHigh   34.85  (1.5%)   34.79  
> (1.4%)   -0.2% (  -3% -2%) 0.722
> HighIntervalsOrdered   26.92  (4.7%)   26.89  
> (4.9%)   -0.1% (  -9% -9%) 0.929
>   IntNRQ  343.52  (1.6%)  343.16  
> (2.0%)   -0.1% (  -3% -3%) 0.855
>OrHighNotHigh  595.61  (3.2%)  595.10  
> (4.3%)   -0.1% (  -7% -7%) 0.944
>  MedIntervalsOrdered   17.66  (3.6%)   17.65  
> (3.8%)   -0.1% (  -7% -7%) 0.961
>  LowIntervalsOrdered  109.23  (3.3%)  109.18  
> (3.5%)   -0.0% (  -6% -7%) 0.969
>  AndHighHigh   81.09  (1.5%)   81.10  
> (2.0%)0.0% (  -3% -3%) 0.967
>  LowSpanNear  203.33  (2.1%)  203.41  
> (1.8%)0.0% (  -3% -3%) 0.948
>  MedSloppyPhrase   27.15  (1.5%)   27.17  
> (1.2%)0.1% (  -2% -2%) 0.907
>LowPhrase   75.76  (1.8%)   75.81  
> (2.0%)0.1% (  -3% -3%) 0.904
>  AndHighMedDayTaxoFacets   97.27  (1.9%)   97.35  
> (1.9%)0.1% (  -3% -4%) 0.888
> HighSloppyPhrase   14.32  (2.7%)   14.34  
> (1.8%)0.1% (  -4% -4%) 0.870
>   Fuzzy2   76.00  (3.9%)   76.12  
> (3.4%)0.2% (  -6% -7%) 0.894
> Wildcard  123.51  (1.8%)  123.71  
> (2.1%)0.2% (  -3% -4%) 0.796
> OrHighNotLow  722.64  (4.4%)  724.15  
> (5.4%)0.2% (  -9% -   10%) 0.894
>   AndHighLow  929.73  (4.0%)  931.75  
> (3.8%)0.2% (  -7% -8%) 0.859
>  Prefix3  240.13  (1.5%)  240.69  
> (1.9%)0.2% (  -3% -3%) 0.675
>   AndHighMed  210.17  (1.7%)  210.84  
> (1.6%)0.3% (  -2% -3%) 0.532
>  LowSloppyPhrase  142.83  (1.8%)  143.54  
> (2.0%)0.5% (  -3% -4%) 0.410
> OrNotHighMed  709.24  (4.4%)  712.78  
> (4.3%)0.5% (  -7% -9%) 0.715
>   Fuzzy1   85.33  (5.7%)

[jira] [Resolved] (LUCENE-10332) Speed up Facets by enable batch reading of LongValues

2021-12-20 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo resolved LUCENE-10332.
---
Resolution: Won't Do

> Speed up Facets by enable batch reading of LongValues
> -
>
> Key: LUCENE-10332
> URL: https://issues.apache.org/jira/browse/LUCENE-10332
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In {{Lucene90DocValuesProducer}}, there are several places reading LongValues 
> like this pattern: 
> {code:java}
> long startOffset = addresses.get(doc);
> bytes.length = (int) (addresses.get(doc + 1L) - startOffset);
> {code}
> In these cases, we are needing to read 2 numbers stored together. It would be 
> great if we can read 2 longs once. The luceneutil benchmark shows that some 
> Facets tasks were speed up nearly 20% by this approach:
> *Benchmark*
> {code:java}
> TaskQPS baseline  StdDevQPS 
> my_modified_version  StdDevPct diff p-value
>BrowseMonthSSDVFacets   17.25  (8.6%)   16.78 
> (17.8%)   -2.7% ( -26% -   25%) 0.536
>  LowTerm 1458.66  (3.6%) 1438.15  
> (4.4%)   -1.4% (  -9% -6%) 0.268
>HighTermDayOfYearSort  108.55 (10.0%)  108.04  
> (9.1%)   -0.5% ( -17% -   20%) 0.874
>   HighPhrase  168.65  (1.9%)  168.06  
> (2.3%)   -0.3% (  -4% -3%) 0.602
> OrNotHighLow 1201.79  (3.4%) 1197.93  
> (4.6%)   -0.3% (  -8% -7%) 0.801
> HighSpanNear   15.26  (1.6%)   15.21  
> (1.4%)   -0.3% (  -3% -2%) 0.499
>  Respell   62.61  (1.8%)   62.45  
> (1.9%)   -0.3% (  -3% -3%) 0.649
>MedPhrase   57.57  (1.4%)   57.44  
> (1.8%)   -0.2% (  -3% -2%) 0.648
>OrHighMed  129.10  (3.0%)  128.83  
> (3.1%)   -0.2% (  -6% -6%) 0.830
>  MedSpanNear   19.45  (2.3%)   19.41  
> (2.2%)   -0.2% (  -4% -4%) 0.784
>   OrHighHigh   34.85  (1.5%)   34.79  
> (1.4%)   -0.2% (  -3% -2%) 0.722
> HighIntervalsOrdered   26.92  (4.7%)   26.89  
> (4.9%)   -0.1% (  -9% -9%) 0.929
>   IntNRQ  343.52  (1.6%)  343.16  
> (2.0%)   -0.1% (  -3% -3%) 0.855
>OrHighNotHigh  595.61  (3.2%)  595.10  
> (4.3%)   -0.1% (  -7% -7%) 0.944
>  MedIntervalsOrdered   17.66  (3.6%)   17.65  
> (3.8%)   -0.1% (  -7% -7%) 0.961
>  LowIntervalsOrdered  109.23  (3.3%)  109.18  
> (3.5%)   -0.0% (  -6% -7%) 0.969
>  AndHighHigh   81.09  (1.5%)   81.10  
> (2.0%)0.0% (  -3% -3%) 0.967
>  LowSpanNear  203.33  (2.1%)  203.41  
> (1.8%)0.0% (  -3% -3%) 0.948
>  MedSloppyPhrase   27.15  (1.5%)   27.17  
> (1.2%)0.1% (  -2% -2%) 0.907
>LowPhrase   75.76  (1.8%)   75.81  
> (2.0%)0.1% (  -3% -3%) 0.904
>  AndHighMedDayTaxoFacets   97.27  (1.9%)   97.35  
> (1.9%)0.1% (  -3% -4%) 0.888
> HighSloppyPhrase   14.32  (2.7%)   14.34  
> (1.8%)0.1% (  -4% -4%) 0.870
>   Fuzzy2   76.00  (3.9%)   76.12  
> (3.4%)0.2% (  -6% -7%) 0.894
> Wildcard  123.51  (1.8%)  123.71  
> (2.1%)0.2% (  -3% -4%) 0.796
> OrHighNotLow  722.64  (4.4%)  724.15  
> (5.4%)0.2% (  -9% -   10%) 0.894
>   AndHighLow  929.73  (4.0%)  931.75  
> (3.8%)0.2% (  -7% -8%) 0.859
>  Prefix3  240.13  (1.5%)  240.69  
> (1.9%)0.2% (  -3% -3%) 0.675
>   AndHighMed  210.17  (1.7%)  210.84  
> (1.6%)0.3% (  -2% -3%) 0.532
>  LowSloppyPhrase  142.83  (1.8%)  143.54  
> (2.0%)0.5% (  -3% -4%) 0.410
> OrNotHighMed  709.24  (4.4%)  712.78  
> (4.3%)0.5% (  -7% -9%) 0.715
>   Fuzzy1   85.33  (5.7%)   85.77  
> (6.3%)0.5% ( -10% -   13%) 0.786
>  MedTerm 1466.50  (3.5%) 1474.85  
> (3.9%)0.6% (  -6% -8%) 0.629
>

[jira] [Created] (LUCENE-10332) Speed up Facets by enable batch reading of LongValues

2021-12-20 Thread Feng Guo (Jira)

Feng Guo created LUCENE-10332:
-

 Summary: Speed up Facets by enable batch reading of LongValues
 Key: LUCENE-10332
 URL: https://issues.apache.org/jira/browse/LUCENE-10332
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Feng Guo


In {{Lucene90DocValuesProducer}}, there are several places reading LongValues 
like this pattern: 

{code:java}
long startOffset = addresses.get(doc);
bytes.length = (int) (addresses.get(doc + 1L) - startOffset);
{code}

In these cases, we are needing to read 2 numbers stored together. It would be 
great if we can read 2 longs once. The luceneutil benchmark shows that some 
Facets tasks were speed up nearly 20% by this approach:

*Benchmark*
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
   BrowseMonthSSDVFacets   17.25  (8.6%)   16.78 
(17.8%)   -2.7% ( -26% -   25%) 0.536
 LowTerm 1458.66  (3.6%) 1438.15  
(4.4%)   -1.4% (  -9% -6%) 0.268
   HighTermDayOfYearSort  108.55 (10.0%)  108.04  
(9.1%)   -0.5% ( -17% -   20%) 0.874
  HighPhrase  168.65  (1.9%)  168.06  
(2.3%)   -0.3% (  -4% -3%) 0.602
OrNotHighLow 1201.79  (3.4%) 1197.93  
(4.6%)   -0.3% (  -8% -7%) 0.801
HighSpanNear   15.26  (1.6%)   15.21  
(1.4%)   -0.3% (  -3% -2%) 0.499
 Respell   62.61  (1.8%)   62.45  
(1.9%)   -0.3% (  -3% -3%) 0.649
   MedPhrase   57.57  (1.4%)   57.44  
(1.8%)   -0.2% (  -3% -2%) 0.648
   OrHighMed  129.10  (3.0%)  128.83  
(3.1%)   -0.2% (  -6% -6%) 0.830
 MedSpanNear   19.45  (2.3%)   19.41  
(2.2%)   -0.2% (  -4% -4%) 0.784
  OrHighHigh   34.85  (1.5%)   34.79  
(1.4%)   -0.2% (  -3% -2%) 0.722
HighIntervalsOrdered   26.92  (4.7%)   26.89  
(4.9%)   -0.1% (  -9% -9%) 0.929
  IntNRQ  343.52  (1.6%)  343.16  
(2.0%)   -0.1% (  -3% -3%) 0.855
   OrHighNotHigh  595.61  (3.2%)  595.10  
(4.3%)   -0.1% (  -7% -7%) 0.944
 MedIntervalsOrdered   17.66  (3.6%)   17.65  
(3.8%)   -0.1% (  -7% -7%) 0.961
 LowIntervalsOrdered  109.23  (3.3%)  109.18  
(3.5%)   -0.0% (  -6% -7%) 0.969
 AndHighHigh   81.09  (1.5%)   81.10  
(2.0%)0.0% (  -3% -3%) 0.967
 LowSpanNear  203.33  (2.1%)  203.41  
(1.8%)0.0% (  -3% -3%) 0.948
 MedSloppyPhrase   27.15  (1.5%)   27.17  
(1.2%)0.1% (  -2% -2%) 0.907
   LowPhrase   75.76  (1.8%)   75.81  
(2.0%)0.1% (  -3% -3%) 0.904
 AndHighMedDayTaxoFacets   97.27  (1.9%)   97.35  
(1.9%)0.1% (  -3% -4%) 0.888
HighSloppyPhrase   14.32  (2.7%)   14.34  
(1.8%)0.1% (  -4% -4%) 0.870
  Fuzzy2   76.00  (3.9%)   76.12  
(3.4%)0.2% (  -6% -7%) 0.894
Wildcard  123.51  (1.8%)  123.71  
(2.1%)0.2% (  -3% -4%) 0.796
OrHighNotLow  722.64  (4.4%)  724.15  
(5.4%)0.2% (  -9% -   10%) 0.894
  AndHighLow  929.73  (4.0%)  931.75  
(3.8%)0.2% (  -7% -8%) 0.859
 Prefix3  240.13  (1.5%)  240.69  
(1.9%)0.2% (  -3% -3%) 0.675
  AndHighMed  210.17  (1.7%)  210.84  
(1.6%)0.3% (  -2% -3%) 0.532
 LowSloppyPhrase  142.83  (1.8%)  143.54  
(2.0%)0.5% (  -3% -4%) 0.410
OrNotHighMed  709.24  (4.4%)  712.78  
(4.3%)0.5% (  -7% -9%) 0.715
  Fuzzy1   85.33  (5.7%)   85.77  
(6.3%)0.5% ( -10% -   13%) 0.786
 MedTerm 1466.50  (3.5%) 1474.85  
(3.9%)0.6% (  -6% -8%) 0.629
  TermDTSort  105.51  (7.7%)  106.33  
(7.3%)0.8% ( -13% -   17%) 0.746
PKLookup  206.18  (2.9%)  208.68  
(2.9%)1.2% (  -4% -7%) 0.179
OrHighNotMed  876.71  (3.0%)  887.84  
(3.9%)1.3% (  -5% -8%) 0.251
   OrNotHighHigh  774.25  (4.7%)  785.03  
(6.0%)1.4% (  -8% -   12%)

[jira] [Commented] (LUCENE-10329) Use Computed Mask For DirectMonotonicReader#get

2021-12-20 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462559#comment-17462559
 ] 

Feng Guo commented on LUCENE-10329:
---

The PR 552 was not linked to this Jira issue so i opened a new 553, but it 
seems both of them are linked here now... Please ignore 552 and look at 553 
then :)

> Use Computed Mask For DirectMonotonicReader#get
> ---
>
> Key: LUCENE-10329
> URL: https://issues.apache.org/jira/browse/LUCENE-10329
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I saw {{DirectMonotonicReader#get}} was the hot method during when running 
> luceneutil test, So using a computed mask for DirectMonotonicReader#get 
> instead of computing it for every call may make a bit sense  :)
> {code:java}
> PERCENT   CPU SAMPLES   STACK
> 14.07%66936 
> org.apache.lucene.util.packed.DirectMonotonicReader#get()
> 5.93% 28198 
> org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$17#binaryValue()
> 5.44% 25858 
> org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
> 5.27% 25052 
> org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
> 4.48% 21310 java.nio.ByteBuffer#get()
> 1.83% 8722  java.nio.Buffer#position()
> 1.80% 8573  
> jdk.internal.misc.ScopedMemoryAccess#getByteInternal()
> 1.80% 8558  
> org.apache.lucene.store.ByteBufferGuard#ensureValid()
> 1.79% 8537  java.nio.Buffer#scope()
> 1.67% 7939  
> org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
> 1.51% 7182  
> org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()
> 1.43% 6781  java.nio.Buffer#nextGetIndex()
> 1.40% 6657  java.nio.Buffer#checkIndex()
> 1.26% 5979  
> org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
> 1.19% 5670  jdk.internal.misc.Unsafe#convEndian()
> 1.17% 5565  java.nio.DirectByteBuffer#ix()
> 1.12% 5310  
> org.apache.lucene.search.BooleanScorer$OrCollector#collect()
> 1.07% 5075  org.apache.lucene.store.ByteBufferGuard#getShort()
> 1.06% 5065  org.apache.lucene.search.ConjunctionDISI#doNext()
> 1.03% 4914  
> jdk.internal.util.Preconditions#checkFromIndexSize()
> 1.02% 4869  
> jdk.internal.misc.ScopedMemoryAccess#getShortUnalignedInternal()
> 0.99% 4719  
> org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#seek()
> 0.96% 4587  java.nio.DirectByteBuffer#get()
> 0.96% 4587  
> org.apache.lucene.search.MultiCollector$MultiLeafCollector#collect()
> 0.94% 4460  
> org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
> 0.90% 4297  
> org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#count()
> 0.84% 3996  
> org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
> 0.79% 3769  
> org.apache.lucene.search.BooleanScorer#scoreDocument()
> 0.77% 3648  
> org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc()
> 0.75% 3572  
> org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10329) Use Computed Mask For DirectMonotonicReader#get

2021-12-20 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462543#comment-17462543
 ] 

Feng Guo commented on LUCENE-10329:
---

*Benchmark result (a bit improved)*

{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
   HighTermDayOfYearSort   90.49 (18.0%)   88.62 
(11.1%)   -2.1% ( -26% -   32%) 0.661
  TermDTSort   86.84 (10.6%)   85.83  
(8.7%)   -1.2% ( -18% -   20%) 0.705
 Prefix3  140.90  (5.7%)  139.70  
(7.0%)   -0.8% ( -12% -   12%) 0.675
   HighTermMonthSort  176.44 (14.0%)  175.30 
(11.8%)   -0.6% ( -23% -   29%) 0.875
  HighPhrase  430.23  (3.2%)  428.04  
(3.6%)   -0.5% (  -7% -6%) 0.635
   OrNotHighHigh  676.66  (4.2%)  673.70  
(3.2%)   -0.4% (  -7% -7%) 0.711
OrHighNotLow  823.43  (3.5%)  822.17  
(4.5%)   -0.2% (  -7% -8%) 0.905
Wildcard  121.59  (2.0%)  121.41  
(2.6%)   -0.2% (  -4% -4%) 0.840
 AndHighMedDayTaxoFacets   55.32  (1.8%)   55.26  
(1.9%)   -0.1% (  -3% -3%) 0.851
HighSpanNear   14.79  (2.6%)   14.77  
(2.5%)   -0.1% (  -5% -5%) 0.899
   BrowseMonthSSDVFacets   17.99  (9.8%)   17.98  
(9.6%)   -0.1% ( -17% -   21%) 0.982
HighIntervalsOrdered   21.05  (3.1%)   21.04  
(3.2%)   -0.0% (  -6% -6%) 0.966
 MedIntervalsOrdered   18.57  (3.0%)   18.57  
(2.9%)   -0.0% (  -5% -6%) 0.983
 MedSpanNear   88.82  (2.0%)   88.86  
(2.1%)0.0% (  -3% -4%) 0.943
 LowSpanNear  154.86  (1.9%)  154.95  
(1.6%)0.1% (  -3% -3%) 0.916
 Respell   65.43  (2.2%)   65.51  
(2.3%)0.1% (  -4% -4%) 0.862
   BrowseDayOfYearSSDVFacets   16.76 (11.3%)   16.79 
(11.6%)0.2% ( -20% -   25%) 0.963
   LowPhrase  513.12  (2.7%)  514.01  
(3.1%)0.2% (  -5% -6%) 0.850
  IntNRQ  288.28  (1.3%)  288.90  
(1.2%)0.2% (  -2% -2%) 0.586
 LowSloppyPhrase  214.50  (2.4%)  215.09  
(2.2%)0.3% (  -4% -5%) 0.706
 LowIntervalsOrdered  202.73  (2.8%)  203.30  
(2.9%)0.3% (  -5% -6%) 0.757
  OrHighHigh   41.48  (1.8%)   41.64  
(2.0%)0.4% (  -3% -4%) 0.524
OrNotHighMed  809.16  (5.0%)  812.45  
(3.1%)0.4% (  -7% -8%) 0.757
  AndHighLow  665.08  (3.1%)  668.14  
(3.3%)0.5% (  -5% -7%) 0.649
PKLookup  211.67  (3.1%)  212.66  
(3.3%)0.5% (  -5% -7%) 0.644
   MedPhrase  304.39  (2.5%)  305.90  
(2.3%)0.5% (  -4% -5%) 0.519
  AndHighMed  157.06  (4.0%)  157.89  
(4.0%)0.5% (  -7% -8%) 0.678
 AndHighHigh   99.07  (2.6%)   99.69  
(3.6%)0.6% (  -5% -7%) 0.534
  Fuzzy2   36.32  (3.6%)   36.55  
(3.7%)0.6% (  -6% -8%) 0.579
   OrHighMed   80.10  (2.4%)   80.62  
(1.8%)0.6% (  -3% -4%) 0.329
 LowTerm 1411.61  (3.0%) 1421.53  
(4.3%)0.7% (  -6% -8%) 0.549
AndHighHighDayTaxoFacets   12.47  (2.6%)   12.56  
(2.5%)0.8% (  -4% -6%) 0.343
 MedSloppyPhrase   37.22  (1.6%)   37.51  
(1.7%)0.8% (  -2% -4%) 0.138
  Fuzzy1   60.37  (4.8%)   60.87  
(4.3%)0.8% (  -7% -   10%) 0.564
   OrHighLow  565.69  (4.2%)  570.55  
(4.3%)0.9% (  -7% -9%) 0.523
HighTerm 1167.96  (5.0%) 1178.00  
(5.8%)0.9% (  -9% -   12%) 0.615
 MedTerm 1392.49  (4.3%) 1404.95  
(4.0%)0.9% (  -7% -9%) 0.496
  OrHighMedDayTaxoFacets4.17  (2.1%)4.21  
(2.4%)1.0% (  -3% -5%) 0.189
MedTermDayTaxoFacets   21.41  (1.8%)   21.61  
(2.0%)1.0% (  -2% -4%) 0.115
HighSloppyPhrase3.74  (2.8%)3.77  
(3.0%)1.0% (  -4% -6%) 0.298
OrNotHighLow 1020.98  (4.6%) 1034.25  
(3.9%)1.3% (  -6% -   10%)

[jira] [Updated] (LUCENE-10329) Use Computed Mask For DirectMonotonicReader#get

2021-12-20 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10329:
--
Attachment: (was: screenshot-1.png)

> Use Computed Mask For DirectMonotonicReader#get
> ---
>
> Key: LUCENE-10329
> URL: https://issues.apache.org/jira/browse/LUCENE-10329
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>
> I saw {{DirectMonotonicReader#get}} was the hot method during when running 
> luceneutil test, So using a computed mask for DirectMonotonicReader#get 
> instead of computing it for every call may make a bit sense  :)
> {code:java}
> PERCENT   CPU SAMPLES   STACK
> 14.07%66936 
> org.apache.lucene.util.packed.DirectMonotonicReader#get()
> 5.93% 28198 
> org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$17#binaryValue()
> 5.44% 25858 
> org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
> 5.27% 25052 
> org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
> 4.48% 21310 java.nio.ByteBuffer#get()
> 1.83% 8722  java.nio.Buffer#position()
> 1.80% 8573  
> jdk.internal.misc.ScopedMemoryAccess#getByteInternal()
> 1.80% 8558  
> org.apache.lucene.store.ByteBufferGuard#ensureValid()
> 1.79% 8537  java.nio.Buffer#scope()
> 1.67% 7939  
> org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
> 1.51% 7182  
> org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()
> 1.43% 6781  java.nio.Buffer#nextGetIndex()
> 1.40% 6657  java.nio.Buffer#checkIndex()
> 1.26% 5979  
> org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
> 1.19% 5670  jdk.internal.misc.Unsafe#convEndian()
> 1.17% 5565  java.nio.DirectByteBuffer#ix()
> 1.12% 5310  
> org.apache.lucene.search.BooleanScorer$OrCollector#collect()
> 1.07% 5075  org.apache.lucene.store.ByteBufferGuard#getShort()
> 1.06% 5065  org.apache.lucene.search.ConjunctionDISI#doNext()
> 1.03% 4914  
> jdk.internal.util.Preconditions#checkFromIndexSize()
> 1.02% 4869  
> jdk.internal.misc.ScopedMemoryAccess#getShortUnalignedInternal()
> 0.99% 4719  
> org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#seek()
> 0.96% 4587  java.nio.DirectByteBuffer#get()
> 0.96% 4587  
> org.apache.lucene.search.MultiCollector$MultiLeafCollector#collect()
> 0.94% 4460  
> org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
> 0.90% 4297  
> org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#count()
> 0.84% 3996  
> org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
> 0.79% 3769  
> org.apache.lucene.search.BooleanScorer#scoreDocument()
> 0.77% 3648  
> org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc()
> 0.75% 3572  
> org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10329) Use Computed Mask For DirectMonotonicReader#get

2021-12-20 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10329:
--
Description: 
I saw {{DirectMonotonicReader#get}} was the hot method during when running 
luceneutil test, So using a computed mask for DirectMonotonicReader#get instead 
of computing it for every call may make a bit sense  :)


{code:java}
PERCENT   CPU SAMPLES   STACK
14.07%66936 
org.apache.lucene.util.packed.DirectMonotonicReader#get()
5.93% 28198 
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$17#binaryValue()
5.44% 25858 
org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
5.27% 25052 
org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
4.48% 21310 java.nio.ByteBuffer#get()
1.83% 8722  java.nio.Buffer#position()
1.80% 8573  
jdk.internal.misc.ScopedMemoryAccess#getByteInternal()
1.80% 8558  
org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.79% 8537  java.nio.Buffer#scope()
1.67% 7939  
org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
1.51% 7182  
org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()
1.43% 6781  java.nio.Buffer#nextGetIndex()
1.40% 6657  java.nio.Buffer#checkIndex()
1.26% 5979  
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
1.19% 5670  jdk.internal.misc.Unsafe#convEndian()
1.17% 5565  java.nio.DirectByteBuffer#ix()
1.12% 5310  
org.apache.lucene.search.BooleanScorer$OrCollector#collect()
1.07% 5075  org.apache.lucene.store.ByteBufferGuard#getShort()
1.06% 5065  org.apache.lucene.search.ConjunctionDISI#doNext()
1.03% 4914  jdk.internal.util.Preconditions#checkFromIndexSize()
1.02% 4869  
jdk.internal.misc.ScopedMemoryAccess#getShortUnalignedInternal()
0.99% 4719  
org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#seek()
0.96% 4587  java.nio.DirectByteBuffer#get()
0.96% 4587  
org.apache.lucene.search.MultiCollector$MultiLeafCollector#collect()
0.94% 4460  
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
0.90% 4297  
org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#count()
0.84% 3996  
org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
0.79% 3769  
org.apache.lucene.search.BooleanScorer#scoreDocument()
0.77% 3648  
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc()
0.75% 3572  
org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
{code}


  was:Use a computed mask for {{DirectMonotonicReader#get}} instead of 
computing it for every call.


> Use Computed Mask For DirectMonotonicReader#get
> ---
>
> Key: LUCENE-10329
> URL: https://issues.apache.org/jira/browse/LUCENE-10329
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
> Attachments: screenshot-1.png
>
>
> I saw {{DirectMonotonicReader#get}} was the hot method during when running 
> luceneutil test, So using a computed mask for DirectMonotonicReader#get 
> instead of computing it for every call may make a bit sense  :)
> {code:java}
> PERCENT   CPU SAMPLES   STACK
> 14.07%66936 
> org.apache.lucene.util.packed.DirectMonotonicReader#get()
> 5.93% 28198 
> org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$17#binaryValue()
> 5.44% 25858 
> org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
> 5.27% 25052 
> org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
> 4.48% 21310 java.nio.ByteBuffer#get()
> 1.83% 8722  java.nio.Buffer#position()
> 1.80% 8573  
> jdk.internal.misc.ScopedMemoryAccess#getByteInternal()
> 1.80% 8558  
> org.apache.lucene.store.ByteBufferGuard#ensureValid()
> 1.79% 8537  java.nio.Buffer#scope()
> 1.67% 7939  
> org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
> 1.51% 7182  
> org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()
> 1.43% 6781  java.nio.Buffer#nextGetIndex()
> 1.40% 6657  java.nio.Buffer#checkIndex()
> 1.26% 5979  
>

[jira] [Updated] (LUCENE-10329) Use Computed Mask For DirectMonotonicReader#get

2021-12-20 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10329:
--
Attachment: screenshot-1.png

> Use Computed Mask For DirectMonotonicReader#get
> ---
>
> Key: LUCENE-10329
> URL: https://issues.apache.org/jira/browse/LUCENE-10329
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
> Attachments: screenshot-1.png
>
>
> Use a computed mask for {{DirectMonotonicReader#get}} instead of computing it 
> for every call.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10329) Use Computed Mask For DirectMonotonicReader#get

2021-12-20 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10329:
--
Description: Use a computed mask for {{DirectMonotonicReader#get}} instead 
of computing it for every call.  (was: Uss a computed mask for 
{{DirectMonotonicReader#get}} instead of computing it for every call.)

> Use Computed Mask For DirectMonotonicReader#get
> ---
>
> Key: LUCENE-10329
> URL: https://issues.apache.org/jira/browse/LUCENE-10329
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>
> Use a computed mask for {{DirectMonotonicReader#get}} instead of computing it 
> for every call.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10329) Use Computed Mask For DirectMonotonicReader#get

2021-12-20 Thread Feng Guo (Jira)

Feng Guo created LUCENE-10329:
-

 Summary: Use Computed Mask For DirectMonotonicReader#get
 Key: LUCENE-10329
 URL: https://issues.apache.org/jira/browse/LUCENE-10329
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Feng Guo


Uss a computed mask for {{DirectMonotonicReader#get}} instead of computing it 
for every call.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil

2021-12-20 Thread Feng Guo (Jira)



[ https://issues.apache.org/jira/browse/LUCENE-10315 ]


Feng Guo deleted comment on LUCENE-10315:
---

was (Author: gf2121):
The optimization can only be triggered when {{count == 
BKDConfig#DEFAULT_MAX_POINTS_IN_LEAF_NODE}}, This is fragile because users can 
customize the {{maxPointsInLeaf}} in the Codec, leading the optimization 
meaningless. Here are some ways i can think of to address this:

1. Directly drop the support of customizing {{maxPointsInLeaf}}, like what we 
do in postings.
2. Generate a series of ForUtils, like {{ForUitil128}}, {{ForUitil256}}, 
{{ForUitil512}}, {{ForUtil1024}} ... and make some notes to hint users to 
choose values from them.


> Speed up BKD leaf block ids codec by a 512 ints ForUtil
> ---
>
> Key: LUCENE-10315
> URL: https://issues.apache.org/jira/browse/LUCENE-10315
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Feng Guo
>Priority: Major
> Attachments: addall.svg
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Elasticsearch (which based on lucene) can automatically infers types for 
> users with its dynamic mapping feature. When users index some low cardinality 
> fields, such as gender / age / status... they often use some numbers to 
> represent the values, while ES will infer these fields as {{{}long{}}}, and 
> ES uses BKD as the index of {{long}} fields. When the data volume grows, 
> building the result set of low-cardinality fields will make the CPU usage and 
> load very high.
> This is a flame graph we obtained from the production environment:
> [^addall.svg]
> It can be seen that almost all CPU is used in addAll. When we reindex 
> {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly 
> reduced ( We spent weeks of time to reindex all indices... ). I know that ES 
> recommended to use {{keyword}} for term/terms query and {{long}} for range 
> query in the document, but there are always some users who didn't realize 
> this and keep their habit of using sql database, or dynamic mapping 
> automatically selects the type for them. All in all, users won't realize that 
> there would be such a big difference in performance between {{long}} and 
> {{keyword}} fields in low cardinality fields. So from my point of view it 
> will make sense if we can make BKD works better for the low/medium 
> cardinality fields.
> As far as i can see, for low cardinality fields, there are two advantages of 
> {{keyword}} over {{{}long{}}}:
> 1. {{ForUtil}} used in {{keyword}} postings is much more efficient than BKD's 
> delta VInt, because its batch reading (readLongs) and SIMD decode.
> 2. When the query term count is less than 16, {{TermsInSetQuery}} can lazily 
> materialize of its result set, and when another small result clause 
> intersects with this low cardinality condition, the low cardinality field can 
> avoid reading all docIds into memory.
> This ISSUE is targeting to solve the first point. The basic idea is trying to 
> use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked this optimization 
> by mocking some random {{LongPoint}} and querying them with 
> {{PointInSetQuery}}.
> *Benchmark Result*
> |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff 
> percentage|
> |1|32|1|51.44|148.26|188.22%|
> |1|32|2|26.8|101.88|280.15%|
> |1|32|4|14.04|53.52|281.20%|
> |1|32|8|7.04|28.54|305.40%|
> |1|32|16|3.54|14.61|312.71%|
> |1|128|1|110.56|350.26|216.81%|
> |1|128|8|16.6|89.81|441.02%|
> |1|128|16|8.45|48.07|468.88%|
> |1|128|32|4.2|25.35|503.57%|
> |1|128|64|2.13|13.02|511.27%|
> |1|1024|1|536.19|843.88|57.38%|
> |1|1024|8|109.71|251.89|129.60%|
> |1|1024|32|33.24|104.11|213.21%|
> |1|1024|128|8.87|30.47|243.52%|
> |1|1024|512|2.24|8.3|270.54%|
> |1|8192|1|.33|5000|50.00%|
> |1|8192|32|139.47|214.59|53.86%|
> |1|8192|128|54.59|109.23|100.09%|
> |1|8192|512|15.61|36.15|131.58%|
> |1|8192|2048|4.11|11.14|171.05%|
> |1|1048576|1|2597.4|3030.3|16.67%|
> |1|1048576|32|314.96|371.75|18.03%|
> |1|1048576|128|99.7|116.28|16.63%|
> |1|1048576|512|30.5|37.15|21.80%|
> |1|1048576|2048|10.38|12.3|18.50%|
> |1|8388608|1|2564.1|3174.6|23.81%|
> |1|8388608|32|196.27|238.95|21.75%|
> |1|8388608|128|55.36|68.03|22.89%|
> |1|8388608|512|15.58|19.24|23.49%|
> |1|8388608|2048|4.56|5.71|25.22%|
> The indices size is reduced for low cardinality fields and flat for high 
> cardinality fields.
> {code:java}
> 113Mindex_1_doc_32_cardinality_baseline
> 114Mindex_1_doc_32_cardinality_candidate
> 140Mindex_1_doc_128_cardinality_baseline
> 133M

[jira] [Comment Edited] (LUCENE-10319) Make ForUtil#BLOCK_SIZE changeable

2021-12-19 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461367#comment-17461367
 ] 

Feng Guo edited comment on LUCENE-10319 at 12/19/21, 9:25 AM:
--

Out of curiosity, I tried to run the luceneutil wikimedium1m for block size = 
256, but got an error there:
{code:java}
WARNING: cat=AndHighHigh: hit counts differ: 10274+ vs 10884+
WARNING: cat=HighTerm: hit counts differ: 5969+ vs 9423+
WARNING: cat=LowTerm: hit counts differ: 2394+ vs 3325+
WARNING: cat=MedTerm: hit counts differ: 4558+ vs 7118+
WARNING: cat=OrHighHigh: hit counts differ: 5986+ vs 5987+
WARNING: cat=OrHighMed: hit counts differ: 3044+ vs 3445+
Traceback (most recent call last):
  File 
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/localrun.py",
 line 60, in 
comp.benchmark("baseline_vs_patch")
  File 
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/competition.py",
 line 494, in benchmark
searchBench.run(id, base, challenger,
  File 
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/searchBench.py",
 line 196, in run
raise RuntimeError('errors occurred: %s' % str(cmpDiffs))
RuntimeError: errors occurred: ([], ['query=+body:web +body:up filter=None 
sort=None groupField=None hitCount=10274+: wrong hitCount: 10274+ vs 10884+', 
'query=body:he body:resulting filter=None sort=None groupField=None 
hitCount=3044+: wrong hitCount: 3044+ vs 3445+', 'query=body:official 
filter=None sort=None groupField=None hitCount=4558+: wrong hitCount: 4558+ vs 
7118+', 'query=body:thumb filter=None sort=None groupField=None hitCount=5969+: 
wrong hitCount: 5969+ vs 9423+', 'query=body:years body:pages filter=None 
sort=None groupField=None hitCount=5986+: wrong hitCount: 5986+ vs 5987+', 
'query=body:goods filter=None sort=None groupField=None hitCount=2394+: wrong 
hitCount: 2394+ vs 3325+'], 1.0)
{code}
I guess this error may be something about MaxScore optimization? So i changed 
the {{#TOTAL_HITS_THRESHOLD}} to a very large number for both baseline and 
candidate  and rerun the benchmark, everything looks good now and i got a 
rather good report.
But notice that this report does *not* really make sense since we changed the 
{{{}#TOTAL_HITS_THRESHOLD{}}}, this is just to verify the results are right.
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  Fuzzy1  118.73 (11.5%)  114.82 
(13.0%)   -3.3% ( -24% -   23%) 0.407
 LowTerm 2369.88  (9.2%) 2323.31  
(5.7%)   -2.0% ( -15% -   14%) 0.428
PKLookup  250.07  (5.0%)  245.42  
(4.3%)   -1.9% ( -10% -7%) 0.214
 Prefix3  306.43  (6.9%)  301.82  
(7.0%)   -1.5% ( -14% -   13%) 0.502
Wildcard  221.77  (5.2%)  218.64  
(4.0%)   -1.4% ( -10% -8%) 0.348
   HighTermMonthSort 1161.02 (12.7%) 1156.95 
(11.1%)   -0.4% ( -21% -   26%) 0.928
   BrowseDayOfYearSSDVFacets  140.62  (1.3%)  140.48  
(1.1%)   -0.1% (  -2% -2%) 0.791
  Fuzzy2   47.51  (8.9%)   47.57  
(7.0%)0.1% ( -14% -   17%) 0.961
 Respell  200.51  (2.7%)  200.82  
(1.4%)0.2% (  -3% -4%) 0.823
   OrHighMed  197.90  (3.0%)  198.36  
(3.6%)0.2% (  -6% -7%) 0.830
   BrowseMonthSSDVFacets  152.24  (2.8%)  152.74  
(1.0%)0.3% (  -3% -4%) 0.630
   OrHighLow  245.11  (3.5%)  245.97  
(3.1%)0.4% (  -6% -7%) 0.744
  AndHighLow 1598.05  (7.2%) 1604.55  
(4.6%)0.4% ( -10% -   13%) 0.836
   BrowseDayOfYearTaxoFacets   28.84  (3.0%)   28.99  
(3.3%)0.5% (  -5% -7%) 0.603
  OrHighHigh  109.37  (4.2%)  110.14  
(4.0%)0.7% (  -7% -9%) 0.599
   BrowseMonthTaxoFacets   30.77  (3.5%)   31.00  
(4.1%)0.8% (  -6% -8%) 0.541
BrowseDateTaxoFacets   28.71  (3.2%)   28.93  
(3.3%)0.8% (  -5% -7%) 0.461
   HighTermDayOfYearSort  593.30 (13.5%)  599.82 
(13.2%)1.1% ( -22% -   32%) 0.800
 AndHighHigh  441.62  (5.0%)  452.99  
(4.1%)2.6% (  -6% -   12%) 0.083
  IntNRQ  121.71  (6.2%)  124.89  
(4.2%)2.6% (  -7% -   13%) 0.127
HighTerm  599.78  (4.2%)  615.86  
(2.6%)2.7% (  -3% -9%) 0.019
 MedSloppyPhrase  397.69  (3.1%)  411.46  
(3.3%)3.5% (  -2% -   10%)

[jira] [Updated] (LUCENE-10319) Make ForUtil#BLOCK_SIZE changeable

2021-12-18 Thread Feng Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10319:
--
Description: 
In LUCENE-10315, I tried to generate a {{ForUtil}} whose 
{{{}BLOCK_SIZE=512{}}}, I thought it could be simple since it looks like i only 
need to change the {{BLOCK_SIZE}}, but it turns out that there are a lot of 
values related to the {{BLOCK_SIZE}} but hard coded.

So this approach is trying to make all hard code value related to BLOCK_SIZE to 
be generated from the {{BLOCK_SIZE}} in case we need a different {{BLOCK_SIZE}} 
{{ForUtil}} somewhere else or want to change {{BLOCK_SIZE}} in postings in 
feature.

I tried to make the {{BLOCK_SIZE = 64 / 256}} and all tests passed.

  was:
In LUCENE-10315, I tried to generate a {{ForUtil}} whose 
{{{}BLOCK_SIZE=512{}}}, I thought it could be simple since it looks like i only 
need to change the BLOCK_SIZE, but it turns out that there are a lot of values 
related to the BLOCK_SIZE but hard coded.

So this is trying to make all hard code value generated from the BLOCK_SIZE in 
case we need a ForUtil somewhere else or want to change BLOCK_SIZE in postings 
in feature.

I tried to make the BLOCK_SIZE = 64 / 256 and all tests passed.


> Make ForUtil#BLOCK_SIZE changeable
> --
>
> Key: LUCENE-10319
> URL: https://issues.apache.org/jira/browse/LUCENE-10319
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In LUCENE-10315, I tried to generate a {{ForUtil}} whose 
> {{{}BLOCK_SIZE=512{}}}, I thought it could be simple since it looks like i 
> only need to change the {{BLOCK_SIZE}}, but it turns out that there are a lot 
> of values related to the {{BLOCK_SIZE}} but hard coded.
> So this approach is trying to make all hard code value related to BLOCK_SIZE 
> to be generated from the {{BLOCK_SIZE}} in case we need a different 
> {{BLOCK_SIZE}} {{ForUtil}} somewhere else or want to change {{BLOCK_SIZE}} in 
> postings in feature.
> I tried to make the {{BLOCK_SIZE = 64 / 256}} and all tests passed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10319) Make ForUtil#BLOCK_SIZE changeable

2021-12-17 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461367#comment-17461367
 ] 

Feng Guo edited comment on LUCENE-10319 at 12/17/21, 2:08 PM:
--

Out of curiosity, I tried to run the luceneutil wikimedium1m for block size = 
256, but got an error there:
{code:java}
WARNING: cat=AndHighHigh: hit counts differ: 10274+ vs 10884+
WARNING: cat=HighTerm: hit counts differ: 5969+ vs 9423+
WARNING: cat=LowTerm: hit counts differ: 2394+ vs 3325+
WARNING: cat=MedTerm: hit counts differ: 4558+ vs 7118+
WARNING: cat=OrHighHigh: hit counts differ: 5986+ vs 5987+
WARNING: cat=OrHighMed: hit counts differ: 3044+ vs 3445+
Traceback (most recent call last):
  File 
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/localrun.py",
 line 60, in 
comp.benchmark("baseline_vs_patch")
  File 
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/competition.py",
 line 494, in benchmark
searchBench.run(id, base, challenger,
  File 
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/searchBench.py",
 line 196, in run
raise RuntimeError('errors occurred: %s' % str(cmpDiffs))
RuntimeError: errors occurred: ([], ['query=+body:web +body:up filter=None 
sort=None groupField=None hitCount=10274+: wrong hitCount: 10274+ vs 10884+', 
'query=body:he body:resulting filter=None sort=None groupField=None 
hitCount=3044+: wrong hitCount: 3044+ vs 3445+', 'query=body:official 
filter=None sort=None groupField=None hitCount=4558+: wrong hitCount: 4558+ vs 
7118+', 'query=body:thumb filter=None sort=None groupField=None hitCount=5969+: 
wrong hitCount: 5969+ vs 9423+', 'query=body:years body:pages filter=None 
sort=None groupField=None hitCount=5986+: wrong hitCount: 5986+ vs 5987+', 
'query=body:goods filter=None sort=None groupField=None hitCount=2394+: wrong 
hitCount: 2394+ vs 3325+'], 1.0)
{code}
I guess this error may be something about Impacts? So i changed the 
{{#TOTAL_HITS_THRESHOLD}} to a very large number for both baseline and 
candidate  and rerun the benchmark, everything looks good now and i got a 
rather good report.
But notice that this report does *not* really make sense since we changed the 
{{{}#TOTAL_HITS_THRESHOLD{}}}, this is just to verify the results are right.
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  Fuzzy1  118.73 (11.5%)  114.82 
(13.0%)   -3.3% ( -24% -   23%) 0.407
 LowTerm 2369.88  (9.2%) 2323.31  
(5.7%)   -2.0% ( -15% -   14%) 0.428
PKLookup  250.07  (5.0%)  245.42  
(4.3%)   -1.9% ( -10% -7%) 0.214
 Prefix3  306.43  (6.9%)  301.82  
(7.0%)   -1.5% ( -14% -   13%) 0.502
Wildcard  221.77  (5.2%)  218.64  
(4.0%)   -1.4% ( -10% -8%) 0.348
   HighTermMonthSort 1161.02 (12.7%) 1156.95 
(11.1%)   -0.4% ( -21% -   26%) 0.928
   BrowseDayOfYearSSDVFacets  140.62  (1.3%)  140.48  
(1.1%)   -0.1% (  -2% -2%) 0.791
  Fuzzy2   47.51  (8.9%)   47.57  
(7.0%)0.1% ( -14% -   17%) 0.961
 Respell  200.51  (2.7%)  200.82  
(1.4%)0.2% (  -3% -4%) 0.823
   OrHighMed  197.90  (3.0%)  198.36  
(3.6%)0.2% (  -6% -7%) 0.830
   BrowseMonthSSDVFacets  152.24  (2.8%)  152.74  
(1.0%)0.3% (  -3% -4%) 0.630
   OrHighLow  245.11  (3.5%)  245.97  
(3.1%)0.4% (  -6% -7%) 0.744
  AndHighLow 1598.05  (7.2%) 1604.55  
(4.6%)0.4% ( -10% -   13%) 0.836
   BrowseDayOfYearTaxoFacets   28.84  (3.0%)   28.99  
(3.3%)0.5% (  -5% -7%) 0.603
  OrHighHigh  109.37  (4.2%)  110.14  
(4.0%)0.7% (  -7% -9%) 0.599
   BrowseMonthTaxoFacets   30.77  (3.5%)   31.00  
(4.1%)0.8% (  -6% -8%) 0.541
BrowseDateTaxoFacets   28.71  (3.2%)   28.93  
(3.3%)0.8% (  -5% -7%) 0.461
   HighTermDayOfYearSort  593.30 (13.5%)  599.82 
(13.2%)1.1% ( -22% -   32%) 0.800
 AndHighHigh  441.62  (5.0%)  452.99  
(4.1%)2.6% (  -6% -   12%) 0.083
  IntNRQ  121.71  (6.2%)  124.89  
(4.2%)2.6% (  -7% -   13%) 0.127
HighTerm  599.78  (4.2%)  615.86  
(2.6%)2.7% (  -3% -9%) 0.019
 MedSloppyPhrase  397.69  (3.1%)  411.46  
(3.3%)3.5% (  -2% -   10%) 0.001

[jira] [Comment Edited] (LUCENE-10319) Make ForUtil#BLOCK_SIZE changeable

2021-12-17 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461367#comment-17461367
 ] 

Feng Guo edited comment on LUCENE-10319 at 12/17/21, 12:16 PM:
---

Out of curiosity, I tried to run the luceneutil wikimedium1m for block size = 
256, but got an error there:
{code:java}
WARNING: cat=AndHighHigh: hit counts differ: 10274+ vs 10884+
WARNING: cat=HighTerm: hit counts differ: 5969+ vs 9423+
WARNING: cat=LowTerm: hit counts differ: 2394+ vs 3325+
WARNING: cat=MedTerm: hit counts differ: 4558+ vs 7118+
WARNING: cat=OrHighHigh: hit counts differ: 5986+ vs 5987+
WARNING: cat=OrHighMed: hit counts differ: 3044+ vs 3445+
Traceback (most recent call last):
  File 
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/localrun.py",
 line 60, in 
comp.benchmark("baseline_vs_patch")
  File 
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/competition.py",
 line 494, in benchmark
searchBench.run(id, base, challenger,
  File 
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/searchBench.py",
 line 196, in run
raise RuntimeError('errors occurred: %s' % str(cmpDiffs))
RuntimeError: errors occurred: ([], ['query=+body:web +body:up filter=None 
sort=None groupField=None hitCount=10274+: wrong hitCount: 10274+ vs 10884+', 
'query=body:he body:resulting filter=None sort=None groupField=None 
hitCount=3044+: wrong hitCount: 3044+ vs 3445+', 'query=body:official 
filter=None sort=None groupField=None hitCount=4558+: wrong hitCount: 4558+ vs 
7118+', 'query=body:thumb filter=None sort=None groupField=None hitCount=5969+: 
wrong hitCount: 5969+ vs 9423+', 'query=body:years body:pages filter=None 
sort=None groupField=None hitCount=5986+: wrong hitCount: 5986+ vs 5987+', 
'query=body:goods filter=None sort=None groupField=None hitCount=2394+: wrong 
hitCount: 2394+ vs 3325+'], 1.0)
{code}
I guess this error may be something about Impacts? So i changed the 
{{#TOTAL_HITS_THRESHOLD}} to a very large number for both baseline and 
candidate  and rerun the benchmark, everything looks good now and i got a 
rather good report.
But notice that this report does *not* really make sense since we changed the 
{{{}#TOTAL_HITS_THRESHOLD{}}}, this is just to verify the results are right.
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  Fuzzy1  118.73 (11.5%)  114.82 
(13.0%)   -3.3% ( -24% -   23%) 0.407
 LowTerm 2369.88  (9.2%) 2323.31  
(5.7%)   -2.0% ( -15% -   14%) 0.428
PKLookup  250.07  (5.0%)  245.42  
(4.3%)   -1.9% ( -10% -7%) 0.214
 Prefix3  306.43  (6.9%)  301.82  
(7.0%)   -1.5% ( -14% -   13%) 0.502
Wildcard  221.77  (5.2%)  218.64  
(4.0%)   -1.4% ( -10% -8%) 0.348
   HighTermMonthSort 1161.02 (12.7%) 1156.95 
(11.1%)   -0.4% ( -21% -   26%) 0.928
   BrowseDayOfYearSSDVFacets  140.62  (1.3%)  140.48  
(1.1%)   -0.1% (  -2% -2%) 0.791
  Fuzzy2   47.51  (8.9%)   47.57  
(7.0%)0.1% ( -14% -   17%) 0.961
 Respell  200.51  (2.7%)  200.82  
(1.4%)0.2% (  -3% -4%) 0.823
   OrHighMed  197.90  (3.0%)  198.36  
(3.6%)0.2% (  -6% -7%) 0.830
   BrowseMonthSSDVFacets  152.24  (2.8%)  152.74  
(1.0%)0.3% (  -3% -4%) 0.630
   OrHighLow  245.11  (3.5%)  245.97  
(3.1%)0.4% (  -6% -7%) 0.744
  AndHighLow 1598.05  (7.2%) 1604.55  
(4.6%)0.4% ( -10% -   13%) 0.836
   BrowseDayOfYearTaxoFacets   28.84  (3.0%)   28.99  
(3.3%)0.5% (  -5% -7%) 0.603
  OrHighHigh  109.37  (4.2%)  110.14  
(4.0%)0.7% (  -7% -9%) 0.599
   BrowseMonthTaxoFacets   30.77  (3.5%)   31.00  
(4.1%)0.8% (  -6% -8%) 0.541
BrowseDateTaxoFacets   28.71  (3.2%)   28.93  
(3.3%)0.8% (  -5% -7%) 0.461
   HighTermDayOfYearSort  593.30 (13.5%)  599.82 
(13.2%)1.1% ( -22% -   32%) 0.800
 AndHighHigh  441.62  (5.0%)  452.99  
(4.1%)2.6% (  -6% -   12%) 0.083
  IntNRQ  121.71  (6.2%)  124.89  
(4.2%)2.6% (  -7% -   13%) 0.127
HighTerm  599.78  (4.2%)  615.86  
(2.6%)2.7% (  -3% -9%) 0.019
 MedSloppyPhrase  397.69  (3.1%)  411.46  
(3.3%)3.5% (  -2% -   10%) 0.001

[jira] [Comment Edited] (LUCENE-10319) Make ForUtil#BLOCK_SIZE changeable

2021-12-17 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461367#comment-17461367
 ] 

Feng Guo edited comment on LUCENE-10319 at 12/17/21, 10:50 AM:
---

Out of curiosity, I tried to run the luceneutil wikimedium1m for block size = 
256, but got an error there:
{code:java}
WARNING: cat=AndHighHigh: hit counts differ: 10274+ vs 10884+
WARNING: cat=HighTerm: hit counts differ: 5969+ vs 9423+
WARNING: cat=LowTerm: hit counts differ: 2394+ vs 3325+
WARNING: cat=MedTerm: hit counts differ: 4558+ vs 7118+
WARNING: cat=OrHighHigh: hit counts differ: 5986+ vs 5987+
WARNING: cat=OrHighMed: hit counts differ: 3044+ vs 3445+
Traceback (most recent call last):
  File 
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/localrun.py",
 line 60, in 
comp.benchmark("baseline_vs_patch")
  File 
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/competition.py",
 line 494, in benchmark
searchBench.run(id, base, challenger,
  File 
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/searchBench.py",
 line 196, in run
raise RuntimeError('errors occurred: %s' % str(cmpDiffs))
RuntimeError: errors occurred: ([], ['query=+body:web +body:up filter=None 
sort=None groupField=None hitCount=10274+: wrong hitCount: 10274+ vs 10884+', 
'query=body:he body:resulting filter=None sort=None groupField=None 
hitCount=3044+: wrong hitCount: 3044+ vs 3445+', 'query=body:official 
filter=None sort=None groupField=None hitCount=4558+: wrong hitCount: 4558+ vs 
7118+', 'query=body:thumb filter=None sort=None groupField=None hitCount=5969+: 
wrong hitCount: 5969+ vs 9423+', 'query=body:years body:pages filter=None 
sort=None groupField=None hitCount=5986+: wrong hitCount: 5986+ vs 5987+', 
'query=body:goods filter=None sort=None groupField=None hitCount=2394+: wrong 
hitCount: 2394+ vs 3325+'], 1.0)
{code}
I guess this error may be something about Impacts? So i changed the 
{{#TOTAL_HITS_THRESHOLD}} to a very large number for both baseline and 
candidate  and rerun the benchmark, everything looks good now and i got a 
rather good report.
But notice that this report does *not* really make sense since we changed the 
{{{}#TOTAL_HITS_THRESHOLD{}}}, this is just to verify the results are right.
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  Fuzzy1  118.73 (11.5%)  114.82 
(13.0%)   -3.3% ( -24% -   23%) 0.407
 LowTerm 2369.88  (9.2%) 2323.31  
(5.7%)   -2.0% ( -15% -   14%) 0.428
PKLookup  250.07  (5.0%)  245.42  
(4.3%)   -1.9% ( -10% -7%) 0.214
 Prefix3  306.43  (6.9%)  301.82  
(7.0%)   -1.5% ( -14% -   13%) 0.502
Wildcard  221.77  (5.2%)  218.64  
(4.0%)   -1.4% ( -10% -8%) 0.348
   HighTermMonthSort 1161.02 (12.7%) 1156.95 
(11.1%)   -0.4% ( -21% -   26%) 0.928
   BrowseDayOfYearSSDVFacets  140.62  (1.3%)  140.48  
(1.1%)   -0.1% (  -2% -2%) 0.791
  Fuzzy2   47.51  (8.9%)   47.57  
(7.0%)0.1% ( -14% -   17%) 0.961
 Respell  200.51  (2.7%)  200.82  
(1.4%)0.2% (  -3% -4%) 0.823
   OrHighMed  197.90  (3.0%)  198.36  
(3.6%)0.2% (  -6% -7%) 0.830
   BrowseMonthSSDVFacets  152.24  (2.8%)  152.74  
(1.0%)0.3% (  -3% -4%) 0.630
   OrHighLow  245.11  (3.5%)  245.97  
(3.1%)0.4% (  -6% -7%) 0.744
  AndHighLow 1598.05  (7.2%) 1604.55  
(4.6%)0.4% ( -10% -   13%) 0.836
   BrowseDayOfYearTaxoFacets   28.84  (3.0%)   28.99  
(3.3%)0.5% (  -5% -7%) 0.603
  OrHighHigh  109.37  (4.2%)  110.14  
(4.0%)0.7% (  -7% -9%) 0.599
   BrowseMonthTaxoFacets   30.77  (3.5%)   31.00  
(4.1%)0.8% (  -6% -8%) 0.541
BrowseDateTaxoFacets   28.71  (3.2%)   28.93  
(3.3%)0.8% (  -5% -7%) 0.461
   HighTermDayOfYearSort  593.30 (13.5%)  599.82 
(13.2%)1.1% ( -22% -   32%) 0.800
 AndHighHigh  441.62  (5.0%)  452.99  
(4.1%)2.6% (  -6% -   12%) 0.083
  IntNRQ  121.71  (6.2%)  124.89  
(4.2%)2.6% (  -7% -   13%) 0.127
HighTerm  599.78  (4.2%)  615.86  
(2.6%)2.7% (  -3% -9%) 0.019
 MedSloppyPhrase  397.69  (3.1%)  411.46  
(3.3%)3.5% (  -2% -   10%) 0.001

[jira] [Comment Edited] (LUCENE-10319) Make ForUtil#BLOCK_SIZE changeable

2021-12-17 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461367#comment-17461367
 ] 

Feng Guo edited comment on LUCENE-10319 at 12/17/21, 10:49 AM:
---

Out of curiosity, I tried to run the luceneutil wikimedium1m for block size = 
256, but got an error there:
{code:java}
WARNING: cat=AndHighHigh: hit counts differ: 10274+ vs 10884+
WARNING: cat=HighTerm: hit counts differ: 5969+ vs 9423+
WARNING: cat=LowTerm: hit counts differ: 2394+ vs 3325+
WARNING: cat=MedTerm: hit counts differ: 4558+ vs 7118+
WARNING: cat=OrHighHigh: hit counts differ: 5986+ vs 5987+
WARNING: cat=OrHighMed: hit counts differ: 3044+ vs 3445+
Traceback (most recent call last):
  File 
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/localrun.py",
 line 60, in 
comp.benchmark("baseline_vs_patch")
  File 
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/competition.py",
 line 494, in benchmark
searchBench.run(id, base, challenger,
  File 
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/searchBench.py",
 line 196, in run
raise RuntimeError('errors occurred: %s' % str(cmpDiffs))
RuntimeError: errors occurred: ([], ['query=+body:web +body:up filter=None 
sort=None groupField=None hitCount=10274+: wrong hitCount: 10274+ vs 10884+', 
'query=body:he body:resulting filter=None sort=None groupField=None 
hitCount=3044+: wrong hitCount: 3044+ vs 3445+', 'query=body:official 
filter=None sort=None groupField=None hitCount=4558+: wrong hitCount: 4558+ vs 
7118+', 'query=body:thumb filter=None sort=None groupField=None hitCount=5969+: 
wrong hitCount: 5969+ vs 9423+', 'query=body:years body:pages filter=None 
sort=None groupField=None hitCount=5986+: wrong hitCount: 5986+ vs 5987+', 
'query=body:goods filter=None sort=None groupField=None hitCount=2394+: wrong 
hitCount: 2394+ vs 3325+'], 1.0)
{code}
I guess this error may be something about Impacts? So i changed the 
{{#TOTAL_HITS_THRESHOLD}} to a very large number for both baseline and 
candidate  and rerun the benchmark, everything looks good now and i got a 
rather good report.
But notice that this report does *not* really make sense since we changed the 
{{{}#TOTAL_HITS_THRESHOLD{}}}, this is just to verify the results are right.
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  Fuzzy1  118.73 (11.5%)  114.82 
(13.0%)   -3.3% ( -24% -   23%) 0.407
 LowTerm 2369.88  (9.2%) 2323.31  
(5.7%)   -2.0% ( -15% -   14%) 0.428
PKLookup  250.07  (5.0%)  245.42  
(4.3%)   -1.9% ( -10% -7%) 0.214
 Prefix3  306.43  (6.9%)  301.82  
(7.0%)   -1.5% ( -14% -   13%) 0.502
Wildcard  221.77  (5.2%)  218.64  
(4.0%)   -1.4% ( -10% -8%) 0.348
   HighTermMonthSort 1161.02 (12.7%) 1156.95 
(11.1%)   -0.4% ( -21% -   26%) 0.928
   BrowseDayOfYearSSDVFacets  140.62  (1.3%)  140.48  
(1.1%)   -0.1% (  -2% -2%) 0.791
  Fuzzy2   47.51  (8.9%)   47.57  
(7.0%)0.1% ( -14% -   17%) 0.961
 Respell  200.51  (2.7%)  200.82  
(1.4%)0.2% (  -3% -4%) 0.823
   OrHighMed  197.90  (3.0%)  198.36  
(3.6%)0.2% (  -6% -7%) 0.830
   BrowseMonthSSDVFacets  152.24  (2.8%)  152.74  
(1.0%)0.3% (  -3% -4%) 0.630
   OrHighLow  245.11  (3.5%)  245.97  
(3.1%)0.4% (  -6% -7%) 0.744
  AndHighLow 1598.05  (7.2%) 1604.55  
(4.6%)0.4% ( -10% -   13%) 0.836
   BrowseDayOfYearTaxoFacets   28.84  (3.0%)   28.99  
(3.3%)0.5% (  -5% -7%) 0.603
  OrHighHigh  109.37  (4.2%)  110.14  
(4.0%)0.7% (  -7% -9%) 0.599
   BrowseMonthTaxoFacets   30.77  (3.5%)   31.00  
(4.1%)0.8% (  -6% -8%) 0.541
BrowseDateTaxoFacets   28.71  (3.2%)   28.93  
(3.3%)0.8% (  -5% -7%) 0.461
   HighTermDayOfYearSort  593.30 (13.5%)  599.82 
(13.2%)1.1% ( -22% -   32%) 0.800
 AndHighHigh  441.62  (5.0%)  452.99  
(4.1%)2.6% (  -6% -   12%) 0.083
  IntNRQ  121.71  (6.2%)  124.89  
(4.2%)2.6% (  -7% -   13%) 0.127
HighTerm  599.78  (4.2%)  615.86  
(2.6%)2.7% (  -3% -9%) 0.019
 MedSloppyPhrase  397.69  (3.1%)  411.46  
(3.3%)3.5% (  -2% -   10%) 0.001

[jira] [Commented] (LUCENE-10319) Make ForUtil#BLOCK_SIZE changeable

2021-12-17 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461367#comment-17461367
 ] 

Feng Guo commented on LUCENE-10319:
---

Out of curiosity, I tried to run the luceneutil wikimedium1m for block size = 
256, but got an error there:
{code:java}
WARNING: cat=AndHighHigh: hit counts differ: 10274+ vs 10884+
WARNING: cat=HighTerm: hit counts differ: 5969+ vs 9423+
WARNING: cat=LowTerm: hit counts differ: 2394+ vs 3325+
WARNING: cat=MedTerm: hit counts differ: 4558+ vs 7118+
WARNING: cat=OrHighHigh: hit counts differ: 5986+ vs 5987+
WARNING: cat=OrHighMed: hit counts differ: 3044+ vs 3445+
Traceback (most recent call last):
  File 
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/localrun.py",
 line 60, in 
comp.benchmark("baseline_vs_patch")
  File 
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/competition.py",
 line 494, in benchmark
searchBench.run(id, base, challenger,
  File 
"/Users/gf/Documents/projects/luceneutil/lucene_benchmark/src/python/searchBench.py",
 line 196, in run
raise RuntimeError('errors occurred: %s' % str(cmpDiffs))
RuntimeError: errors occurred: ([], ['query=+body:web +body:up filter=None 
sort=None groupField=None hitCount=10274+: wrong hitCount: 10274+ vs 10884+', 
'query=body:he body:resulting filter=None sort=None groupField=None 
hitCount=3044+: wrong hitCount: 3044+ vs 3445+', 'query=body:official 
filter=None sort=None groupField=None hitCount=4558+: wrong hitCount: 4558+ vs 
7118+', 'query=body:thumb filter=None sort=None groupField=None hitCount=5969+: 
wrong hitCount: 5969+ vs 9423+', 'query=body:years body:pages filter=None 
sort=None groupField=None hitCount=5986+: wrong hitCount: 5986+ vs 5987+', 
'query=body:goods filter=None sort=None groupField=None hitCount=2394+: wrong 
hitCount: 2394+ vs 3325+'], 1.0)
{code}

I guess this error may be something about Impacts? So i changed the 
{{#TOTAL_HITS_THRESHOLD}} to a very large number for both baseline and 
candidate  and rerun the benchmark, everything looks good now and i got a 
rather good report. But this report does *not* really makes sense since we 
changed the {{{}#TOTAL_HITS_THRESHOLD{}}}, just to verify the results are right.
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  Fuzzy1  118.73 (11.5%)  114.82 
(13.0%)   -3.3% ( -24% -   23%) 0.407
 LowTerm 2369.88  (9.2%) 2323.31  
(5.7%)   -2.0% ( -15% -   14%) 0.428
PKLookup  250.07  (5.0%)  245.42  
(4.3%)   -1.9% ( -10% -7%) 0.214
 Prefix3  306.43  (6.9%)  301.82  
(7.0%)   -1.5% ( -14% -   13%) 0.502
Wildcard  221.77  (5.2%)  218.64  
(4.0%)   -1.4% ( -10% -8%) 0.348
   HighTermMonthSort 1161.02 (12.7%) 1156.95 
(11.1%)   -0.4% ( -21% -   26%) 0.928
   BrowseDayOfYearSSDVFacets  140.62  (1.3%)  140.48  
(1.1%)   -0.1% (  -2% -2%) 0.791
  Fuzzy2   47.51  (8.9%)   47.57  
(7.0%)0.1% ( -14% -   17%) 0.961
 Respell  200.51  (2.7%)  200.82  
(1.4%)0.2% (  -3% -4%) 0.823
   OrHighMed  197.90  (3.0%)  198.36  
(3.6%)0.2% (  -6% -7%) 0.830
   BrowseMonthSSDVFacets  152.24  (2.8%)  152.74  
(1.0%)0.3% (  -3% -4%) 0.630
   OrHighLow  245.11  (3.5%)  245.97  
(3.1%)0.4% (  -6% -7%) 0.744
  AndHighLow 1598.05  (7.2%) 1604.55  
(4.6%)0.4% ( -10% -   13%) 0.836
   BrowseDayOfYearTaxoFacets   28.84  (3.0%)   28.99  
(3.3%)0.5% (  -5% -7%) 0.603
  OrHighHigh  109.37  (4.2%)  110.14  
(4.0%)0.7% (  -7% -9%) 0.599
   BrowseMonthTaxoFacets   30.77  (3.5%)   31.00  
(4.1%)0.8% (  -6% -8%) 0.541
BrowseDateTaxoFacets   28.71  (3.2%)   28.93  
(3.3%)0.8% (  -5% -7%) 0.461
   HighTermDayOfYearSort  593.30 (13.5%)  599.82 
(13.2%)1.1% ( -22% -   32%) 0.800
 AndHighHigh  441.62  (5.0%)  452.99  
(4.1%)2.6% (  -6% -   12%) 0.083
  IntNRQ  121.71  (6.2%)  124.89  
(4.2%)2.6% (  -7% -   13%) 0.127
HighTerm  599.78  (4.2%)  615.86  
(2.6%)2.7% (  -3% -9%) 0.019
 MedSloppyPhrase  397.69  (3.1%)  411.46  
(3.3%)3.5% (  -2% -   10%) 0.001
 MedSpanNear   75.75  (2.8%)   78.59  
(1.5%)

[jira] (LUCENE-10319) Make ForUtil#BLOCK_SIZE changeable

2021-12-16 Thread Feng Guo (Jira)



[ https://issues.apache.org/jira/browse/LUCENE-10319 ]


Feng Guo deleted comment on LUCENE-10319:
---

was (Author: gf2121):
Out of curiosity, I run the luceneutil wikimedium1m for block size = 64 / 256, 
I post the result here in case someone would be interested in this :)

*BLOCK_SIZE=64*

{{Index size:}}
{{434M (block size = 128)}}
{{446M (block size = 64)}}
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
  AndHighMed  742.46  (6.2%)  632.83  
(3.9%)  -14.8% ( -23% -   -4%) 0.000
 MedSpanNear  106.50  (2.8%)   92.48  
(3.7%)  -13.2% ( -19% -   -6%) 0.000
 MedSloppyPhrase  147.88  (3.0%)  128.80  
(2.2%)  -12.9% ( -17% -   -7%) 0.000
 LowSloppyPhrase  491.02  (3.7%)  428.92  
(3.5%)  -12.6% ( -19% -   -5%) 0.000
 LowSpanNear  332.59  (3.0%)  292.64  
(3.8%)  -12.0% ( -18% -   -5%) 0.000
 MedIntervalsOrdered   80.37  (3.3%)   71.33  
(2.6%)  -11.2% ( -16% -   -5%) 0.000
 LowIntervalsOrdered  163.87  (3.1%)  145.73  
(2.2%)  -11.1% ( -15% -   -5%) 0.000
HighSloppyPhrase  137.71  (3.8%)  122.61  
(3.4%)  -11.0% ( -17% -   -3%) 0.000
 LowTerm 2787.22  (6.1%) 2488.95  
(6.1%)  -10.7% ( -21% -1%) 0.000
  OrHighHigh  160.41  (3.1%)  144.06  
(3.7%)  -10.2% ( -16% -   -3%) 0.000
HighSpanNear  140.00  (1.7%)  127.69  
(3.0%)   -8.8% ( -13% -   -4%) 0.000
   OrHighMed  258.10  (4.3%)  235.96  
(4.6%)   -8.6% ( -16% -0%) 0.000
HighIntervalsOrdered  257.27  (3.0%)  242.95  
(4.8%)   -5.6% ( -12% -2%) 0.000
 AndHighHigh  248.63  (3.0%)  234.84  
(3.2%)   -5.5% ( -11% -0%) 0.000
   HighTermDayOfYearSort  954.02  (9.5%)  905.20  
(7.4%)   -5.1% ( -20% -   13%) 0.058
  AndHighLow 1550.86  (5.0%) 1498.68  
(4.5%)   -3.4% ( -12% -6%) 0.026
   HighTermMonthSort  633.80 (10.4%)  613.68  
(5.9%)   -3.2% ( -17% -   14%) 0.236
   LowPhrase  547.94  (3.9%)  534.39  
(3.1%)   -2.5% (  -9% -4%) 0.027
 Prefix3  566.20 (11.3%)  554.74  
(8.9%)   -2.0% ( -19% -   20%) 0.529
   MedPhrase  468.94  (3.0%)  461.20  
(4.8%)   -1.7% (  -9% -6%) 0.192
 Respell  149.39  (3.9%)  147.07  
(5.3%)   -1.6% ( -10% -7%) 0.287
   OrHighLow  908.68  (5.2%)  899.50  
(5.3%)   -1.0% ( -10% -   10%) 0.542
  Fuzzy2   75.80 (10.0%)   75.37 
(12.6%)   -0.6% ( -21% -   24%) 0.876
   BrowseMonthSSDVFacets  151.56  (0.7%)  150.73  
(2.8%)   -0.5% (  -4% -2%) 0.399
  Fuzzy1  117.46 (14.0%)  116.84 
(12.6%)   -0.5% ( -23% -   30%) 0.899
   BrowseDayOfYearSSDVFacets  139.72  (0.9%)  139.01  
(1.8%)   -0.5% (  -3% -2%) 0.250
Wildcard  418.32 (11.7%)  416.56 
(11.3%)   -0.4% ( -20% -   25%) 0.908
  IntNRQ  641.72  (5.4%)  643.10  
(5.5%)0.2% ( -10% -   11%) 0.900
  HighPhrase  547.62  (6.0%)  549.35 
(11.0%)0.3% ( -15% -   18%) 0.910
BrowseDateTaxoFacets   29.02  (2.9%)   29.40  
(5.3%)1.3% (  -6% -9%) 0.336
   BrowseMonthTaxoFacets   31.12  (3.7%)   31.52  
(6.4%)1.3% (  -8% -   11%) 0.430
   BrowseDayOfYearTaxoFacets   29.03  (3.2%)   29.42  
(5.3%)1.4% (  -6% -   10%) 0.328
PKLookup  239.41  (2.5%)  242.82  
(4.0%)1.4% (  -4% -8%) 0.174
 MedTerm 2332.72  (4.5%) 2445.01  
(4.6%)4.8% (  -4% -   14%) 0.001
HighTerm 1835.22  (5.3%) 1935.28  
(6.0%)5.5% (  -5% -   17%) 0.002
{code}
*BLOCK_SIZE=256*

{{Index size:}}
{{434M (block size = 128)}}
{{438M (block size = 256)}}
{code:java}
TaskQPS baseline  StdDevQPS my_modified_version 
 StdDevPct diff p-value
 AndHighHigh  214.93  (3.8%)  183.83  
(2.6%)  -14.5% ( -20% -   -8%) 0.000
 MedTerm 2589.52  (4.5%) 2303.67  
(5.5%)  -11.0% ( -20% -   -1%) 0.000
HighTerm 1750.90  (4.0%) 1560.54

[jira] (LUCENE-10315) Speed up BKD leaf block ids codec by a 512 ints ForUtil

2021-12-16 Thread Feng Guo (Jira)



[ https://issues.apache.org/jira/browse/LUCENE-10315 ]


Feng Guo deleted comment on LUCENE-10315:
---

was (Author: gf2121):
I noticed that benchmark in LuceneUtil is mainly for geo scenes (BKD can 
support multi-dimension points is really a powerful feature! ), but the main 
direction of this optimization is the low/medium cardinality 1D point scenario 
(high cardinality of 1D field has also been improved by nearly 20%), so here 
I'd like to describe some background of optimizing medium/low cardinality 
fields in BKD:

I'm a user of Elasticsearch (which based on lucene), ES can automatically 
infers types for users with its dynamic mapping feature. When users index some 
low cardinality fields, such as gender / age / status... they often use some 
numbers to represent the values, while ES will infer these fields as 
{{{}long{}}}, and ES uses BKD as the index of {{long}} fields. When the data 
volume grows, building the result set of low-cardinality fields will make the 
CPU usage and load very high.

This is a flame graph we obtained from the production environment:
[^addall.svg]

It can be seen that almost all CPU is used in addAll. When we reindex {{long}} 
to {{{}keyword{}}}, the cluster load and search latency are greatly reduced ( 
We spent weeks of time to reindex all indices... ). I know that ES recommended 
to use {{keyword}} for term/terms query and {{long}} for range query in its 
document, but there are always some users who didn't realize this and keep 
their habit of using sql database, or dynamic mapping automatically selects the 
type for them. All in all, users won't realize that there is such a big 
difference in performance between {{long}} and {{keyword}} fields in low 
cardinality fields. So from my point of view it will make sense if we can make 
BKD works better for the low/medium cardinality fields.

As far as i can see, for low cardinality fields, there are two advantages of 
{{keyword}} over {{{}long{}}}:
1. {{ForUtil}} used in {{keyword}} postings is much more efficient than BKD's 
delta VInt, because its batch reading (readLongs) and SIMD decode.
2. When the query term count is less than 16, {{TermsInSetQuery}} can lazily 
materialize of its result set, and when another small result clause intersects 
with this low cardinality condition, the low cardinality field can avoid 
reading all docIds into memory.

This ISSUE is targeting to solve the first point. I hope these words can 
explain a bit the motivation of this ISSUE :)

> Speed up BKD leaf block ids codec by a 512 ints ForUtil
> ---
>
> Key: LUCENE-10315
> URL: https://issues.apache.org/jira/browse/LUCENE-10315
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Feng Guo
>Priority: Major
> Attachments: addall.svg
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Elasticsearch (which based on lucene) can automatically infers types for 
> users with its dynamic mapping feature. When users index some low cardinality 
> fields, such as gender / age / status... they often use some numbers to 
> represent the values, while ES will infer these fields as {{{}long{}}}, and 
> ES uses BKD as the index of {{long}} fields. When the data volume grows, 
> building the result set of low-cardinality fields will make the CPU usage and 
> load very high.
> This is a flame graph we obtained from the production environment:
> [^addall.svg]
> It can be seen that almost all CPU is used in addAll. When we reindex 
> {{long}} to {{{}keyword{}}}, the cluster load and search latency are greatly 
> reduced ( We spent weeks of time to reindex all indices... ). I know that ES 
> recommended to use {{keyword}} for term/terms query and {{long}} for range 
> query in the document, but there are always some users who didn't realize 
> this and keep their habit of using sql database, or dynamic mapping 
> automatically selects the type for them. All in all, users won't realize that 
> there would be such a big difference in performance between {{long}} and 
> {{keyword}} fields in low cardinality fields. So from my point of view it 
> will make sense if we can make BKD works better for the low/medium 
> cardinality fields.
> As far as i can see, for low cardinality fields, there are two advantages of 
> {{keyword}} over {{{}long{}}}:
> 1. {{ForUtil}} used in {{keyword}} postings is much more efficient than BKD's 
> delta VInt, because its batch reading (readLongs) and SIMD decode.
> 2. When the query term count is less than 16, {{TermsInSetQuery}} can lazily 
> materialize of its result set, and when another small result clause 
> intersects with this low cardinality condition, the low cardinality field can 
> avoid reading all docIds into memory.
> This ISSUE is targeting to solve the first point. The basic idea is trying to 
> use a 512 ints {{ForUtil}}

1 2 3 >

1 - 100 of 264 matches

Mail list logo