Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]

2024-01-14 Thread via GitHub


easyice commented on PR #12954:
URL: https://github.com/apache/lucene/pull/12954#issuecomment-1890884774

   Close this in favor of https://github.com/apache/lucene/pull/12997


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]

2024-01-14 Thread via GitHub


easyice closed pull request #12954: Reduce frequencies buffer size when they 
are not needed
URL: https://github.com/apache/lucene/pull/12954


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]

2024-01-08 Thread via GitHub


github-actions[bot] commented on PR #12954:
URL: https://github.com/apache/lucene/pull/12954#issuecomment-1880898477

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]

2023-12-21 Thread via GitHub


easyice commented on PR #12954:
URL: https://github.com/apache/lucene/pull/12954#issuecomment-1866446008

   Here is the benchmark for new approach (avoid for-loop in `reset()`), the 
`PKLookup` task still has a speedup, but the speedup for `Wildcard` task is 
disappeared, i checked the memory allocate flamegraph for `Wildcard` task, the 
`BlockDocsEnum.` call account for about 8%, so the 128-size long array 
allocate has performance impact on this task. this is different from 
`PKLookup`. i'm sorry i only checked the memory allocate for `PKLookup` 
yesterday.
   
   so:
   * `PKLookup` is speedup by avoid for-loop in `reset()`
   * `Wildcard` is speedup by reduce frequencies buffer allocation
   
   
   The Benchmark delta about new approach:
   
   * baseline: main
   * my_modified_version: new approach (avoid for-loop in `reset()`)
   
   
   Benchmark result
   
   ```
  TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
 OrHighHigh   15.94  (4.3%)   15.56  
(5.1%)   -2.4% ( -11% -7%) 0.112
  OrHighMed   49.58  (5.0%)   48.77  
(4.7%)   -1.6% ( -10% -8%) 0.293
  OrNotHighHigh  146.95  (3.5%)  144.64  
(4.8%)   -1.6% (  -9% -6%) 0.237
LowSloppyPhrase   15.35  (4.1%)   15.13  
(3.6%)   -1.4% (  -8% -6%) 0.234
   HighSloppyPhrase5.61  (2.8%)5.53  
(3.6%)   -1.4% (  -7% -5%) 0.176
 AndHighMed   21.66  (3.9%)   21.43  
(4.1%)   -1.1% (  -8% -7%) 0.390
  MedPhrase9.56  (5.0%)9.47  
(5.0%)   -1.0% ( -10% -9%) 0.512
AndHighHigh   15.95  (3.6%)   15.80  
(3.7%)   -1.0% (  -8% -6%) 0.409
MedSloppyPhrase2.03  (2.7%)2.02  
(3.4%)   -0.9% (  -6% -5%) 0.344
  OrHighLow  179.58  (3.1%)  177.95  
(3.7%)   -0.9% (  -7% -6%) 0.399
LowTerm  177.24  (6.0%)  176.00  
(5.2%)   -0.7% ( -11% -   11%) 0.695
  HighTermDayOfYearSort  128.70  (3.0%)  127.85  
(3.0%)   -0.7% (  -6% -5%) 0.482
   OrHighNotLow  204.26  (3.4%)  203.10  
(3.5%)   -0.6% (  -7% -6%) 0.601
MedTerm  235.64  (3.5%)  234.40  
(4.0%)   -0.5% (  -7% -7%) 0.662
   OrNotHighMed  123.29  (4.0%)  122.67  
(3.9%)   -0.5% (  -8% -7%) 0.689
   HighTerm  286.35  (4.1%)  284.95  
(4.3%)   -0.5% (  -8% -8%) 0.713
 TermDTSort   60.32 (10.4%)   60.06 
(10.8%)   -0.4% ( -19% -   23%) 0.895
   OrHighNotMed  135.57  (4.2%)  135.07  
(5.3%)   -0.4% (  -9% -9%) 0.806
  OrHighNotHigh  118.48  (3.6%)  118.06  
(5.5%)   -0.4% (  -9% -9%) 0.808
LowIntervalsOrdered   37.27  (4.7%)   37.15  
(4.9%)   -0.3% (  -9% -9%) 0.834
MedIntervalsOrdered2.24  (4.5%)2.24  
(5.0%)   -0.1% (  -9% -9%) 0.928
  LowPhrase   20.47  (3.3%)   20.44  
(3.3%)   -0.1% (  -6% -6%) 0.916
 Fuzzy2   30.51  (2.2%)   30.50  
(1.3%)   -0.0% (  -3% -3%) 0.948
   HighIntervalsOrdered2.55  (5.1%)2.55  
(5.4%)0.1% (  -9% -   11%) 0.975
 IntNRQ   24.26  (3.5%)   24.27  
(3.7%)0.1% (  -6% -7%) 0.962
 AndHighLow  270.76  (3.7%)  271.29  
(5.0%)0.2% (  -8% -9%) 0.889
 HighPhrase   32.64  (4.5%)   32.71  
(4.1%)0.2% (  -8% -9%) 0.878
  HighTermTitleSort   72.27  (5.1%)   72.44  
(4.1%)0.2% (  -8% -9%) 0.871
MedSpanNear6.40  (4.3%)6.42  
(3.7%)0.4% (  -7% -8%) 0.782
   HighTermTitleBDVSort2.31  (3.4%)2.32  
(3.5%)0.4% (  -6% -7%) 0.716
   OrNotHighLow  235.32  (3.0%)  236.80  
(4.7%)0.6% (  -6% -8%) 0.613
Respell   25.22  (2.7%)   25.40  
(2.2%)0.7% (  -4% -5%) 0.353
 Fuzzy1   12.56  (2.0%)   12.65  
(2.0%)0.7% (  -3% -4%) 0.251
   HighSpanNear4.94  (4.5%)4.98  
(4.1%)0.8% (  -7% -9%) 0.566
LowSpanNear1.10  (6.4%)1.11   

Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]

2023-12-20 Thread via GitHub


jpountz commented on PR #12954:
URL: https://github.com/apache/lucene/pull/12954#issuecomment-1864765767

   > so maybe we can consider an other approach: try to avoid the for-loop in 
reset() if the instance can be reused
   
   +1 this sounds like a good idea!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]

2023-12-20 Thread via GitHub


easyice commented on PR #12954:
URL: https://github.com/apache/lucene/pull/12954#issuecomment-1864749851

   
   I took several hours to confirm the results, the benchmark shows it became 
faster, this exceeded my expectation, we think the speedup is due to remove the 
loop that initializes the `freqBuffer` to 1 in `reset()` like below:
   
   ```
 if (indexHasFreq == false || needsFreq == false) {
   for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) {
 freqBuffer[i] = 1;
   }
 }
   ```
   
   Since if we always allocate the 128-size `freqBuffer` for this PR, the 
benchmark shows it still has a speedup. therefore, performance improvement has 
no relevance to reducing memory allocation. so maybe we can consider the other 
approach: try to avoid the for-loop in `reset()` if the instance can be reused. 
thanks for the suggestions from @gf2121 when i investigating the cause of the 
performance speedup.
   
   
   Benchmark output for the PR(using `wikimediumall`):
   
   ```
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
   HighSloppyPhrase0.38  (5.3%)0.37  
(4.6%)   -1.6% ( -10% -8%) 0.323
MedTerm  226.03  (4.8%)  223.08  
(5.2%)   -1.3% ( -10% -9%) 0.409
   HighIntervalsOrdered2.19  (6.4%)2.17  
(6.5%)   -0.9% ( -12% -   12%) 0.676
MedSloppyPhrase   18.52  (2.7%)   18.39  
(2.5%)   -0.7% (  -5% -4%) 0.402
 Fuzzy2   36.10  (1.8%)   35.86  
(1.6%)   -0.7% (  -3% -2%) 0.219
 Fuzzy1   43.40  (1.6%)   43.16  
(1.7%)   -0.6% (  -3% -2%) 0.276
Respell   21.69  (1.8%)   21.58  
(1.8%)   -0.5% (  -4% -3%) 0.375
LowTerm  232.03  (3.0%)  231.08  
(2.9%)   -0.4% (  -6% -5%) 0.659
LowSloppyPhrase   18.26  (2.0%)   18.20  
(2.1%)   -0.3% (  -4% -3%) 0.660
   HighTerm  267.11  (5.2%)  266.50  
(5.6%)   -0.2% ( -10% -   11%) 0.893
   HighSpanNear1.85  (5.7%)1.84  
(6.7%)   -0.2% ( -11% -   12%) 0.935
   OrHighNotLow  167.52  (5.7%)  167.26  
(5.6%)   -0.2% ( -10% -   11%) 0.931
   HighTermTitleBDVSort1.90  (3.7%)1.90  
(4.5%)   -0.1% (  -7% -8%) 0.915
MedIntervalsOrdered7.07  (3.4%)7.06  
(3.8%)   -0.1% (  -7% -7%) 0.910
MedSpanNear   24.97  (2.1%)   24.94  
(2.7%)   -0.1% (  -4% -4%) 0.874
 HighPhrase   10.67  (6.0%)   10.66  
(5.7%)   -0.1% ( -11% -   12%) 0.950
  LowPhrase4.70  (4.0%)4.70  
(3.8%)   -0.0% (  -7% -8%) 0.979
   OrHighNotMed  130.98  (6.2%)  131.01  
(6.1%)0.0% ( -11% -   13%) 0.989
  OrNotHighHigh  171.61  (5.4%)  171.67  
(5.3%)0.0% ( -10% -   11%) 0.984
LowIntervalsOrdered   28.65  (4.3%)   28.68  
(4.3%)0.1% (  -8% -9%) 0.947
 OrHighHigh   18.94  (2.9%)   19.00  
(3.6%)0.3% (  -5% -6%) 0.766
  OrHighNotHigh  125.97  (5.5%)  126.41  
(6.0%)0.3% ( -10% -   12%) 0.848
   OrNotHighMed  181.48  (4.0%)  182.38  
(3.6%)0.5% (  -6% -8%) 0.679
LowSpanNear6.89  (2.5%)6.93  
(3.2%)0.6% (  -5% -6%) 0.516
  MedPhrase  110.79  (2.8%)  111.45  
(2.9%)0.6% (  -5% -6%) 0.515
  OrHighMed   38.51  (2.4%)   38.79  
(2.1%)0.7% (  -3% -5%) 0.311
 AndHighMed   40.73  (2.4%)   41.06  
(2.5%)0.8% (  -4% -5%) 0.304
 TermDTSort   74.72  (4.1%)   75.32  
(2.6%)0.8% (  -5% -7%) 0.460
AndHighHigh   10.24  (5.6%)   10.33  
(3.9%)0.8% (  -8% -   11%) 0.600
  HighTermMonthSort 1071.18  (2.9%) 1079.84  
(4.5%)0.8% (  -6% -8%) 0.499
 AndHighLow  167.91  (5.2%)  170.10  
(5.6%)1.3% (  -9% -   12%) 0.446
 IntNRQ   13.84  (4.0%)   14.05  
(3.3%)1.5% (  -5% -9%) 0.208
   OrNotHighLow  241.06  (4.5%)  244.91  
(4.8%)1.6% (  -7% -   11%) 0.276
  OrHighLow  175.82  

Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]

2023-12-19 Thread via GitHub


easyice commented on PR #12954:
URL: https://github.com/apache/lucene/pull/12954#issuecomment-1862859903

   ohhh.. You said makes sense, i will check it. Thank you Adrien!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]

2023-12-19 Thread via GitHub


jpountz commented on PR #12954:
URL: https://github.com/apache/lucene/pull/12954#issuecomment-1862482539

   I wonder if there is a performance impact, since this is moving a condition 
from something that runs once per block of 128 docs to something that map run 
on every doc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org