Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]
easyice commented on PR #12954: URL: https://github.com/apache/lucene/pull/12954#issuecomment-1890884774 Close this in favor of https://github.com/apache/lucene/pull/12997 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]
easyice closed pull request #12954: Reduce frequencies buffer size when they are not needed URL: https://github.com/apache/lucene/pull/12954 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]
github-actions[bot] commented on PR #12954: URL: https://github.com/apache/lucene/pull/12954#issuecomment-1880898477 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]
easyice commented on PR #12954: URL: https://github.com/apache/lucene/pull/12954#issuecomment-1866446008 Here is the benchmark for new approach (avoid for-loop in `reset()`), the `PKLookup` task still has a speedup, but the speedup for `Wildcard` task is disappeared, i checked the memory allocate flamegraph for `Wildcard` task, the `BlockDocsEnum.` call account for about 8%, so the 128-size long array allocate has performance impact on this task. this is different from `PKLookup`. i'm sorry i only checked the memory allocate for `PKLookup` yesterday. so: * `PKLookup` is speedup by avoid for-loop in `reset()` * `Wildcard` is speedup by reduce frequencies buffer allocation The Benchmark delta about new approach: * baseline: main * my_modified_version: new approach (avoid for-loop in `reset()`) Benchmark result ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value OrHighHigh 15.94 (4.3%) 15.56 (5.1%) -2.4% ( -11% -7%) 0.112 OrHighMed 49.58 (5.0%) 48.77 (4.7%) -1.6% ( -10% -8%) 0.293 OrNotHighHigh 146.95 (3.5%) 144.64 (4.8%) -1.6% ( -9% -6%) 0.237 LowSloppyPhrase 15.35 (4.1%) 15.13 (3.6%) -1.4% ( -8% -6%) 0.234 HighSloppyPhrase5.61 (2.8%)5.53 (3.6%) -1.4% ( -7% -5%) 0.176 AndHighMed 21.66 (3.9%) 21.43 (4.1%) -1.1% ( -8% -7%) 0.390 MedPhrase9.56 (5.0%)9.47 (5.0%) -1.0% ( -10% -9%) 0.512 AndHighHigh 15.95 (3.6%) 15.80 (3.7%) -1.0% ( -8% -6%) 0.409 MedSloppyPhrase2.03 (2.7%)2.02 (3.4%) -0.9% ( -6% -5%) 0.344 OrHighLow 179.58 (3.1%) 177.95 (3.7%) -0.9% ( -7% -6%) 0.399 LowTerm 177.24 (6.0%) 176.00 (5.2%) -0.7% ( -11% - 11%) 0.695 HighTermDayOfYearSort 128.70 (3.0%) 127.85 (3.0%) -0.7% ( -6% -5%) 0.482 OrHighNotLow 204.26 (3.4%) 203.10 (3.5%) -0.6% ( -7% -6%) 0.601 MedTerm 235.64 (3.5%) 234.40 (4.0%) -0.5% ( -7% -7%) 0.662 OrNotHighMed 123.29 (4.0%) 122.67 (3.9%) -0.5% ( -8% -7%) 0.689 HighTerm 286.35 (4.1%) 284.95 (4.3%) -0.5% ( -8% -8%) 0.713 TermDTSort 60.32 (10.4%) 60.06 (10.8%) -0.4% ( -19% - 23%) 0.895 OrHighNotMed 135.57 (4.2%) 135.07 (5.3%) -0.4% ( -9% -9%) 0.806 OrHighNotHigh 118.48 (3.6%) 118.06 (5.5%) -0.4% ( -9% -9%) 0.808 LowIntervalsOrdered 37.27 (4.7%) 37.15 (4.9%) -0.3% ( -9% -9%) 0.834 MedIntervalsOrdered2.24 (4.5%)2.24 (5.0%) -0.1% ( -9% -9%) 0.928 LowPhrase 20.47 (3.3%) 20.44 (3.3%) -0.1% ( -6% -6%) 0.916 Fuzzy2 30.51 (2.2%) 30.50 (1.3%) -0.0% ( -3% -3%) 0.948 HighIntervalsOrdered2.55 (5.1%)2.55 (5.4%)0.1% ( -9% - 11%) 0.975 IntNRQ 24.26 (3.5%) 24.27 (3.7%)0.1% ( -6% -7%) 0.962 AndHighLow 270.76 (3.7%) 271.29 (5.0%)0.2% ( -8% -9%) 0.889 HighPhrase 32.64 (4.5%) 32.71 (4.1%)0.2% ( -8% -9%) 0.878 HighTermTitleSort 72.27 (5.1%) 72.44 (4.1%)0.2% ( -8% -9%) 0.871 MedSpanNear6.40 (4.3%)6.42 (3.7%)0.4% ( -7% -8%) 0.782 HighTermTitleBDVSort2.31 (3.4%)2.32 (3.5%)0.4% ( -6% -7%) 0.716 OrNotHighLow 235.32 (3.0%) 236.80 (4.7%)0.6% ( -6% -8%) 0.613 Respell 25.22 (2.7%) 25.40 (2.2%)0.7% ( -4% -5%) 0.353 Fuzzy1 12.56 (2.0%) 12.65 (2.0%)0.7% ( -3% -4%) 0.251 HighSpanNear4.94 (4.5%)4.98 (4.1%)0.8% ( -7% -9%) 0.566 LowSpanNear1.10 (6.4%)1.11
Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]
jpountz commented on PR #12954: URL: https://github.com/apache/lucene/pull/12954#issuecomment-1864765767 > so maybe we can consider an other approach: try to avoid the for-loop in reset() if the instance can be reused +1 this sounds like a good idea! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]
easyice commented on PR #12954: URL: https://github.com/apache/lucene/pull/12954#issuecomment-1864749851 I took several hours to confirm the results, the benchmark shows it became faster, this exceeded my expectation, we think the speedup is due to remove the loop that initializes the `freqBuffer` to 1 in `reset()` like below: ``` if (indexHasFreq == false || needsFreq == false) { for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { freqBuffer[i] = 1; } } ``` Since if we always allocate the 128-size `freqBuffer` for this PR, the benchmark shows it still has a speedup. therefore, performance improvement has no relevance to reducing memory allocation. so maybe we can consider the other approach: try to avoid the for-loop in `reset()` if the instance can be reused. thanks for the suggestions from @gf2121 when i investigating the cause of the performance speedup. Benchmark output for the PR(using `wikimediumall`): ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value HighSloppyPhrase0.38 (5.3%)0.37 (4.6%) -1.6% ( -10% -8%) 0.323 MedTerm 226.03 (4.8%) 223.08 (5.2%) -1.3% ( -10% -9%) 0.409 HighIntervalsOrdered2.19 (6.4%)2.17 (6.5%) -0.9% ( -12% - 12%) 0.676 MedSloppyPhrase 18.52 (2.7%) 18.39 (2.5%) -0.7% ( -5% -4%) 0.402 Fuzzy2 36.10 (1.8%) 35.86 (1.6%) -0.7% ( -3% -2%) 0.219 Fuzzy1 43.40 (1.6%) 43.16 (1.7%) -0.6% ( -3% -2%) 0.276 Respell 21.69 (1.8%) 21.58 (1.8%) -0.5% ( -4% -3%) 0.375 LowTerm 232.03 (3.0%) 231.08 (2.9%) -0.4% ( -6% -5%) 0.659 LowSloppyPhrase 18.26 (2.0%) 18.20 (2.1%) -0.3% ( -4% -3%) 0.660 HighTerm 267.11 (5.2%) 266.50 (5.6%) -0.2% ( -10% - 11%) 0.893 HighSpanNear1.85 (5.7%)1.84 (6.7%) -0.2% ( -11% - 12%) 0.935 OrHighNotLow 167.52 (5.7%) 167.26 (5.6%) -0.2% ( -10% - 11%) 0.931 HighTermTitleBDVSort1.90 (3.7%)1.90 (4.5%) -0.1% ( -7% -8%) 0.915 MedIntervalsOrdered7.07 (3.4%)7.06 (3.8%) -0.1% ( -7% -7%) 0.910 MedSpanNear 24.97 (2.1%) 24.94 (2.7%) -0.1% ( -4% -4%) 0.874 HighPhrase 10.67 (6.0%) 10.66 (5.7%) -0.1% ( -11% - 12%) 0.950 LowPhrase4.70 (4.0%)4.70 (3.8%) -0.0% ( -7% -8%) 0.979 OrHighNotMed 130.98 (6.2%) 131.01 (6.1%)0.0% ( -11% - 13%) 0.989 OrNotHighHigh 171.61 (5.4%) 171.67 (5.3%)0.0% ( -10% - 11%) 0.984 LowIntervalsOrdered 28.65 (4.3%) 28.68 (4.3%)0.1% ( -8% -9%) 0.947 OrHighHigh 18.94 (2.9%) 19.00 (3.6%)0.3% ( -5% -6%) 0.766 OrHighNotHigh 125.97 (5.5%) 126.41 (6.0%)0.3% ( -10% - 12%) 0.848 OrNotHighMed 181.48 (4.0%) 182.38 (3.6%)0.5% ( -6% -8%) 0.679 LowSpanNear6.89 (2.5%)6.93 (3.2%)0.6% ( -5% -6%) 0.516 MedPhrase 110.79 (2.8%) 111.45 (2.9%)0.6% ( -5% -6%) 0.515 OrHighMed 38.51 (2.4%) 38.79 (2.1%)0.7% ( -3% -5%) 0.311 AndHighMed 40.73 (2.4%) 41.06 (2.5%)0.8% ( -4% -5%) 0.304 TermDTSort 74.72 (4.1%) 75.32 (2.6%)0.8% ( -5% -7%) 0.460 AndHighHigh 10.24 (5.6%) 10.33 (3.9%)0.8% ( -8% - 11%) 0.600 HighTermMonthSort 1071.18 (2.9%) 1079.84 (4.5%)0.8% ( -6% -8%) 0.499 AndHighLow 167.91 (5.2%) 170.10 (5.6%)1.3% ( -9% - 12%) 0.446 IntNRQ 13.84 (4.0%) 14.05 (3.3%)1.5% ( -5% -9%) 0.208 OrNotHighLow 241.06 (4.5%) 244.91 (4.8%)1.6% ( -7% - 11%) 0.276 OrHighLow 175.82
Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]
easyice commented on PR #12954: URL: https://github.com/apache/lucene/pull/12954#issuecomment-1862859903 ohhh.. You said makes sense, i will check it. Thank you Adrien! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]
jpountz commented on PR #12954: URL: https://github.com/apache/lucene/pull/12954#issuecomment-1862482539 I wonder if there is a performance impact, since this is moving a condition from something that runs once per block of 128 docs to something that map run on every doc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org