[GitHub] [lucene-jira-archive] mocobeta closed issue #13: Test issue
mocobeta closed issue #13: Test issue URL: https://github.com/apache/lucene-jira-archive/issues/13 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta opened a new issue, #14: Investigate import failure of LUCENE-1498
mocobeta opened a new issue, #14: URL: https://github.com/apache/lucene-jira-archive/issues/14 https://issues.apache.org/jira/browse/LUCENE-1498 won't be imported. ``` [2022-06-26 18:38:25,394] ERROR:import_github_issues: Import GitHub issue /mnt/hdd/repo/sandbox-lucene-10557/migration/github-import-data/GH-LUCENE-1498.json was failed. status=failed, errors=[{'location': '/issue', 'resource': 'Issue', 'field': None, 'value': None, 'code': 'error'}] ``` Maybe the body contains character sequences that are not acceptable to GitHub? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #7: Make a detailed migration plan
mocobeta commented on issue #7: URL: https://github.com/apache/lucene-jira-archive/issues/7#issuecomment-1173010491 #13 confirmed that my account was able to update issues in an ASF repo without modifying the author. We can do step 7 (second pass - the most time-consuming part) ourselves. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #13: Test issue
mocobeta commented on issue #13: URL: https://github.com/apache/lucene-jira-archive/issues/13#issuecomment-1173010229 Ok, we can update issues ourselves by scripts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #13: Test issue
mocobeta commented on issue #13: URL: https://github.com/apache/lucene-jira-archive/issues/13#issuecomment-1173007101 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] ikawaha commented on issue #13: Test issue
ikawaha commented on issue #13: URL: https://github.com/apache/lucene-jira-archive/issues/13#issuecomment-1173006810 ねこですねこはいますよろしくおねがいします -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #13: Test issue
mocobeta commented on issue #13: URL: https://github.com/apache/lucene-jira-archive/issues/13#issuecomment-1173006761 Thank you @ikawaha. Can you please also add a fake comment please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] ikawaha opened a new issue, #13: Test issue
ikawaha opened a new issue, #13: URL: https://github.com/apache/lucene-jira-archive/issues/13 This is a test to check the operation of the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10577) Quantize vector values
[ https://issues.apache.org/jira/browse/LUCENE-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561790#comment-17561790 ] ASF subversion and git services commented on LUCENE-10577: -- Commit 359b495129c68403e7aa36b0a1455e75a3a033e1 in lucene's branch refs/heads/branch_9x from Julie Tibshirani [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=359b495129c ] LUCENE-10577: Add vectors format unit test and fix toString (#998) We forgot to add this unit test when introducing the new 9.3 vectors format. This commit adds the test and fixes issues it uncovered in toString. > Quantize vector values > -- > > Key: LUCENE-10577 > URL: https://issues.apache.org/jira/browse/LUCENE-10577 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Michael Sokolov >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > The {{KnnVectorField}} api handles vectors with 4-byte floating point values. > These fields can be used (via {{KnnVectorsReader}}) in two main ways: > 1. The {{VectorValues}} iterator enables retrieving values > 2. Approximate nearest -neighbor search > The main point of this addition was to provide the search capability, and to > support that it is not really necessary to store vectors in full precision. > Perhaps users may also be willing to retrieve values in lower precision for > whatever purpose those serve, if they are able to store more samples. We know > that 8 bits is enough to provide a very near approximation to the same > recall/performance tradeoff that is achieved with the full-precision vectors. > I'd like to explore how we could enable 4:1 compression of these fields by > reducing their precision. > A few ways I can imagine this would be done: > 1. Provide a parallel byte-oriented API. This would allow users to provide > their data in reduced-precision format and give control over the quantization > to them. It would have a major impact on the Lucene API surface though, > essentially requiring us to duplicate all of the vector APIs. > 2. Automatically quantize the stored vector data when we can. This would > require no or perhaps very limited change to the existing API to enable the > feature. > I've been exploring (2), and what I find is that we can achieve very good > recall results using dot-product similarity scoring by simple linear scaling > + quantization of the vector values, so long as we choose the scale that > minimizes the quantization error. Dot-product is amenable to this treatment > since vectors are required to be unit-length when used with that similarity > function. > Even still there is variability in the ideal scale over different data sets. > A good choice seems to be max(abs(min-value), abs(max-value)), but of course > this assumes that the data set doesn't have a few outlier data points. A > theoretical range can be obtained by 1/sqrt(dimension), but this is only > useful when the samples are normally distributed. We could in theory > determine the ideal scale when flushing a segment and manage this > quantization per-segment, but then numerical error could creep in when > merging. > I'll post a patch/PR with an experimental setup I've been using for > evaluation purposes. It is pretty self-contained and simple, but has some > drawbacks that need to be addressed: > 1. No automated mechanism for determining quantization scale (it's a constant > that I have been playing with) > 2. Converts from byte/float when computing dot-product instead of directly > computing on byte values > I'd like to get people's feedback on the approach and whether in general we > should think about doing this compression under the hood, or expose a > byte-oriented API. Whatever we do I think a 4:1 compression ratio is pretty > compelling and we should pursue something. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10635) Ensure test coverage for WANDScorer after additional scorers get added
[ https://issues.apache.org/jira/browse/LUCENE-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561789#comment-17561789 ] Zach Chen commented on LUCENE-10635: I like this idea! This approach should also be able to preserve most of the assertions in the test utilities. I can give it a try and see how things might look. > Ensure test coverage for WANDScorer after additional scorers get added > -- > > Key: LUCENE-10635 > URL: https://issues.apache.org/jira/browse/LUCENE-10635 > Project: Lucene - Core > Issue Type: Test >Reporter: Zach Chen >Priority: Major > > This is a follow-up issue from discussions > [https://github.com/apache/lucene/pull/972#issuecomment-1170684358] & > [https://github.com/apache/lucene/pull/972#pullrequestreview-1024377641] . > > As additional scorers such as BlockMaxMaxscoreScorer get added, some tests in > TestWANDScorer that used to test WANDScorer now test BlockMaxMaxscoreScorer > instead, reducing test coverage for WANDScorer. We would like to see how we > can ensure TestWANDScorer reliably tests WANDScorer, perhaps by initiating > the scorer directly inside the tests? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction
zacharymorn commented on PR #972: URL: https://github.com/apache/lucene/pull/972#issuecomment-1172956347 Thanks again @jpountz ! I've created the above PR to backport these changes to `branch_9x`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn opened a new pull request, #1002: LUCENE-10480: (Backporting) Use BMM scorer for 2 clauses disjunction
zacharymorn opened a new pull request, #1002: URL: https://github.com/apache/lucene/pull/1002 This PR backports PR https://github.com/apache/lucene/pull/972 to `branch_9x` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10480) Specialize 2-clauses disjunctions
[ https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561787#comment-17561787 ] ASF subversion and git services commented on LUCENE-10480: -- Commit 503ec5597331454bf8b6af79b9701cfdccf5 in lucene's branch refs/heads/main from zacharymorn [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=503ec559733 ] LUCENE-10480: Use BMM scorer for 2 clauses disjunction (#972) > Specialize 2-clauses disjunctions > - > > Key: LUCENE-10480 > URL: https://issues.apache.org/jira/browse/LUCENE-10480 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 5h 10m > Remaining Estimate: 0h > > WANDScorer is nice, but it also has lots of overhead to maintain its > invariants: one linked list for the current candidates, one priority queue of > scorers that are behind, another one for scorers that are ahead. All this > could be simplified in the 2-clauses case, which feels worth specializing for > as it's very common that end users enter queries that only have two terms? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn merged pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction
zacharymorn merged PR #972: URL: https://github.com/apache/lucene/pull/972 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10577) Quantize vector values
[ https://issues.apache.org/jira/browse/LUCENE-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561773#comment-17561773 ] ASF subversion and git services commented on LUCENE-10577: -- Commit 187f843e2a49f37f5fa1d50107f32be895146e21 in lucene's branch refs/heads/main from Julie Tibshirani [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=187f843e2a4 ] LUCENE-10577: Add vectors format unit test and fix toString (#998) We forgot to add this unit test when introducing the new 9.3 vectors format. This commit adds the test and fixes issues it uncovered in toString. > Quantize vector values > -- > > Key: LUCENE-10577 > URL: https://issues.apache.org/jira/browse/LUCENE-10577 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Michael Sokolov >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > The {{KnnVectorField}} api handles vectors with 4-byte floating point values. > These fields can be used (via {{KnnVectorsReader}}) in two main ways: > 1. The {{VectorValues}} iterator enables retrieving values > 2. Approximate nearest -neighbor search > The main point of this addition was to provide the search capability, and to > support that it is not really necessary to store vectors in full precision. > Perhaps users may also be willing to retrieve values in lower precision for > whatever purpose those serve, if they are able to store more samples. We know > that 8 bits is enough to provide a very near approximation to the same > recall/performance tradeoff that is achieved with the full-precision vectors. > I'd like to explore how we could enable 4:1 compression of these fields by > reducing their precision. > A few ways I can imagine this would be done: > 1. Provide a parallel byte-oriented API. This would allow users to provide > their data in reduced-precision format and give control over the quantization > to them. It would have a major impact on the Lucene API surface though, > essentially requiring us to duplicate all of the vector APIs. > 2. Automatically quantize the stored vector data when we can. This would > require no or perhaps very limited change to the existing API to enable the > feature. > I've been exploring (2), and what I find is that we can achieve very good > recall results using dot-product similarity scoring by simple linear scaling > + quantization of the vector values, so long as we choose the scale that > minimizes the quantization error. Dot-product is amenable to this treatment > since vectors are required to be unit-length when used with that similarity > function. > Even still there is variability in the ideal scale over different data sets. > A good choice seems to be max(abs(min-value), abs(max-value)), but of course > this assumes that the data set doesn't have a few outlier data points. A > theoretical range can be obtained by 1/sqrt(dimension), but this is only > useful when the samples are normally distributed. We could in theory > determine the ideal scale when flushing a segment and manage this > quantization per-segment, but then numerical error could creep in when > merging. > I'll post a patch/PR with an experimental setup I've been using for > evaluation purposes. It is pretty self-contained and simple, but has some > drawbacks that need to be addressed: > 1. No automated mechanism for determining quantization scale (it's a constant > that I have been playing with) > 2. Converts from byte/float when computing dot-product instead of directly > computing on byte values > I'd like to get people's feedback on the approach and whether in general we > should think about doing this compression under the hood, or expose a > byte-oriented API. Whatever we do I think a 4:1 compression ratio is pretty > compelling and we should pursue something. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani merged pull request #998: LUCENE-10577: Add vectors format unit test and fix toString
jtibshirani merged PR #998: URL: https://github.com/apache/lucene/pull/998 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on pull request #998: LUCENE-10577: Add vectors format unit test and fix toString
jtibshirani commented on PR #998: URL: https://github.com/apache/lucene/pull/998#issuecomment-1172928401 No problem! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a diff in pull request #932: LUCENE-10559: Add Prefilter Option to KnnGraphTester
jtibshirani commented on code in PR #932: URL: https://github.com/apache/lucene/pull/932#discussion_r912381221 ## lucene/core/src/test/org/apache/lucene/util/hnsw/KnnGraphTester.java: ## @@ -480,11 +537,15 @@ private int[][] getNN(Path docPath, Path queryPath) throws IOException { String hash = Integer.toString(Objects.hash(docPath, queryPath, numDocs, numIters, topK), 36); String nnFileName = "nn-" + hash + ".bin"; Path nnPath = Paths.get(nnFileName); -if (Files.exists(nnPath) && isNewer(nnPath, docPath, queryPath)) { +if (Files.exists(nnPath) Review Comment: Oops I read the logic incorrectly, thanks for clarifying! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10639) WANDScorer performs better without two-phase
[ https://issues.apache.org/jira/browse/LUCENE-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561772#comment-17561772 ] Greg Miller commented on LUCENE-10639: -- {quote}I suspected there was some overhead to two-phase iteration but not as much as this. {quote} Right. Yeah, I guess I was so surprised by the performance shift that I assumed there must be an interesting second-phase happening. But from what you're saying, it sounds like these {{OrHighLow/Med/High}} tasks aren't doing that. And that the performance change is purely some side-effect of running the two phases instead of doing all the checks in the first phase. I should have dug into what these tasks are doing. {quote}Hotspot was not always able to optimize "if (liveDocs == null)" checks {quote} Interesting. Seems worth a shot. Thanks for the quick thoughts! > WANDScorer performs better without two-phase > > > Key: LUCENE-10639 > URL: https://issues.apache.org/jira/browse/LUCENE-10639 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Greg Miller >Priority: Major > > After looking at the recent improvement [~jpountz] made to WAND scoring in > LUCENE-10634, which does additional work during match confirmation to not > confirm a match who's score wouldn't be competitive, I wanted to see how > performance would shift if we squashed the two-phase iteration completely and > only returned true matches (that were also known to be competitive by score) > in the "approximation" phase. I was a bit surprised to find that luceneutil > benchmarks (run with {{{}wikimediumall{}}}), improves significantly on some > disjunction tasks and doesn't show significant regressions anywhere else. > Note that I used LUCENE-10634 as a baseline, and built my candidate change on > top of that. The diff can be seen here: > [DIFF|https://github.com/gsmiller/lucene/compare/b2d46440998fe4a972e8cc8c948580111359ed0f..c5bab794c92dbc66e70f9389948c1bdfe9b45231] > A simple conclusion here might be that we shouldn't do two-phase iteration in > WANDScorer, but I'm pretty sure that's not right. I wonder if what's really > going on is that we're under-estimating the cost of confirming a match? Right > now we just return the tail size as the cost. While the cost of confirming a > match is proportional to the tail size, the actual work involved can be quite > significant (having to advance tail iterators to new blocks and decompress > them). I wonder if the WAND second phase is being run too early on > approximate candidates, and if less-expensive, (and even possibly more > restrictive?), second phases could/should be running first? > I'm raising this here as more of a curiosity to see if it sparks ideas on how > to move forward. Again, I'm not proposing we do away with two-phase > iteration, but it seems we might be able to improve things. Maybe I'll > explore changing the cost heuristic next. Also, maybe there's some different > benchmarking that would be useful here that I may not be familiar with? > Benchmark results on wikimediumall: > {code:java} > TaskQPS baseline StdDevQPS candidate > StdDevPct diff p-value > HighTermTitleBDVSort 22.52 (18.9%) 21.66 > (15.6%) -3.8% ( -32% - 37%) 0.485 > Prefix39.38 (9.2%)9.09 > (10.6%) -3.1% ( -20% - 18%) 0.326 >HighTermMonthSort 25.37 (16.0%) 24.87 > (17.1%) -2.0% ( -30% - 37%) 0.710 > MedTermDayTaxoFacets9.62 (4.2%)9.51 > (4.1%) -1.2% ( -9% -7%) 0.368 > TermDTSort 74.69 (18.0%) 74.13 > (18.2%) -0.7% ( -31% - 43%) 0.897 >HighTermDayOfYearSort 52.64 (16.1%) 52.32 > (15.4%) -0.6% ( -27% - 36%) 0.903 >BrowseMonthTaxoFacets8.64 (19.1%)8.59 > (19.8%) -0.6% ( -33% - 47%) 0.926 > BrowseDateSSDVFacets0.86 (9.5%)0.86 > (13.1%) -0.4% ( -20% - 24%) 0.914 > PKLookup 147.18 (3.9%) 146.66 > (3.3%) -0.3% ( -7% -7%) 0.759 >BrowseDayOfYearSSDVFacets3.47 (4.5%)3.45 > (4.8%) -0.3% ( -9% -9%) 0.822 > Wildcard 36.36 (4.4%) 36.26 > (5.2%) -0.3% ( -9% -9%) 0.866 >BrowseMonthSSDVFacets4.15 (12.7%)4.13 > (12.8%) -0.3% ( -22% - 28%) 0.950 > AndHighMedDayTaxoFacets 15.21 (2.7%) 15.18 > (2.9%) -0.2% ( -5% -5%) 0.819 > Fuzzy1 68.33 (1.8%) 68.22
[jira] [Commented] (LUCENE-10635) Ensure test coverage for WANDScorer after additional scorers get added
[ https://issues.apache.org/jira/browse/LUCENE-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561754#comment-17561754 ] Adrien Grand commented on LUCENE-10635: --- Thinking out loud, maybe one way to do this would be to have a specialized WANDQuery in the test folder that is guaranteed to produce a WANDScorer? > Ensure test coverage for WANDScorer after additional scorers get added > -- > > Key: LUCENE-10635 > URL: https://issues.apache.org/jira/browse/LUCENE-10635 > Project: Lucene - Core > Issue Type: Test >Reporter: Zach Chen >Priority: Major > > This is a follow-up issue from discussions > [https://github.com/apache/lucene/pull/972#issuecomment-1170684358] & > [https://github.com/apache/lucene/pull/972#pullrequestreview-1024377641] . > > As additional scorers such as BlockMaxMaxscoreScorer get added, some tests in > TestWANDScorer that used to test WANDScorer now test BlockMaxMaxscoreScorer > instead, reducing test coverage for WANDScorer. We would like to see how we > can ensure TestWANDScorer reliably tests WANDScorer, perhaps by initiating > the scorer directly inside the tests? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10639) WANDScorer performs better without two-phase
[ https://issues.apache.org/jira/browse/LUCENE-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561753#comment-17561753 ] Adrien Grand commented on LUCENE-10639: --- On a recent PR [~ChrisHegarty] found out that Hotspot was not always able to optimize "if (liveDocs == null)" checks within for loops (https://github.com/apache/lucene/pull/812#discussion_r851301618). Since then I've been wondering if DefaultBulkScorer is affected by this. If it is, we could look into the performance benefit of moving the {{if (liveDocs == null)}} check out of the for loop here: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/Weight.java#L311-L317. This might also help the compiler figure out that the approximation and TwoPhaseIterator's matches run in sequence? > WANDScorer performs better without two-phase > > > Key: LUCENE-10639 > URL: https://issues.apache.org/jira/browse/LUCENE-10639 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Greg Miller >Priority: Major > > After looking at the recent improvement [~jpountz] made to WAND scoring in > LUCENE-10634, which does additional work during match confirmation to not > confirm a match who's score wouldn't be competitive, I wanted to see how > performance would shift if we squashed the two-phase iteration completely and > only returned true matches (that were also known to be competitive by score) > in the "approximation" phase. I was a bit surprised to find that luceneutil > benchmarks (run with {{{}wikimediumall{}}}), improves significantly on some > disjunction tasks and doesn't show significant regressions anywhere else. > Note that I used LUCENE-10634 as a baseline, and built my candidate change on > top of that. The diff can be seen here: > [DIFF|https://github.com/gsmiller/lucene/compare/b2d46440998fe4a972e8cc8c948580111359ed0f..c5bab794c92dbc66e70f9389948c1bdfe9b45231] > A simple conclusion here might be that we shouldn't do two-phase iteration in > WANDScorer, but I'm pretty sure that's not right. I wonder if what's really > going on is that we're under-estimating the cost of confirming a match? Right > now we just return the tail size as the cost. While the cost of confirming a > match is proportional to the tail size, the actual work involved can be quite > significant (having to advance tail iterators to new blocks and decompress > them). I wonder if the WAND second phase is being run too early on > approximate candidates, and if less-expensive, (and even possibly more > restrictive?), second phases could/should be running first? > I'm raising this here as more of a curiosity to see if it sparks ideas on how > to move forward. Again, I'm not proposing we do away with two-phase > iteration, but it seems we might be able to improve things. Maybe I'll > explore changing the cost heuristic next. Also, maybe there's some different > benchmarking that would be useful here that I may not be familiar with? > Benchmark results on wikimediumall: > {code:java} > TaskQPS baseline StdDevQPS candidate > StdDevPct diff p-value > HighTermTitleBDVSort 22.52 (18.9%) 21.66 > (15.6%) -3.8% ( -32% - 37%) 0.485 > Prefix39.38 (9.2%)9.09 > (10.6%) -3.1% ( -20% - 18%) 0.326 >HighTermMonthSort 25.37 (16.0%) 24.87 > (17.1%) -2.0% ( -30% - 37%) 0.710 > MedTermDayTaxoFacets9.62 (4.2%)9.51 > (4.1%) -1.2% ( -9% -7%) 0.368 > TermDTSort 74.69 (18.0%) 74.13 > (18.2%) -0.7% ( -31% - 43%) 0.897 >HighTermDayOfYearSort 52.64 (16.1%) 52.32 > (15.4%) -0.6% ( -27% - 36%) 0.903 >BrowseMonthTaxoFacets8.64 (19.1%)8.59 > (19.8%) -0.6% ( -33% - 47%) 0.926 > BrowseDateSSDVFacets0.86 (9.5%)0.86 > (13.1%) -0.4% ( -20% - 24%) 0.914 > PKLookup 147.18 (3.9%) 146.66 > (3.3%) -0.3% ( -7% -7%) 0.759 >BrowseDayOfYearSSDVFacets3.47 (4.5%)3.45 > (4.8%) -0.3% ( -9% -9%) 0.822 > Wildcard 36.36 (4.4%) 36.26 > (5.2%) -0.3% ( -9% -9%) 0.866 >BrowseMonthSSDVFacets4.15 (12.7%)4.13 > (12.8%) -0.3% ( -22% - 28%) 0.950 > AndHighMedDayTaxoFacets 15.21 (2.7%) 15.18 > (2.9%) -0.2% ( -5% -5%) 0.819 > Fuzzy1 68.33 (1.8%) 68.22 > (2.0%) -0.2% ( -3% -3%)
[jira] [Commented] (LUCENE-10639) WANDScorer performs better without two-phase
[ https://issues.apache.org/jira/browse/LUCENE-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561751#comment-17561751 ] Adrien Grand commented on LUCENE-10639: --- I suspected there was some overhead to two-phase iteration but not as much as this. Two-phase iteration doesn't aim at improving the performance of queries on their own, but when combined with other queries through conjunctions: conjunctions make sure to reach agreement across approximations before they proceed with the match phase. This is the feature that makes Lucene perform better than other search libraries on the query `+"the who" +uk` at https://tantivy-search.github.io/bench/, because Lucene makes sure that documents contain all of "the", "who" and "uk" before it starts checking positions. I would also expect two-phase iteration to help on [AndMedOrHighHigh on nightly benchmarks|https://home.apache.org/~mikemccand/lucenebench/AndMedOrHighHigh.html] since WANDScorer will do less work to return the next candidate on or beyond the lead doc ID produced by the "Med" term. > WANDScorer performs better without two-phase > > > Key: LUCENE-10639 > URL: https://issues.apache.org/jira/browse/LUCENE-10639 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Greg Miller >Priority: Major > > After looking at the recent improvement [~jpountz] made to WAND scoring in > LUCENE-10634, which does additional work during match confirmation to not > confirm a match who's score wouldn't be competitive, I wanted to see how > performance would shift if we squashed the two-phase iteration completely and > only returned true matches (that were also known to be competitive by score) > in the "approximation" phase. I was a bit surprised to find that luceneutil > benchmarks (run with {{{}wikimediumall{}}}), improves significantly on some > disjunction tasks and doesn't show significant regressions anywhere else. > Note that I used LUCENE-10634 as a baseline, and built my candidate change on > top of that. The diff can be seen here: > [DIFF|https://github.com/gsmiller/lucene/compare/b2d46440998fe4a972e8cc8c948580111359ed0f..c5bab794c92dbc66e70f9389948c1bdfe9b45231] > A simple conclusion here might be that we shouldn't do two-phase iteration in > WANDScorer, but I'm pretty sure that's not right. I wonder if what's really > going on is that we're under-estimating the cost of confirming a match? Right > now we just return the tail size as the cost. While the cost of confirming a > match is proportional to the tail size, the actual work involved can be quite > significant (having to advance tail iterators to new blocks and decompress > them). I wonder if the WAND second phase is being run too early on > approximate candidates, and if less-expensive, (and even possibly more > restrictive?), second phases could/should be running first? > I'm raising this here as more of a curiosity to see if it sparks ideas on how > to move forward. Again, I'm not proposing we do away with two-phase > iteration, but it seems we might be able to improve things. Maybe I'll > explore changing the cost heuristic next. Also, maybe there's some different > benchmarking that would be useful here that I may not be familiar with? > Benchmark results on wikimediumall: > {code:java} > TaskQPS baseline StdDevQPS candidate > StdDevPct diff p-value > HighTermTitleBDVSort 22.52 (18.9%) 21.66 > (15.6%) -3.8% ( -32% - 37%) 0.485 > Prefix39.38 (9.2%)9.09 > (10.6%) -3.1% ( -20% - 18%) 0.326 >HighTermMonthSort 25.37 (16.0%) 24.87 > (17.1%) -2.0% ( -30% - 37%) 0.710 > MedTermDayTaxoFacets9.62 (4.2%)9.51 > (4.1%) -1.2% ( -9% -7%) 0.368 > TermDTSort 74.69 (18.0%) 74.13 > (18.2%) -0.7% ( -31% - 43%) 0.897 >HighTermDayOfYearSort 52.64 (16.1%) 52.32 > (15.4%) -0.6% ( -27% - 36%) 0.903 >BrowseMonthTaxoFacets8.64 (19.1%)8.59 > (19.8%) -0.6% ( -33% - 47%) 0.926 > BrowseDateSSDVFacets0.86 (9.5%)0.86 > (13.1%) -0.4% ( -20% - 24%) 0.914 > PKLookup 147.18 (3.9%) 146.66 > (3.3%) -0.3% ( -7% -7%) 0.759 >BrowseDayOfYearSSDVFacets3.47 (4.5%)3.45 > (4.8%) -0.3% ( -9% -9%) 0.822 > Wildcard 36.36 (4.4%) 36.26 > (5.2%) -0.3% ( -9% -9%) 0.866 >BrowseMonthSSDVFacets4.15 (12.7%)4.13 >
[jira] [Created] (LUCENE-10639) WANDScorer performs better without two-phase
Greg Miller created LUCENE-10639: Summary: WANDScorer performs better without two-phase Key: LUCENE-10639 URL: https://issues.apache.org/jira/browse/LUCENE-10639 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Greg Miller After looking at the recent improvement [~jpountz] made to WAND scoring in LUCENE-10634, which does additional work during match confirmation to not confirm a match who's score wouldn't be competitive, I wanted to see how performance would shift if we squashed the two-phase iteration completely and only returned true matches (that were also known to be competitive by score) in the "approximation" phase. I was a bit surprised to find that luceneutil benchmarks (run with {{{}wikimediumall{}}}), improves significantly on some disjunction tasks and doesn't show significant regressions anywhere else. Note that I used LUCENE-10634 as a baseline, and built my candidate change on top of that. The diff can be seen here: [DIFF|https://github.com/gsmiller/lucene/compare/b2d46440998fe4a972e8cc8c948580111359ed0f..c5bab794c92dbc66e70f9389948c1bdfe9b45231] A simple conclusion here might be that we shouldn't do two-phase iteration in WANDScorer, but I'm pretty sure that's not right. I wonder if what's really going on is that we're under-estimating the cost of confirming a match? Right now we just return the tail size as the cost. While the cost of confirming a match is proportional to the tail size, the actual work involved can be quite significant (having to advance tail iterators to new blocks and decompress them). I wonder if the WAND second phase is being run too early on approximate candidates, and if less-expensive, (and even possibly more restrictive?), second phases could/should be running first? I'm raising this here as more of a curiosity to see if it sparks ideas on how to move forward. Again, I'm not proposing we do away with two-phase iteration, but it seems we might be able to improve things. Maybe I'll explore changing the cost heuristic next. Also, maybe there's some different benchmarking that would be useful here that I may not be familiar with? Benchmark results on wikimediumall: {code:java} TaskQPS baseline StdDevQPS candidate StdDevPct diff p-value HighTermTitleBDVSort 22.52 (18.9%) 21.66 (15.6%) -3.8% ( -32% - 37%) 0.485 Prefix39.38 (9.2%)9.09 (10.6%) -3.1% ( -20% - 18%) 0.326 HighTermMonthSort 25.37 (16.0%) 24.87 (17.1%) -2.0% ( -30% - 37%) 0.710 MedTermDayTaxoFacets9.62 (4.2%)9.51 (4.1%) -1.2% ( -9% -7%) 0.368 TermDTSort 74.69 (18.0%) 74.13 (18.2%) -0.7% ( -31% - 43%) 0.897 HighTermDayOfYearSort 52.64 (16.1%) 52.32 (15.4%) -0.6% ( -27% - 36%) 0.903 BrowseMonthTaxoFacets8.64 (19.1%)8.59 (19.8%) -0.6% ( -33% - 47%) 0.926 BrowseDateSSDVFacets0.86 (9.5%)0.86 (13.1%) -0.4% ( -20% - 24%) 0.914 PKLookup 147.18 (3.9%) 146.66 (3.3%) -0.3% ( -7% -7%) 0.759 BrowseDayOfYearSSDVFacets3.47 (4.5%)3.45 (4.8%) -0.3% ( -9% -9%) 0.822 Wildcard 36.36 (4.4%) 36.26 (5.2%) -0.3% ( -9% -9%) 0.866 BrowseMonthSSDVFacets4.15 (12.7%)4.13 (12.8%) -0.3% ( -22% - 28%) 0.950 AndHighMedDayTaxoFacets 15.21 (2.7%) 15.18 (2.9%) -0.2% ( -5% -5%) 0.819 Fuzzy1 68.33 (1.8%) 68.22 (2.0%) -0.2% ( -3% -3%) 0.783 OrHighMedDayTaxoFacets2.90 (4.1%)2.89 (4.0%) -0.1% ( -7% -8%) 0.930 MedPhrase 52.81 (2.3%) 52.76 (1.8%) -0.1% ( -4% -4%) 0.878 Respell 36.80 (1.9%) 36.78 (1.9%) -0.1% ( -3% -3%) 0.933 Fuzzy2 63.06 (1.9%) 63.05 (2.1%) -0.0% ( -3% -4%) 0.971 LowPhrase 74.60 (1.9%) 74.61 (1.8%)0.0% ( -3% -3%) 0.987 AndHighHighDayTaxoFacets4.54 (2.3%)4.55 (2.0%)0.0% ( -4% -4%) 0.960 HighPhrase 353.13 (2.6%) 353.28 (2.5%)0.0% ( -4% -5%) 0.958 OrNotHighHigh 761.72 (4.0%) 762.48 (3.6%)0.1% ( -7% -8%) 0.935 OrHighNotLow 1129.94 (4.1%) 1131.56
[GitHub] [lucene-jira-archive] mocobeta commented on issue #12: Make a test set for improving markup conversion quality
mocobeta commented on issue #12: URL: https://github.com/apache/lucene-jira-archive/issues/12#issuecomment-1172897263 Supported block elements in [Jira Text Formatting](https://jira.atlassian.com/secure/WikiRendererHelpAction.jspa?section=all) - `-`: not included - `o`: correctly rendered - `x`: broken |Issue|Quote|Bullet list|Numbered list|Noformat/Code|Table| |-|-|-|-|-|-| |[LUCENE-10560](https://github.com/mocobeta/sandbox-lucene-10557/issues/10793)|-|-|-|o|-| [LUCENE-10559](https://github.com/mocobeta/sandbox-lucene-10557/issues/10792)|-|-|-|-|x| |[LUCENE-10557](https://github.com/mocobeta/migration-test-2/issues/155)|x|x|x|o|x| |[LUCENE-10544](https://github.com/mocobeta/sandbox-lucene-10557/issues/10777)|x|x|-|o|-| -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts
gsmiller commented on code in PR #974: URL: https://github.com/apache/lucene/pull/974#discussion_r912362078 ## lucene/demo/src/java/org/apache/lucene/demo/facet/DistanceFacetsExample.java: ## @@ -212,7 +212,26 @@ public static Query getBoundingBoxQuery( } /** User runs a query and counts facets. */ - public FacetResult search() throws IOException { + public FacetResult searchAllChildren() throws IOException { + +FacetsCollector fc = searcher.search(new MatchAllDocsQuery(), new FacetsCollectorManager()); + +Facets facets = +new DoubleRangeFacetCounts( +"field", +getDistanceValueSource(), +fc, +getBoundingBoxQuery(ORIGIN_LATITUDE, ORIGIN_LONGITUDE, 10.0), +ONE_KM, +TWO_KM, +FIVE_KM, +TEN_KM); + +return facets.getAllChildren("field"); + } + + /** User runs a query and counts facets. */ + public FacetResult searchTopChildren() throws IOException { Review Comment: Thanks! I don't think this is quite what I was suggesting, but maybe it's OK? Or maybe I haven't had enough coffee yet and am misreading your change? But the idea I was proposing would be to faceting on one-hour buckets, not a trailing window as you have. So each range would only be one hour "wide", but we'd cover an entire week's worth of them. This type of log analysis is something you might want to do in the real-world, where you want to understand how many errors occurred in each hour over the last week. So you're essentially creating a histogram of error counts over a week-long time period (in one-hour granularity). And then you're answering the question, "what are the top-5 hours that contained the most errors?" Does this make sense? I think what you have here is a "trailing window" sort of thing that grows but the endpoint is always at "now"? I'm not sure this is all that interesting of an example since the larger ranges will _always_ contain more errors right? Sorry if I'm mis-reading your new change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #2: Archive all Jira attachments
mocobeta commented on issue #2: URL: https://github.com/apache/lucene-jira-archive/issues/2#issuecomment-1172890744 > This is first time I learned of this cool `dust` command! What a great name too. Yes, it's one of my favorite CLI tools (a bit slower than `du` but fine for my usage). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #2: Archive all Jira attachments
mikemccand commented on issue #2: URL: https://github.com/apache/lucene-jira-archive/issues/2#issuecomment-1172888349 Thanks @mocobeta! This is first time I learned of this cool `dust` command! What a great name too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #2: Archive all Jira attachments
mocobeta commented on issue #2: URL: https://github.com/apache/lucene-jira-archive/issues/2#issuecomment-1172884299 Links that point to the files in the `attachments` branch are also fine. ![Screenshot from 2022-07-02 20-39-13](https://user-images.githubusercontent.com/1825333/176999240-b77f9e0f-ebeb-43b2-85d2-93ec2333e5e7.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #2: Archive all Jira attachments
mocobeta commented on issue #2: URL: https://github.com/apache/lucene-jira-archive/issues/2#issuecomment-1172880262 Instead of `main`, I committed all attachments (the latest snapshot) to `attachments` branch. The download script looks to work fine. I'm closing this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta closed issue #2: Archive all Jira attachments
mocobeta closed issue #2: Archive all Jira attachments URL: https://github.com/apache/lucene-jira-archive/issues/2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta opened a new issue, #12: Make a test set for #1
mocobeta opened a new issue, #12: URL: https://github.com/apache/lucene-jira-archive/issues/12 There are too many patterns where markup can be broken. Can't make sure that an ad-hoc fix for one problem does not introduce another bug. Instead of arbitrary choosing examples, it'd be good to have a carefully selected test set that includes various markup combinations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #11: Selectively create github cross-issue link
mocobeta commented on issue #11: URL: https://github.com/apache/lucene-jira-archive/issues/11#issuecomment-1172875774 It's awkward but may not be a blocker. At least we need to check if this does not break hyperlinks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta opened a new issue, #11: Selectively create github cross-issue link
mocobeta opened a new issue, #11: URL: https://github.com/apache/lucene-jira-archive/issues/11 Currently, we blindly create cross-issue links - for example, "LUCENE-.patch" is incorrectly replaced with "LUCENE- (#YYY).patch". Ideally, only Jira issue keys should be replaced. As an ad-hoc solution, maybe check if there are spaces before/after issue keys... (this can make false negatives - if an issue key is in a table cell, there may not be spaces but `|` before/after the issue key.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org