[GitHub] [lucene] rmuir commented on pull request #12087: Graduate DocValuesNumbersQuery from lucene/sandbox to newSlowSetQuery()
rmuir commented on PR #12087: URL: https://github.com/apache/lucene/pull/12087#issuecomment-1383064967 the benchmark above uses queries such as `"la|21,22,23",// 2226 hits` in this case we form a boolean query of TermQuery:"la" AND admin2code in (21,22,23). The admin2 codes are typically county level in most countries and each one these numbers match many documents: e.g. 100,000+ I ran the benchmark on full geonames (11M+ docs) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12087: Graduate DocValuesNumbersQuery from lucene/sandbox to newSlowSetQuery()
rmuir commented on PR #12087: URL: https://github.com/apache/lucene/pull/12087#issuecomment-1383064386 Here's my benchmarks with attached java program: [NumSetBenchmark.java.txt](https://github.com/apache/lucene/files/10419558/NumSetBenchmark.java.txt) * `main` uses `IntPoint.newSetQuery` on main branch * `patch` uses `IntField.newSetQuery` on this branch. The purpose was to run different batches of "hard" queries to look for performance regressions (not using numeric IDs, but terms of various density intersecting integer sets of various density). The reported time in ms. is the time it takes to run the batch I don't see any problems: | Query Set | main (IndexSearcher.count) | patch (IndexSearcher.count) | main (IndexSearcher.search) | patch (IndexSearcher.search) | - | - | - | - | - | | BIG_BIG | 14.43ms | 11.30ms | 11.75ms | 5.98ms | | MEDIUM_BIG | 16.45ms | 6.25ms | 17.08ms | 5.66ms | | SMALL_BIG | 17.54ms | 2.00ms | 18.43ms | 2.52ms | | BIG_MEDIUM | 5.50ms | 4.54ms | 5.90ms | 5.00ms | | MEDIUM_MEDIUM | 6.39ms | 3.70ms | 7.13ms | 4.70ms | | SMALL_MEDIUM | 6.64ms | 1.43ms | 6.98ms | 1.70ms | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12087: Graduate DocValuesNumbersQuery from lucene/sandbox to newSlowSetQuery()
rmuir commented on PR #12087: URL: https://github.com/apache/lucene/pull/12087#issuecomment-1382954253 intended as followups: * look into PointRangeQuery and implement necessary estimation for IndexOrDocValuesQuery to do the right thing * Add newSetQuery() to IntField/LongField/DoubleField/FloatField, that uses IndexOrDocValuesQuery(PointRangeQuery, ThisQuery) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #12028: Add newSetQuery for IntField, LongField, FloatField, DoubleField
rmuir commented on issue #12028: URL: https://github.com/apache/lucene/issues/12028#issuecomment-1382953573 I don't think it is good to degrade to `BooleanQuery` when using points or doc-values, it will only hurt performance. Let's add `NumericDocValuesField.newSlowSetQuery()` and `SortedNumericDocValuesField.newSlowSetQuery()` to complement the doc-values based range queries? Query in fact already exist, but needs to be cleaned up since they have been "hiding" in `lucene/sandbox`. See PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new pull request, #12087: Graduate DocValuesNumbersQuery from lucene/sandbox to newSlowSetQuery()
rmuir opened a new pull request, #12087: URL: https://github.com/apache/lucene/pull/12087 Clean up this query a bit, and move it around to support: * NumericDocValuesField.newSlowSetQuery() * SortedNumericDocValuesField.newSlowSetQuery() This complements the existing docvalues-based range queries, with a set query. Later we can hook this into IntField/LongField/FloatField/DoubleField via IndexOrDocValuesQuery. In general cleanup was not a big deal, involves: * fix code to use e.g. DocValues.isCacheable rather than assuming docvalues can't be updated * implement optimized codepath for single-valued fields * in general, try to be consistent with SortedNumericDocValuesRangeQuery as much as possible Relates to #12028 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir merged pull request #12086: Upgrade to errorprone 2.18
rmuir merged PR #12086: URL: https://github.com/apache/lucene/pull/12086 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir closed issue #12057: ban finalizers in the build somehow (worst-case: use error-prone)
rmuir closed issue #12057: ban finalizers in the build somehow (worst-case: use error-prone) URL: https://github.com/apache/lucene/issues/12057 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new pull request, #12086: Upgrade to errorprone 2.18
rmuir opened a new pull request, #12086: URL: https://github.com/apache/lucene/pull/12086 Went thru the new checks as usual. Now that `Finalize` has our bugfix, I enabled it. Closes #12057 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir merged pull request #12056: Update to error-prone 2.17
rmuir merged PR #12056: URL: https://github.com/apache/lucene/pull/12056 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir merged pull request #12038: remove non-NRT replication support
rmuir merged PR #12038: URL: https://github.com/apache/lucene/pull/12038 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir closed issue #11381: remove non-NRT replication support [LUCENE-10345]
rmuir closed issue #11381: remove non-NRT replication support [LUCENE-10345] URL: https://github.com/apache/lucene/issues/11381 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on pull request #11860: GITHUB-11830 Better optimize storage for vector connections
benwtrent commented on PR #11860: URL: https://github.com/apache/lucene/pull/11860#issuecomment-1382728572 This for sure has to do with reading for the memory offsets and then reading the neighbors. I can dig into this a little bit next week unless somebody else has a really good idea. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #12079: Speed up 1D BKD merging.
jpountz commented on PR #12079: URL: https://github.com/apache/lucene/pull/12079#issuecomment-1382690674 The last data point at https://people.apache.org/~mikemccand/lucenebench/sparseResults.html#tot_merge_times has a drop for overall merging that I expect to be mostly contributed by this change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #11860: GITHUB-11830 Better optimize storage for vector connections
jpountz commented on PR #11860: URL: https://github.com/apache/lucene/pull/11860#issuecomment-1382689973 For reference, there seems to be a 6-7% QPS drop on nightly benchmarks associated with this change. https://people.apache.org/~mikemccand/lucenebench/VectorSearch.html I think it's fine, just noting it in case someone wants to double check whether there's something obvious that can be improved, but overall the big gains in space efficiency are worth this small slowdown in my opinion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org