[GitHub] [lucene] rmuir commented on pull request #12087: Graduate DocValuesNumbersQuery from lucene/sandbox to newSlowSetQuery()

2023-01-14 Thread GitBox


rmuir commented on PR #12087:
URL: https://github.com/apache/lucene/pull/12087#issuecomment-1383064967

   the benchmark above uses queries such as `"la|21,22,23",// 2226 hits`
   
   in this case we form a boolean query of TermQuery:"la" AND admin2code in 
(21,22,23). The admin2 codes are typically county level in most countries and 
each one these numbers match many documents: e.g. 100,000+
   
   I ran the benchmark on full geonames (11M+ docs)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12087: Graduate DocValuesNumbersQuery from lucene/sandbox to newSlowSetQuery()

2023-01-14 Thread GitBox


rmuir commented on PR #12087:
URL: https://github.com/apache/lucene/pull/12087#issuecomment-1383064386

   Here's my benchmarks with attached java program: 
[NumSetBenchmark.java.txt](https://github.com/apache/lucene/files/10419558/NumSetBenchmark.java.txt)
   * `main` uses `IntPoint.newSetQuery` on main branch
   * `patch` uses `IntField.newSetQuery` on this branch.
   
   The purpose was to run different batches of "hard" queries to look for 
performance regressions (not using numeric IDs, but terms of various density 
intersecting integer sets of various density). The reported time in ms. is the 
time it takes to run the batch
   
   I don't see any problems:
   
   | Query Set  | main (IndexSearcher.count) | patch (IndexSearcher.count) | 
main (IndexSearcher.search) | patch (IndexSearcher.search)
   | - | - | - | - | 
- |
   | BIG_BIG | 14.43ms  | 11.30ms  | 11.75ms  | 5.98ms  |
   | MEDIUM_BIG  | 16.45ms  | 6.25ms  | 17.08ms  | 5.66ms  |
   | SMALL_BIG  | 17.54ms  | 2.00ms  | 18.43ms  | 2.52ms  |
   | BIG_MEDIUM | 5.50ms  | 4.54ms  | 5.90ms | 5.00ms  |
   | MEDIUM_MEDIUM | 6.39ms  | 3.70ms  | 7.13ms  | 4.70ms  |
   | SMALL_MEDIUM | 6.64ms  | 1.43ms  | 6.98ms  | 1.70ms  |
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12087: Graduate DocValuesNumbersQuery from lucene/sandbox to newSlowSetQuery()

2023-01-14 Thread GitBox


rmuir commented on PR #12087:
URL: https://github.com/apache/lucene/pull/12087#issuecomment-1382954253

   intended as followups:
   * look into PointRangeQuery and implement necessary estimation for 
IndexOrDocValuesQuery to do the right thing
   * Add newSetQuery() to IntField/LongField/DoubleField/FloatField, that uses 
IndexOrDocValuesQuery(PointRangeQuery, ThisQuery)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #12028: Add newSetQuery for IntField, LongField, FloatField, DoubleField

2023-01-14 Thread GitBox


rmuir commented on issue #12028:
URL: https://github.com/apache/lucene/issues/12028#issuecomment-1382953573

   I don't think it is good to degrade to `BooleanQuery` when using points or 
doc-values, it will only hurt performance.
   
   Let's add `NumericDocValuesField.newSlowSetQuery()` and 
`SortedNumericDocValuesField.newSlowSetQuery()` to complement the doc-values 
based range queries?
   
   Query in fact already exist, but needs to be cleaned up since they have been 
"hiding" in `lucene/sandbox`.  See PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new pull request, #12087: Graduate DocValuesNumbersQuery from lucene/sandbox to newSlowSetQuery()

2023-01-14 Thread GitBox


rmuir opened a new pull request, #12087:
URL: https://github.com/apache/lucene/pull/12087

   Clean up this query a bit, and move it around to support:
   
   * NumericDocValuesField.newSlowSetQuery()
   * SortedNumericDocValuesField.newSlowSetQuery()
   
   This complements the existing docvalues-based range queries, with a set 
query.
   
   Later we can hook this into IntField/LongField/FloatField/DoubleField via 
IndexOrDocValuesQuery.
   
   In general cleanup was not a big deal, involves:
   * fix code to use e.g. DocValues.isCacheable rather than assuming docvalues 
can't be updated
   * implement optimized codepath for single-valued fields
   * in general, try to be consistent with SortedNumericDocValuesRangeQuery as 
much as possible
   
   Relates to #12028


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir merged pull request #12086: Upgrade to errorprone 2.18

2023-01-14 Thread GitBox


rmuir merged PR #12086:
URL: https://github.com/apache/lucene/pull/12086


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir closed issue #12057: ban finalizers in the build somehow (worst-case: use error-prone)

2023-01-14 Thread GitBox


rmuir closed issue #12057: ban finalizers in the build somehow (worst-case: use 
error-prone)
URL: https://github.com/apache/lucene/issues/12057


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new pull request, #12086: Upgrade to errorprone 2.18

2023-01-14 Thread GitBox


rmuir opened a new pull request, #12086:
URL: https://github.com/apache/lucene/pull/12086

   Went thru the new checks as usual. Now that `Finalize` has our bugfix, I 
enabled it.
   
   Closes #12057


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir merged pull request #12056: Update to error-prone 2.17

2023-01-14 Thread GitBox


rmuir merged PR #12056:
URL: https://github.com/apache/lucene/pull/12056


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir merged pull request #12038: remove non-NRT replication support

2023-01-14 Thread GitBox


rmuir merged PR #12038:
URL: https://github.com/apache/lucene/pull/12038


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir closed issue #11381: remove non-NRT replication support [LUCENE-10345]

2023-01-14 Thread GitBox


rmuir closed issue #11381: remove non-NRT replication support [LUCENE-10345]
URL: https://github.com/apache/lucene/issues/11381


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2023-01-14 Thread GitBox


benwtrent commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1382728572

   This for sure has to do with reading for the memory offsets and then reading 
the neighbors. 
   
   I can dig into this a little bit next week unless somebody else has a really 
good idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #12079: Speed up 1D BKD merging.

2023-01-14 Thread GitBox


jpountz commented on PR #12079:
URL: https://github.com/apache/lucene/pull/12079#issuecomment-1382690674

   The last data point at 
https://people.apache.org/~mikemccand/lucenebench/sparseResults.html#tot_merge_times
 has a drop for overall merging that I expect to be mostly contributed by this 
change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2023-01-14 Thread GitBox


jpountz commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1382689973

   For reference, there seems to be a 6-7% QPS drop on nightly benchmarks 
associated with this change. 
https://people.apache.org/~mikemccand/lucenebench/VectorSearch.html I think 
it's fine, just noting it in case someone wants to double check whether there's 
something obvious that can be improved, but overall the big gains in space 
efficiency are worth this small slowdown in my opinion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org