[GitHub] [lucene-solr] itygh commented on pull request #2676: SOLR-16626: Upgrade to Netty 4.1.87.Final

2023-01-20 Thread GitBox


itygh commented on PR #2676:
URL: https://github.com/apache/lucene-solr/pull/2676#issuecomment-1398560219

   这是来自QQ邮箱的假期自动回复邮件。您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy opened a new pull request, #2676: SOLR-16626: Upgrade to Netty 4.1.87.Final

2023-01-20 Thread GitBox


janhoy opened a new pull request, #2676:
URL: https://github.com/apache/lucene-solr/pull/2676

   Backport from 9.x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] risdenk commented on pull request #12029: introduce support in KnnVectorQuery for getters/setters

2023-01-20 Thread GitBox


risdenk commented on PR #12029:
URL: https://github.com/apache/lucene/pull/12029#issuecomment-1398549982

   Re: immutable Query in Solr - See 
https://issues.apache.org/jira/browse/SOLR-16509 and 
https://github.com/apache/solr/pull/1146


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] alessandrobenedetti commented on pull request #12029: introduce support in KnnVectorQuery for getters/setters

2023-01-20 Thread GitBox


alessandrobenedetti commented on PR #12029:
URL: https://github.com/apache/lucene/pull/12029#issuecomment-1398524390

   waiting for the checks and then I'll merge tonight!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] alessandrobenedetti commented on a diff in pull request #12029: introduce support in KnnVectorQuery for getters/setters

2023-01-20 Thread GitBox


alessandrobenedetti commented on code in PR #12029:
URL: https://github.com/apache/lucene/pull/12029#discussion_r1082676520


##
lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java:
##
@@ -33,6 +33,7 @@
 import org.apache.lucene.store.Directory;
 import org.apache.lucene.util.TestVectorUtil;
 import org.apache.lucene.util.VectorUtil;
+import org.junit.Assert;

Review Comment:
   done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mkhludnev commented on issue #11218: GraphTokenStreamFiniteStrings#articulationPointsRecurse can run into stack overflows [LUCENE-10181]

2023-01-20 Thread GitBox


mkhludnev commented on issue #11218:
URL: https://github.com/apache/lucene/issues/11218#issuecomment-1398122320

   @hassenome , can you share versions, stacktrace and invocation arguments?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] hassenome commented on issue #11218: GraphTokenStreamFiniteStrings#articulationPointsRecurse can run into stack overflows [LUCENE-10181]

2023-01-20 Thread GitBox


hassenome commented on issue #11218:
URL: https://github.com/apache/lucene/issues/11218#issuecomment-1398087851

   Hello,
   We are facing this error, as a root cause for a feature used by 
ElasticSearch. We are wondering if there is an update to this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma closed issue #12097: TestIndexSortSortedNumericDocValuesRangeQuery.testCountBoundary failure

2023-01-19 Thread GitBox


vigyasharma closed issue #12097: 
TestIndexSortSortedNumericDocValuesRangeQuery.testCountBoundary failure
URL: https://github.com/apache/lucene/issues/12097


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma merged pull request #12098: Fix failure in TestIndexSortSortedNumericDocValuesRangeQuery

2023-01-19 Thread GitBox


vigyasharma merged PR #12098:
URL: https://github.com/apache/lucene/pull/12098


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12098: Fix failure in TestIndexSortSortedNumericDocValuesRangeQuery

2023-01-19 Thread GitBox


rmuir commented on PR #12098:
URL: https://github.com/apache/lucene/pull/12098#issuecomment-1397749519

   if a test wants to enforce it only has one segment, it should 
`forceMerge()`, make use of `LuceneTestCase.getOnlyLeafReader()`, etc. 
   
   Otherwise the number of segments can vary based on flushing/merging, 
especially when using `LuceneTestCase.newIndexWriterConfig()`, 
`RandomIndexWriter`, etc. so in general, we want to avoid assertions that rely 
upon certain segment structure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jmazanec15 commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-19 Thread GitBox


jmazanec15 commented on code in PR #12050:
URL: https://github.com/apache/lucene/pull/12050#discussion_r1081896861


##
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##
@@ -94,36 +93,83 @@ public int size() {
   }
 
   /**
-   * Add node on the given level
+   * Add node on the given level. Nodes can be inserted out of order, but it 
requires that the nodes

Review Comment:
   Updated structure to use treemap to represent upper levels of graph.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jmazanec15 commented on pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-19 Thread GitBox


jmazanec15 commented on PR #12050:
URL: https://github.com/apache/lucene/pull/12050#issuecomment-1397643952

   Per [this 
discussion](https://github.com/apache/lucene/pull/12050#discussion_r1061034056),
 I refactored OnHeapHnswGraph to use a TreeMap to represent the graph structure 
for levels greater than 0. I ran performance tests with the same setup as 
https://github.com/apache/lucene/issues/11354#issuecomment-1239961308, and the 
results did not show a significant difference in indexing time between my 
previous implementation, the implementation using the map, and the current 
implementation with no merge optimization. Additionally, the results did not 
show a difference in merge time between by previous implementation and the 
implementation using the map.
   
   Here are the results:
   
   ###  Segment Size 10K
   
   
   Exper. | Total indexing time (s) | Total time to merge numeric vectors (ms) 
| Recall
   -- | -- | -- | --
   Control-1 | 189s | 697280 | 0.979
   Control-2 | 190s | 722042 | 0.979
   Control-3 | 191s | 713402 | 0.979
   Test-array 1 | 190s | 683966 | 0.98
   Test-array 2 | 187s | 683584 | 0.98
   Test-array 3 | 190s | 702458 | 0.98
   Test-map 1 | 189s | 723582 | 0.98
   Test-map 2 | 187s | 658196 | 0.98
   Test-map 3 | 190s | 66 | 0.98
   
   ###  Segment Size 100K
   
   Exper. | Total indexing time (s) | Total time to merge numeric vectors (ms) 
| Recall
   -- | -- | -- | --
   Control-1 | 366s | 675361 | 0.981
   Control-2 | 370s | 695974 | 0.981
   Control-3 | 367s | 684418 | 0.981
   Test-array 1 | 368s | 651814 | 0.981
   Test-array 2 | 368s | 654862 | 0.981
   Test-array 3 | 368s | 656062 | 0.981
   Test-map 1  | 364s | 637257 | 0.981
   Test-map 2  | 370s | 628755 | 0.981
   Test-map 3 | 366s | 647569 | 0.981
   
   ###  Segment Size 500K
   
   Exper. | Total indexing time (s) | Total time to merge numeric vectors (ms) 
| Recall
   -- | -- | -- | --
   Control-1 | 633s | 655538 | 0.98
   Control-2 | 631s | 664622 | 0.98
   Control-3 | 627s | 635919 | 0.98
   Test-array 1 | 639s | 376139 | 0.98
   Test-array 2 | 636s | 378071 | 0.98
   Test-array 3 | 638s | 352633 | 0.98
   Test-map 1  | 645s | 373572 | 0.98
   Test-map 2  | 635s | 374309 | 0.98
   Test-map 3 | 633s | 381212 | 0.98
   
   Given that the results do not show a significant difference, I switched to 
use the treemap to avoid multiple large array copies.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma opened a new pull request, #12098: Fix failure in TestIndexSortSortedNumericDocValuesRangeQuery

2023-01-19 Thread GitBox


vigyasharma opened a new pull request, #12098:
URL: https://github.com/apache/lucene/pull/12098

   Fixes bug in `TestIndexSortSortedNumericDocValuesRangeQuery. 
testCountBoundary`. 
   Addresses #12097 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma commented on issue #12097: TestIndexSortSortedNumericDocValuesRangeQuery.testCountBoundary failure

2023-01-19 Thread GitBox


vigyasharma commented on issue #12097:
URL: https://github.com/apache/lucene/issues/12097#issuecomment-1397542712

   Wait.. I think the assert should simply be on total no. of documents, not 
documents per leaf. Something like:
   ```java
   int count = 0;
   for (LeafReaderContext context : searcher.getLeafContexts()) {
 count += weight.count(context);
   }
   assertEquals(2, count);
   ```
   This passes on my workspace. I'll raise a PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma opened a new issue, #12097: TestIndexSortSortedNumericDocValuesRangeQuery.testCountBoundary failure

2023-01-19 Thread GitBox


vigyasharma opened a new issue, #12097:
URL: https://github.com/apache/lucene/issues/12097

   ### Description
   
   Found this test failing in Lucene-Check-9.x - Build # 4239. 
   
   **Steps to repro:**
   ```ruby
   gradlew test --tests 
TestIndexSortSortedNumericDocValuesRangeQuery.testCountBoundary 
-Dtests.seed=A11A06AE642497F1 -Dtests.multiplier=2 -Dtests.locale=en-KE 
-Dtests.timezone=Singapore -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   ```
   
   I've been able to repro this failure in both `branch_9x` as well as `main`. 
This is what I found with a debugger - 
   
   It seems that in this test seed, the missing document [(created 
here)](https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/search/TestIndexSortSortedNumericDocValuesRangeQuery.java#L589),
 goes into a separate leaf. Then for that leaf context, [the 
IteratorAndCount](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSortSortedNumericDocValuesRangeQuery.java#L200)
 has `count = -1`, taking the flow to the fallback query. 
   And since the document does not have a points value, the fallback query gets 
`PointValues == null`, leading it to return a 0 
[here](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java#L381).
   
   I'm not very familiar with this part of Lucene, but I guess it is likely for 
the missing document to land in a leaf of its own. Would it help if the missing 
doc had a points value but for a different field?
   
   ### Gradle command to reproduce
   
   ```ruby
   gradlew test --tests 
TestIndexSortSortedNumericDocValuesRangeQuery.testCountBoundary 
-Dtests.seed=A11A06AE642497F1 -Dtests.multiplier=2 -Dtests.locale=en-KE 
-Dtests.timezone=Singapore -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12094: releaseWizard: allow explicitly setting MANIFEST.MF userid (e.g., to apache id)

2023-01-19 Thread GitBox


uschindler commented on PR #12094:
URL: https://github.com/apache/lucene/pull/12094#issuecomment-1397386789

   I am fine with both PRs, both technically correct. I don't care about 
username. If I would do a relaese I would insert "policeman" into the artifacts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] magibney commented on a diff in pull request #12094: releaseWizard: allow explicitly setting MANIFEST.MF userid (e.g., to apache id)

2023-01-19 Thread GitBox


magibney commented on code in PR #12094:
URL: https://github.com/apache/lucene/pull/12094#discussion_r1081614175


##
gradle/java/jar-manifest.gradle:
##
@@ -46,7 +46,9 @@ subprojects {
 if (snapshotBuild) {
   return "${project.version} ${gitRev} [snapshot build, details 
omitted]"
 } else {
-  return "${project.version} ${gitRev} - 
${System.properties['user.name']} - ${buildDate} ${buildTime}"
+  def sysProps = System.properties

Review Comment:
   I've adjusted this PR accordingly, but following on Robert's feedback also 
opened a PR that simply removes the username from MANIFEST.MF. Assuming we want 
to go that direction, we can just close this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] magibney opened a new pull request, #12096: remove username from MANIFEST.MF in build artifacts

2023-01-19 Thread GitBox


magibney opened a new pull request, #12096:
URL: https://github.com/apache/lucene/pull/12096

   Following on discussion from #12094


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] magibney commented on pull request #12094: releaseWizard: allow explicitly setting MANIFEST.MF userid (e.g., to apache id)

2023-01-19 Thread GitBox


magibney commented on PR #12094:
URL: https://github.com/apache/lucene/pull/12094#issuecomment-1397358626

   > Should we just remove the username from the manifest? This doesn't make 
sense to me, we don't put usernames anywhere else (e.g. no @author at apache)...
   
   This seems fine to me. The RM is associated with the release via the GPG 
signature, which really is the only meaningful association with the release. 
That would substantially change the nature of this PR though. If we go that 
route I'll close this and open a new PR for removing the username from 
manifest? Is any further discussion necessary to go ahead with removing the 
username from the manifest?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] magibney commented on pull request #12095: buildAndPushRelease should optionally pause before assembleRelease

2023-01-19 Thread GitBox


magibney commented on PR #12095:
URL: https://github.com/apache/lucene/pull/12095#issuecomment-1397344248

   The main reason I didn't make this the default is because I'm not sure 
whether running this through the releaseWizard would support user input. I'm 
using the releaseWizard to guide me through the steps but running them all 
manually, so user input is definitely supported. TBH I'm not sure -- maybe 
releaseWizard _would_ support user input to subcommands?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on pull request #12085: update releaseWizard.py to support offline gpg key

2023-01-19 Thread GitBox


javanna commented on PR #12085:
URL: https://github.com/apache/lucene/pull/12085#issuecomment-1396993570

   Thanks @magibney !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna merged pull request #12085: update releaseWizard.py to support offline gpg key

2023-01-19 Thread GitBox


javanna merged PR #12085:
URL: https://github.com/apache/lucene/pull/12085


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12094: releaseWizard: allow explicitly setting MANIFEST.MF userid (e.g., to apache id)

2023-01-19 Thread GitBox


rmuir commented on PR #12094:
URL: https://github.com/apache/lucene/pull/12094#issuecomment-1396942514

   I have also witnessed harassment from solr users towards the person whose 
name happens to be in there. Please, lets remove the username.
   
   If I am ignored and this option is kept, I will use the option to populate 
something extremely offensive into the field rather than my real username, and 
create a release candidate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12094: releaseWizard: allow explicitly setting MANIFEST.MF userid (e.g., to apache id)

2023-01-19 Thread GitBox


rmuir commented on PR #12094:
URL: https://github.com/apache/lucene/pull/12094#issuecomment-1396882440

   Should we just remove the username from the manifest? This doesn't make 
sense to me, we don't put usernames anywhere else (e.g. no `@author` at 
apache)...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek commented on pull request #12095: buildAndPushRelease should optionally pause before assembleRelease

2023-01-19 Thread GitBox


romseygeek commented on PR #12095:
URL: https://github.com/apache/lucene/pull/12095#issuecomment-1396781927

   +1, this has caught me multiple times!  I think I'd personally make it the 
default but I don't know if others have things set up so that they don't need 
to type in their GPG pin.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma commented on issue #12000: Lucene-facet leaves ThreadLocal that creates a memory leak

2023-01-18 Thread GitBox


vigyasharma commented on issue #12000:
URL: https://github.com/apache/lucene/issues/12000#issuecomment-1396551522

   Removed UTF8TaxonomyWriterCache from main, and deprecated it in 9.x. We now 
default to LruTaxonomyWriterCache. PRs have been merged in. Closing this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma closed issue #12000: Lucene-facet leaves ThreadLocal that creates a memory leak

2023-01-18 Thread GitBox


vigyasharma closed issue #12000: Lucene-facet leaves ThreadLocal that creates a 
memory leak
URL: https://github.com/apache/lucene/issues/12000


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma commented on issue #12082: LeafFieldComparator setBottom not being called before compareBottom

2023-01-18 Thread GitBox


vigyasharma commented on issue #12082:
URL: https://github.com/apache/lucene/issues/12082#issuecomment-1396549638

   I think you're right that `bottom` should be scoped outside the 
`LeafFieldComparator`. It stores the bottom slot value for competitive hits and 
should survive across leaf contexts.
   
   I checked a few FieldComparator implementations however, and I do see  it 
scoped outside the LeafFieldComparator. For e.g. 
[DoubleComparator](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/comparators/DoubleComparator.java#L32),
 and 
[DocComparator](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/comparators/DocComparator.java#L31)
   
   This also seems to be the case in Lucene 8.11.2 
([[1]](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/comparators/DoubleComparator.java#L33),
 
[[2]](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/comparators/DocComparator.java#L34))
   
   Can you share code references/links for some comparators where you see this 
is an issue? Or perhaps a test which reproduces this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang merged pull request #12084: Same bound with fallbackQuery

2023-01-18 Thread GitBox


LuXugang merged PR #12084:
URL: https://github.com/apache/lucene/pull/12084


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12094: releaseWizard: allow explicitly setting MANIFEST.MF userid (e.g., to apache id)

2023-01-18 Thread GitBox


uschindler commented on code in PR #12094:
URL: https://github.com/apache/lucene/pull/12094#discussion_r1080680435


##
dev-tools/scripts/buildAndPushRelease.py:
##
@@ -120,6 +120,8 @@ def prepare(root, version, gpg_key_id, gpg_password, 
gpg_home=None, sign_gradle=
   print('  prepare-release')
   cmd = './gradlew --no-daemon assembleRelease' \
 ' -Dversion.release=%s' % version

Review Comment:
   Actually this should also be `-P`, but it won't break, but for consistency.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12094: releaseWizard: allow explicitly setting MANIFEST.MF userid (e.g., to apache id)

2023-01-18 Thread GitBox


uschindler commented on code in PR #12094:
URL: https://github.com/apache/lucene/pull/12094#discussion_r1080679818


##
dev-tools/scripts/buildAndPushRelease.py:
##
@@ -120,6 +120,8 @@ def prepare(root, version, gpg_key_id, gpg_password, 
gpg_home=None, sign_gradle=
   print('  prepare-release')
   cmd = './gradlew --no-daemon assembleRelease' \
 ' -Dversion.release=%s' % version
+  if mf_username is not None:
+cmd += ' -Dmanifest.username=%s' % mf_username

Review Comment:
   This should be `-Pmanifest.username=%s`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12094: releaseWizard: allow explicitly setting MANIFEST.MF userid (e.g., to apache id)

2023-01-18 Thread GitBox


uschindler commented on code in PR #12094:
URL: https://github.com/apache/lucene/pull/12094#discussion_r1080678559


##
gradle/java/jar-manifest.gradle:
##
@@ -46,7 +46,9 @@ subprojects {
 if (snapshotBuild) {
   return "${project.version} ${gitRev} [snapshot build, details 
omitted]"
 } else {
-  return "${project.version} ${gitRev} - 
${System.properties['user.name']} - ${buildDate} ${buildTime}"
+  def sysProps = System.properties

Review Comment:
   Please don't use system properties directly for build properties; with 
gradle it should be project properties. Our build system has a method to get 
project properties which also falls back to sysprops. In short: use 
`propertyOrDefault('manifest.username', System.properties['user.name'])`
   
   "user.name" is a real system property, so it is correct to use it here 
(otherwise you could fake it). But the project property should be given by 
gradle. This also allows to set it in your local gradle.properties.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12094: releaseWizard: allow explicitly setting MANIFEST.MF userid (e.g., to apache id)

2023-01-18 Thread GitBox


uschindler commented on code in PR #12094:
URL: https://github.com/apache/lucene/pull/12094#discussion_r1080678559


##
gradle/java/jar-manifest.gradle:
##
@@ -46,7 +46,9 @@ subprojects {
 if (snapshotBuild) {
   return "${project.version} ${gitRev} [snapshot build, details 
omitted]"
 } else {
-  return "${project.version} ${gitRev} - 
${System.properties['user.name']} - ${buildDate} ${buildTime}"
+  def sysProps = System.properties

Review Comment:
   Please don't use system properties directly, with gradle it should be 
project propreties. Our build system has a method to get project properties 
which also falls back to sysprops. In short: use 
`propertyOrDefault('manifest.username', System.properties['user.name'])`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma merged pull request #12093: Deprecate support for UTF8TaxonomyWriterCache

2023-01-18 Thread GitBox


vigyasharma merged PR #12093:
URL: https://github.com/apache/lucene/pull/12093


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jmazanec15 commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-18 Thread GitBox


jmazanec15 commented on code in PR #12050:
URL: https://github.com/apache/lucene/pull/12050#discussion_r1080646383


##
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##
@@ -94,36 +93,83 @@ public int size() {
   }
 
   /**
-   * Add node on the given level
+   * Add node on the given level. Nodes can be inserted out of order, but it 
requires that the nodes

Review Comment:
   Added a commit for it here: 
https://github.com/jmazanec15/lucene/commit/9c54de56fa37a35bdff241abd9ebe3a6f1d8ba3a.
 Running some performance tests to compare results.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma merged pull request #12092: Remove UTF8TaxonomyWriterCache

2023-01-18 Thread GitBox


vigyasharma merged PR #12092:
URL: https://github.com/apache/lucene/pull/12092


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma commented on pull request #12093: Deprecate support for UTF8TaxonomyWriterCache

2023-01-18 Thread GitBox


vigyasharma commented on PR #12093:
URL: https://github.com/apache/lucene/pull/12093#issuecomment-1387643504

   > hange the default implementation in branch_9x to LRU as well? (either here 
on this issue or via #12092). I think it would be good to not default to the 
deprecated impl.
   
   Ah, good point. I'll update this PR to change the default here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12093: Deprecate support for UTF8TaxonomyWriterCache

2023-01-18 Thread GitBox


rmuir commented on PR #12093:
URL: https://github.com/apache/lucene/pull/12093#issuecomment-1387508863

   @vigyasharma do you intend to change the default implementation in branch_9x 
to LRU as well? (either here on this issue or via #12092). I think it would be 
good to not default to the deprecated impl.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] magibney opened a new pull request, #12095: buildAndPushRelease should optionally pause before assembleRelease

2023-01-18 Thread GitBox


magibney opened a new pull request, #12095:
URL: https://github.com/apache/lucene/pull/12095

   buildAndPushRelease currently proceeds directly from running tests to 
assembling the release (and signing jars). Since assembleRelease prompts for 
GPG key PIN, it can easily happen that the RM steps away while tests are 
running, and returns to find that tests have completed, but the script has 
failed due to timing out waiting for GPG pinentry in the `assembleRelease` 
step. To address this issue, this PR adds a (optional, non-default) pause for 
user confirmation before proceeding to the `assembleRelease` phase. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] magibney opened a new pull request, #12094: releaseWizard: allow explicitly setting MANIFEST.MF userid (e.g., to apache id)

2023-01-18 Thread GitBox


magibney opened a new pull request, #12094:
URL: https://github.com/apache/lucene/pull/12094

   buildAndPushRelease (release script) currently sets the username portion of 
the `ImplementationVersion` property MANIFEST.MF entry for built jars according 
the local machine username of the active user. It is straightforward to support 
explicitly setting this value, allowing for official Apache release artifacts 
to consistently indicate the apache Id of the release manager.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #12091: Speeding up Lucene Vector Similarity through the Java Vector API

2023-01-18 Thread GitBox


rmuir commented on issue #12091:
URL: https://github.com/apache/lucene/issues/12091#issuecomment-1386986370

   There is nothing to do here about it. Convince OpenJDK to stop hostaging the 
vector api in incubating status like they have done for years.
   
   When it is at least in "Preview" status then we can use it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #12090: Building a Lucene posting format that leverages the Java Vector API

2023-01-18 Thread GitBox


rmuir commented on issue #12090:
URL: https://github.com/apache/lucene/issues/12090#issuecomment-1386986113

   There is nothing to do here about it. Convince OpenJDK to stop hostaging the 
vector api in incubating status like they have done for years.
   
   When it is at least in "Preview" status then we can use it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #11902: Customization of Edit distance costs for different operations

2023-01-18 Thread GitBox


rmuir commented on issue #11902:
URL: https://github.com/apache/lucene/issues/11902#issuecomment-1386981136

   this would be far too trappy, entirely too slow. use toy python libraries 
like the one referenced if you want to build toys, but this is a library for 
building search engines


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir closed issue #11902: Customization of Edit distance costs for different operations

2023-01-18 Thread GitBox


rmuir closed issue #11902: Customization of Edit distance costs for different 
operations
URL: https://github.com/apache/lucene/issues/11902


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mohamedniyaz1996 commented on issue #11902: Customization of Edit distance costs for different operations

2023-01-18 Thread GitBox


mohamedniyaz1996 commented on issue #11902:
URL: https://github.com/apache/lucene/issues/11902#issuecomment-1386830082

   @tang-hi , I agree it will be a dip in performance - but still it can be 
provided as a feature with a warning about performance drop.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma commented on pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

2023-01-17 Thread GitBox


vigyasharma commented on PR #12013:
URL: https://github.com/apache/lucene/pull/12013#issuecomment-1386565076

   PR - https://github.com/apache/lucene/pull/12093 to deprecate 
`UTF8TaxonomyWriterCache` in 9.x
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma opened a new pull request, #12093: Deprecate support for UTF8TaxonomyWriterCache

2023-01-17 Thread GitBox


vigyasharma opened a new pull request, #12093:
URL: https://github.com/apache/lucene/pull/12093

   As discussed in PR #12013 , deprecating support for 
`UTF8TaxonomyWriterCache` in branch_9x.
   Addresses #12000 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma merged pull request #12045: fix typo in KoreanNumberFilter

2023-01-17 Thread GitBox


vigyasharma merged PR #12045:
URL: https://github.com/apache/lucene/pull/12045


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma closed pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

2023-01-17 Thread GitBox


vigyasharma closed pull request #12013: Clear thread local values on 
UTF8TaxonomyWriterCache.close()
URL: https://github.com/apache/lucene/pull/12013


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma commented on pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

2023-01-17 Thread GitBox


vigyasharma commented on PR #12013:
URL: https://github.com/apache/lucene/pull/12013#issuecomment-1386545577

   Created a separate PR - #12092 to remove support for 
`UTF8TaxonomyWriterCache` from main. Will close this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma opened a new pull request, #12092: Remove UTF8TaxonomyWriterCache

2023-01-17 Thread GitBox


vigyasharma opened a new pull request, #12092:
URL: https://github.com/apache/lucene/pull/12092

   As per the discussion in PR #12013, this change removes the never evicting 
`UTF8TaxonomyWriterCache` and uses `LruTaxonomyWriterCache` as the default 
taxonomy writer cache implementation.
   
   Addresses #12000 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jebnix commented on issue #11870: Create a Markdown based documentation

2023-01-17 Thread GitBox


jebnix commented on issue #11870:
URL: https://github.com/apache/lucene/issues/11870#issuecomment-1386297416

   @uschindler That's nice, but I personally miss two things about the Lucene 
repo:
   1. The ability to find the documentation in a central place (that makes the 
contribution much easier). That's the way most repositories manage project 
documentation. The Javadoc in my opinion is good for code-related notes, and 
separate docs (that are currently inside `package-info.java`) - inside a 
`docs/` dir.
   2. Some generated **good-looking** documentation site. I suggest Docusaurus 
usage, so the docs will get written in MD, and look like [this for 
example](https://redux.js.org/).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mulugetam opened a new issue, #12091: Speeding up Lucene Vector Similarity through the Java Vector API

2023-01-17 Thread GitBox


mulugetam opened a new issue, #12091:
URL: https://github.com/apache/lucene/issues/12091

   ### Description
   
   Lucene's implementation of ANN relies on a scalar implementation of the 
vector similarity functions 
[dot-product,](https://github.com/apache/lucene/blob/4fe8424925ca404d335fa41d261545d3182c22fa/lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java#L53)
 [Euclidean 
distance](https://github.com/apache/lucene/blob/4fe8424925ca404d335fa41d261545d3182c22fa/lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java#L34),
 and 
[cosine](https://github.com/apache/lucene/blob/4fe8424925ca404d335fa41d261545d3182c22fa/lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java#L71).
 The vector implementation of these functions is quite straightforward. 
   
   Below is performance data I got, based on JMH, comparing the vector 
implementation of the `dot product` and `Euclidean` against the equivalent 
default (scalar with loop-unrolling) implementation. 
   
   `dim` is the dimension/length of the `float[]` arrays in test and `score` is 
the number of dot product/Euclidean distance operations done per second.
   
   ```
   Benchmark   dim  ModeCnt Score   
Units   Gain
   
--
   scalarDotProduct 60  thrpt   1232031825.541 ±   6151.580 
ops/s   1.00
   scalarDotProduct 120 thrpt   1217120537.911 ±   5793.505 
ops/s   1.00
   scalarDotProduct 480 thrpt   12 4506350.215 ±   1677.755 
ops/s   1.00
   vectorDotProduct 60  thrpt   1298862701.038 ±  85554.695 
ops/s   3.09
   vectorDotProduct 120 thrpt   1299059913.888 ±  20609.182 
ops/s   5.79
   vectorDotProduct 480 thrpt   12  220320941.436  ± 173467.603 
ops/s   48.89
   ```
   
   ```
   Benchmark   dim  ModeCnt Score   
Units   Gain
   
--
   scalarSquareDistance 60  thrpt   12  25890614.822 ±  
7071.413 ops/s  1.00
   scalarSquareDistance 120 thrpt   12  12524294.760 ±  
3435.882 ops/s  1.00
   scalarSquareDistance 480 thrpt   12   3145045.026 ±   
409.361 ops/s  1.00
   vectorSquareDistance 60  thrpt   12 104317302.765 ± 
36895.474 ops/s  4.03
   vectorSquareDistance 120 thrpt   12 122083614.889 ± 
11821.642 ops/s  9.75
   vectorSquareDistance 480 thrpt   12 362229408.898 ± 
85439.065 ops/s  115.17
   ```
   
   I have also tested the same with [Msokolov's ANN benchmark 
suite](https://github.com/msokolov/ann-benchmarks) and saw a speedup of more 
than 2x in indexing (docs/sec) and search performance (QPS). Will do a PR for 
it soon.
   
   Let's discuss this :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mulugetam opened a new issue, #12090: Building a Lucene posting format that leverages the Java Vector API

2023-01-17 Thread GitBox


mulugetam opened a new issue, #12090:
URL: https://github.com/apache/lucene/issues/12090

   ### Description
   
   This issue is to start a conversation on implementing a vectorized encoding 
and decoding scheme for postings. 
   
   A few months ago, we implemented vectorized integer compression based on the 
[JavaFastPFOR](https://github.com/lemire/JavaFastPFOR) library. That code has 
since been [merged](https://github.com/lemire/JavaFastPFOR/pull/51). 
Performance results, based on [JMH](https://github.com/openjdk/jmh), show 
[significant gains](https://github.com/mulugetam/VectorJavaFastPFOR) in 
performance compared to the default JavaFastPFOR. 
   
   We would, of course, need to benchmark the vectorized PostingsFormat against 
the existing implementation.
   
   @jpountz  What's your take on it, and how should we go about it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a diff in pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox


gsmiller commented on code in PR #12089:
URL: https://github.com/apache/lucene/pull/12089#discussion_r1072874867


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/queries/TermInSetQuery.java:
##
@@ -0,0 +1,527 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.queries;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.Objects;
+import java.util.Set;
+import org.apache.lucene.index.DocValues;
+import org.apache.lucene.index.DocValuesType;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.LeafReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.index.SortedDocValues;
+import org.apache.lucene.index.SortedSetDocValues;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.index.TermState;
+import org.apache.lucene.index.Terms;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.search.ConstantScoreScorer;
+import org.apache.lucene.search.ConstantScoreWeight;
+import org.apache.lucene.search.DisiPriorityQueue;
+import org.apache.lucene.search.DisiWrapper;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.MatchNoDocsQuery;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.search.QueryVisitor;
+import org.apache.lucene.search.ScoreMode;
+import org.apache.lucene.search.Scorer;
+import org.apache.lucene.search.ScorerSupplier;
+import org.apache.lucene.search.TermQuery;
+import org.apache.lucene.search.TwoPhaseIterator;
+import org.apache.lucene.search.Weight;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.DocIdSetBuilder;
+import org.apache.lucene.util.LongBitSet;
+import org.apache.lucene.util.PriorityQueue;
+
+public class TermInSetQuery extends Query {
+  // TODO: tunable coefficients. need to actually tune them (or maybe these 
are too complex and not
+  // useful)
+  private static final double J = 1.0;
+  private static final double K = 1.0;
+  // L: postings lists under this threshold will always be "pre-processed" 
into a bitset
+  private static final int L = 512;
+  // M: max number of clauses we'll manage/check during scoring (these remain 
"unprocessed")
+  private static final int M = Math.min(IndexSearcher.getMaxClauseCount(), 64);
+
+  private final String field;
+  // TODO: Not particularly memory-efficient; could use prefix-coding here but 
sorting isn't free
+  private final BytesRef[] terms;
+  private final int termsHashCode;
+
+  public TermInSetQuery(String field, Collection terms) {
+this.field = field;
+
+final Set uniqueTerms;
+if (terms instanceof Set) {
+  uniqueTerms = (Set) terms;
+} else {
+  uniqueTerms = new HashSet<>(terms);
+}
+this.terms = new BytesRef[uniqueTerms.size()];
+Iterator it = uniqueTerms.iterator();
+for (int i = 0; i < uniqueTerms.size(); i++) {
+  assert it.hasNext();
+  this.terms[i] = it.next();
+}
+// TODO: compute lazily?
+termsHashCode = Arrays.hashCode(this.terms);
+  }
+
+  @Override
+  public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, 
float boost)
+  throws IOException {
+
+return new ConstantScoreWeight(this, boost) {
+
+  @Override
+  public Scorer scorer(LeafReaderContext context) throws IOException {
+ScorerSupplier supplier = scorerSupplier(context);
+if (supplier == null) {
+  return null;
+} else {
+  return supplier.get(Long.MAX_VALUE);
+}
+  }
+
+  @Override
+  public ScorerSupplier scorerSupplier(LeafReaderContext context) throws 
IOException {
+if (terms.length <= 1) {
+  throw new IllegalStateException("Must call IndexSearcher#rewrite");
+}
+
+// If the field doesn't exist in the segment, return null:
+LeafReader reader = context.reader();
+FieldInfo fi = reader.getFieldInfos().fieldInfo(field);
+if (fi == null) {
+  return null;
+}
+
+return 

[GitHub] [lucene] gsmiller commented on a diff in pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox


gsmiller commented on code in PR #12089:
URL: https://github.com/apache/lucene/pull/12089#discussion_r1072872477


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/queries/TermInSetQuery.java:
##
@@ -0,0 +1,527 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.queries;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.Objects;
+import java.util.Set;
+import org.apache.lucene.index.DocValues;
+import org.apache.lucene.index.DocValuesType;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.LeafReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.index.SortedDocValues;
+import org.apache.lucene.index.SortedSetDocValues;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.index.TermState;
+import org.apache.lucene.index.Terms;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.search.ConstantScoreScorer;
+import org.apache.lucene.search.ConstantScoreWeight;
+import org.apache.lucene.search.DisiPriorityQueue;
+import org.apache.lucene.search.DisiWrapper;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.MatchNoDocsQuery;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.search.QueryVisitor;
+import org.apache.lucene.search.ScoreMode;
+import org.apache.lucene.search.Scorer;
+import org.apache.lucene.search.ScorerSupplier;
+import org.apache.lucene.search.TermQuery;
+import org.apache.lucene.search.TwoPhaseIterator;
+import org.apache.lucene.search.Weight;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.DocIdSetBuilder;
+import org.apache.lucene.util.LongBitSet;
+import org.apache.lucene.util.PriorityQueue;
+
+public class TermInSetQuery extends Query {
+  // TODO: tunable coefficients. need to actually tune them (or maybe these 
are too complex and not
+  // useful)
+  private static final double J = 1.0;
+  private static final double K = 1.0;
+  // L: postings lists under this threshold will always be "pre-processed" 
into a bitset
+  private static final int L = 512;
+  // M: max number of clauses we'll manage/check during scoring (these remain 
"unprocessed")
+  private static final int M = Math.min(IndexSearcher.getMaxClauseCount(), 64);
+
+  private final String field;
+  // TODO: Not particularly memory-efficient; could use prefix-coding here but 
sorting isn't free
+  private final BytesRef[] terms;

Review Comment:
   That's a good point/perspective. I'm convinced. It's easy enough to borrow 
those ideas from our existing `TermInSetQuery`, so I'll do that. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a diff in pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox


gsmiller commented on code in PR #12089:
URL: https://github.com/apache/lucene/pull/12089#discussion_r1072871141


##
lucene/core/src/java/org/apache/lucene/search/DisiWrapper.java:
##
@@ -57,4 +57,14 @@ public DisiWrapper(Scorer scorer) {
   matchCost = 0f;
 }
   }
+
+  public DisiWrapper(DocIdSetIterator iterator) {

Review Comment:
   This change is common to #12055, so I'm hoping we'd actually land it as part 
of that work and not needed just for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox


rmuir commented on code in PR #12089:
URL: https://github.com/apache/lucene/pull/12089#discussion_r1072855208


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/queries/TermInSetQuery.java:
##
@@ -0,0 +1,527 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.queries;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.Objects;
+import java.util.Set;
+import org.apache.lucene.index.DocValues;
+import org.apache.lucene.index.DocValuesType;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.LeafReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.index.SortedDocValues;
+import org.apache.lucene.index.SortedSetDocValues;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.index.TermState;
+import org.apache.lucene.index.Terms;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.search.ConstantScoreScorer;
+import org.apache.lucene.search.ConstantScoreWeight;
+import org.apache.lucene.search.DisiPriorityQueue;
+import org.apache.lucene.search.DisiWrapper;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.MatchNoDocsQuery;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.search.QueryVisitor;
+import org.apache.lucene.search.ScoreMode;
+import org.apache.lucene.search.Scorer;
+import org.apache.lucene.search.ScorerSupplier;
+import org.apache.lucene.search.TermQuery;
+import org.apache.lucene.search.TwoPhaseIterator;
+import org.apache.lucene.search.Weight;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.DocIdSetBuilder;
+import org.apache.lucene.util.LongBitSet;
+import org.apache.lucene.util.PriorityQueue;
+
+public class TermInSetQuery extends Query {
+  // TODO: tunable coefficients. need to actually tune them (or maybe these 
are too complex and not
+  // useful)
+  private static final double J = 1.0;
+  private static final double K = 1.0;
+  // L: postings lists under this threshold will always be "pre-processed" 
into a bitset
+  private static final int L = 512;
+  // M: max number of clauses we'll manage/check during scoring (these remain 
"unprocessed")
+  private static final int M = Math.min(IndexSearcher.getMaxClauseCount(), 64);
+
+  private final String field;
+  // TODO: Not particularly memory-efficient; could use prefix-coding here but 
sorting isn't free
+  private final BytesRef[] terms;

Review Comment:
   you can use this tool when benchmarking to help make sure index no longer 
fits in RAM: 
https://github.com/mikemccand/luceneutil/blob/b48e7f49b19c27367737436214cc1ce7e67ad32c/src/python/ramhog.c
   
   or you can open up the computer and remove DIMMs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox


rmuir commented on code in PR #12089:
URL: https://github.com/apache/lucene/pull/12089#discussion_r1072841550


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/queries/TermInSetQuery.java:
##
@@ -0,0 +1,527 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.queries;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.Objects;
+import java.util.Set;
+import org.apache.lucene.index.DocValues;
+import org.apache.lucene.index.DocValuesType;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.LeafReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.index.SortedDocValues;
+import org.apache.lucene.index.SortedSetDocValues;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.index.TermState;
+import org.apache.lucene.index.Terms;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.search.ConstantScoreScorer;
+import org.apache.lucene.search.ConstantScoreWeight;
+import org.apache.lucene.search.DisiPriorityQueue;
+import org.apache.lucene.search.DisiWrapper;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.MatchNoDocsQuery;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.search.QueryVisitor;
+import org.apache.lucene.search.ScoreMode;
+import org.apache.lucene.search.Scorer;
+import org.apache.lucene.search.ScorerSupplier;
+import org.apache.lucene.search.TermQuery;
+import org.apache.lucene.search.TwoPhaseIterator;
+import org.apache.lucene.search.Weight;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.DocIdSetBuilder;
+import org.apache.lucene.util.LongBitSet;
+import org.apache.lucene.util.PriorityQueue;
+
+public class TermInSetQuery extends Query {
+  // TODO: tunable coefficients. need to actually tune them (or maybe these 
are too complex and not
+  // useful)
+  private static final double J = 1.0;
+  private static final double K = 1.0;
+  // L: postings lists under this threshold will always be "pre-processed" 
into a bitset
+  private static final int L = 512;
+  // M: max number of clauses we'll manage/check during scoring (these remain 
"unprocessed")
+  private static final int M = Math.min(IndexSearcher.getMaxClauseCount(), 64);
+
+  private final String field;
+  // TODO: Not particularly memory-efficient; could use prefix-coding here but 
sorting isn't free
+  private final BytesRef[] terms;

Review Comment:
   When I think of worst-case, I'm assuming where these term dictionaries don't 
even fit in RAM. We should really always assume this stuff doesn't fit in RAM :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox


gsmiller commented on PR #12089:
URL: https://github.com/apache/lucene/pull/12089#issuecomment-1386068784

   @rmuir 
   > I was naively thinking to try to the same approach with the 
DocValuesTermsQuery that is in sandbox...
   
   I think that's probably a good place to start honestly. I was thinking of 
introducing this in the sandbox module initially and not actually hooking it 
into `KeywordField`. Then maybe following up by graduating it and hooking it 
in? But maybe it's worth doing initially if we can get it right? I dunno.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a diff in pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox


gsmiller commented on code in PR #12089:
URL: https://github.com/apache/lucene/pull/12089#discussion_r1072835306


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/queries/TermInSetQuery.java:
##
@@ -0,0 +1,527 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.queries;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.Objects;
+import java.util.Set;
+import org.apache.lucene.index.DocValues;
+import org.apache.lucene.index.DocValuesType;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.LeafReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.index.SortedDocValues;
+import org.apache.lucene.index.SortedSetDocValues;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.index.TermState;
+import org.apache.lucene.index.Terms;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.search.ConstantScoreScorer;
+import org.apache.lucene.search.ConstantScoreWeight;
+import org.apache.lucene.search.DisiPriorityQueue;
+import org.apache.lucene.search.DisiWrapper;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.MatchNoDocsQuery;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.search.QueryVisitor;
+import org.apache.lucene.search.ScoreMode;
+import org.apache.lucene.search.Scorer;
+import org.apache.lucene.search.ScorerSupplier;
+import org.apache.lucene.search.TermQuery;
+import org.apache.lucene.search.TwoPhaseIterator;
+import org.apache.lucene.search.Weight;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.DocIdSetBuilder;
+import org.apache.lucene.util.LongBitSet;
+import org.apache.lucene.util.PriorityQueue;
+
+public class TermInSetQuery extends Query {
+  // TODO: tunable coefficients. need to actually tune them (or maybe these 
are too complex and not
+  // useful)
+  private static final double J = 1.0;
+  private static final double K = 1.0;
+  // L: postings lists under this threshold will always be "pre-processed" 
into a bitset
+  private static final int L = 512;
+  // M: max number of clauses we'll manage/check during scoring (these remain 
"unprocessed")
+  private static final int M = Math.min(IndexSearcher.getMaxClauseCount(), 64);
+
+  private final String field;
+  // TODO: Not particularly memory-efficient; could use prefix-coding here but 
sorting isn't free
+  private final BytesRef[] terms;

Review Comment:
   I generally agree. I was testing some interesting use-cases though with our 
current `TermInSetQuery` where we have 10,000+ terms (PK-type field), and the 
sorting it does actually came with a significant cost (and we got a pretty good 
win by removing it). But that's a bit of a different use-case maybe now that I 
think about it. We have a bloom filter in place as well, and a good share of 
the terms aren't actually in the index. So there's that...
   
   OK, yeah, we probably ought to sort here in a general implementation :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox


rmuir commented on code in PR #12089:
URL: https://github.com/apache/lucene/pull/12089#discussion_r1072830614


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/queries/TermInSetQuery.java:
##
@@ -0,0 +1,527 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.queries;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.Objects;
+import java.util.Set;
+import org.apache.lucene.index.DocValues;
+import org.apache.lucene.index.DocValuesType;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.LeafReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.index.SortedDocValues;
+import org.apache.lucene.index.SortedSetDocValues;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.index.TermState;
+import org.apache.lucene.index.Terms;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.search.ConstantScoreScorer;
+import org.apache.lucene.search.ConstantScoreWeight;
+import org.apache.lucene.search.DisiPriorityQueue;
+import org.apache.lucene.search.DisiWrapper;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.MatchNoDocsQuery;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.search.QueryVisitor;
+import org.apache.lucene.search.ScoreMode;
+import org.apache.lucene.search.Scorer;
+import org.apache.lucene.search.ScorerSupplier;
+import org.apache.lucene.search.TermQuery;
+import org.apache.lucene.search.TwoPhaseIterator;
+import org.apache.lucene.search.Weight;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.DocIdSetBuilder;
+import org.apache.lucene.util.LongBitSet;
+import org.apache.lucene.util.PriorityQueue;
+
+public class TermInSetQuery extends Query {
+  // TODO: tunable coefficients. need to actually tune them (or maybe these 
are too complex and not
+  // useful)
+  private static final double J = 1.0;
+  private static final double K = 1.0;
+  // L: postings lists under this threshold will always be "pre-processed" 
into a bitset
+  private static final int L = 512;
+  // M: max number of clauses we'll manage/check during scoring (these remain 
"unprocessed")
+  private static final int M = Math.min(IndexSearcher.getMaxClauseCount(), 64);
+
+  private final String field;
+  // TODO: Not particularly memory-efficient; could use prefix-coding here but 
sorting isn't free
+  private final BytesRef[] terms;

Review Comment:
   but this is really bad for performance to be unsorted: it means we do a 
bunch of random access lookups in the terms dictionaries: looping over these 
unsorted terms and doing seekExact, looping over these unsorted terms and doing 
lookupOrd that could instead easily be sequential/more friendly.
   
   Given that sometimes looking up all the terms is a very heavy cost for this 
thing, calling `Arrays.sort()` in the ctor seems like an easy-win/no-brainer. 
As is the prefix-coding to keep the RAM usage lower in the worst-case. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox


rmuir commented on PR #12089:
URL: https://github.com/apache/lucene/pull/12089#issuecomment-1385982331

   I modified the benchmark from #12087 to just use StringField instead of 
IntField. The queries are supposed to be "hard" in that I'm not trying to 
benchmark what is necessarily typical, instead target "hard" stuff that is more 
worst-case (e.g. we shouldn't cause regressions vs `new PointInSetQuery()` in 
main branch today): 
[StringSetBenchmark.java.txt](https://github.com/apache/lucene/files/10438949/StringSetBenchmark.java.txt)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox


rmuir commented on PR #12089:
URL: https://github.com/apache/lucene/pull/12089#issuecomment-1385954675

   Thanks for looking at this. I can alter benchmark from #12087 to test this 
case, honestly we could even just take the benchmark and index the numeric 
field as a string instead as a start :)
   
   In the case of numeric fields, we just had crazy query in the sandbox which 
is better as e.g. NumericDocValues.newSlowSetQuery. And then we hooked into 
IntField etc as newSlowSetQuery with the IndexOrDocValuesQuery. For that one, 
we had to fix PointInSetQuery to support ScorerSupplier etc (but TermInSetQuery 
already has this cost estimation)
   
   I was naively thinking to try to the same approach with the 
DocValuesTermsQuery that is in sandbox... though I anticipated maybe more 
trickiness with inverted index as opposed to points. But maybe 
IndexOrDocValuesQuery would surprise me again, of course its probably worth 
exploring anyway, we could compare the approaches. I do like 
IndexOrDocValuesQuery for solving these problems and if we can improve it to 
keep it generic, i'd definitely be in favor of that. But whatever is fastest 
wins :)
   
   I do think its important to add these fields such as KeywordField and put 
this best-practice logic behind simple methods so that it is easier on the user.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #12054: Introduce a new `KeywordField`.

2023-01-17 Thread GitBox


gsmiller commented on PR #12054:
URL: https://github.com/apache/lucene/pull/12054#issuecomment-1385952712

   Somewhat related to this PR, I've been experimenting with the idea of a 
"self optimizing" `TermInSetQuery` implementation that toggles between using 
postings and doc values based on index statistics, etc. I wanted to link that 
idea here as it's a bit related (requires indexing both postings and dv, which 
this PR makes easy). This is just an early idea, but I'll link an early draft 
here in case anyone is curious or has thoughts: #12089


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller opened a new pull request, #12089: [DRAFT] Explore TermInSet Query that "self optimizes"

2023-01-17 Thread GitBox


gsmiller opened a new pull request, #12089:
URL: https://github.com/apache/lucene/pull/12089

   ### Description
   
   This is a DRAFT PR to sketch out the idea of a "self optimizing" 
TermInSetQuery. The idea is to build on the new `KeywordField` being proposed 
in #12054, which indexes both postings and DV data. It takes a bit of a 
different approach though as compared to `IndexOrDocValuesQuery` by 
"internally" deciding whether to use postings vs. doc values (at the segment 
granularity).
   
   Please note that there are many TODOs in here and I haven't done any 
benchmarking, etc. I've written light tests to convince myself it works (I've 
made sure all branches have been exercised), but it's highly likely there are 
bugs.
   
   I'm putting this out there for discussion only. My plan is to benchmark this 
approach as a next step, but I wanted to float the idea early to see if anyone 
has feedback or other ideas. Also, if someone loves the idea and wants to run 
with it, please go for it. I'm pretty busy for the next couple of week and I'm 
not sure when I'll come back to this.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir closed issue #11869: Add RangeOnRangeFacetCounts

2023-01-17 Thread GitBox


rmuir closed issue #11869: Add RangeOnRangeFacetCounts
URL: https://github.com/apache/lucene/issues/11869


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #11795: Add FilterDirectory to track write amplification factor

2023-01-17 Thread GitBox


rmuir commented on issue #11795:
URL: https://github.com/apache/lucene/issues/11795#issuecomment-1385823162

   Closing as the PR has been merged and is in the 9.5.0 section of CHANGES.txt


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir closed issue #11795: Add FilterDirectory to track write amplification factor

2023-01-17 Thread GitBox


rmuir closed issue #11795: Add FilterDirectory to track write amplification 
factor
URL: https://github.com/apache/lucene/issues/11795


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #11869: Add RangeOnRangeFacetCounts

2023-01-17 Thread GitBox


rmuir commented on issue #11869:
URL: https://github.com/apache/lucene/issues/11869#issuecomment-1385822481

   Closing as the PR has been merged and is in the 9.5.0 section of CHANGES.txt


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir merged pull request #12087: Graduate DocValuesNumbersQuery from lucene/sandbox to newSlowSetQuery()

2023-01-16 Thread GitBox


rmuir merged PR #12087:
URL: https://github.com/apache/lucene/pull/12087


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #12087: Graduate DocValuesNumbersQuery from lucene/sandbox to newSlowSetQuery()

2023-01-16 Thread GitBox


rmuir commented on code in PR #12087:
URL: https://github.com/apache/lucene/pull/12087#discussion_r1071265859


##
lucene/core/src/java/org/apache/lucene/document/NumericDocValuesField.java:
##
@@ -97,6 +97,27 @@ SortedNumericDocValues getValues(LeafReader reader, String 
field) throws IOExcep
 };
   }
 
+  /**
+   * Create a query matching any of the specified values.
+   *
+   * NOTE: Such queries cannot efficiently advance to the next 
match, which makes them
+   * slow if they are not ANDed with a selective query. As a consequence, they 
are best used wrapped
+   * in an {@link IndexOrDocValuesQuery}, alongside a set query that executes 
on points, such as
+   * {@link LongPoint#newSetQuery}.

Review Comment:
   oh, duh, thank you. will fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a diff in pull request #12087: Graduate DocValuesNumbersQuery from lucene/sandbox to newSlowSetQuery()

2023-01-16 Thread GitBox


jpountz commented on code in PR #12087:
URL: https://github.com/apache/lucene/pull/12087#discussion_r1071254421


##
lucene/core/src/java/org/apache/lucene/document/NumericDocValuesField.java:
##
@@ -97,6 +97,27 @@ SortedNumericDocValues getValues(LeafReader reader, String 
field) throws IOExcep
 };
   }
 
+  /**
+   * Create a query matching any of the specified values.
+   *
+   * NOTE: Such queries cannot efficiently advance to the next 
match, which makes them
+   * slow if they are not ANDed with a selective query. As a consequence, they 
are best used wrapped
+   * in an {@link IndexOrDocValuesQuery}, alongside a set query that executes 
on points, such as
+   * {@link LongPoint#newSetQuery}.

Review Comment:
   Maybe link to LongField and other similar fields?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek commented on pull request #12088: Don't throw UOE when highlighting FieldExistsQuery

2023-01-16 Thread GitBox


romseygeek commented on PR #12088:
URL: https://github.com/apache/lucene/pull/12088#issuecomment-1383932158

   Thanks for the review @mkhludnev! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek merged pull request #12088: Don't throw UOE when highlighting FieldExistsQuery

2023-01-16 Thread GitBox


romseygeek merged PR #12088:
URL: https://github.com/apache/lucene/pull/12088


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek opened a new pull request, #12088: Don't throw UOE when highlighting FieldExistsQuery

2023-01-16 Thread GitBox


romseygeek opened a new pull request, #12088:
URL: https://github.com/apache/lucene/pull/12088

   WeightedSpanTermExtractor will try to rewrite queries that it doesn't
   know about, to see if they end up as something it does know about and
   that it can extract terms from.  To support field merging, it rewrites 
against
   a delegating leaf reader that does not support `getFieldInfos()`.
   
   FieldExistsQuery uses `getFieldInfos()` in its rewrite, which means that 
   if one is passed to WeightedSpanTermExtractor, we get an
   UnsupportedOperationException thrown.
   
   This commit makes WeightedSpanTermExtractor aware of FieldExistsQuery,
   so that it can just ignore it and avoid throwing an exception.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12087: Graduate DocValuesNumbersQuery from lucene/sandbox to newSlowSetQuery()

2023-01-14 Thread GitBox


rmuir commented on PR #12087:
URL: https://github.com/apache/lucene/pull/12087#issuecomment-1383064967

   the benchmark above uses queries such as `"la|21,22,23",// 2226 hits`
   
   in this case we form a boolean query of TermQuery:"la" AND admin2code in 
(21,22,23). The admin2 codes are typically county level in most countries and 
each one these numbers match many documents: e.g. 100,000+
   
   I ran the benchmark on full geonames (11M+ docs)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12087: Graduate DocValuesNumbersQuery from lucene/sandbox to newSlowSetQuery()

2023-01-14 Thread GitBox


rmuir commented on PR #12087:
URL: https://github.com/apache/lucene/pull/12087#issuecomment-1383064386

   Here's my benchmarks with attached java program: 
[NumSetBenchmark.java.txt](https://github.com/apache/lucene/files/10419558/NumSetBenchmark.java.txt)
   * `main` uses `IntPoint.newSetQuery` on main branch
   * `patch` uses `IntField.newSetQuery` on this branch.
   
   The purpose was to run different batches of "hard" queries to look for 
performance regressions (not using numeric IDs, but terms of various density 
intersecting integer sets of various density). The reported time in ms. is the 
time it takes to run the batch
   
   I don't see any problems:
   
   | Query Set  | main (IndexSearcher.count) | patch (IndexSearcher.count) | 
main (IndexSearcher.search) | patch (IndexSearcher.search)
   | - | - | - | - | 
- |
   | BIG_BIG | 14.43ms  | 11.30ms  | 11.75ms  | 5.98ms  |
   | MEDIUM_BIG  | 16.45ms  | 6.25ms  | 17.08ms  | 5.66ms  |
   | SMALL_BIG  | 17.54ms  | 2.00ms  | 18.43ms  | 2.52ms  |
   | BIG_MEDIUM | 5.50ms  | 4.54ms  | 5.90ms | 5.00ms  |
   | MEDIUM_MEDIUM | 6.39ms  | 3.70ms  | 7.13ms  | 4.70ms  |
   | SMALL_MEDIUM | 6.64ms  | 1.43ms  | 6.98ms  | 1.70ms  |
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12087: Graduate DocValuesNumbersQuery from lucene/sandbox to newSlowSetQuery()

2023-01-14 Thread GitBox


rmuir commented on PR #12087:
URL: https://github.com/apache/lucene/pull/12087#issuecomment-1382954253

   intended as followups:
   * look into PointRangeQuery and implement necessary estimation for 
IndexOrDocValuesQuery to do the right thing
   * Add newSetQuery() to IntField/LongField/DoubleField/FloatField, that uses 
IndexOrDocValuesQuery(PointRangeQuery, ThisQuery)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #12028: Add newSetQuery for IntField, LongField, FloatField, DoubleField

2023-01-14 Thread GitBox


rmuir commented on issue #12028:
URL: https://github.com/apache/lucene/issues/12028#issuecomment-1382953573

   I don't think it is good to degrade to `BooleanQuery` when using points or 
doc-values, it will only hurt performance.
   
   Let's add `NumericDocValuesField.newSlowSetQuery()` and 
`SortedNumericDocValuesField.newSlowSetQuery()` to complement the doc-values 
based range queries?
   
   Query in fact already exist, but needs to be cleaned up since they have been 
"hiding" in `lucene/sandbox`.  See PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new pull request, #12087: Graduate DocValuesNumbersQuery from lucene/sandbox to newSlowSetQuery()

2023-01-14 Thread GitBox


rmuir opened a new pull request, #12087:
URL: https://github.com/apache/lucene/pull/12087

   Clean up this query a bit, and move it around to support:
   
   * NumericDocValuesField.newSlowSetQuery()
   * SortedNumericDocValuesField.newSlowSetQuery()
   
   This complements the existing docvalues-based range queries, with a set 
query.
   
   Later we can hook this into IntField/LongField/FloatField/DoubleField via 
IndexOrDocValuesQuery.
   
   In general cleanup was not a big deal, involves:
   * fix code to use e.g. DocValues.isCacheable rather than assuming docvalues 
can't be updated
   * implement optimized codepath for single-valued fields
   * in general, try to be consistent with SortedNumericDocValuesRangeQuery as 
much as possible
   
   Relates to #12028


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir merged pull request #12086: Upgrade to errorprone 2.18

2023-01-14 Thread GitBox


rmuir merged PR #12086:
URL: https://github.com/apache/lucene/pull/12086


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir closed issue #12057: ban finalizers in the build somehow (worst-case: use error-prone)

2023-01-14 Thread GitBox


rmuir closed issue #12057: ban finalizers in the build somehow (worst-case: use 
error-prone)
URL: https://github.com/apache/lucene/issues/12057


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new pull request, #12086: Upgrade to errorprone 2.18

2023-01-14 Thread GitBox


rmuir opened a new pull request, #12086:
URL: https://github.com/apache/lucene/pull/12086

   Went thru the new checks as usual. Now that `Finalize` has our bugfix, I 
enabled it.
   
   Closes #12057


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir merged pull request #12056: Update to error-prone 2.17

2023-01-14 Thread GitBox


rmuir merged PR #12056:
URL: https://github.com/apache/lucene/pull/12056


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir merged pull request #12038: remove non-NRT replication support

2023-01-14 Thread GitBox


rmuir merged PR #12038:
URL: https://github.com/apache/lucene/pull/12038


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir closed issue #11381: remove non-NRT replication support [LUCENE-10345]

2023-01-14 Thread GitBox


rmuir closed issue #11381: remove non-NRT replication support [LUCENE-10345]
URL: https://github.com/apache/lucene/issues/11381


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2023-01-14 Thread GitBox


benwtrent commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1382728572

   This for sure has to do with reading for the memory offsets and then reading 
the neighbors. 
   
   I can dig into this a little bit next week unless somebody else has a really 
good idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #12079: Speed up 1D BKD merging.

2023-01-14 Thread GitBox


jpountz commented on PR #12079:
URL: https://github.com/apache/lucene/pull/12079#issuecomment-1382690674

   The last data point at 
https://people.apache.org/~mikemccand/lucenebench/sparseResults.html#tot_merge_times
 has a drop for overall merging that I expect to be mostly contributed by this 
change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2023-01-14 Thread GitBox


jpountz commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1382689973

   For reference, there seems to be a 6-7% QPS drop on nightly benchmarks 
associated with this change. 
https://people.apache.org/~mikemccand/lucenebench/VectorSearch.html I think 
it's fine, just noting it in case someone wants to double check whether there's 
something obvious that can be improved, but overall the big gains in space 
efficiency are worth this small slowdown in my opinion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-13 Thread GitBox


zhaih commented on PR #12050:
URL: https://github.com/apache/lucene/pull/12050#issuecomment-1382268387

   +1, That sounds good!
   
   On Fri, Jan 13, 2023, 11:10 John Mazanec ***@***.***> wrote:
   
   > ***@***. commented on this pull request.
   > --
   >
   > In lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java
   > :
   >
   > > @@ -94,36 +93,83 @@ public int size() {
   >}
   >
   >/**
   > -   * Add node on the given level
   > +   * Add node on the given level. Nodes can be inserted out of order, but 
it requires that the nodes
   >
   > Oh I see what you mean. Yes, that makes sense.
   >
   > I think for level 0, we will still want to use a List because all nodes
   > will eventually be present in this level. However, for levels > 0, we can
   > use a TreeMap and then add an iterator over the keys of that map.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jmazanec15 commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-13 Thread GitBox


jmazanec15 commented on code in PR #12050:
URL: https://github.com/apache/lucene/pull/12050#discussion_r1069901702


##
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##
@@ -94,36 +93,83 @@ public int size() {
   }
 
   /**
-   * Add node on the given level
+   * Add node on the given level. Nodes can be inserted out of order, but it 
requires that the nodes

Review Comment:
   Oh I see what you mean. Yes, that makes sense. 
   
   I think for level 0, we will still want to use a List 
because all nodes will eventually be present in this level. However, for levels 
> 0, we can use a TreeMap and then add an iterator over the keys of that map.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] magibney opened a new pull request, #12085: update releaseWizard.py to support offline gpg key

2023-01-13 Thread GitBox


magibney opened a new pull request, #12085:
URL: https://github.com/apache/lucene/pull/12085

   porting analogous change from solr: https://github.com/apache/solr/pull/1288


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang opened a new pull request, #12084: Same bound with fallbackQuery

2023-01-13 Thread GitBox


LuXugang opened a new pull request, #12084:
URL: https://github.com/apache/lucene/pull/12084

   
   ## Description
   
   IndexSortSortedNumericDocValuesRangeQuery should have the same bound with 
fallbackQuery.
   
   According to the comment, my guess it is a typo thing?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #12083: MultiCollector shouldn't report that scores are needed when they're not.

2023-01-13 Thread GitBox


jpountz commented on PR #12083:
URL: https://github.com/apache/lucene/pull/12083#issuecomment-1381870866

   Thanks Luca!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz merged pull request #12083: MultiCollector shouldn't report that scores are needed when they're not.

2023-01-13 Thread GitBox


jpountz merged PR #12083:
URL: https://github.com/apache/lucene/pull/12083


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz opened a new pull request, #12083: MultiCollector shouldn't report that scores are needed when they're not.

2023-01-13 Thread GitBox


jpountz opened a new pull request, #12083:
URL: https://github.com/apache/lucene/pull/12083

   When sub collectors don't agree on their `ScoreMode`, `MultiCollector` 
currently returns `COMPLETE`. This makes sense when assuming that there is 
likely one collector computing top hits (`TOP_SCORES`) and another one 
computing facets (`COMPLETE_NO_SCORES`) so `COMPLETE` makes sense. However it 
is also possible to have one collector computing top hits by field (`TOP_DOCS`) 
and another one doing facets (`COMPLETE_NO_SCORES`), and `MultiCollector` 
shouldn't report that scores are needed in that case.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang merged pull request #12078: Enhance XXXField#newRangeQuery

2023-01-13 Thread GitBox


LuXugang merged PR #12078:
URL: https://github.com/apache/lucene/pull/12078


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang closed issue #12074: Enhance XXXField#newRangeQuery

2023-01-13 Thread GitBox


LuXugang closed issue #12074: Enhance XXXField#newRangeQuery
URL: https://github.com/apache/lucene/issues/12074


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek commented on pull request #11807: No need to rewrite queries in unified highlighter

2023-01-13 Thread GitBox


romseygeek commented on PR #11807:
URL: https://github.com/apache/lucene/pull/11807#issuecomment-1381532407

   Oops, yes, I should have backported it at the time.  Will do that now!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



  1   2   3   4   5   6   7   8   9   10   >