[GitHub] [lucene] vigyasharma closed issue #12097: TestIndexSortSortedNumericDocValuesRangeQuery.testCountBoundary failure

2023-01-19 Thread GitBox


vigyasharma closed issue #12097: 
TestIndexSortSortedNumericDocValuesRangeQuery.testCountBoundary failure
URL: https://github.com/apache/lucene/issues/12097


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma merged pull request #12098: Fix failure in TestIndexSortSortedNumericDocValuesRangeQuery

2023-01-19 Thread GitBox


vigyasharma merged PR #12098:
URL: https://github.com/apache/lucene/pull/12098


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12098: Fix failure in TestIndexSortSortedNumericDocValuesRangeQuery

2023-01-19 Thread GitBox


rmuir commented on PR #12098:
URL: https://github.com/apache/lucene/pull/12098#issuecomment-1397749519

   if a test wants to enforce it only has one segment, it should 
`forceMerge()`, make use of `LuceneTestCase.getOnlyLeafReader()`, etc. 
   
   Otherwise the number of segments can vary based on flushing/merging, 
especially when using `LuceneTestCase.newIndexWriterConfig()`, 
`RandomIndexWriter`, etc. so in general, we want to avoid assertions that rely 
upon certain segment structure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jmazanec15 commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-19 Thread GitBox


jmazanec15 commented on code in PR #12050:
URL: https://github.com/apache/lucene/pull/12050#discussion_r1081896861


##
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##
@@ -94,36 +93,83 @@ public int size() {
   }
 
   /**
-   * Add node on the given level
+   * Add node on the given level. Nodes can be inserted out of order, but it 
requires that the nodes

Review Comment:
   Updated structure to use treemap to represent upper levels of graph.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jmazanec15 commented on pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-19 Thread GitBox


jmazanec15 commented on PR #12050:
URL: https://github.com/apache/lucene/pull/12050#issuecomment-1397643952

   Per [this 
discussion](https://github.com/apache/lucene/pull/12050#discussion_r1061034056),
 I refactored OnHeapHnswGraph to use a TreeMap to represent the graph structure 
for levels greater than 0. I ran performance tests with the same setup as 
https://github.com/apache/lucene/issues/11354#issuecomment-1239961308, and the 
results did not show a significant difference in indexing time between my 
previous implementation, the implementation using the map, and the current 
implementation with no merge optimization. Additionally, the results did not 
show a difference in merge time between by previous implementation and the 
implementation using the map.
   
   Here are the results:
   
   ###  Segment Size 10K
   
   
   Exper. | Total indexing time (s) | Total time to merge numeric vectors (ms) 
| Recall
   -- | -- | -- | --
   Control-1 | 189s | 697280 | 0.979
   Control-2 | 190s | 722042 | 0.979
   Control-3 | 191s | 713402 | 0.979
   Test-array 1 | 190s | 683966 | 0.98
   Test-array 2 | 187s | 683584 | 0.98
   Test-array 3 | 190s | 702458 | 0.98
   Test-map 1 | 189s | 723582 | 0.98
   Test-map 2 | 187s | 658196 | 0.98
   Test-map 3 | 190s | 66 | 0.98
   
   ###  Segment Size 100K
   
   Exper. | Total indexing time (s) | Total time to merge numeric vectors (ms) 
| Recall
   -- | -- | -- | --
   Control-1 | 366s | 675361 | 0.981
   Control-2 | 370s | 695974 | 0.981
   Control-3 | 367s | 684418 | 0.981
   Test-array 1 | 368s | 651814 | 0.981
   Test-array 2 | 368s | 654862 | 0.981
   Test-array 3 | 368s | 656062 | 0.981
   Test-map 1  | 364s | 637257 | 0.981
   Test-map 2  | 370s | 628755 | 0.981
   Test-map 3 | 366s | 647569 | 0.981
   
   ###  Segment Size 500K
   
   Exper. | Total indexing time (s) | Total time to merge numeric vectors (ms) 
| Recall
   -- | -- | -- | --
   Control-1 | 633s | 655538 | 0.98
   Control-2 | 631s | 664622 | 0.98
   Control-3 | 627s | 635919 | 0.98
   Test-array 1 | 639s | 376139 | 0.98
   Test-array 2 | 636s | 378071 | 0.98
   Test-array 3 | 638s | 352633 | 0.98
   Test-map 1  | 645s | 373572 | 0.98
   Test-map 2  | 635s | 374309 | 0.98
   Test-map 3 | 633s | 381212 | 0.98
   
   Given that the results do not show a significant difference, I switched to 
use the treemap to avoid multiple large array copies.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma opened a new pull request, #12098: Fix failure in TestIndexSortSortedNumericDocValuesRangeQuery

2023-01-19 Thread GitBox


vigyasharma opened a new pull request, #12098:
URL: https://github.com/apache/lucene/pull/12098

   Fixes bug in `TestIndexSortSortedNumericDocValuesRangeQuery. 
testCountBoundary`. 
   Addresses #12097 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma commented on issue #12097: TestIndexSortSortedNumericDocValuesRangeQuery.testCountBoundary failure

2023-01-19 Thread GitBox


vigyasharma commented on issue #12097:
URL: https://github.com/apache/lucene/issues/12097#issuecomment-1397542712

   Wait.. I think the assert should simply be on total no. of documents, not 
documents per leaf. Something like:
   ```java
   int count = 0;
   for (LeafReaderContext context : searcher.getLeafContexts()) {
 count += weight.count(context);
   }
   assertEquals(2, count);
   ```
   This passes on my workspace. I'll raise a PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma opened a new issue, #12097: TestIndexSortSortedNumericDocValuesRangeQuery.testCountBoundary failure

2023-01-19 Thread GitBox


vigyasharma opened a new issue, #12097:
URL: https://github.com/apache/lucene/issues/12097

   ### Description
   
   Found this test failing in Lucene-Check-9.x - Build # 4239. 
   
   **Steps to repro:**
   ```ruby
   gradlew test --tests 
TestIndexSortSortedNumericDocValuesRangeQuery.testCountBoundary 
-Dtests.seed=A11A06AE642497F1 -Dtests.multiplier=2 -Dtests.locale=en-KE 
-Dtests.timezone=Singapore -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   ```
   
   I've been able to repro this failure in both `branch_9x` as well as `main`. 
This is what I found with a debugger - 
   
   It seems that in this test seed, the missing document [(created 
here)](https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/search/TestIndexSortSortedNumericDocValuesRangeQuery.java#L589),
 goes into a separate leaf. Then for that leaf context, [the 
IteratorAndCount](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSortSortedNumericDocValuesRangeQuery.java#L200)
 has `count = -1`, taking the flow to the fallback query. 
   And since the document does not have a points value, the fallback query gets 
`PointValues == null`, leading it to return a 0 
[here](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java#L381).
   
   I'm not very familiar with this part of Lucene, but I guess it is likely for 
the missing document to land in a leaf of its own. Would it help if the missing 
doc had a points value but for a different field?
   
   ### Gradle command to reproduce
   
   ```ruby
   gradlew test --tests 
TestIndexSortSortedNumericDocValuesRangeQuery.testCountBoundary 
-Dtests.seed=A11A06AE642497F1 -Dtests.multiplier=2 -Dtests.locale=en-KE 
-Dtests.timezone=Singapore -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12094: releaseWizard: allow explicitly setting MANIFEST.MF userid (e.g., to apache id)

2023-01-19 Thread GitBox


uschindler commented on PR #12094:
URL: https://github.com/apache/lucene/pull/12094#issuecomment-1397386789

   I am fine with both PRs, both technically correct. I don't care about 
username. If I would do a relaese I would insert "policeman" into the artifacts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] magibney commented on a diff in pull request #12094: releaseWizard: allow explicitly setting MANIFEST.MF userid (e.g., to apache id)

2023-01-19 Thread GitBox


magibney commented on code in PR #12094:
URL: https://github.com/apache/lucene/pull/12094#discussion_r1081614175


##
gradle/java/jar-manifest.gradle:
##
@@ -46,7 +46,9 @@ subprojects {
 if (snapshotBuild) {
   return "${project.version} ${gitRev} [snapshot build, details 
omitted]"
 } else {
-  return "${project.version} ${gitRev} - 
${System.properties['user.name']} - ${buildDate} ${buildTime}"
+  def sysProps = System.properties

Review Comment:
   I've adjusted this PR accordingly, but following on Robert's feedback also 
opened a PR that simply removes the username from MANIFEST.MF. Assuming we want 
to go that direction, we can just close this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] magibney opened a new pull request, #12096: remove username from MANIFEST.MF in build artifacts

2023-01-19 Thread GitBox


magibney opened a new pull request, #12096:
URL: https://github.com/apache/lucene/pull/12096

   Following on discussion from #12094


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] magibney commented on pull request #12094: releaseWizard: allow explicitly setting MANIFEST.MF userid (e.g., to apache id)

2023-01-19 Thread GitBox


magibney commented on PR #12094:
URL: https://github.com/apache/lucene/pull/12094#issuecomment-1397358626

   > Should we just remove the username from the manifest? This doesn't make 
sense to me, we don't put usernames anywhere else (e.g. no @author at apache)...
   
   This seems fine to me. The RM is associated with the release via the GPG 
signature, which really is the only meaningful association with the release. 
That would substantially change the nature of this PR though. If we go that 
route I'll close this and open a new PR for removing the username from 
manifest? Is any further discussion necessary to go ahead with removing the 
username from the manifest?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] magibney commented on pull request #12095: buildAndPushRelease should optionally pause before assembleRelease

2023-01-19 Thread GitBox


magibney commented on PR #12095:
URL: https://github.com/apache/lucene/pull/12095#issuecomment-1397344248

   The main reason I didn't make this the default is because I'm not sure 
whether running this through the releaseWizard would support user input. I'm 
using the releaseWizard to guide me through the steps but running them all 
manually, so user input is definitely supported. TBH I'm not sure -- maybe 
releaseWizard _would_ support user input to subcommands?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on pull request #12085: update releaseWizard.py to support offline gpg key

2023-01-19 Thread GitBox


javanna commented on PR #12085:
URL: https://github.com/apache/lucene/pull/12085#issuecomment-1396993570

   Thanks @magibney !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna merged pull request #12085: update releaseWizard.py to support offline gpg key

2023-01-19 Thread GitBox


javanna merged PR #12085:
URL: https://github.com/apache/lucene/pull/12085


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12094: releaseWizard: allow explicitly setting MANIFEST.MF userid (e.g., to apache id)

2023-01-19 Thread GitBox


rmuir commented on PR #12094:
URL: https://github.com/apache/lucene/pull/12094#issuecomment-1396942514

   I have also witnessed harassment from solr users towards the person whose 
name happens to be in there. Please, lets remove the username.
   
   If I am ignored and this option is kept, I will use the option to populate 
something extremely offensive into the field rather than my real username, and 
create a release candidate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12094: releaseWizard: allow explicitly setting MANIFEST.MF userid (e.g., to apache id)

2023-01-19 Thread GitBox


rmuir commented on PR #12094:
URL: https://github.com/apache/lucene/pull/12094#issuecomment-1396882440

   Should we just remove the username from the manifest? This doesn't make 
sense to me, we don't put usernames anywhere else (e.g. no `@author` at 
apache)...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek commented on pull request #12095: buildAndPushRelease should optionally pause before assembleRelease

2023-01-19 Thread GitBox


romseygeek commented on PR #12095:
URL: https://github.com/apache/lucene/pull/12095#issuecomment-1396781927

   +1, this has caught me multiple times!  I think I'd personally make it the 
default but I don't know if others have things set up so that they don't need 
to type in their GPG pin.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org