Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub


ChrisHegarty commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1842566686


##
lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java:
##
@@ -123,4 +123,11 @@ public abstract void search(
   public KnnVectorsReader getMergeInstance() {
 return this;
   }
+
+  /**
+   * Optional: reset or close merge resources used in the reader
+   *
+   * The default implementation is empty
+   */
+  public void finishMerge() throws IOException {}

Review Comment:
   How "sneaky" would it be to just update the read advice and return the same 
instance. Then restore the random read advice in finishMerge.   The reason I 
suggest this is that so far all options and discussions lead to sharing the 
same underlying memory, so regardless on the actual instance here, then the 
advice is going to change on the memory.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub


ChrisHegarty commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1842566686


##
lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java:
##
@@ -123,4 +123,11 @@ public abstract void search(
   public KnnVectorsReader getMergeInstance() {
 return this;
   }
+
+  /**
+   * Optional: reset or close merge resources used in the reader
+   *
+   * The default implementation is empty
+   */
+  public void finishMerge() throws IOException {}

Review Comment:
   How "sneaky" would it be to just update the read advice and return the same 
instance. Then restore the random read advice on finishMerge. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub


shatejas commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1842587936


##
lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java:
##
@@ -123,4 +123,11 @@ public abstract void search(
   public KnnVectorsReader getMergeInstance() {
 return this;
   }
+
+  /**
+   * Optional: reset or close merge resources used in the reader
+   *
+   * The default implementation is empty
+   */
+  public void finishMerge() throws IOException {}

Review Comment:
   > How "sneaky" would it be to just update the read advice and return the 
same instance. Then restore the random read advice in finishMerge. The reason I 
suggest this is that so far all options and discussions lead to sharing the 
same underlying memory, so regardless on the actual instance here, then the 
advice is going to change on the memory.
   
   Makes sense to me, and if we switch to `close` we can switch to the approach 
mentioned by @jpountz in 
https://github.com/apache/lucene/pull/13985#discussion_r1835532510



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub


shatejas commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1842587936


##
lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java:
##
@@ -123,4 +123,11 @@ public abstract void search(
   public KnnVectorsReader getMergeInstance() {
 return this;
   }
+
+  /**
+   * Optional: reset or close merge resources used in the reader
+   *
+   * The default implementation is empty
+   */
+  public void finishMerge() throws IOException {}

Review Comment:
   > How "sneaky" would it be to just update the read advice and return the 
same instance. Then restore the random read advice in finishMerge. The reason I 
suggest this is that so far all options and discussions lead to sharing the 
same underlying memory, so regardless on the actual instance here, then the 
advice is going to change on the memory.
   
   Makes sense to me, and if we switch to `close` we can switch to the approach 
mentioned by @jpountz in 
https://github.com/apache/lucene/pull/13985#discussion_r1835532510. I will make 
the change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub


ChrisHegarty commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1842566686


##
lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java:
##
@@ -123,4 +123,11 @@ public abstract void search(
   public KnnVectorsReader getMergeInstance() {
 return this;
   }
+
+  /**
+   * Optional: reset or close merge resources used in the reader
+   *
+   * The default implementation is empty
+   */
+  public void finishMerge() throws IOException {}

Review Comment:
   How "sneaky" would it be to just update the read advice and return the same 
instance. Then restore the random read advice on finishMerge.   The reasonI 
suggest this is that so far all options and discussions lead to sharing the 
same underlying memory, so regardless on the actual instance type here, then 
the advice is going to change on the memory.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub


ChrisHegarty commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1842566686


##
lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java:
##
@@ -123,4 +123,11 @@ public abstract void search(
   public KnnVectorsReader getMergeInstance() {
 return this;
   }
+
+  /**
+   * Optional: reset or close merge resources used in the reader
+   *
+   * The default implementation is empty
+   */
+  public void finishMerge() throws IOException {}

Review Comment:
   How "sneaky" would it be to just update the read advice and return the same 
instance. Then restore the random read advice in finishMerge.   The reasonI 
suggest this is that so far all options and discussions lead to sharing the 
same underlying memory, so regardless on the actual instance type here, then 
the advice is going to change on the memory.
   



##
lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java:
##
@@ -123,4 +123,11 @@ public abstract void search(
   public KnnVectorsReader getMergeInstance() {
 return this;
   }
+
+  /**
+   * Optional: reset or close merge resources used in the reader
+   *
+   * The default implementation is empty
+   */
+  public void finishMerge() throws IOException {}

Review Comment:
   How "sneaky" would it be to just update the read advice and return the same 
instance. Then restore the random read advice in finishMerge.   The reason I 
suggest this is that so far all options and discussions lead to sharing the 
same underlying memory, so regardless on the actual instance type here, then 
the advice is going to change on the memory.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub


shatejas commented on PR #13985:
URL: https://github.com/apache/lucene/pull/13985#issuecomment-2476919288

   > Generally, I think that the direction in this PR is good. I wanna help get 
it moved forward. I'll do some local testing and perf runs to verify the 
impact. I can also commit some tests and improvements directly to the PR 
branch. @shatejas ?
   
   Feel free to commit changes, added you as a collaborator to the 
shatejas/lucene repo. Do let me know if you need any other access
   
   I will be working on the benchmarks within next few days. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub


ChrisHegarty commented on PR #13985:
URL: https://github.com/apache/lucene/pull/13985#issuecomment-2476880937

   Generally, I think that the direction in this PR is good. I wanna help get 
it moved forward. I'll do some local testing and perf runs to verify the 
impact. I can also commit some tests and improvements directly to the PR 
branch. @shatejas ?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub


ChrisHegarty commented on PR #13985:
URL: https://github.com/apache/lucene/pull/13985#issuecomment-2476285629

   Thanks @shatejas 
   
   For clarity, the bottleneck that is being fixed here is with the reading of 
all vector data from the to-be-merged segments, when copying that data to the 
new segment - all non-deleted vectors are accessed sequential, just once, as 
they are copied.  We can then switch back to random access when building the 
graph.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub


ChrisHegarty commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1842167895


##
lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java:
##
@@ -123,4 +123,11 @@ public abstract void search(
   public KnnVectorsReader getMergeInstance() {
 return this;
   }
+
+  /**
+   * Optional: reset or close merge resources used in the reader
+   *
+   * The default implementation is empty
+   */
+  public void finishMerge() throws IOException {}

Review Comment:
   > I wonder if we should reuse the close() method of merge instances for 
this, what do you think ... ?
   
   It might be possible to use `close` here, if we override it to restore the 
previous read advice, and ensure that the merge calls close.  Additionally, 
either avoid closing the underlying index input or slice the input upon 
creation of the new instance. (closing the slice will not close the underlying 
input). 
   
   I see that this is all per-thread, so this may be fine. But just to say it 
explicitly here, changing the read advice, whether it be on a slice or not, 
will affect all instances that share the underlying index.  This should be fine 
- since the reader is just being used for merging at this point. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-13 Thread via GitHub


shatejas commented on PR #13985:
URL: https://github.com/apache/lucene/pull/13985#issuecomment-2474147590

   > I'm curious how much this actually helps, and I know that you said that 
benchmark results would be posted.
   
   @ChrisHegarty Preliminary results showed approximately 40mins (~13%) 
reduction in total force merge time for 10m dataset. You should be able to find 
details here 
https://github.com/opensearch-project/k-NN/issues/2134#issuecomment-2420793541. 
   Though the setup does not use LuceneHNSW it does use 
Lucene99FlatVectorsReader.
   
   > What is unclear is how much the sequential advise here helps over 
something like a load (similar to MemorySegment::load), that touches every page.
   
   I did try preload which does `MemorySegment::load` and there was an 
improvement in force merge time. The advantage I see with updateAdvice is being 
able to delay the loading of vectors till needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-12 Thread via GitHub


uschindler commented on PR #13985:
URL: https://github.com/apache/lucene/pull/13985#issuecomment-2471068437

   Hi,
   I am currently on travel, so I can't review this. Will look into it posisbly 
later this week. Greetings from Costa Rica!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-12 Thread via GitHub


ChrisHegarty commented on PR #13985:
URL: https://github.com/apache/lucene/pull/13985#issuecomment-2470311965

   @shatejas I'm curious how much this actually helps, and I know that you said 
that benchmark results would be posted.
   
   I do like that we can update the ReadAdvice on an index input 👍  What is 
unclear is how much the sequential advise here helps over something like a 
`load` (similar to `MemorySegment::load`), that touches every page. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-11 Thread via GitHub


shatejas commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1836913408


##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java:
##
@@ -113,6 +114,25 @@ public Lucene99HnswVectorsReader(SegmentReadState state, 
FlatVectorsReader flatV
 }
   }
 
+  private Lucene99HnswVectorsReader(
+  Lucene99HnswVectorsReader reader, KnnVectorsReader flatVectorsReader) {
+assert flatVectorsReader instanceof FlatVectorsReader;

Review Comment:
   > maybe we don't even need a cast if we make getMergeInstance() return a 
FlatVectorsReader
   
   Actually figured out a way to do this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-10 Thread via GitHub


shatejas commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1836001654


##
lucene/test-framework/src/java/org/apache/lucene/tests/store/BaseDirectoryTestCase.java:
##
@@ -1554,6 +1555,42 @@ public void testPrefetchOnSlice() throws IOException {
 doTestPrefetch(TestUtil.nextInt(random(), 1, 1024));
   }
 
+  public void testUpdateReadAdvice() throws IOException {
+try (Directory dir = getDirectory(createTempDir("testUpdateReadAdvice"))) {
+  final int totalLength = TestUtil.nextInt(random(), 16384, 65536);
+  byte[] arr = new byte[totalLength];
+  random().nextBytes(arr);
+  try (IndexOutput out = dir.createOutput("temp.bin", IOContext.DEFAULT)) {
+out.writeBytes(arr, arr.length);
+  }
+
+  try (IndexInput orig = dir.openInput("temp.bin", IOContext.DEFAULT)) {
+IndexInput in = random().nextBoolean() ? orig.clone() : orig;
+// Read advice updated at start
+orig.updateReadAdvice(randomReadAdvice());
+for (int i = 0; i < totalLength; i++) {
+  int offset = TestUtil.nextInt(random(), 0, (int) in.length() - 1);
+  in.seek(offset);
+  assertEquals(arr[offset], in.readByte());
+}
+
+// Updating readAdvice in the middle
+for (int i = 0; i < 10_000; ++i) {
+  int offset = TestUtil.nextInt(random(), 0, (int) in.length() - 1);
+  in.seek(offset);
+  assertEquals(arr[offset], in.readByte());
+  if (random().nextBoolean()) {
+orig.updateReadAdvice(randomReadAdvice());
+  }
+}
+  }
+}
+  }
+
+  private ReadAdvice randomReadAdvice() {
+return ReadAdvice.values()[TestUtil.nextInt(random(), 0, 
ReadAdvice.values().length - 1)];

Review Comment:
   Will switch



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-10 Thread via GitHub


shatejas commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1836000686


##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java:
##
@@ -327,4 +336,61 @@ static FieldEntry create(IndexInput input, FieldInfo info) 
throws IOException {
   info);
 }
   }
+
+  private static final class MergeLucene99FlatVectorsReader extends 
FlatVectorsReader {
+
+private final Lucene99FlatVectorsReader delegate;
+
+MergeLucene99FlatVectorsReader(final Lucene99FlatVectorsReader 
flatVectorsReader)

Review Comment:
   This seemed a bit more clean, Open to the approach if there is a strong 
preference



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-10 Thread via GitHub


shatejas commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1835999858


##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java:
##
@@ -113,6 +114,25 @@ public Lucene99HnswVectorsReader(SegmentReadState state, 
FlatVectorsReader flatV
 }
   }
 
+  private Lucene99HnswVectorsReader(
+  Lucene99HnswVectorsReader reader, KnnVectorsReader flatVectorsReader) {
+assert flatVectorsReader instanceof FlatVectorsReader;

Review Comment:
   > I would rather make this ctor take a FlatVectorsReader and push the 
responsibility to callers to make the cast
   
   This makes sense to me, I will change it
   
   > maybe we don't even need a cast if we make getMergeInstance() return a 
FlatVectorsReader
   
   This seems difficult as of now, 
[MergeState](https://github.com/shatejas/lucene/blob/12ca4779b962c96367f3e6a8b06523837e5e6434/lucene/core/src/java/org/apache/lucene/index/MergeState.java#L157)
 expects KNNVectorsReader and FlatVectorsReader is wrapped inside of 
KNNVectorsReader



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-10 Thread via GitHub


shatejas commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1835999756


##
lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java:
##
@@ -123,4 +123,11 @@ public abstract void search(
   public KnnVectorsReader getMergeInstance() {
 return this;
   }
+
+  /**
+   * Optional: reset or close merge resources used in the reader
+   *
+   * The default implementation is empty
+   */
+  public void finishMerge() throws IOException {}

Review Comment:
   > I wonder if we should reuse the close() method of merge instances for 
this, what do you think @uschindler ?
   
   I went with that solution at first, `close` is called way to late, to be 
able to benefit the search requests going on as per my understanding. I was 
looking for something to switch back to random access earlier and thats how I 
landed with this approach. Happy to move it to `close()` if there is a 
preference
   
   > Separately, it would be nice to improve the asserting framework 
(AssertingKnnVectorsReader in this case) to validate that finishMerge() is 
always called on merge instances.
   
   Wasn't aware of AssertingKnnVectorsReader. I can take a look



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-10 Thread via GitHub


shatejas commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1835999756


##
lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java:
##
@@ -123,4 +123,11 @@ public abstract void search(
   public KnnVectorsReader getMergeInstance() {
 return this;
   }
+
+  /**
+   * Optional: reset or close merge resources used in the reader
+   *
+   * The default implementation is empty
+   */
+  public void finishMerge() throws IOException {}

Review Comment:
   > I wonder if we should reuse the close() method of merge instances for 
this, what do you think @uschindler ?
   
   I went with that solution at first, `close` is called way to late, to be 
able to benefit the search requests going on as per my understanding. I was 
looking for something to switch back to random access earlier and thats how I 
landed with this approach. Happy to move it to `close()` if there is a 
preference
   
   >> Separately, it would be nice to improve the asserting framework 
(AssertingKnnVectorsReader in this case) to validate that finishMerge() is 
always called on merge instances.
   
   Wasn't aware of AssertingKnnVectorsReader. I can take a look



##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java:
##
@@ -113,6 +114,25 @@ public Lucene99HnswVectorsReader(SegmentReadState state, 
FlatVectorsReader flatV
 }
   }
 
+  private Lucene99HnswVectorsReader(
+  Lucene99HnswVectorsReader reader, KnnVectorsReader flatVectorsReader) {
+assert flatVectorsReader instanceof FlatVectorsReader;

Review Comment:
   >> I would rather make this ctor take a FlatVectorsReader and push the 
responsibility to callers to make the cast
   
   This makes sense to me, I will change it
   
   >> maybe we don't even need a cast if we make getMergeInstance() return a 
FlatVectorsReader
   
   This seems difficult as of now, 
[MergeState](https://github.com/shatejas/lucene/blob/12ca4779b962c96367f3e6a8b06523837e5e6434/lucene/core/src/java/org/apache/lucene/index/MergeState.java#L157)
 expects KNNVectorsReader and FlatVectorsReader is wrapped inside of 
KNNVectorsReader



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-09 Thread via GitHub


jpountz commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1835530675


##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java:
##
@@ -170,6 +170,15 @@ public void checkIntegrity() throws IOException {
 CodecUtil.checksumEntireFile(vectorData);
   }
 
+  @Override
+  public FlatVectorsReader getMergeInstance() {
+try {
+  return new MergeLucene99FlatVectorsReader(this);
+} catch (IOException exception) {
+  throw new RuntimeException(exception);

Review Comment:
   ```suggestion
 throw new UncheckedIOException(exception);
   ```



##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java:
##
@@ -113,6 +114,25 @@ public Lucene99HnswVectorsReader(SegmentReadState state, 
FlatVectorsReader flatV
 }
   }
 
+  private Lucene99HnswVectorsReader(
+  Lucene99HnswVectorsReader reader, KnnVectorsReader flatVectorsReader) {
+assert flatVectorsReader instanceof FlatVectorsReader;

Review Comment:
   I would rather make this ctor take a `FlatVectorsReader` and push the 
responsibility to callers to make the cast (or maybe we don't even need a cast 
if we make getMergeInstance() return a `FlatVectorsReader`?)



##
lucene/test-framework/src/java/org/apache/lucene/tests/store/BaseDirectoryTestCase.java:
##
@@ -1554,6 +1555,42 @@ public void testPrefetchOnSlice() throws IOException {
 doTestPrefetch(TestUtil.nextInt(random(), 1, 1024));
   }
 
+  public void testUpdateReadAdvice() throws IOException {
+try (Directory dir = getDirectory(createTempDir("testUpdateReadAdvice"))) {
+  final int totalLength = TestUtil.nextInt(random(), 16384, 65536);
+  byte[] arr = new byte[totalLength];
+  random().nextBytes(arr);
+  try (IndexOutput out = dir.createOutput("temp.bin", IOContext.DEFAULT)) {
+out.writeBytes(arr, arr.length);
+  }
+
+  try (IndexInput orig = dir.openInput("temp.bin", IOContext.DEFAULT)) {
+IndexInput in = random().nextBoolean() ? orig.clone() : orig;
+// Read advice updated at start
+orig.updateReadAdvice(randomReadAdvice());
+for (int i = 0; i < totalLength; i++) {
+  int offset = TestUtil.nextInt(random(), 0, (int) in.length() - 1);
+  in.seek(offset);
+  assertEquals(arr[offset], in.readByte());
+}
+
+// Updating readAdvice in the middle
+for (int i = 0; i < 10_000; ++i) {
+  int offset = TestUtil.nextInt(random(), 0, (int) in.length() - 1);
+  in.seek(offset);
+  assertEquals(arr[offset], in.readByte());
+  if (random().nextBoolean()) {
+orig.updateReadAdvice(randomReadAdvice());
+  }
+}
+  }
+}
+  }
+
+  private ReadAdvice randomReadAdvice() {
+return ReadAdvice.values()[TestUtil.nextInt(random(), 0, 
ReadAdvice.values().length - 1)];

Review Comment:
   You could use `RandomPicks#randomFrom`.



##
lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java:
##
@@ -123,4 +123,11 @@ public abstract void search(
   public KnnVectorsReader getMergeInstance() {
 return this;
   }
+
+  /**
+   * Optional: reset or close merge resources used in the reader
+   *
+   * The default implementation is empty
+   */
+  public void finishMerge() throws IOException {}

Review Comment:
   Separately, it would be nice to improve the asserting framework 
(`AssertingKnnVectorsReader` in this case) to validate that `finishMerge()` is 
always called on merge instances.



##
lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java:
##
@@ -123,4 +123,11 @@ public abstract void search(
   public KnnVectorsReader getMergeInstance() {
 return this;
   }
+
+  /**
+   * Optional: reset or close merge resources used in the reader
+   *
+   * The default implementation is empty
+   */
+  public void finishMerge() throws IOException {}

Review Comment:
   I wonder if we should reuse the close() method of merge instances for this, 
what do you think @uschindler ?



##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java:
##
@@ -327,4 +336,61 @@ static FieldEntry create(IndexInput input, FieldInfo info) 
throws IOException {
   info);
 }
   }
+
+  private static final class MergeLucene99FlatVectorsReader extends 
FlatVectorsReader {
+
+private final Lucene99FlatVectorsReader delegate;
+
+MergeLucene99FlatVectorsReader(final Lucene99FlatVectorsReader 
flatVectorsReader)

Review Comment:
   Nit: I'm not sure I would create a wrapper, maybe we should just add a copy 
constructor and have a `boolean merging` instance like we do e.g. on stored 
fields (see `Lucene90CompressingStoredFieldsReader`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above 

Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-09 Thread via GitHub


shatejas commented on PR #13985:
URL: https://github.com/apache/lucene/pull/13985#issuecomment-2466151296

   cc: @uschindler @jpountz @navneet1v 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-09 Thread via GitHub


shatejas opened a new pull request, #13985:
URL: https://github.com/apache/lucene/pull/13985

   reading from IndexInput
   
   The change is needed to be able to reduce the force merge time. 
Lucene99FlatVectorsReader is opened with IOContext.RANDOM, this optimizes 
searches with madvise as RANDOM. For merges we need sequential access and 
ability to preload pages to be able to shorten the merge time.
   
   The change updates the ReadAdvice.SEQUENTIAL before the merge starts and 
reverts it to ReadAdvice.RANDOM at the end of the merge for
   Lucene99FlatVectorsReader.
   
   ### Description
   
   Benchmarking results coming up. Opening as a draft to get initial feedback
   
   ### Related issues
   https://github.com/apache/lucene/issues/13920
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org