Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17773 )

Change subject: KUDU-1260: Fix prefetching bug on Java scanner
......................................................................


Patch Set 7:

(2 comments)

Thank you very much for iterating on this patch!

Just curious -- is it possible to simplify things a bit, switching to trivial 
case of queue of size 1 and using blocking put() method instead of add() when 
adding next pre-fetch deferred into the queue?

http://gerrit.cloudera.org:8080/#/c/17773/4/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java:

http://gerrit.cloudera.org:8080/#/c/17773/4/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@290
PS4, Line 290:    * Every time the client calls nextRows(), a new RPC together 
with a prefetching RPC
             :    * are sent to the server side, so the queue will not have 
more than 2 items.
> Hmm, there is no explicitly code to guarantee "the callback for the added D
I'm just trying to understand how we could make this code bullet-proof in the 
sense of not trying to put too many pre-fetch RPCs into the queue, so we are 
sure the code doesn't unexpectedly throw an exception when pre-fetching RPCs 
return fast.

What if we make the queue to be just size 1 and call 
cachedPrefetcherDeferred.put(prefetcherDeferred) instead of 
cachedPrefetcherDeferred.add(prefetcherDeferred), basically switching to 
blocking inserts into the queue?  Could it work as expected?


http://gerrit.cloudera.org:8080/#/c/17773/4/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@642
PS4, Line 642:       Deferred<RowResultIterator> prefetcherDeferred = 
cachedPrefetcherDeferred.poll();
             :       prefetcherDeferred.chain(new 
Deferred<RowResultIterator>().addCallback(prefetch));
> It is risky to chain prefetch for the Deferred still in the queue, which ma
Yep, I think ideally we don't want to schedule many prefetches to run in 
parallel.  If we can get to some reliable way to have no more than one active 
prefetch at any time, that would be great.

What if the queue were of size 1 and instead of add() the code would call 
put()?  Could it work?



--
To view, visit http://gerrit.cloudera.org:8080/17773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I853a041d86c75ec196d7d4ff45af4673c5c5f5cd
Gerrit-Change-Number: 17773
Gerrit-PatchSet: 7
Gerrit-Owner: Hongjiang Zhang <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Hongjiang Zhang <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 24 Aug 2021 05:39:16 +0000
Gerrit-HasComments: Yes

Reply via email to