Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17773 )

Change subject: KUDU-1260: Fix prefetching bug on Java scanner
......................................................................


Patch Set 11:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/17773/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17773/11//COMMIT_MSG@10
PS11, Line 10: The writing thread records the timestamp of
             : its write, and the scanner thread creates two scanners (w and 
w/o prefetching),
             : by comparing the scan result of the two scanners, we can verify 
the
             : prefetching result.
It's great that this test does a lot of things, but in order to drill into the 
specific behavior of scan prefetching, do we actually have to do all of these 
things? Wouldn't it be simpler and more easily debuggable to just have scanners 
with prefetching, while still maintaining test coverage for this feature?


http://gerrit.cloudera.org:8080/#/c/17773/11//COMMIT_MSG@24
PS11, Line 24: This error may cause the spark task to fail sometimes.
nit: I haven't gone through the entire patch yet, but is this addressed in this 
patch? Could you clarify whether this is existing behavior or even with this 
patch spark tasks fail?


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java:

http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@292
PS11, Line 292:
nit: we typically indent at four spaces for line wraps


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@299
PS11, Line 299:    * This error may cause the spark task to fail sometimes.
nit: this detail doesn't seem important -- it seems like this would affect all 
Java applications, not just Spark.


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@449
PS11, Line 449:     return this.hasMore || cachedPrefetcherDeferred.get() != 
null;
nit: now it's a bit confusing to have both the concept of "having more" and 
"having more rows". Perhaps we can rename 'hasMore' to something more specific 
so it's clearer what it's referring to? Perhaps canRequestMore or something?


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@654
PS11, Line 654:
nit: sam here with regards to spacing


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@671
PS11, Line 671:         Deferred<RowResultIterator> prefetcherDeferred = 
client.scanNextRows(AsyncKuduScanner.this)
              :             .addCallbacks(gotNextRow, nextRowErrback());
              :         if (!cachedPrefetcherDeferred.compareAndSet(null, 
prefetcherDeferred)) {
              :           LOG.info("Skip one prefetching because two 
consecutive prefetching scan occurs");
              :         }
> If cachedPrepetcherDeffered's value is null, which means no prefetching, th
Rephrasing Alexey's question, once we call client.scanNextRows(), does it start 
immediately sending a scan request? In which case, if we drop that request's 
deferred, won't we be missing some rows?


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1212
PS11, Line 1212: if (!(prefetching && closed)) {
nit: before entering the switch, would it make sense to check canBeIgnored() 
instead? That way we ensure the ignoring is consistent since it all uses the 
same canBeIgnored() method


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1252
PS11, Line 1252:       if (error != null) {
               :         if (canBeIgnored(resp.getError().getCode())) {
               :           LOG.info("Ignore false alert of scanner not found 
for scan request");
               :           error = null;
               :         }
               :       }
Would it make sense to put this up by the switch statement so we don't have to 
have additional handling at L1210?


http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala:

PS11:
nit: please move the Spark portion of this into a separate changelist rather 
than lumping everything into one big patch



--
To view, visit http://gerrit.cloudera.org:8080/17773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I853a041d86c75ec196d7d4ff45af4673c5c5f5cd
Gerrit-Change-Number: 17773
Gerrit-PatchSet: 11
Gerrit-Owner: Hongjiang Zhang <hongjizh...@ebay.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <granthe...@apache.org>
Gerrit-Reviewer: Hongjiang Zhang <hongjizh...@ebay.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 29 Sep 2021 06:26:05 +0000
Gerrit-HasComments: Yes

Reply via email to