Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17773 )
Change subject: KUDU-1260: Fix prefetching bug on Java scanner ...................................................................... Patch Set 11: (10 comments) http://gerrit.cloudera.org:8080/#/c/17773/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17773/11//COMMIT_MSG@10 PS11, Line 10: The writing thread records the timestamp of : its write, and the scanner thread creates two scanners (w and w/o prefetching), : by comparing the scan result of the two scanners, we can verify the : prefetching result. It's great that this test does a lot of things, but in order to drill into the specific behavior of scan prefetching, do we actually have to do all of these things? Wouldn't it be simpler and more easily debuggable to just have scanners with prefetching, while still maintaining test coverage for this feature? http://gerrit.cloudera.org:8080/#/c/17773/11//COMMIT_MSG@24 PS11, Line 24: This error may cause the spark task to fail sometimes. nit: I haven't gone through the entire patch yet, but is this addressed in this patch? Could you clarify whether this is existing behavior or even with this patch spark tasks fail? http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java: http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@292 PS11, Line 292: nit: we typically indent at four spaces for line wraps http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@299 PS11, Line 299: * This error may cause the spark task to fail sometimes. nit: this detail doesn't seem important -- it seems like this would affect all Java applications, not just Spark. http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@449 PS11, Line 449: return this.hasMore || cachedPrefetcherDeferred.get() != null; nit: now it's a bit confusing to have both the concept of "having more" and "having more rows". Perhaps we can rename 'hasMore' to something more specific so it's clearer what it's referring to? Perhaps canRequestMore or something? http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@654 PS11, Line 654: nit: sam here with regards to spacing http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@671 PS11, Line 671: Deferred<RowResultIterator> prefetcherDeferred = client.scanNextRows(AsyncKuduScanner.this) : .addCallbacks(gotNextRow, nextRowErrback()); : if (!cachedPrefetcherDeferred.compareAndSet(null, prefetcherDeferred)) { : LOG.info("Skip one prefetching because two consecutive prefetching scan occurs"); : } > If cachedPrepetcherDeffered's value is null, which means no prefetching, th Rephrasing Alexey's question, once we call client.scanNextRows(), does it start immediately sending a scan request? In which case, if we drop that request's deferred, won't we be missing some rows? http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1212 PS11, Line 1212: if (!(prefetching && closed)) { nit: before entering the switch, would it make sense to check canBeIgnored() instead? That way we ensure the ignoring is consistent since it all uses the same canBeIgnored() method http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1252 PS11, Line 1252: if (error != null) { : if (canBeIgnored(resp.getError().getCode())) { : LOG.info("Ignore false alert of scanner not found for scan request"); : error = null; : } : } Would it make sense to put this up by the switch statement so we don't have to have additional handling at L1210? http://gerrit.cloudera.org:8080/#/c/17773/11/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala: PS11: nit: please move the Spark portion of this into a separate changelist rather than lumping everything into one big patch -- To view, visit http://gerrit.cloudera.org:8080/17773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I853a041d86c75ec196d7d4ff45af4673c5c5f5cd Gerrit-Change-Number: 17773 Gerrit-PatchSet: 11 Gerrit-Owner: Hongjiang Zhang <hongjizh...@ebay.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Hongjiang Zhang <hongjizh...@ebay.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Wed, 29 Sep 2021 06:26:05 +0000 Gerrit-HasComments: Yes