Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12715 )
Change subject: [java] Make the KuduScanner iterable ...................................................................... Patch Set 3: (8 comments) Would be good to make sure that we still have test coverage over the "old way" of iterating too. http://gerrit.cloudera.org:8080/#/c/12715/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12715/2//COMMIT_MSG@20 PS2, Line 20: while still sharing the underlying : data. > The row data returned from a Scan RPC is passed by reference to the RowResu Grant and I discussed this offline. I had assumed that the semantics for the Java Slice class mirrored that of the C++ Slice class, where it's the caller's responsibility to ensure that data pointed to by a Slice remains live for the lifetime of the Slice. But in Java, arrays are objects, not primitives, and so when a Java Slice points to an array of data, that array is considered live (from the GC's perspective). http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java File java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java: http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java@365 PS2, Line 365: scanners do not time out Nit: we can be more specific ("to ensure that this scanner will not time out"). http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java: http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1030 PS2, Line 1030: // TODO: Find a clean way to plumb in reuseRowResult. > The challenge here is that I would need to add this property to the scanTok I recall having a similar discussion with Dan in the past. Like you, I made the case that scan tokens should contain just enough information to describe "what" is being scanned, and the rest should be set by whomever is hydrating the token. Dan's counter-argument is that even if that separation makes sense from an abstraction standpoint, it's not particularly practical because often times the "how" to scan is only known by the driver; if we force the "how" to be set by executors, the driver now needs to send that information via side channel to the executors. Practicality carried the day: scan tokens are certainly richer than they need to be, but easier to use. http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java: http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java@169 PS2, Line 169: return new KuduScannerIterator(this, asyncScanner.getKeepAlivePeriodMs() ); Nit: extra space towards the end. http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScannerIterator.java File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScannerIterator.java: http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScannerIterator.java@27 PS2, Line 27: public class KuduScannerIterator implements Iterator<RowResult> { Doc the class too? http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java File java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java: http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java@49 PS2, Line 49: private Slice rowData; I know my original confusion stemmed from lack of Java knowledge, but I think it'd be useful to doc the reference semantics that tripped me here, to explain how RowResult effectively has a reference to the data. Maybe that's better served by documenting the surprising (to a C++ developer) semantics in Slice itself. http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/RowResultIterator.java File java/kudu-client/src/main/java/org/apache/kudu/client/RowResultIterator.java: http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/RowResultIterator.java@72 PS2, Line 72: if (reuseRowResult && numRows != 0) { : this.sharedRowResult = new RowResult(this.schema, this.bs, this.indirectBs, -1); : } else { : this.sharedRowResult = null; : } Nit: use a ternary? http://gerrit.cloudera.org:8080/#/c/12715/2/src/kudu/client/client.proto File src/kudu/client/client.proto: http://gerrit.cloudera.org:8080/#/c/12715/2/src/kudu/client/client.proto@111 PS2, Line 111: scanners do not time out. Nit: "this scanner won't time out" -- To view, visit http://gerrit.cloudera.org:8080/12715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3e4ac59e30d0562c0a381d5e304af1dcfdcf5a1a Gerrit-Change-Number: 12715 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Tue, 12 Mar 2019 18:30:45 +0000 Gerrit-HasComments: Yes
