Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12715 )

Change subject: [java] Make the KuduScanner iterable
......................................................................


Patch Set 3:

(8 comments)

Would be good to make sure that we still have test coverage over the "old way" 
of iterating too.

http://gerrit.cloudera.org:8080/#/c/12715/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12715/2//COMMIT_MSG@20
PS2, Line 20:  while still sharing the underlying
            : data.
> The row data returned from a Scan RPC is passed by reference to the RowResu
Grant and I discussed this offline.

I had assumed that the semantics for the Java Slice class mirrored that of the 
C++ Slice class, where it's the caller's responsibility to ensure that data 
pointed to by a Slice remains live for the lifetime of the Slice. But in Java, 
arrays are objects, not primitives, and so when a Java Slice points to an array 
of data, that array is considered live (from the GC's perspective).


http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java:

http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java@365
PS2, Line 365: scanners do not time out
Nit: we can be more specific ("to ensure that this scanner will not time out").


http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java:

http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1030
PS2, Line 1030:       // TODO: Find a clean way to plumb in reuseRowResult.
> The challenge here is that I would need to add this property to the scanTok
I recall having a similar discussion with Dan in the past. Like you, I made the 
case that scan tokens should contain just enough information to describe "what" 
is being scanned, and the rest should be set by whomever is hydrating the 
token. Dan's counter-argument is that even if that separation makes sense from 
an abstraction standpoint, it's not particularly practical because often times 
the "how" to scan is only known by the driver; if we force the "how" to be set 
by executors, the driver now needs to send that information via side channel to 
the executors.

Practicality carried the day: scan tokens are certainly richer than they need 
to be, but easier to use.


http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java:

http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java@169
PS2, Line 169:     return new KuduScannerIterator(this, 
asyncScanner.getKeepAlivePeriodMs() );
Nit: extra space towards the end.


http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScannerIterator.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/KuduScannerIterator.java:

http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScannerIterator.java@27
PS2, Line 27: public class KuduScannerIterator implements Iterator<RowResult> {
Doc the class too?


http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java
File java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java:

http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java@49
PS2, Line 49:   private Slice rowData;
I know my original confusion stemmed from lack of Java knowledge, but I think 
it'd be useful to doc the reference semantics that tripped me here, to explain 
how RowResult effectively has a reference to the data. Maybe that's better 
served by documenting the surprising (to a C++ developer) semantics in Slice 
itself.


http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/RowResultIterator.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/RowResultIterator.java:

http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/RowResultIterator.java@72
PS2, Line 72:     if (reuseRowResult && numRows != 0) {
            :       this.sharedRowResult = new RowResult(this.schema, this.bs, 
this.indirectBs, -1);
            :     } else {
            :       this.sharedRowResult = null;
            :     }
Nit: use a ternary?


http://gerrit.cloudera.org:8080/#/c/12715/2/src/kudu/client/client.proto
File src/kudu/client/client.proto:

http://gerrit.cloudera.org:8080/#/c/12715/2/src/kudu/client/client.proto@111
PS2, Line 111: scanners do not time out.
Nit: "this scanner won't time out"



--
To view, visit http://gerrit.cloudera.org:8080/12715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e4ac59e30d0562c0a381d5e304af1dcfdcf5a1a
Gerrit-Change-Number: 12715
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Tue, 12 Mar 2019 18:30:45 +0000
Gerrit-HasComments: Yes

Reply via email to