Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12323 )

Change subject: KUDU-2670: Part 1: Build ScanToken by splitSizeBytes
......................................................................


Patch Set 6:

(9 comments)

Thank you for the update Yao, it's looking good. I posted a few small nits and 
a couple high level suggestions.

http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java:

http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@1204
PS6, Line 1204:    * Returns a deferred containing key ranges of the tablet 
which covers the partition key in the table.
Nit: Line too long.


http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java:

http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@305
PS6, Line 305:      * If unset, don't split the tablet.
If unset or <= 0 is more accurate I suppose.


http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@437
PS6, Line 437:                 tablet.getPartition().partitionKeyStart, 
splitSizeBytes, timeout));
I think we need to use a DedlineTracker to be sure the timeout represents the 
entire process of getting the scan tokens. As is I think the 
SplitKeyRangeRequest can take the full timeout along with the  
table.getTabletsLocations call. There was some refactoring recently related to 
this so you may want to rebase on master.


http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeRequest.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeRequest.java:

http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeRequest.java@36
PS6, Line 36: final class SplitKeyRangeRequest extends 
KuduRpc<SplitKeyRangeResponse> {
I don't think this needs to be final since it's internal preventing extension 
isn't a big deal.


http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestSplitKeyRangeRequest.java
File 
java/kudu-client/src/test/java/org/apache/kudu/client/TestSplitKeyRangeRequest.java:

http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestSplitKeyRangeRequest.java@19
PS6, Line 19: public class TestSplitKeyRangeRequest {
I don't mind a test of the request/response. More coverage is a great thing. 
But I think a test of the public API usage is most important.

In this case I think something that uses the ScanToken API with splits and then 
reading the data to verify the data was split.


http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestSplitKeyRangeRequest.java@34
PS6, Line 34:       "--tablet_transaction_memory_limit_mb=1",
Why do these flags need to be set?


http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/test/java/org/apache/kudu/util/SplitTestUtil.java
File java/kudu-client/src/test/java/org/apache/kudu/util/SplitTestUtil.java:

http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/test/java/org/apache/kudu/util/SplitTestUtil.java@16
PS6, Line 16: public class SplitTestUtil {
Could this be moved into the ClientTestUtil class for things that are generally 
useful? There is also an existing DataGenerator class that can be used to 
generate the rows.

Things that are specific to the TestSplitKeyRangeRequest only can be moved to 
the test class.


http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala:

PS6:
Could you write a spark test that uses this new feature?


http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala:

http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala@37
PS6, Line 37:  * @param splitSizeBytes Set the number of bytes used to split 
tablet.
Perhaps make these docs a bit more spark centric. Something along the lines of 
"Sets the target number of bytes per spark task. If set, tablets will be split 
to generate uniform task sizes instead of the default of 1 task per tablet."



--
To view, visit http://gerrit.cloudera.org:8080/12323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0502f5d64569e8b1d45e88de3cb36aa2e01234d0
Gerrit-Change-Number: 12323
Gerrit-PatchSet: 6
Gerrit-Owner: yangz <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu <[email protected]>
Gerrit-Reviewer: yangz <[email protected]>
Gerrit-Comment-Date: Mon, 04 Mar 2019 16:50:11 +0000
Gerrit-HasComments: Yes

Reply via email to