Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12323 )
Change subject: KUDU-2670: Part 1: Build ScanToken by splitSizeBytes ...................................................................... Patch Set 6: (9 comments) Thank you for the update Yao, it's looking good. I posted a few small nits and a couple high level suggestions. http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java: http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@1204 PS6, Line 1204: * Returns a deferred containing key ranges of the tablet which covers the partition key in the table. Nit: Line too long. http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java: http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@305 PS6, Line 305: * If unset, don't split the tablet. If unset or <= 0 is more accurate I suppose. http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@437 PS6, Line 437: tablet.getPartition().partitionKeyStart, splitSizeBytes, timeout)); I think we need to use a DedlineTracker to be sure the timeout represents the entire process of getting the scan tokens. As is I think the SplitKeyRangeRequest can take the full timeout along with the table.getTabletsLocations call. There was some refactoring recently related to this so you may want to rebase on master. http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeRequest.java File java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeRequest.java: http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeRequest.java@36 PS6, Line 36: final class SplitKeyRangeRequest extends KuduRpc<SplitKeyRangeResponse> { I don't think this needs to be final since it's internal preventing extension isn't a big deal. http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestSplitKeyRangeRequest.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestSplitKeyRangeRequest.java: http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestSplitKeyRangeRequest.java@19 PS6, Line 19: public class TestSplitKeyRangeRequest { I don't mind a test of the request/response. More coverage is a great thing. But I think a test of the public API usage is most important. In this case I think something that uses the ScanToken API with splits and then reading the data to verify the data was split. http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestSplitKeyRangeRequest.java@34 PS6, Line 34: "--tablet_transaction_memory_limit_mb=1", Why do these flags need to be set? http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/test/java/org/apache/kudu/util/SplitTestUtil.java File java/kudu-client/src/test/java/org/apache/kudu/util/SplitTestUtil.java: http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/test/java/org/apache/kudu/util/SplitTestUtil.java@16 PS6, Line 16: public class SplitTestUtil { Could this be moved into the ClientTestUtil class for things that are generally useful? There is also an existing DataGenerator class that can be used to generate the rows. Things that are specific to the TestSplitKeyRangeRequest only can be moved to the test class. http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala: PS6: Could you write a spark test that uses this new feature? http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala: http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala@37 PS6, Line 37: * @param splitSizeBytes Set the number of bytes used to split tablet. Perhaps make these docs a bit more spark centric. Something along the lines of "Sets the target number of bytes per spark task. If set, tablets will be split to generate uniform task sizes instead of the default of 1 task per tablet." -- To view, visit http://gerrit.cloudera.org:8080/12323 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0502f5d64569e8b1d45e88de3cb36aa2e01234d0 Gerrit-Change-Number: 12323 Gerrit-PatchSet: 6 Gerrit-Owner: yangz <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yao Xu <[email protected]> Gerrit-Reviewer: yangz <[email protected]> Gerrit-Comment-Date: Mon, 04 Mar 2019 16:50:11 +0000 Gerrit-HasComments: Yes
