Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12323 )
Change subject: KUDU-2670: split more scanner and add concurrent ...................................................................... Patch Set 2: (15 comments) Thank you for this contribution. I did a quick pass and added some high level feedback. I think it would be useful to break this into 2 patches. One for the tablet splitting and another for the concurrent scanner. http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java File java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java: http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java@341 PS2, Line 341: public S lowerBoundPartitionKeyRaw(byte[] partitionKey) { I don't think these need to be public. http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConcurrentKuduScanner.java File java/kudu-client/src/main/java/org/apache/kudu/client/ConcurrentKuduScanner.java: http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConcurrentKuduScanner.java@32 PS2, Line 32: public class ConcurrentKuduScanner extends KuduScanner { Can you add javadoc? An example of when and how this should be used would be helpful. Does this need to extend KuduScanner? Given it wrap/contains many KuduScanners that doesn't seem right. http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConcurrentKuduScanner.java@46 PS2, Line 46: ConcurrentKuduScanner(List<KuduScanner> scanners, Naming it Concurrent makes it sound like this class is thread safe. Instead it looks like this implementation wraps multiple scanners. Maybe we could name it MultiKuduScanner? http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConcurrentKuduScanner.java@52 PS2, Line 52: this.queue = new LinkedBlockingQueue<>(threadNumer * 4); Why multiply by 4? http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConcurrentKuduScanner.java@78 PS2, Line 78: log.error("scanner error", ex); Maybe info or debug log level? http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java File java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java: http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java@46 PS2, Line 46: public class KeyEncoder { I don't think this needs to be public. http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/KeyRange.java File java/kudu-client/src/main/java/org/apache/kudu/client/KeyRange.java: http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/KeyRange.java@20 PS2, Line 20: public class KeyRange { Could you add java doc? Does this need to be public? http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java: http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@286 PS2, Line 286: private long targetChunkSize = -1; Could this name indicate that it will cause a tablet to be split? It would be good to also include the unit in the name. ex: splitSizeBytes http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@420 PS2, Line 420: static class DefaultTokenBuilder implements TokenBuilder { Since this is private API we can use methods to separate this logic instead of a builder. http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@460 PS2, Line 460: SplitKeyRangeRequest request = new SplitKeyRangeRequest(table, startPrimaryKey, I think the overall timeout needs to be considered here. http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java File java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java@a247 PS2, Line 247: Why was this removed? http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeRequest.java File java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeRequest.java: http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeRequest.java@30 PS2, Line 30: final class SplitKeyRangeRequest extends KuduRpc<SplitKeyRangeResponse> { Can you add javadoc? http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeResponse.java File java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeResponse.java: http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeResponse.java@26 PS2, Line 26: * Constructor with information common to all RPCs. Think this is a copy paste from KuduRpcResponse. Can you update the javadoc. http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestConcurrentScanner.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestConcurrentScanner.java: http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestConcurrentScanner.java@20 PS2, Line 20: nit: All these red blocks show extra spaces. http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala: http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@51 PS2, Line 51: val targetChunkSize = sc.getConf.getLong("spark.kudu.scanner.targetChunkSize", 1024 * 1024 * 512L) This configuration can be added to KuduReadOptions and set in DefaultSource like "kudu.scanLocality". -- To view, visit http://gerrit.cloudera.org:8080/12323 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0502f5d64569e8b1d45e88de3cb36aa2e01234d0 Gerrit-Change-Number: 12323 Gerrit-PatchSet: 2 Gerrit-Owner: yangz <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 08 Feb 2019 22:11:25 +0000 Gerrit-HasComments: Yes
