Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12323 )

Change subject: KUDU-2670: split more scanner and add concurrent
......................................................................


Patch Set 2:

(15 comments)

Thank you for this contribution. I did a quick pass and added some high level 
feedback.

I think it would be useful to break this into 2 patches. One for the tablet 
splitting and another for the concurrent scanner.

http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java:

http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java@341
PS2, Line 341:   public S lowerBoundPartitionKeyRaw(byte[] partitionKey) {
I don't think these need to be public.


http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConcurrentKuduScanner.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/ConcurrentKuduScanner.java:

http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConcurrentKuduScanner.java@32
PS2, Line 32: public class ConcurrentKuduScanner extends KuduScanner {
Can you add javadoc? An example of when and how this should be used would be 
helpful.

Does this need to extend KuduScanner? Given it wrap/contains many KuduScanners 
that doesn't seem right.


http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConcurrentKuduScanner.java@46
PS2, Line 46:   ConcurrentKuduScanner(List<KuduScanner> scanners,
Naming it Concurrent makes it sound like this class is thread safe. Instead it 
looks like this implementation wraps multiple scanners. Maybe we could name it 
MultiKuduScanner?


http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConcurrentKuduScanner.java@52
PS2, Line 52:     this.queue = new LinkedBlockingQueue<>(threadNumer * 4);
Why multiply by 4?


http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConcurrentKuduScanner.java@78
PS2, Line 78:               log.error("scanner error", ex);
Maybe info or debug log level?


http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java:

http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/KeyEncoder.java@46
PS2, Line 46: public class KeyEncoder {
I don't think this needs to be public.


http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/KeyRange.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KeyRange.java:

http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/KeyRange.java@20
PS2, Line 20: public class KeyRange {
Could you add java doc?

Does this need to be public?


http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java:

http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@286
PS2, Line 286:     private long targetChunkSize = -1;
Could this name indicate that it will cause a tablet to be split? It would be 
good to also include the unit in the name.
ex: splitSizeBytes


http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@420
PS2, Line 420:     static class DefaultTokenBuilder implements TokenBuilder {
Since this is private API we can use methods to separate this logic instead of 
a builder.


http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@460
PS2, Line 460:           SplitKeyRangeRequest request = new 
SplitKeyRangeRequest(table, startPrimaryKey,
I think the overall timeout needs to be considered here.


http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java:

http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java@a247
PS2, Line 247:
Why was this removed?


http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeRequest.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeRequest.java:

http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeRequest.java@30
PS2, Line 30: final class SplitKeyRangeRequest extends 
KuduRpc<SplitKeyRangeResponse> {
Can you add javadoc?


http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeResponse.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeResponse.java:

http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeResponse.java@26
PS2, Line 26:    * Constructor with information common to all RPCs.
Think this is a copy paste from KuduRpcResponse. Can you update the javadoc.


http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestConcurrentScanner.java
File 
java/kudu-client/src/test/java/org/apache/kudu/client/TestConcurrentScanner.java:

http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestConcurrentScanner.java@20
PS2, Line 20:
nit: All these red blocks show extra spaces.


http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala:

http://gerrit.cloudera.org:8080/#/c/12323/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@51
PS2, Line 51:   val targetChunkSize = 
sc.getConf.getLong("spark.kudu.scanner.targetChunkSize", 1024 * 1024 * 512L)
This configuration can be added to KuduReadOptions and set in DefaultSource 
like "kudu.scanLocality".



--
To view, visit http://gerrit.cloudera.org:8080/12323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0502f5d64569e8b1d45e88de3cb36aa2e01234d0
Gerrit-Change-Number: 12323
Gerrit-PatchSet: 2
Gerrit-Owner: yangz <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 08 Feb 2019 22:11:25 +0000
Gerrit-HasComments: Yes

Reply via email to