[kudu-CR] Create parallelized loader Spark job

Adar Dembo (Code Review) Mon, 17 Dec 2018 16:36:01 -0800

Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12101 )


Change subject: Create parallelized loader Spark job
......................................................................


Patch Set 1:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala
File 
java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala:

PS1:
License header.


http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@24
PS1, Line 24: case class TableOptions(
            :     numPartitions: Int,
            :     replicationFactor: Int,
            :     numColumns: Int,
            :     intColumnPercentage: Float)
Unused?


http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@46
PS1, Line 46:     val kuduClient = new 
KuduClientBuilder(options.masterAddresses).build()
Why does this use the KuduClient directly instead of using KuduContext or 
something like that? Will this work with a Kerberized cluster?


http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@75
PS1, Line 75: isServiceUnavailable
Is this really indicative of a collision?


http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@83
PS1, Line 83:       rowsWritten += 1
The subtraction and addition is a little weird. Maybe you can add the happy 
path into the if/else and increment rowsWritten there?


http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@219
PS1, Line 219: Defaults to
Nit: Default: ...

(to be consistent with the other options here.)


http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala
File 
java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala:

PS1:
License header.


http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala@19
PS1, Line 19:   private val TABLE_SCHEMA: Schema = {
Can you use your fancy schema generator to improve coverage?


http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala@32
PS1, Line 32:     .setNumReplicas(1)
Why this?



--
To view, visit http://gerrit.cloudera.org:8080/12101
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibdfd41a21a7f80d22125c7f4e5ca4ed62c31709d
Gerrit-Change-Number: 12101
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Comment-Date: Tue, 18 Dec 2018 00:35:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] Create parallelized loader Spark job

Reply via email to