Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12101 )
Change subject: Create parallelized loader Spark job ...................................................................... Patch Set 1: (9 comments) http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala: PS1: License header. http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@24 PS1, Line 24: case class TableOptions( : numPartitions: Int, : replicationFactor: Int, : numColumns: Int, : intColumnPercentage: Float) Unused? http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@46 PS1, Line 46: val kuduClient = new KuduClientBuilder(options.masterAddresses).build() Why does this use the KuduClient directly instead of using KuduContext or something like that? Will this work with a Kerberized cluster? http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@75 PS1, Line 75: isServiceUnavailable Is this really indicative of a collision? http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@83 PS1, Line 83: rowsWritten += 1 The subtraction and addition is a little weird. Maybe you can add the happy path into the if/else and increment rowsWritten there? http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@219 PS1, Line 219: Defaults to Nit: Default: ... (to be consistent with the other options here.) http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala File java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala: PS1: License header. http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala@19 PS1, Line 19: private val TABLE_SCHEMA: Schema = { Can you use your fancy schema generator to improve coverage? http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala@32 PS1, Line 32: .setNumReplicas(1) Why this? -- To view, visit http://gerrit.cloudera.org:8080/12101 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibdfd41a21a7f80d22125c7f4e5ca4ed62c31709d Gerrit-Change-Number: 12101 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Comment-Date: Tue, 18 Dec 2018 00:35:47 +0000 Gerrit-HasComments: Yes
