Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11147 )
Change subject: Create parallelized loader Spark job ...................................................................... Patch Set 4: (10 comments) Not sure if you meant to abandon this or not; seemed like that was an accident. http://gerrit.cloudera.org:8080/#/c/11147/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11147/4//COMMIT_MSG@14 PS4, Line 14: marke marked http://gerrit.cloudera.org:8080/#/c/11147/4/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala: PS4: License header. http://gerrit.cloudera.org:8080/#/c/11147/4/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@24 PS4, Line 24: case class TableOptions( : numPartitions: Int, : replicationFactor: Int, : numColumns: Int, : intColumnPercentage: Float) Unused? http://gerrit.cloudera.org:8080/#/c/11147/4/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@46 PS4, Line 46: val kuduClient = new KuduClientBuilder(options.masterAddresses).build() Why does this use the KuduClient directly instead of using KuduContext or something like that? Will this work with a Kerberized cluster? http://gerrit.cloudera.org:8080/#/c/11147/4/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@75 PS4, Line 75: isServiceUnavailable Is this really indicative of a collision? http://gerrit.cloudera.org:8080/#/c/11147/4/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@83 PS4, Line 83: rowsWritten += 1 The subtraction and addition is a little weird. Maybe you can add the happy path into the if/else and increment rowsWritten there? http://gerrit.cloudera.org:8080/#/c/11147/4/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@218 PS4, Line 218: Defaults to Nit: Default: ... (to be consistent with the other options here.) http://gerrit.cloudera.org:8080/#/c/11147/4/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala File java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala: PS4: License header. http://gerrit.cloudera.org:8080/#/c/11147/4/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala@19 PS4, Line 19: private val TABLE_SCHEMA: Schema = { Can't you use your fancy schema generator here to get better coverage? http://gerrit.cloudera.org:8080/#/c/11147/4/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala@32 PS4, Line 32: .setNumReplicas(1) Why this? -- To view, visit http://gerrit.cloudera.org:8080/11147 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4434c9f5d709154037386b4c7be94045df162267 Gerrit-Change-Number: 11147 Gerrit-PatchSet: 4 Gerrit-Owner: Mike Percy <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 18 Dec 2018 00:32:59 +0000 Gerrit-HasComments: Yes
