Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12101 )
Change subject: Create parallelized loader Spark job ...................................................................... Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala: PS1: > License header. Done http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@24 PS1, Line 24: import org.apache.kudu.client.PartialRow : import org.apache.kudu.client.SessionConfiguration : import org.apache.kudu.spark.kudu.KuduContext : import org.apache.kudu.spark.tools.DistributedDataGeneratorOptions._ : import org.apache.kudu.util.Dat > Unused? Done http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@46 PS1, Line 46: GeneratorMetrics(sc.longAccumulator("rows_written"), sc.longAccumulator("row_collisions")) > Why does this use the KuduClient directly instead of using KuduContext or s I am using the client API directly because we are inserting PartialRows instead of Spark rows. I can however use the context to get the client. I will do that. http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@75 PS1, Line 75: > Is this really indicative of a collision? Perhaps not. This is a carry over from mikes patch. I suspect the idea is to handle retires. I will remove it for now and add back if needed. http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@83 PS1, Line 83: // Synchronously flush on potentially the last iteration of the > It's tough to do that because we only get those errors on auto- or manual f Done http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@219 PS1, Line 219: ional() > Nit: Default: ... Done http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala File java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala: PS1: > License header. Done http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala@32 PS1, Line 32: val log: Logger = LoggerFactory.getLogger(getClass) > I probably originally wrote this part of the test before handing this off t I am happy to remove it to make things more "default". -- To view, visit http://gerrit.cloudera.org:8080/12101 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibdfd41a21a7f80d22125c7f4e5ca4ed62c31709d Gerrit-Change-Number: 12101 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Comment-Date: Thu, 20 Dec 2018 17:58:39 +0000 Gerrit-HasComments: Yes
