Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12101 )
Change subject: Create parallelized loader Spark job ...................................................................... Patch Set 1: Code-Review+1 (3 comments) http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala: http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@83 PS1, Line 83: rowsWritten += 1 > The subtraction and addition is a little weird. Maybe you can add the happy It's tough to do that because we only get those errors on auto- or manual flush, not in every iteration of the loop. So we increment in the happy path and subtract in the (occasional) sad path. Perhaps this is worth a comment. http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala File java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala: http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala@19 PS1, Line 19: private val TABLE_SCHEMA: Schema = { > Can you use your fancy schema generator to improve coverage? Can we leave that as a TODO? This is in service of a specific task which is measuring backup performance and this implementation and test seem good enough as-is for that purpose. http://gerrit.cloudera.org:8080/#/c/12101/1/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala@32 PS1, Line 32: .setNumReplicas(1) > Why this? I probably originally wrote this part of the test before handing this off to Grant, after which he did a bunch of refactoring of the generator class. The way I would justify this is that we don't really care about replication for the purposes of correctness testing on this data generator. The leader will do all the validation we need for the scope of this test and adding replicas just makes it more likely to have a noisier / flakier test. -- To view, visit http://gerrit.cloudera.org:8080/12101 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibdfd41a21a7f80d22125c7f4e5ca4ed62c31709d Gerrit-Change-Number: 12101 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Comment-Date: Wed, 19 Dec 2018 19:45:28 +0000 Gerrit-HasComments: Yes
