Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12211 )
Change subject: [spark-tools] Fix DistributedDataGenerator num-tasks ...................................................................... Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/12211/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12211/1//COMMIT_MSG@9 PS1, Line 9: bum-tasks > bum tasks indeed. Done. Lol. http://gerrit.cloudera.org:8080/#/c/12211/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala: http://gerrit.cloudera.org:8080/#/c/12211/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@151 PS1, Line 151: sc.parallelize(0 until options.numTasks, numSlices = options.numTasks) : .foreachPartition(taskNum => generateRows(context, options, taskNum.next(), metrics)) > Could you explain what the old code did, what the new code does, and why it The old code didn't attempt to set a number of tasks at all. It just parallelized the array of integers (0 until options.numTasks) using sparks default target of 2, resulting in 2 tasks all the time. This code sets numSlices = options.numTasks which sets the number of partitions/tasks to parallelize the array across. I also changed from foreach to foreachPartition, because even though there is only 1 value in each partition, I found foreachPartition gives a much clearer representation that generateRows is being called once in each partition. -- To view, visit http://gerrit.cloudera.org:8080/12211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7fa560e71b2f84a75002e9e776011e1a11c5a1ff Gerrit-Change-Number: 12211 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Comment-Date: Thu, 10 Jan 2019 22:26:11 +0000 Gerrit-HasComments: Yes
