Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12211 )

Change subject: [spark-tools] Fix DistributedDataGenerator num-tasks
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12211/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12211/1//COMMIT_MSG@9
PS1, Line 9: bum-tasks
> bum tasks indeed.
Done. Lol.


http://gerrit.cloudera.org:8080/#/c/12211/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala
File 
java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala:

http://gerrit.cloudera.org:8080/#/c/12211/1/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala@151
PS1, Line 151:     sc.parallelize(0 until options.numTasks, numSlices = 
options.numTasks)
             :       .foreachPartition(taskNum => generateRows(context, 
options, taskNum.next(), metrics))
> Could you explain what the old code did, what the new code does, and why it
The old code didn't attempt to set a number of tasks at all. It just 
parallelized the array of integers (0 until options.numTasks) using sparks 
default target of 2, resulting in 2 tasks all the time.

This code sets numSlices = options.numTasks which sets the number of 
partitions/tasks to parallelize the array across.

I also changed from foreach to foreachPartition, because even though there is 
only 1 value in each partition, I found foreachPartition gives a much clearer 
representation that generateRows is being called once in each partition.



--
To view, visit http://gerrit.cloudera.org:8080/12211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7fa560e71b2f84a75002e9e776011e1a11c5a1ff
Gerrit-Change-Number: 12211
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Comment-Date: Thu, 10 Jan 2019 22:26:11 +0000
Gerrit-HasComments: Yes

Reply via email to