Grant Henke has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12411
Change subject: [spark-tools] DistributedDataGenerator repartition support ...................................................................... [spark-tools] DistributedDataGenerator repartition support This patch adds support to the DistributedDataGenerator to repartition the data to match the Kudu partitioning while still respecting the num-tasks parameter. Because data generation is now decoupled from data loading, this patch changes the collision handling behavior. Instead of generating new data on collision, now the collision is only tracked in the metrics. Change-Id: I57bcc68d645c52b429ac6cf8bcdf0551a8244995 --- M java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala M java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala 2 files changed, 218 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/12411/1 -- To view, visit http://gerrit.cloudera.org:8080/12411 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I57bcc68d645c52b429ac6cf8bcdf0551a8244995 Gerrit-Change-Number: 12411 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke <[email protected]>
