Grant Henke has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12411


Change subject: [spark-tools] DistributedDataGenerator repartition support
......................................................................

[spark-tools] DistributedDataGenerator repartition support

This patch adds support to the DistributedDataGenerator
to repartition the data to match the Kudu partitioning
while still respecting the num-tasks parameter.

Because data generation is now decoupled from data
loading, this patch changes the collision handling
behavior. Instead of generating new data on collision,
now the collision is only tracked in the metrics.

Change-Id: I57bcc68d645c52b429ac6cf8bcdf0551a8244995
---
M 
java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala
M 
java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala
2 files changed, 218 insertions(+), 65 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/12411/1
--
To view, visit http://gerrit.cloudera.org:8080/12411
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I57bcc68d645c52b429ac6cf8bcdf0551a8244995
Gerrit-Change-Number: 12411
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke <[email protected]>

Reply via email to