Grant Henke has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12411 )
Change subject: [spark-tools] DistributedDataGenerator repartition support ...................................................................... [spark-tools] DistributedDataGenerator repartition support This patch adds support to the DistributedDataGenerator to repartition the data to match the Kudu partitioning. Because data generation is now decoupled from data loading, this patch changes the collision handling behavior. Instead of generating new data on collision, now the collision is only tracked in the metrics. Additionally this patch changes the default generation type from random to sequential given that has been shown to be the more common option and the type of workload Kudu is better suited for. Change-Id: I57bcc68d645c52b429ac6cf8bcdf0551a8244995 Reviewed-on: http://gerrit.cloudera.org:8080/12411 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]> --- M java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/DistributedDataGenerator.scala M java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/DistributedDataGeneratorTest.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala 3 files changed, 206 insertions(+), 68 deletions(-) Approvals: Kudu Jenkins: Verified Adar Dembo: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/12411 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I57bcc68d645c52b429ac6cf8bcdf0551a8244995 Gerrit-Change-Number: 12411 Gerrit-PatchSet: 5 Gerrit-Owner: Grant Henke <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]>
