Grant Henke has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12484
Change subject: KUDU-2672: [spark] Optionally repartition to match Kudu partitions ...................................................................... KUDU-2672: [spark] Optionally repartition to match Kudu partitions Adds a write option to repartition the data to match the target Kudu partitions. Additionally provides the option to sort while repartitioning. Repartitioning ensures that one task/client is only writing to a single tablet. This improves throughput by improving batching especially for tables with a large number of partitions. Additionally sorting before writing to Kudu reduces the amount of compactions needed and can improve sustained throughput. Change-Id: I8763615997bccc08901235841149fc3bacb321e7 --- M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/RowConverter.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala 5 files changed, 183 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/84/12484/1 -- To view, visit http://gerrit.cloudera.org:8080/12484 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8763615997bccc08901235841149fc3bacb321e7 Gerrit-Change-Number: 12484 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke <[email protected]>
