Hello Will Berkeley, Mike Percy, Kudu Jenkins, Adar Dembo,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/12484
to look at the new patch set (#3).
Change subject: KUDU-2672: [spark] Optionally repartition to match Kudu
partitions
......................................................................
KUDU-2672: [spark] Optionally repartition to match Kudu partitions
Adds a write option to repartition the data to match
the target Kudu partitions. Additionally provides the
option to sort while repartitioning.
Repartitioning ensures that one task/client is only
writing to a single tablet. This improves throughput
by improving batching especially for tables with a large
number of partitions.
Additionally sorting before writing to Kudu reduces the
amount of compactions needed and can improve
sustained throughput.
Change-Id: I8763615997bccc08901235841149fc3bacb321e7
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/RowConverter.scala
M
java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
5 files changed, 184 insertions(+), 22 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/84/12484/3
--
To view, visit http://gerrit.cloudera.org:8080/12484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8763615997bccc08901235841149fc3bacb321e7
Gerrit-Change-Number: 12484
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>