Thomas Tauber-Marshall has uploaded a new patch set (#3). Change subject: PREVIEW: IMPALA-3742: partitions INSERTs into Kudu tables ......................................................................
PREVIEW: IMPALA-3742: partitions INSERTs into Kudu tables Bulk inserts into Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions the rows to insert according to Kudu's partitioning scheme. A followup patch will deal with sorting. It accomplishes this by inserting an exchange node into the plan before the insert. The DataStreamSender then uses a new abstraction, DataStreamPartitioner, that calls into the Kudu client to determine the partition for each row. In the future, DataStreamPartitioner can be extended to other partitioning types. This patch is a PREVIEW so we can decide if we're happy with the partitioning API Kudu has proposed and get that in on the Kudu side. It does not have any tests, and has not been tested for performance. Change-Id: Ic10b3295159354888efcde3df76b0edb24161515 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/runtime/CMakeLists.txt M be/src/runtime/coordinator.cc A be/src/runtime/data-stream-partitioner.cc A be/src/runtime/data-stream-partitioner.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/scheduling/simple-scheduler.cc M bin/impala-config.sh M common/thrift/Partitions.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java 17 files changed, 343 insertions(+), 82 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/6037/3 -- To view, visit http://gerrit.cloudera.org:8080/6037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]>
