Thomas Tauber-Marshall has uploaded a new change for review. http://gerrit.cloudera.org:8080/6037
Change subject: PREVIEW: IMPALA-3742: partitions INSERTs into Kudu tables ...................................................................... PREVIEW: IMPALA-3742: partitions INSERTs into Kudu tables Bulk inserts into Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions the rows to insert according to Kudu's partitioning scheme. A followup patch will deal with sorting. It accomplishes this by inserting an exchange node into the plan before the insert and then passing down the TableId for the target table to the DataStreamSender so that it can call into the Kudu client to determine the partition for each row. This patch is a PREVIEW so we can decide if we're happy with the partitioning API Kudu has proposed and get that in on the Kudu side. It does not have any tests, and has not been tested for performance. Its been suggested that rather than adding another special case partitioning type to DataStreamSender we could make it more general by passing in a partitioning function. I'm currently investigating this. Change-Id: Ic10b3295159354888efcde3df76b0edb24161515 --- M be/src/runtime/coordinator.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/scheduling/simple-scheduler.cc M bin/impala-config.sh M common/thrift/Partitions.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java 11 files changed, 149 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/6037/1 -- To view, visit http://gerrit.cloudera.org:8080/6037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
