Thomas Tauber-Marshall has uploaded a new patch set (#10).

Change subject: IMPALA-3742: partitions DMLs for Kudu tables
......................................................................

IMPALA-3742: partitions DMLs for Kudu tables

Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu
are currently painful because we just send rows randomly,
which creates a lot of work for Kudu since it partitions
and sorts data before writing, causing writes to be slow.
We can alleviate this by sending the rows to Kudu already
partitioned and sorted.

This patch accomplishes this by inserting an exchange node into
the plan before the DML operation. The DataStreamSender then uses
a new abstraction, DataStreamPartitioner, that calls into the
Kudu client to determine the partition for each row. The partition
number is then appended to the row that is sent by the
DataStreamSender.

It also adds a sort node after the exchange node, which sorts on
the new tuple containing the partition number and then on the
key columns of the table.

Testing:
- Updated planner tests.
- Manually verified the partitioning works as expected.

Change-Id: Ic10b3295159354888efcde3df76b0edb24161515
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/coordinator.cc
A be/src/runtime/data-stream-partitioner.cc
A be/src/runtime/data-stream-partitioner.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/scheduling/scheduler.cc
M bin/impala-config.sh
M common/thrift/Partitions.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-delete.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
26 files changed, 665 insertions(+), 121 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/6037/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>

Reply via email to