Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables ......................................................................
IMPALA-3742: Partitions and sort INSERTs for Kudu tables Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow and leading to timeouts. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions and sorts rows according to Kudu's partitioning scheme for INSERTs and UPSERTs. A followup patch will handle UPDATE and DELETE. It accomplishes this by inserting an exchange node and a sort node into the plan before the operation. Both the exchange and the sort are given a KuduPartitionExpr which takes a row and calls into the Kudu client to return its partition number. It also disallows INSERT hints for Kudu tables, since the hints that we support (SHUFFLE, CLUSTER, SORTBY), so longer make sense. Testing: - Updated planner tests. - Ran the Kudu functional tests. - Ran performance tests demonstrating that we can now handle much larger inserts without having timeouts. Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Reviewed-on: http://gerrit.cloudera.org:8080/6559 Reviewed-by: Thomas Tauber-Marshall <[email protected]> Tested-by: Impala Public Jenkins --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exprs/CMakeLists.txt M be/src/exprs/expr-context.h M be/src/exprs/expr.cc A be/src/exprs/kudu-partition-expr.cc A be/src/exprs/kudu-partition-expr.h M be/src/runtime/coordinator.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/scheduling/scheduler.cc M common/thrift/Exprs.thrift M common/thrift/Partitions.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java A fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test 26 files changed, 616 insertions(+), 170 deletions(-) Approvals: Impala Public Jenkins: Verified Thomas Tauber-Marshall: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
