[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Internal Jenkins has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 13: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-2521: Add clustered hint to insert statements .. IMPALA-2521: Add clustered hint to insert statements This change introduces a clustered/noclustered hint for insert statements. Specifying this hint adds an additional sort node to the plan, just before the table sink. This has the effect that data will be clustered by its partition prior to writing partitions, which therefore can be written sequentially. Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Reviewed-on: http://gerrit.cloudera.org:8080/4745 Reviewed-by: Alex Behm Tested-by: Internal Jenkins --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test 10 files changed, 418 insertions(+), 44 deletions(-) Approvals: Internal Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Alex Behm has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Lars Volker has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 13: Alex, can you +2 this again? The tests got renamed and I didn't catch the syntax error when running locally. :( -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Hello Marcel Kornacker, Internal Jenkins, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4745 to look at the new patch set (#13). Change subject: IMPALA-2521: Add clustered hint to insert statements .. IMPALA-2521: Add clustered hint to insert statements This change introduces a clustered/noclustered hint for insert statements. Specifying this hint adds an additional sort node to the plan, just before the table sink. This has the effect that data will be clustered by its partition prior to writing partitions, which therefore can be written sequentially. Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test 10 files changed, 418 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/4745/13 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Internal Jenkins has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 12: Verified-1 Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/360/ -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Alex Behm has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 12: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Lars Volker has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 12: Rebase done in PS12 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Hello Marcel Kornacker, Internal Jenkins, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4745 to look at the new patch set (#12). Change subject: IMPALA-2521: Add clustered hint to insert statements .. IMPALA-2521: Add clustered hint to insert statements This change introduces a clustered/noclustered hint for insert statements. Specifying this hint adds an additional sort node to the plan, just before the table sink. This has the effect that data will be clustered by its partition prior to writing partitions, which therefore can be written sequentially. Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test 10 files changed, 417 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/4745/12 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Lars Volker has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 11: Fixed AnalyzeStmtsTest, also needs another rebase due to more Kudu related changes making it into master before this one. Will push another PS. -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Hello Marcel Kornacker, Internal Jenkins, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4745 to look at the new patch set (#11). Change subject: IMPALA-2521: Add clustered hint to insert statements .. IMPALA-2521: Add clustered hint to insert statements This change introduces a clustered/noclustered hint for insert statements. Specifying this hint adds an additional sort node to the plan, just before the table sink. This has the effect that data will be clustered by its partition prior to writing partitions, which therefore can be written sequentially. Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test 10 files changed, 418 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/4745/11 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Internal Jenkins has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 11: Verified-1 Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/357/ -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Internal Jenkins has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 10: Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/356/ -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Internal Jenkins has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 10: Verified-1 Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/355/ -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Alex Behm has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 10: Code-Review+1 Thanks! -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Lars Volker has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 10: (1 comment) I added a test. I also had to rebase onto Dimitris' change, so now the diff between PS 9 and 10 is a bit messy. Sorry for that. http://gerrit.cloudera.org:8080/#/c/4745/8/testdata/workloads/functional-planner/queries/PlannerTest/insert.test File testdata/workloads/functional-planner/queries/PlannerTest/insert.test: Line 635: |--02:AGGREGATE [FINALIZE] > Sorry just thought of this. Can you also add a test like this that has a WH Done -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4745 to look at the new patch set (#10). Change subject: IMPALA-2521: Add clustered hint to insert statements .. IMPALA-2521: Add clustered hint to insert statements This change introduces a clustered/noclustered hint for insert statements. Specifying this hint adds an additional sort node to the plan, just before the table sink. This has the effect that data will be clustered by its partition prior to writing partitions, which therefore can be written sequentially. Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test 10 files changed, 417 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/4745/10 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Alex Behm has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/4745/8/testdata/workloads/functional-planner/queries/PlannerTest/insert.test File testdata/workloads/functional-planner/queries/PlannerTest/insert.test: Line 635: # IMPALA-2521: clustered insert into non-partitioned table does not add sort node. Sorry just thought of this. Can you also add a test like this that has a WHERE-clause subquery? We should test the reset() + analyze() path when subqueries are rewritten. -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Lars Volker has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 9: (6 comments) Thanks for the comments, please see PS9. http://gerrit.cloudera.org:8080/#/c/4745/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 602: } > Thanks. The reason why it works without is that SortNode.init() during plan Ok Line 604: if (matchFound && table_ instanceof KuduTable) { > I vote for keep Ok http://gerrit.cloudera.org:8080/#/c/4745/8/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 603: // Store exprs for Kudu key columns. > add line breaks between if blocks, it's harder to navigate without those (' Done http://gerrit.cloudera.org:8080/#/c/4745/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: Line 237:* outputs of aggregation. > something missing there. (i know, not your fault.) Yes, I couldn't figure out in git blame what that sentence originally meant. Since I don't understand enough of the code to complete it, I removed it. Let me know if I should replace it with something else. http://gerrit.cloudera.org:8080/#/c/4745/8/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test: Line 247: # IMPALA-2521: clustered insert into table adds sort node, correctly substituting exprs. > you can't always see the correct substitution until you run the query (beca Done Line 249: select id, name, maxzip as zip > use fewer columns, add an analytic function in the inline view Done -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4745 to look at the new patch set (#9). Change subject: IMPALA-2521: Add clustered hint to insert statements .. IMPALA-2521: Add clustered hint to insert statements This change introduces a clustered/noclustered hint for insert statements. Specifying this hint adds an additional sort node to the plan, just before the table sink. This has the effect that data will be clustered by its partition prior to writing partitions, which therefore can be written sequentially. Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test 10 files changed, 367 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/4745/9 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 8: (3 comments) i won't look at the substitution logic in detail because it gives me a headache. :) but please add that test. http://gerrit.cloudera.org:8080/#/c/4745/8/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 603: // Store exprs for Kudu key columns. add line breaks between if blocks, it's harder to navigate without those ('{' in vi let's you jump by a 'paragraph'). at least add one at l613 http://gerrit.cloudera.org:8080/#/c/4745/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: Line 237:* the outputs of aggregation. Invoked for UnionStmt as well since something missing there. (i know, not your fault.) http://gerrit.cloudera.org:8080/#/c/4745/8/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test: Line 247: # IMPALA-2521: clustered insert into table adds sort node, correctly substituting exprs. you can't always see the correct substitution until you run the query (because the column labels can be the same even when you're pointing at the wrong tuple). let's add query tests as well, what you have so far should run (even if you won't see any benefit in the hdfs case). -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Alex Behm has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 8: Code-Review+1 (3 comments) I'm happy with the patch after the remaining comments. I imagine Marcel will want to take another pass, so only giving +1. http://gerrit.cloudera.org:8080/#/c/4745/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 602: } > I added a test for this, however it also seems to work without the substitu Thanks. The reason why it works without is that SortNode.init() during planning will substitute the ordering exprs appropriately. However, it's cleaner to leave your changes because the PK exprs do need to be substituted like the other ones, but it just happens to work in this case because of how the PK exprs are used exactly. Line 604: if (matchFound && table_ instanceof KuduTable) { > Done, though after finding that List supports contains I might as well inli I vote for keep http://gerrit.cloudera.org:8080/#/c/4745/8/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test: Line 249: select id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, float_col, double_col, use fewer columns, add an analytic function in the inline view -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Lars Volker has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 8: (7 comments) Thanks for the review, please see PS8, as well as my comment regarding expr substitution. http://gerrit.cloudera.org:8080/#/c/4745/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 132: private ArrayList primaryKeyExprs_ = Lists.newArrayList(); > how about primaryKeyExprs_? Done Line 517:* expressions. > Mention that as point 4 Done Line 601: } > Kudu (it's a proper noun) Done Line 602: } > You also need to substitute the kuduExprs_ in InsertStmt.substituteResultEx I added a test for this, however it also seems to work without the substitution. What am I missing? Also, are c_i and k_j particular columns? I tried reordering them, but it still does not fail when omitting the substitution. Line 604: if (matchFound && table_ instanceof KuduTable) { > add helper KuduTable.isPrimaryKeyColumn() Done, though after finding that List supports contains I might as well inline it as it is just one line. Inline or Keep? http://gerrit.cloudera.org:8080/#/c/4745/5/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: Line 499:* (key columns for Kudu tables), so that partitions can be written sequentially in the > Kudu tables Done http://gerrit.cloudera.org:8080/#/c/4745/5/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test: Line 248: insert into table functional_kudu.alltypes /*+ clustered */ > shuffle/noshuffle don't do anything for Kudu tables, so I don't think this Done -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Lars Volker has uploaded a new patch set (#8). Change subject: IMPALA-2521: Add clustered hint to insert statements .. IMPALA-2521: Add clustered hint to insert statements This change introduces a clustered/noclustered hint for insert statements. Specifying this hint adds an additional sort node to the plan, just before the table sink. This has the effect that data will be clustered by its partition prior to writing partitions, which therefore can be written sequentially. Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 8 files changed, 298 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/4745/8 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Lars Volker has uploaded a new patch set (#7). Change subject: IMPALA-2521: Add clustered hint to insert statements .. IMPALA-2521: Add clustered hint to insert statements This change introduces a clustered/noclustered hint for insert statements. Specifying this hint adds an additional sort node to the plan, just before the table sink. This has the effect that data will be clustered by its partition prior to writing partitions, which therefore can be written sequentially. Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 8 files changed, 295 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/4745/7 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Lars Volker has uploaded a new patch set (#6). Change subject: IMPALA-2521: Add clustered hint to insert statements .. IMPALA-2521: Add clustered hint to insert statements This change introduces a clustered/noclustered hint for insert statements. Specifying this hint adds an additional sort node to the plan, just before the table sink. This has the effect that data will be clustered by its partition prior to writing partitions, which therefore can be written sequentially. Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 8 files changed, 295 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/4745/6 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Alex Behm has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/4745/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 132: private ArrayList kuduKeyExprs_ = Lists.newArrayList(); how about primaryKeyExprs_? Line 517:* expressions. Result exprs for key columns of kudu tables are stored in kuduKeyExprs_. Mention that as point 4 Line 601: // Store exprs for kudu key columns. Kudu (it's a proper noun) Line 602: if (matchFound && table_ instanceof KuduTable) { You also need to substitute the kuduExprs_ in InsertStmt.substituteResultExprs(). You can try something like this (and add a test): INSERT INTO kudu_table /*+ clustered */ SELECT * FROM (SELECT c1, c2, k1, k2) FROM src_table Line 604: for (String kuduKeyColumnName: kuduTable.getKuduKeyColumnNames()) { add helper KuduTable.isPrimaryKeyColumn() http://gerrit.cloudera.org:8080/#/c/4745/5/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: Line 499:* (key columns for kudu tables), so that partitions can be written sequentially in the Kudu tables http://gerrit.cloudera.org:8080/#/c/4745/5/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test: Line 248: insert into table functional_kudu.alltypes /*+ clustered,shuffle */ shuffle/noshuffle don't do anything for Kudu tables, so I don't think this is necessary they might do something in the future -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Lars Volker has uploaded a new patch set (#5). Change subject: IMPALA-2521: Add clustered hint to insert statements .. IMPALA-2521: Add clustered hint to insert statements This change introduces a clustered/noclustered hint for insert statements. Specifying this hint adds an additional sort node to the plan, just before the table sink. This has the effect that data will be clustered by its partition prior to writing partitions, which therefore can be written sequentially. Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 7 files changed, 288 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/4745/5 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Lars Volker has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 4: (8 comments) > (8 comments) > > Nice! Current PS looks good to me. Do you intent to add Kudu > support in a separate patch? Thanks for the comments. I had a look and it didn't add too much code, so I added kudu support and tests here. It also seemed to fit into the scope of the Jira. http://gerrit.cloudera.org:8080/#/c/4745/4/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: Line 255: SlotRef origSlotRef = (SlotRef) smap.getLhs().get(i); > Seems clearer to call these inputSlotRef and outputSlotRef Done http://gerrit.cloudera.org:8080/#/c/4745/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java File fe/src/main/java/org/apache/impala/analysis/SortInfo.java: Line 149:* output by the sort node. Done by materializing slot refs in the order-by and result > ... and given result expressions... Done Line 151:* slot refs into the new tuple. This simplifies sorting logic for total and top-n > the sorting logic Done Line 164: // The tuple descriptor for the sort output. It will contain the materialized tuples, > Not clear what "It" refers to in the second sentence, can you rephrase? Done Line 185: // ones that point to the slot refs the new, materialized input rows. > .. slot refs into the sort's output tuple. Done Line 188: // Update the tuple descriptor used to materialize the input tuple of the sort. > ... used to materialize the input of the sort. Done http://gerrit.cloudera.org:8080/#/c/4745/4/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: Line 496:* Insert a sort node into the plan, depending on the clustered/noclustered plan hint. > Insert a sort node on top of the plan, ... Done Line 505: List partitionExprs = Lists.newArrayList(insertStmt.getPartitionKeyExprs()); > orderingExprs Done -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Alex Behm has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 4: (8 comments) Nice! Current PS looks good to me. Do you intent to add Kudu support in a separate patch? http://gerrit.cloudera.org:8080/#/c/4745/4/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: Line 255: SlotRef origSlotRef = (SlotRef) smap.getLhs().get(i); Seems clearer to call these inputSlotRef and outputSlotRef http://gerrit.cloudera.org:8080/#/c/4745/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java File fe/src/main/java/org/apache/impala/analysis/SortInfo.java: Line 149:* output by the sort node. Done by materializing slot refs in the order-by and result ... and given result expressions... Line 151:* slot refs into the new tuple. This simplifies sorting logic for total and top-n the sorting logic Line 164: // The tuple descriptor for the sort output. It will contain the materialized tuples, Not clear what "It" refers to in the second sentence, can you rephrase? Line 185: // ones that point to the slot refs the new, materialized input rows. .. slot refs into the sort's output tuple. Need to be careful with the terms 'tuple' and 'row' because they have a very specific and different meanings in Impala. Line 188: // Update the tuple descriptor used to materialize the input tuple of the sort. ... used to materialize the input of the sort. http://gerrit.cloudera.org:8080/#/c/4745/4/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: Line 496:* Insert a sort node into the plan, depending on the clustered/noclustered plan hint. Insert a sort node on top of the plan, ... Line 505: List partitionExprs = Lists.newArrayList(insertStmt.getPartitionKeyExprs()); orderingExprs -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Lars Volker has uploaded a new patch set (#4). Change subject: IMPALA-2521: Add clustered hint to insert statements .. IMPALA-2521: Add clustered hint to insert statements This change introduces a clustered/noclustered hint for insert statements. Specifying this hint adds an additional sort node to the plan, just before the table sink. This has the effect that data will be clustered by its partition prior to writing partitions, which therefore can be written sequentially. Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/insert.test 6 files changed, 216 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/4745/4 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Lars Volker has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 4: (24 comments) Thanks for the comments, please see PS4. As pointed out in one of the replies I will look at Kudu support next. http://gerrit.cloudera.org:8080/#/c/4745/2//COMMIT_MSG Commit Message: Line 11: plan, just before the table sink. This has the effect that data will be > -distributed (also adds to the single-node plan) Done http://gerrit.cloudera.org:8080/#/c/4745/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 121: private boolean hasClusteredHint_ = false; > but how do you capture 'noclustered'? that's not the same as the absence of Right now the default is 'noclustered' and is not captured explicitly. Once the default changes to 'clustered' we can change the code to reflect that. http://gerrit.cloudera.org:8080/#/c/4745/2/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: Line 238:*/ > move this TODO into SortInfo.createSortTupleInfo() Done Line 253: Preconditions.checkState(smap.getLhs().get(i) instanceof SlotRef); > these few lines can also be moved into createSortTupleInfo() right? Excellent, I didn't see this. http://gerrit.cloudera.org:8080/#/c/4745/2/fe/src/main/java/org/apache/impala/analysis/SortInfo.java File fe/src/main/java/org/apache/impala/analysis/SortInfo.java: Line 125: > refer to the new tuple materialized by this sort instead of the original in Done Line 145: public SortInfo clone() { return new SortInfo(this); } > shorter: by the sort node Done Line 146: > comment about result expressions only makes sense if we pass in the result True, now it's aligned with the resultExprs actually being passed in. Line 147: /** > we also do this for top-n Done Line 152:* sorts. The substitution map is returned. > I'd suggest removing the 'stmt' parameter and instead returning the ExprEub Done Line 153:* TODO: We could do something more sophisticated than simply copying input slot refs - > Isn't this the tuple descriptor for the sort output? The secnd sentence is I think it is both, since the sort node operates only on these. However output seems to be more intuitive, so I changed it. Line 163: > builds up the tuple operated on and returned by sort node Done Line 171: // substOrderBy is the mapping from slot refs in the sort node's input to slot refs in > I think the else case may not be needed/possible, but the explanation is th Yes, thanks for the explanation. Line 173: // the tuple operated on and returned by the sort node. > Can we also move this to the caller? The returned smap can be used to creat Done Line 181: sortTupleExprs.add(origSlotRef); > let's let the caller make this decision without passing a flag Done http://gerrit.cloudera.org:8080/#/c/4745/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: Line 275: private PlanFragment createScanFragment(PlanNode node) { > never mind, the structure looks good as it is. Ok. http://gerrit.cloudera.org:8080/#/c/4745/2/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: Line 149: // Add optional sort node to the plan, based on clustered/noclustered plan hint. > Let's keep it concrete: we're adding a sort node Done Line 151: // set up table sink for root fragment > there's no ClusteringNode. something like 'clusterInsertInput'/ 'createClus Done Line 497:* This will sort the data produced by 'inputFragment' by the clustering columns, so > Mention the ordering of the sort Done Line 498:* that partitions can be written sequentially in the table sink. > Second sentence is kind of vague. I understand that sorting is just an inst Done Line 503: if (!insertStmt.hasClusteredHint()) return; > single line (and elsewhere) Done Line 508: > partitioning? I took that comment from DistributedPlanner::createInsertFragment() and I thought it made sense to remove all constants during the clustering. Changed the word. http://gerrit.cloudera.org:8080/#/c/4745/2/testdata/workloads/functional-planner/queries/PlannerTest/insert.test File testdata/workloads/functional-planner/queries/PlannerTest/insert.test: Line 573: # IMPALA-2521: clustered insert into partitioned table adds sort node. > Let's test these combinations: Done Line 575: select * from functional.alltypes > Let's also test Kudu tables because there we want to sort on the PKs. Let's I'll try to address this in a subsequent patchset, so we can make progress on the rest. Line 576: PLAN > also add PLAN section Done -- To view, visit http://gerrit.cloude
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Lars Volker has uploaded a new patch set (#3). Change subject: IMPALA-2521: Add clustered hint to insert statements .. IMPALA-2521: Add clustered hint to insert statements This change introduces a clustered/noclustered hint for insert statements. Specifying this hint adds an additional sort node to the plan, just before the table sink. This has the effect that data will be clustered by its partition prior to writing partitions, which therefore can be written sequentially. Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/insert.test 6 files changed, 217 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/4745/3 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/4745/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 121: private boolean hasClusteredHint_ = false; > We actually only need the ternary logic to detect conflicting hints, so I m but how do you capture 'noclustered'? that's not the same as the absence of 'clustered', because we'll want to use 'clustered' eventually by default. http://gerrit.cloudera.org:8080/#/c/4745/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: Line 275: private PlanFragment createScanFragment(PlanNode node) { > i don't think of the logic as orthogonal, because it modifies the insert pr never mind, the structure looks good as it is. http://gerrit.cloudera.org:8080/#/c/4745/2/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: Line 151: addClusteringNode(insertStmt, rootFragment, ctx_.getRootAnalyzer()); there's no ClusteringNode. something like 'clusterInsertInput'/ 'createClusteringSort'/... would be clearer. -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4745/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: Line 275: public PlanFragment addClusteringFragment(PlanFragment inputFragment, > Done i don't think of the logic as orthogonal, because it modifies the insert process. also, i don't see how this is relevant for single-node plans - if you're inserting a lot of data with num_nodes = 1, there's something wrong. you can restructure createInsertFragment to use to two helper functions, for distribution and clustering. -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Alex Behm has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 2: (21 comments) http://gerrit.cloudera.org:8080/#/c/4745/2//COMMIT_MSG Commit Message: Line 11: distributed plan, just before the table sink. This has the effect that -distributed (also adds to the single-node plan) http://gerrit.cloudera.org:8080/#/c/4745/2/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: Line 238:* TODO: We could do something more sophisticated than simply copying input move this TODO into SortInfo.createSortTupleInfo() Line 253: Set sourceSlots = Sets.newHashSet(); these few lines can also be moved into createSortTupleInfo() right? http://gerrit.cloudera.org:8080/#/c/4745/2/fe/src/main/java/org/apache/impala/analysis/SortInfo.java File fe/src/main/java/org/apache/impala/analysis/SortInfo.java: Line 125:* the ordering exprs refer to the materialized tuples instead of the original input. refer to the new tuple materialized by this sort instead of the original input. Line 145:* output by the exec node implementing the sort. Done by materializing slot refs in the shorter: by the sort node Line 146:* order-by and result expressions. Those slot refs in the ordering and result exprs are comment about result expressions only makes sense if we pass in the result expressions to this function, otherwise it's not really clear where those come from Line 147:* substituted with slot refs into the new tuple. This simplifies sorting logic for we also do this for top-n Line 152: public void createSortTupleInfo(Set sourceSlots, boolean materializeSlots, StatementBase stmt, Analyzer analyzer) { I'd suggest removing the 'stmt' parameter and instead returning the ExprEubstitutionMap. Then the stmt callers can apply that smap to whatever exprs they need to change. Line 153: // The tuple descriptor for the sort input. It will contain the materialized tuples, Isn't this the tuple descriptor for the sort output? The secnd sentence is a little imprecise because ("it contains") because a tuple descriptor does not contain materialized tuples. Line 163: // the internal rows of the sort node. builds up the tuple operated on and returned by sort node Line 171: // TODO: I copied this from QueryStmt.java and would like to add an explanatory I think the else case may not be needed/possible, but the explanation is that it's ok to move predicates outside of this query block, but it's not ok to move predicates in if there is a limit. Consider this query: select count(*) from t1 inner join (select id, name from t2 where id < 10 order by limit 10) v on (t1.id = v.id and t1.name = v.name) where t1.name = 'test' In the example above it's ok to generate 't1.id < 10' and put it into the scan of t1, but it's not ok to generate 't2.name = 'test'' and move it into the scan of t2. Hope this makes sense. Line 173: if (stmt.hasLimit()) { Can we also move this to the caller? The returned smap can be used to create this. The reason is that it doesn't really make sense to add these value transfers during plan generation (for the clustered hint use case). Line 181: if (materializeSlots) { let's let the caller make this decision without passing a flag http://gerrit.cloudera.org:8080/#/c/4745/2/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: Line 149: // Add optional clustering node to the plan, based on clustered/noclustered plan Let's keep it concrete: we're adding a sort node Line 497:* Insert a sort node into the plan, depending on the clustered/noclustered plan hint. Mention the ordering of the sort Line 498:* This will re-order the data produced by 'inputFragment' in a way that partitions can Second sentence is kind of vague. I understand that sorting is just an instance of a more general clustering concept, but let's stick to what we are actually doing here :) Line 503: if (!insertStmt.hasClusteredHint()) { single line (and elsewhere) Line 508: // Ignore constants for the sake of partitioning. partitioning? http://gerrit.cloudera.org:8080/#/c/4745/2/testdata/workloads/functional-planner/queries/PlannerTest/insert.test File testdata/workloads/functional-planner/queries/PlannerTest/insert.test: Line 573: # IMPALA-2521: Test to make sure that clustering adds a sort node into the plan. Let's test these combinations: 1. Partitioned tables 1.1. With clustered hint 1.2. With clustered and noshuffle hint 2. Unpartitioned tables 2.1. With clustered hint 2.2. With clustered and shuffle hint Line 575: select * from functional.alltypes Let's also test Kudu tables because there we want to sort on the PKs. Let's add those tests to a Kudu-specific .test file Line
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Lars Volker has posted comments on this change. Change subject: IMPALA-2521: Add clustered hint to insert statements .. Patch Set 1: (12 comments) Thanks for the reviews, please see PS2 http://gerrit.cloudera.org:8080/#/c/4745/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 121: private boolean hasClusteredHint_ = false; > might be better to use ternary logic here with a single Boolean (which can We actually only need the ternary logic to detect conflicting hints, so I made it a boolean and used another local variable in analyzePlanHints() to detect that error. Line 631: if (hasNoShuffleHint_) { > did you mean noclustered? Yes, done. Line 632: throw new AnalysisException("Conflicting INSERT hint: " + hint); > list both conflicting hints (otherwise it's hard to tell how to fix the pro Done http://gerrit.cloudera.org:8080/#/c/4745/1/fe/src/main/java/org/apache/impala/analysis/SortInfo.java File fe/src/main/java/org/apache/impala/analysis/SortInfo.java: Line 68:* sorted. Its source exprs are update to those in tupleSlotExprs. > second sentence incomprehensible. Done Line 123:* the exprs used by the SortNode refer to the materialized tuples instead of the > don't refer to the sort node here, SortInfo shouldn't have to know about th Done http://gerrit.cloudera.org:8080/#/c/4745/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: Line 269:* clustered/noclustered plan hint. The clustering re-order the data in 'inputFragment' > Better to be specific: Say that it adds a sort node that orders on the part Done Line 272: // TODO: this function shares some code with QueryStmt::createSortTupleInfo(). However > You could move most of the code into SortInfo.createSortTupleInfo(List you should move the logic (minus the createSortTupleInfo part that alex bro createInsertFragment has several early exits (return inputFragment) and it isn't that clear to me, how adding this orthogonal functionality would fit in nicely. We need to support both cases, adding shuffling and/or clustering independently. Should I replace them with nested if-else-clauses? For now I moved it to the Planner as suggested by Alex and enabled it for single-node-plans, too. Line 275: public PlanFragment addClusteringFragment(PlanFragment inputFragment, > createInsertFragment() is only called for distributed plans. Should cluster Done Line 309: sortTupleDesc.setIsMaterialized(true); > To indicate that this tuple is a physical tuple needed during runtime execu Done Line 324: // TODO Why is this needed here but not in QueryStmt.java? If it is omitted, the query > Before plan generation we call QueryStmt.materializeRequiredSlots() to make Done http://gerrit.cloudera.org:8080/#/c/4745/1/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: Line 144: rootFragment = distributedPlanner.addClusteringFragment( > 1. we should also do this for single-node plans, so adding the clustering d Done -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2521: Add clustered hint to insert statements
Lars Volker has uploaded a new patch set (#2). Change subject: IMPALA-2521: Add clustered hint to insert statements .. IMPALA-2521: Add clustered hint to insert statements This change introduces a clustered/noclustered hint for insert statements. Specifying this hint adds an additional sort node to the distributed plan, just before the table sink. This has the effect that data will be clustered by its partition prior to writing partitions, which therefore can be written sequentially. Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/insert.test 7 files changed, 154 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/4745/2 -- To view, visit http://gerrit.cloudera.org:8080/4745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I412153bd8435d792bd61dea268d7a3b884048f14 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker