[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce the number of files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. If only one partition is written (because all partitioning columns are constant or the target table is not partitioned), then the shuffle hint leads to a plan where all rows are merged at the coordinator where the table sink is executed. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. Testing: The parser tests mirror the tests of INSERT, while analysis and planner tests are minimal, as the underlying logic is the same as for INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Reviewed-on: http://gerrit.cloudera.org:8080/8400 Reviewed-by: Alex BehmTested-by: Impala Public Jenkins --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 9 files changed, 211 insertions(+), 65 deletions(-) Approvals: Alex Behm: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 18 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 17: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 17 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Thu, 22 Feb 2018 20:43:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 17: Code-Review+2 Thanks for pointing out the conflicts. -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 17 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Thu, 22 Feb 2018 17:04:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 17: (3 comments) I have marked the lines where the rebase (patch set 16) led to non-trivial conflicts/broken tests. Patch set 17 contains some typo fixes + changes in kudu.test, which I forgot to run after rebase. http://gerrit.cloudera.org:8080/#/c/8400/17/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: http://gerrit.cloudera.org:8080/#/c/8400/17/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@303 PS17, Line 303: /** :* Creates an insert into, an insert overwrite, and a CTAS statement with :* the given hints and checks that the parsed hints are the same as the expected hints. :*/ : private void TestInsertAndCtasHints(String insertPart, String ctasPart, : String[] hintStyle, String hints, String... expectedHints) { : String hintsPart = hintStyle[0] + hints + hintStyle[1]; : TestInsertStmtHints(String.format("insert %%s into %s %%s select * from t", : insertPart), hintsPart, expectedHints); : TestInsertStmtHints(String.format("insert %%s overwrite %s %%s select * from t", : insertPart), hintsPart, expectedHints); : TestCtasHints(String.format("create %s table %s as select * from t", : hintsPart, ctasPart), expectedHints); : } : : /** :* Injects hints into pattern and checks that the injected hints match the expected :* hints. This function covers both insert and upsert statements. :*/ : private void TestInsertStmtHints(String pattern, String hint, String... expectedHints) { : for (InsertStmt.HintLocation loc: InsertStmt.HintLocation.values()) { : InsertStmt insertStmt = (InsertStmt) ParsesOk(InjectInsertHint(pattern, hint, loc)); : assertEquals(expectedHints, HintsToStrings(insertStmt.getPlanHints())); : } : } : : /** :* Injects hints into pattern and expect parser error on the injected hints. :* It covers both insert and upsert statements. :*/ : private void ParserErrorOnInsertStmtHints(String pattern, String hint) { : for (InsertStmt.HintLocation loc: InsertStmt.HintLocation.values()) { : ParserError(InjectInsertHint(pattern, hint, loc)); : } : } IMPALA-4168: "Adds Oracle-style hint placement for INSERT/UPSERT" also contains similar changes to avoid test duplication ( https://gerrit.cloudera.org/#/c/8676/ ). http://gerrit.cloudera.org:8080/#/c/8400/17/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: http://gerrit.cloudera.org:8080/#/c/8400/17/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@210 PS17, Line 210: # noshuffle hint would lead to a different plan. : # Note that noclustered hint is added to ensure consistent plans on Impala 2.x and 3.x, : # because IMPALA-5293 changed clustered to be the default on 3.x. : create /*+ noclustered */table t partitioned by (year, month) as These tests were broken by IMPALA-5293: "Turn insert clustering on by default". IMPALA-5293 is not merged to Impala 2.x, while this change can be merged in my opinion. To avoid extra work, I have changed the tests to statements that should lead to the same plan on both branches. http://gerrit.cloudera.org:8080/#/c/8400/17/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test: http://gerrit.cloudera.org:8080/#/c/8400/17/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test@431 PS17, Line 431: primary key(id) partition by hash(id) partitions 3 stored as kudu as : select * from functional.alltypes; : DISTRIBUTEDPLAN : INSERT INTO KUDU [default.t] : | : 01:EXCHANGE [KUDU(KuduPartition(functional.alltypes.id))] : | : 00:SCAN HDFS [functional.alltypes] :partitions=24/24 files=24 size=478.45KB : : create /* +noclustered,noshuffle */ table t : primary key(id) partition by hash(id) partitions 3 stored as kudu as These tests broke because of
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#17). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce the number of files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. If only one partition is written (because all partitioning columns are constant or the target table is not partitioned), then the shuffle hint leads to a plan where all rows are merged at the coordinator where the table sink is executed. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. Testing: The parser tests mirror the tests of INSERT, while analysis and planner tests are minimal, as the underlying logic is the same as for INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 9 files changed, 211 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/17 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 17 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#16). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce the number of files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. If only one partition is written (because all partitioning columns are constant or the target table is not partitioned), then the shuffle hint leads to a plan where all rows are merged at the coordinator where the table sink is executed. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. Testing: The parser tests mirror the tests of INSERT, while analysis and planner tests are minimal, as the underlying logic is the same as for INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 9 files changed, 210 insertions(+), 64 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/16 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 16 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 15: Code-Review+2 If there are non-trivial conflicts in the rebase, please let me know and I'll take another look. -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 15 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 21 Feb 2018 06:18:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 14: (6 comments) http://gerrit.cloudera.org:8080/#/c/8400/14/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/14/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@82 PS14, Line 82: Preconditions.checkNotNull(queryStmt); > combine with member assignment: Done http://gerrit.cloudera.org:8080/#/c/8400/14/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/8400/14/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1640 PS14, Line 1640: public void TestCreateTableAsSelectWithHints() throws AnalysisException { > To avoid duplication with AnalyzeStmts#TestInsertHints let's do the followi I kept 3 tests to cover ok/warning/error case. http://gerrit.cloudera.org:8080/#/c/8400/14/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: http://gerrit.cloudera.org:8080/#/c/8400/14/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@304 PS14, Line 304:* Creates an insert, an upsert, and a CTAS statement with the given hints > Upsert isn't tested. Code or comment wrong? Wrong comment, thanks for spotting it. This remained from an attempt to merge UPSERT tests to this function, which turned out to be non practical, because UPSERT leads to parse failure for partition clauses. http://gerrit.cloudera.org:8080/#/c/8400/14/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@336 PS14, Line 336: static private List ConvertHintsToStringList(List hints) { > shorter: hintsToStrings() Done http://gerrit.cloudera.org:8080/#/c/8400/14/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java: http://gerrit.cloudera.org:8080/#/c/8400/14/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java@83 PS14, Line 83: new String[] { "/* +", "*/" }, // traditional commented hint > tabs Done http://gerrit.cloudera.org:8080/#/c/8400/14/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: http://gerrit.cloudera.org:8080/#/c/8400/14/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@207 PS14, Line 207: # IMPALA-4167: noclustered hint is ignored if the CTAS has partitions and sort columns. > Let's follow a procedure to trim down these tests similar to what we are do Done -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 14 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Tue, 20 Feb 2018 16:44:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#15). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce the number of files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. If only one partition is written (because all partitioning columns are constant or the target table is not partitioned), then the shuffle hint leads to a plan where all rows are merged at the coordinator where the table sink is executed. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. Testing: The parser tests mirror the tests of INSERT, while analysis and planner tests are minimal, as the underlying logic is the same as for INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 9 files changed, 205 insertions(+), 79 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/15 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 15 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 14: (6 comments) http://gerrit.cloudera.org:8080/#/c/8400/14/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/14/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@82 PS14, Line 82: Preconditions.checkNotNull(queryStmt); combine with member assignment: this.createStmt = Preconditions.checkNotNull(queryStmt); http://gerrit.cloudera.org:8080/#/c/8400/14/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/8400/14/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1640 PS14, Line 1640: public void TestCreateTableAsSelectWithHints() throws AnalysisException { To avoid duplication with AnalyzeStmts#TestInsertHints let's do the following: * Keep a single basic test, but remove all other tests that are basically the same as in AnalyzeStmts#TestInsertHints * Keep those tests that are different from the expected behavior in AnalyzeStmts#TestInsertHints * Add a comment that ParserTest ensures hints are properly set in the InsertStmt of a Ctas, and that AnalyzeStmts#TestInsertHints covers the analysis of those hints. Therefore, the tests here are minimal. http://gerrit.cloudera.org:8080/#/c/8400/14/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: http://gerrit.cloudera.org:8080/#/c/8400/14/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@304 PS14, Line 304:* Creates an insert, an upsert, and a CTAS statement with the given hints Upsert isn't tested. Code or comment wrong? http://gerrit.cloudera.org:8080/#/c/8400/14/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@336 PS14, Line 336: static private List ConvertHintsToStringList(List hints) { shorter: hintsToStrings() Easier to return them as a String[], then you don't need to convert expectedHints to an List in L323 http://gerrit.cloudera.org:8080/#/c/8400/14/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java: http://gerrit.cloudera.org:8080/#/c/8400/14/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java@83 PS14, Line 83: new String[] { "/* +", "*/" }, // traditional commented hint tabs http://gerrit.cloudera.org:8080/#/c/8400/14/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: http://gerrit.cloudera.org:8080/#/c/8400/14/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@207 PS14, Line 207: # IMPALA-4167: noclustered hint is ignored if the CTAS has partitions and sort columns. Let's follow a procedure to trim down these tests similar to what we are doing for AnalyzeDDLTest, i.e.: * Keep 1-2 basic tests. * Comment the hints only affect the insert portion which is already tested more thoroughly elsewhere (add missing overage to the insert tests as necessary if you think coverage is missing) * Keep tests that are truly specific to Ctas (though I think there are none) -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 14 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Sat, 17 Feb 2018 17:58:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#14). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce the number of files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. If only one partition is written (because all partitioning columns are constant or the target table is not partitioned), then the shuffle hint leads to a plan where all rows are merged at the coordinator where the table sink is executed. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. The tests mainly mirror the analysis / planner / parser tests of INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 9 files changed, 333 insertions(+), 78 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/14 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 14 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 11: (6 comments) http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/main/cup/sql-parser.cup@1065 PS11, Line 1065: create_tbl_as_select_stmt_inner ::= > let's call this create_tbl_as_select_params to match what is produced Done http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@69 PS6, Line 69:* Helper class for parsing. > I looked at this closely and both approaches seem fine to me. Here are my t Thanks! http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@76 PS11, Line 76: public CreateTableStmt createStmt_; > We don't use the "_" for public members. Done http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1642 PS11, Line 1642: private String[][] getHintStyles() { > Can you move this to FrontendTestBase, make it protected, and use it in the I have moved the member variable version found in other files instead of the function - I prefer the member, because no new arrays have to be created on every call. http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1651 PS11, Line 1651: public void TestCreateTableAsSelectWithHints() throws AnalysisException { > * I think the more important testing is in ParserTest#TestPlanHints similar I have added the CTAS version of insert tests to ParserTest#TestPlanHints. I think that some of these tests should move there, like the ones about the case insensitivity of hints, while the ones with errors or warnings should remain here. I am unsure about the rest, maybe they can be deleted, as planner tests already check most of these cases. About combining with TestInsertHints(): I have created combined insert/upsert/ctas tests in ParserTest#TestPlanHints, something similar could be done here. What makes this more tricky is that the actual tables are checked, so the source table must exist, and the target table in insert or the partitioning parameters in CTAS must match with the result of the select + there are some cases where insert is possible, but CTAS is not (HBase tables). http://gerrit.cloudera.org:8080/#/c/8400/11/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: http://gerrit.cloudera.org:8080/#/c/8400/11/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@207 PS11, Line 207: # IMPALA-4167: noclustered hint is ignored if the CTAS has partitions and sort columns. > * Any ideas to trim down the tests? Seems like a lot of new tests and we sh Maybe create /*+ clustered,noshuffle */ + the default case without hints is enough to show that both type of hints can take effect. If the other tests are deleted, then some of them should be moved to the insert tests, because they cover some scenarios that are not tested there. What do you think about this solution? I agree that both plans are unnecessary - I kept the DISTRIBUTEDPLAN, because the effect of (no)shuffle can be only checked there. -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 07 Feb 2018 17:50:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#13). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce the number of files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. If only one partition is written (because all partitioning columns are constant or the target table is not partitioned), then the shuffle hint leads to a plan where all rows are merged at the coordinator where the table sink is executed. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. The tests mainly mirror the analysis / planner / parser tests of INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 9 files changed, 333 insertions(+), 78 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/13 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 13 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#12). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce the number of files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. If only one partition is written (because all partitioning columns are constant or the target table is not partitioned), then the shuffle hint leads to a plan where all rows are merged at the coordinator where the table sink is executed. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. The tests mainly mirror the analysis / planner / parser tests of INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 9 files changed, 330 insertions(+), 75 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/12 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 12 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 11: (6 comments) http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/main/cup/sql-parser.cup@1065 PS11, Line 1065: create_tbl_as_select_stmt_inner ::= let's call this create_tbl_as_select_params to match what is produced http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@69 PS6, Line 69:* Helper class for parsing. > I don't feel strongly either, lets see what others think. I looked at this closely and both approaches seem fine to me. Here are my thoughts: * We follow the set() approach in a few places because it's more practical than the alternative * Relying on a post-construction set() can be harder to reason about, so I generally prefer the current approach where all info is available at construction time I suggest we proceed with the current version. Nice work on navigating the parser changes, Csaba! http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@76 PS11, Line 76: public CreateTableStmt createStmt_; We don't use the "_" for public members. http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1642 PS11, Line 1642: private String[][] getHintStyles() { Can you move this to FrontendTestBase, make it protected, and use it in the other appropriate places. I found this thing in: AnalyzeDDLTest.java AnalyzeStmtsTest.java ParserTest.java ToSqlTest.java http://gerrit.cloudera.org:8080/#/c/8400/11/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1651 PS11, Line 1651: public void TestCreateTableAsSelectWithHints() throws AnalysisException { * I think the more important testing is in ParserTest#TestPlanHints similar to what we do for INSERT. We should check that the generated InsertStmt has the hints properly set. * These tests are almost identical to AnalyzeStmts#TestInsertHints. The copy+paste worries me. * Combining this test with TestInsertHints() seems tricky, but might be doable. * Alternatively, we could shrink this test to the bare minimum and comment that coverage is provided by ParserTest#TestPlanHints (tests that hints are set in the CTAS' InsertStmt) and AnalyzeStmts#TestInsertHints() (tests analyzing an InsertStmt with hints) What do you think? http://gerrit.cloudera.org:8080/#/c/8400/11/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: http://gerrit.cloudera.org:8080/#/c/8400/11/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@207 PS11, Line 207: # IMPALA-4167: noclustered hint is ignored if the CTAS has partitions and sort columns. * Any ideas to trim down the tests? Seems like a lot of new tests and we should have existing tests for INSERT that cover most parts. * Do we really need both PLAN and DISTRIBUTEDPLAN for these tests? -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 31 Jan 2018 01:38:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#11). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce the number of files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. If only one partition is written (because all partitioning columns are constant or the target table is not partitioned), then the shuffle hint leads to a plan where all rows are merged at the coordinator where the table sink is executed. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. The tests mainly mirror the analysis / planner tests of INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 5 files changed, 330 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/11 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/8400/10/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/8400/10/fe/src/main/cup/sql-parser.cup@1185 PS10, Line 1185: e > nit: indent 2 spaces Done -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Thu, 18 Jan 2018 20:22:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 10: Please see my previous comment and the comment here: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@69 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Thu, 18 Jan 2018 20:01:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 6: (6 comments) http://gerrit.cloudera.org:8080/#/c/8400/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8400/8//COMMIT_MSG@20 PS8, Line 20: > nit: wrong word? Done http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/cup/sql-parser.cup@1055 PS6, Line 1055: plan_hints:hints > nit: please check that all review comments are marked as "Done" when sendin Done http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@69 PS6, Line 69:* Helper class for parsing. > Thanks for the explanation, I think I understand it now. Could we just have I have no problems with going that way, but I am not sure that it would make the change smaller: InsertStmt would also need a Set/AddPlanHints function, and it would be probably better to rewrite InsertStmt to always use the setter instead of constructor arguments. http://gerrit.cloudera.org:8080/#/c/8400/8/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/8/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@71 PS8, Line 71: > nit: "rule that checks" or "rules that check" Done http://gerrit.cloudera.org:8080/#/c/8400/8/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@75 PS8, Line 75: > partitio > nit: we generally seem to use camel case for acronyms, e.g. see createCtasT Done http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@344 PS6, Line 344: 01:EXCHANGE [UNPARTITIONED] > Yes, I think it's intentional. Maybe the docs need to be more clear. Can yo I have created IMPALA-6408 to track the doc issue. -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 17 Jan 2018 16:04:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#10). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce the number of files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. If only one partition is written (because all partitioning columns are constant or the target table is not partitioned), then the shuffle hint leads to a plan where all rows are merged at the coordinator where the table sink is executed. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. The tests mainly mirror the analysis / planner tests of INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 5 files changed, 332 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/10 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#9). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce the number of files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. The tests mainly mirror the analysis / planner tests of INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 5 files changed, 331 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/9 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 9 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 8: (6 comments) http://gerrit.cloudera.org:8080/#/c/8400/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8400/8//COMMIT_MSG@20 PS8, Line 20: to nit: wrong word? http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/cup/sql-parser.cup@1055 PS6, Line 1055: eate_tbl_as_sele > Does opt_plan_hints not work here? nit: please check that all review comments are marked as "Done" when sending out replies. http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@69 PS6, Line 69:* Helper class for parsing. > It is still needed, as it is the result value of non-terminal create_tbl_as Thanks for the explanation, I think I understand it now. Could we just have a setter for the plan hints in CreateTableAsSelectStmt and then call stmt.AddHints(hints) inside the "create_tbl_as_select_stmt" rule? That way we don't have to add another class, the change would be smaller, and the parsing code easier to follow. http://gerrit.cloudera.org:8080/#/c/8400/8/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/8/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@71 PS8, Line 71: rules that checks nit: "rule that checks" or "rules that check" http://gerrit.cloudera.org:8080/#/c/8400/8/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@75 PS8, Line 75: CTASParams nit: we generally seem to use camel case for acronyms, e.g. see createCtasTarget below. http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@344 PS6, Line 344: 01:SORT > I should add a comment about this, but I am unsure whether it should be a s Yes, I think it's intentional. Maybe the docs need to be more clear. Can you open a docs JIRA? -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Tue, 16 Jan 2018 20:35:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 7: (7 comments) I have overwritten the wrong test in patch set 7, please ignore it and compare patch 6 to 8. http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG@19 PS6, Line 19: nly > add "more memory efficient"? That makes it more obvious why it is more effi I hope it is correct and understandable enough this way. http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG@27 PS6, Line 27: shuffle: > I don't think this is true, is it? For unpartitioned tables, what would the Done http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG@35 PS6, Line 35: > Mention how you tested it in the commit message. I wrote about the current state of tests, but I became unsure during the process - should I add tests that actually execute the CTASs with hints? The reason why not to add them is that the non-DDL part of query execution is really the same as in INSERT tests and can be quite slow. http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@69 PS6, Line 69:* Helper class for parsing. > Is this class still needed or a left-over from earlier patch-sets? Can you It is still needed, as it is the result value of non-terminal create_tbl_as_select_stmt_inner. I am not sure that this is the correct place for this explanation, maybe it should be moved to the .cup. http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1701 PS6, Line 1701: Multiple non-conflicting > Can you add one that has multiple distinct hints, like shuffle, clustered? Done http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@233 PS6, Line 233: shuffle > Is that true? Without specifying a hint there should be cases where the cos Thanks for spotting this, I have relied too much on experimentation instead of checking the code... I have split this test into two to test both outcomes. http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@344 PS6, Line 344: | order by: year ASC NULLS > At first I didn't understand what this one does (I saw that insert.test:574 I should add a comment about this, but I am unsure whether it should be a simple comment, or a todo about a strange behavior. This looks intentional (see https://gerrit.cloudera.org/#/c/4162/ ), but it is not mentioned explicitly in the documentation. It is true that this reduces the number of writes overall, but as a side effect, it causes the whole table to be written to one node. Writing the whole table to the coordinator may be useful in some cases (like a temp table?), but I think that shuffle has a different intention. I would ask Greg / Alex about their opinion. -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 15 Jan 2018 18:06:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#8). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce to files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. The tests mainly mirror the analysis / planner tests of INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 5 files changed, 331 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/8 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#7). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition expression before insert to ensure that only one partition is written at a time. The goal is to reduce to files kept open / buffers kept in memory simultaneously. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavior. shuffle: Forces the planner to add an exchange node that repartitions by the partition expression of the output table. This means that a partition will be written only by a single node, which minimizes the global number of simultaneous writes. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modifications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. The tests mainly mirror the analysis / planner tests of INSERT. Query tests are not created, as the hints have no effect on the DDL part of CTAS, and the actual query ran is the same as in the insert case. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 5 files changed, 330 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/7 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 6: (8 comments) Apologies for the delayed review. I had a few more questions. Let's discuss the parser changes in person if you feel I misunderstood the approach. http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG@19 PS6, Line 19: more add "more memory efficient"? That makes it more obvious why it is more efficient to do additional work (sorting). http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG@27 PS6, Line 27: Add exchange node before insert even in case of unpartitioned tables. I don't think this is true, is it? For unpartitioned tables, what would the shuffle / exchange do? Instead "shuffle" forces an exchange, "noshuffle" prevents adding one, and leaving it out results in the planner making a cost-based decision (DistributedPlanner.java:218). http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG@35 PS6, Line 35: was moved from tbl_def_without_col_defs to statement rules. Mention how you tested it in the commit message. http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/cup/sql-parser.cup@1055 PS6, Line 1055: plan_hints:hints Does opt_plan_hints not work here? http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@69 PS6, Line 69:* Helper class for parsing. Is this class still needed or a left-over from earlier patch-sets? Can you expand the class comment to explain why we need it and what it does? http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1701 PS6, Line 1701: Multiple non-conflicting Can you add one that has multiple distinct hints, like shuffle, clustered? http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@233 PS6, Line 233: shuffle Is that true? Without specifying a hint there should be cases where the cost based heuristic decides against adding an exchange node. http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@344 PS6, Line 344: 01:EXCHANGE [UNPARTITIONED] At first I didn't understand what this one does (I saw that insert.test:574 has the same). On my machine this exchange collects all rows at the coordinator. Maybe you want to add that in a comment? -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 08 Jan 2018 14:04:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 4: (7 comments) http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/cup/sql-parser.cup@1167 PS3, Line 1167: tbl_def_without_col_defs ::= > It's important that it is not accepted: we should only accept supported opt Done in cup. I did not find another way (that does not add a lot of duplication) than moving KW_CREATE to high level rules. The problem was that the parser cannot differentiate between the CTAS and CREATE until it reaches the AS SELECT part, but it needs this information to decide whether to add an opt_plan_hints non-terminal or not if there is no hint after CREATE. http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1633 PS3, Line 1633: Only > Did you find a good way? If not, please clean up this code with its own int No, I did not find a really good way. The duplication could be merged, but the resulting code would be quite hard to understand, mainly because the format + partitions of a table are given in inserts, but have to specified in CTAS. This could be done by mapping table names (used in insert) to sql options (used in CTAS), and substituting these strings in the statements, but this would ruin the readability of the statements. http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1647 PS3, Line 1647: // HBase tables cannot be tested, as currently Impala cannot create HBase tables. > I see. I think it should go somewhere near the top. See my previous comment Done http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1653 PS3, Line 1653: > See above. Done http://gerrit.cloudera.org:8080/#/c/8400/4/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: http://gerrit.cloudera.org:8080/#/c/8400/4/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@207 PS4, Line 207: noclustered > nit: remove the "" or add them in the test below? Done http://gerrit.cloudera.org:8080/#/c/8400/4/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@233 PS4, Line 233: +noclustered,shuffle > nit: spaces Done http://gerrit.cloudera.org:8080/#/c/8400/4/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@251 PS4, Line 251: # IMPALA-4167: noclustered hint in CTAS into partitioned HDFS table has no effect as it > I think it's better to not mention the default behavior here. The important Done -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 22 Dec 2017 15:14:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Hello Lars Volker, Gabor Kaszab, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8400 to look at the new patch set (#6). Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT This change adds support for "clustered", "noclustered", "shuffle" and "noshuffle" hints in CTAS statement. Example: create /*+ clustered,noshuffle */ table t partitioned by (year, month) as select * from functional.alltypes The effect of these hints are the same as in insert statements: clustered: Sort locally by partition columns before insert to make the insert more efficient. noclustered: Do not sort by primary key before insert to Kudu table. No effect on HDFS tables currently, as this is their default behavour. shuffle: Add exchange node before insert even in case of unpartitioned tables. noshuffle: Do not add exchange node before insert to partitioned tables. The parser needed some modications to be able to differentiate between CREATE statements that allow hints (CTAS), and CREATE statements that do not (every other type of CREATE statements). As a result, KW_CREATE was moved from tbl_def_without_col_defs to statement rules. Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 5 files changed, 302 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8400/6 -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 3: (13 comments) http://gerrit.cloudera.org:8080/#/c/8400/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8400/3//COMMIT_MSG@9 PS3, Line 9: Adding support for "clustered", "noclustered", "shuffle" and "noshuffle" > How about "This change adds support..." Done http://gerrit.cloudera.org:8080/#/c/8400/3//COMMIT_MSG@12 PS3, Line 12: example: > Capitalize Done http://gerrit.cloudera.org:8080/#/c/8400/3//COMMIT_MSG@19 PS3, Line 19: Sort by partition columns before insert to make the insert more efficient. > Mention where exactly the sort happens (locally vs distributed). Done http://gerrit.cloudera.org:8080/#/c/8400/3//COMMIT_MSG@26 PS3, Line 26: exchenge > spelling Done http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/cup/sql-parser.cup@1054 PS3, Line 1054: RESULT = new CreateTableAsSelectStmt(new CreateTableStmt(tbl_def), > nit: Can you wrap the lines at 90 chars? Done http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/cup/sql-parser.cup@1167 PS3, Line 1167: tbl_def_without_col_defs ::= > Does that now also allow hints like "CREATE /*+ clustered */ TABLE FOO LIKE It is also accepted in that case, but has no effect - should I disallow it? I think that it would be the best to enable hints in several cases (even if no hint is accepted there currently), and check the hints in normal java code and warn if one is not valid for a given statement. I am not sure though. http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/cup/sql-parser.cup@1168 PS3, Line 1168: KW_CREATE external_val:external opt_plan_hints:hints KW_TABLE if_not_exists_val:if_not_exists > long line Done http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@155 PS3, Line 155: TableDef(TableName tableName, boolean isExternal, boolean ifNotExists, List hints) { > long line Done http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1633 PS3, Line 1633: Only > What does "only" mean here? This was copy pasted - I guess it means "not error, just warning". I did not want to deviate too much from the original (AnalyzeStmtsTest.TestInsertHints()), to make it easier to merge the two if I find a good way to do it. http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1647 PS3, Line 1647: // HBase tables cannot be tested, as currently Impala cannot create HBase tables. > It's weird that this explanation occurs in the middle of the function. Can This is also copy paste legacy - AnalyzeStmtsTest.TestInsertHints() had an HBase test here. http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1653 PS3, Line 1653: > Why is there a newline? Also copy paste legacy. http://gerrit.cloudera.org:8080/#/c/8400/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1677 PS3, Line 1677: "select * from functional.alltypes"); > Please also add tests that have additional hints in the select clause. Done http://gerrit.cloudera.org:8080/#/c/8400/3/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: http://gerrit.cloudera.org:8080/#/c/8400/3/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@207 PS3, Line 207: # IMPALA-4167: clustered CTAS into partitioned table adds sort node. > Can you add tests for noclustered and for sort by()? Done -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 17 Nov 2017 00:09:02 + Gerrit-HasComments: Yes