Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause ......................................................................
Patch Set 7: (20 comments) Thank you for your review! Please see PS7. http://gerrit.cloudera.org:8080/#/c/6495/5/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: Line 383: // any such columns of the source table. If unspecified, the destination table will > Otherwise? Will the destination table inherit the sort by columns of the so Yes, it'll inherit them. I expanded the comment. We have a test for this in QueryTest/create-table-like-table.test. http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java: PS5, Line 64: super.analyze(analyzer); > Comment not applicable. Remove? Done http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS5, Line 127: hasNoClusteredHint_ > Similar to the comment on hasShuffleHint, explain the allowed values for ha Done PS5, Line 763: > "if any" Done http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: PS5, Line 77: return Lists.newArrayList(); > This is an immutable list whereas the next function returns a mutable one. Done PS5, Line 83: : */ > remove Done PS5, Line 91: sortByC > nit: rename to sortByColName? since you also have tableCols it may be good Done Line 92: // Make sure it's not a primary key column or partition column. > nit: remove empty line Done PS5, Line 102: : for (int j = 0; j < tableCols.size(); ++j) { > Not sure I follow this part based on the code below. Done, updated the comment. Line 120: return colIdxs; > nit: remove empty line Done http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: PS5, Line 455: the caller must make sure that the value matches any columns he/she adds to the : * table. > Can't we throw an exception instead? We could throw one if the number is wrong, but eventually the caller will have to provide k columns and then set this to some value 0 <= n < k. Should we check that? The comment was meant to indicate that the caller will be on themselves to use this. http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS5, Line 546: orderingExprs.addAll(insertStmt.getPartitionKeyExprs() > I assume this works for HBase tables... Insert hints are not allowed for HBase tables and the analysis will fail. From looking at InsertStmt I think that this would return an empty list of exprs. However I think it could be more explicit. Should I add an "instanceof" check for HdfsTable here? http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: PS5, Line 521: 'sort.by.columns'='id > How about 'ID' (check for case)? Indeed it did not work, I fixed it. PS5, Line 1642: "CREATE TABLE LIKE is not supported for Kudu tables"); > Also, add the negative case where sort by cols don't exist in the source ta Done PS5, Line 1913: // Partitioned HDFS table : AnalyzesOk("create table functional.new_table (i int) PARTITIONED BY (d decimal)" + : "SORT BY (i)"); : // Column in sortby hint must not be a Hdfs partition column. : AnalysisError("create table functional.new_table (i int) PARTITIONED BY (d decimal)" + : "SORT BY (d)", "SORT BY column list must not contain partition column: 'd'"); : : // Kudu table tests are in TestCreateManagedKuduTable(). : } : : @Test : public void TestAlterKuduTable() { : TestUtils.assumeKuduIsSupported(); : > For Kudu specific tests they need to be in a function that first calls Test I move the tests to TestCreateManagedKuduTable(). We don't support setting sort by columns for HBase tables (I added code to AlterTableSortByStmt to enforce this). During analysis, we don't parse the table property, either. http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: PS5, Line 162: create table t sort by > Add one with a partitioned table? Also, a CTAS with a more complex select ( Done http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test: PS5, Line 10: ASC NULLS LAST > Out of curiosity? Can we change the default behavior? If no, are we ok with No, we cannot change it. I thought about this, but couldn't see why we'd want to have the nulls at the other end of the sorted data. Can you think of a case where this would be beneficial? PS5, Line 39: ---- DISTRIBUTEDPLAN : WRITE TO HDFS [test_sort_by.alltypes, OVERWRITE=false, PARTITION-KEYS=(year,month)] : | partitions=24 : | : 01:SORT : | order by: year ASC NULLS LAST, month ASC NULLS LAST, int_col ASC NULLS LAST, bool_col ASC NULLS LAST : | : 00:SCAN HDFS [functional.alltypes] : partitions=24/24 files=24 size=478.45KB > Do we need all these DISTRIBUTED_PLANs? They make sure that the shuffle/noshuffle hint has been observed, indicated be the presence/absence of an exchange node in the distributed plan. Should we remove them? PS5, Line 193: ==== > Can you add one or two more complex select statements feeding the insert? i Done http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-query/queries/QueryTest/alter-table.test File testdata/workloads/functional-query/queries/QueryTest/alter-table.test: PS5, Line 1066: > where are these empty spaces coming from? They are part of the output of "describe formatted". Without them the comparison fails. I could imagine Hive has them and we add them to be compatible, but I'm not sure. Should we investigate and/or open a JIRA for them? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-HasComments: Yes
