[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

Lars Volker (Code Review) Tue, 04 Apr 2017 13:23:04 -0700

Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
......................................................................



Patch Set 7:

(20 comments)

Thank you for your review! Please see PS7.

http://gerrit.cloudera.org:8080/#/c/6495/5/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

Line 383:   // any such columns of the source table. If unspecified, the 
destination table will
> Otherwise? Will the destination table inherit the sort by columns of the so
Yes, it'll inherit them. I expanded the comment. We have a test for this in 
QueryTest/create-table-like-table.test.


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java:

PS5, Line 64: super.analyze(analyzer);
> Comment not applicable. Remove?
Done


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS5, Line 127: hasNoClusteredHint_
> Similar to the comment on hasShuffleHint, explain the allowed values for ha
Done


PS5, Line 763:  
> "if any"
Done


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS5, Line 77: return Lists.newArrayList();
> This is an immutable list whereas the next function returns a mutable one. 
Done


PS5, Line 83: 
            :    */
> remove
Done


PS5, Line 91: sortByC
> nit: rename to sortByColName? since you also have tableCols it may be good 
Done


Line 92:       // Make sure it's not a primary key column or partition column.
> nit: remove empty line
Done


PS5, Line 102: 
             :       for (int j = 0; j < tableCols.size(); ++j) {
> Not sure I follow this part based on the code below.
Done, updated the comment.


Line 120:     return colIdxs;
> nit: remove empty line
Done


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS5, Line 455: the caller must make sure that the value matches any columns 
he/she adds to the
             :    * table.
> Can't we throw an exception instead?
We could throw one if the number is wrong, but eventually the caller will have 
to provide k columns and then set this to some value 0 <= n < k. Should we 
check that? The comment was meant to indicate that the caller will be on 
themselves to use this.


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS5, Line 546: orderingExprs.addAll(insertStmt.getPartitionKeyExprs()
> I assume this works for HBase tables...
Insert hints are not allowed for HBase tables and the analysis will fail. From 
looking at InsertStmt I think that this would return an empty list of exprs. 
However I think it could be more explicit. Should I add an "instanceof" check 
for HdfsTable here?


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

PS5, Line 521: 'sort.by.columns'='id
> How about 'ID' (check for case)?
Indeed it did not work, I fixed it.


PS5, Line 1642:     "CREATE TABLE LIKE is not supported for Kudu tables");
> Also, add the negative case where sort by cols don't exist in the source ta
Done


PS5, Line 1913: // Partitioned HDFS table
              :     AnalyzesOk("create table functional.new_table (i int) 
PARTITIONED BY (d decimal)" +
              :         "SORT BY (i)");
              :     // Column in sortby hint must not be a Hdfs partition 
column.
              :     AnalysisError("create table functional.new_table (i int) 
PARTITIONED BY (d decimal)" +
              :         "SORT BY (d)", "SORT BY column list must not contain 
partition column: 'd'");
              : 
              :     // Kudu table tests are in TestCreateManagedKuduTable().
              :   }
              : 
              :   @Test
              :   public void TestAlterKuduTable() {
              :     TestUtils.assumeKuduIsSupported();
              :    
> For Kudu specific tests they need to be in a function that first calls Test
I move the tests to TestCreateManagedKuduTable().

We don't support setting sort by columns for HBase tables (I added code to 
AlterTableSortByStmt to enforce this). During analysis, we don't parse the 
table property, either.


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

PS5, Line 162: create table t sort by
> Add one with a partitioned table? Also, a CTAS with a more complex select (
Done


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test:

PS5, Line 10: ASC NULLS LAST
> Out of curiosity? Can we change the default behavior? If no, are we ok with
No, we cannot change it. I thought about this, but couldn't see why we'd want 
to have the nulls at the other end of the sorted data. Can you think of a case 
where this would be beneficial?


PS5, Line 39: ---- DISTRIBUTEDPLAN
            : WRITE TO HDFS [test_sort_by.alltypes, OVERWRITE=false, 
PARTITION-KEYS=(year,month)]
            : |  partitions=24
            : |
            : 01:SORT
            : |  order by: year ASC NULLS LAST, month ASC NULLS LAST, int_col 
ASC NULLS LAST, bool_col ASC NULLS LAST
            : |
            : 00:SCAN HDFS [functional.alltypes]
            :    partitions=24/24 files=24 size=478.45KB
> Do we need all these DISTRIBUTED_PLANs?
They make sure that the shuffle/noshuffle hint has been observed, indicated be 
the presence/absence of an exchange node in the distributed plan. Should we 
remove them?


PS5, Line 193: ====
> Can you add one or two more complex select statements feeding the insert? i
Done


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-query/queries/QueryTest/alter-table.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

PS5, Line 1066:                  
> where are these empty spaces coming from?
They are part of the output of "describe formatted". Without them the 
comparison fails. I could imagine Hive has them and we add them to be 
compatible, but I'm not sure. Should we investigate and/or open a JIRA for them?


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

Reply via email to