Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
......................................................................


Patch Set 2:

(20 comments)

Thank you for your Review. Please see my comments and PS3.

http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS2, Line 986: // SORT BY columns are stored in the 'sort.by.columns' table 
property.
             :     HashMap<String, String> properties = new HashMap<String, 
String>();
             :     String columnString = Joiner.on(",").join(col_names);
             :     
properties.put(TablePropertyAnalyzer.TBL_PROP_SORT_BY_COLUMNS, columnString);
             :     RESULT = new AlterTableSetTblProperties(table, null, 
TTablePropertyType.TBL_PROPERTY,
             :         properties);
             :   :}
> I don't see why you need to treat it as another SetTblProperties statement.
Done


PS2, Line 1143: opt_sort_by_cols:sort_by_cols
              :   KW_LIKE table_name:other_table
              :   opt_comment_val:comment
              :   file_format_create_table_val:file_format location_val:location
> nit: indentation
Done


PS2, Line 1182:   opt_sort_by_cols:sort_by_cols opt_comment_val:comment 
row_format_val:row_format serde_properties:serde_props
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

PS2, Line 162: /**
             :    * Analyzes the 'sort.by.columns' property to make sure the 
columns inside the property
             :    * occur in the target table and are neither partitioning nor 
primary key columns.
             :    */
             :   public void analyzeSortByColumns() throws AnalysisException {
             :     StringBuilder error = new StringBuilder();
             :     TablePropertyAnalyzer.analyzeSortByColumns(tblProperties_, 
getTargetTable(), error);
             :     if (error.length() > 0) throw new 
AnalysisException(error.toString());
             :   }
> Along the lines of my previous comment: a) I would move this completely out
Done


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

PS2, Line 275: Boolean
> boolean? why class?
Done


PS2, Line 2486: Preconditions.checkState(!globalState_.warningsRetrieved);
> This is kind of harsh :). Do we guarantee somewhere that no one will call a
We currently don't guarantee that, so we'd be adding that here. Should I remove 
it? Or add a comment to this method and getWarnings() to explain the behavior. 
I ran into this when trying to add warnings to the Analyzer after getWarnings() 
had been called and my intention was to prevent that error in the future. I 
don't feel strongly about keeping this.


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java:

PS2, Line 119: sortByColumns_.size() > 0
> !sortByColumns_.isEmpty()
Done


PS2, Line 179: StringBuilder error = new StringBuilder();
             :       TablePropertyAnalyzer.analyzeSortByColumns(sortByColumns_, 
srcTable, error);
             :       if (error.length() > 0) throw new 
AnalysisException(error.toString());
> Why don't you just throw the AnalysisException from the analyzeSortByColumn
Done. When I started with this I thought I may have to call the method from 
places after the analysis so throwing AnalysisExceptions wouldn't work. That 
turned out to be false, so I now throw in analyzeSortByColumns() directly.


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS2, Line 382:  sortByExprs_.size() > 0
> !sortByExprs_.isEmpty()
Done


PS2, Line 510: sortByExprs_.add(resultExprs_.get(colIdx));
> Does this work ok if you have column permutation?
I think it does. colIdx is the index in the target table, and resultExprs_[i] 
gets written into column[i]. Lines 703ff in prepareExpressions seem to 
guarantee this. I added tests to the PlannerTest/insert-sort-by.test file.


PS2, Line 768: StringBuilder error = new StringBuilder();
             :     sortByColumns_ = 
TablePropertyAnalyzer.analyzeSortByColumns(table_, error);
             :     if (error.length() > 0) throw new 
AnalysisException(error.toString());
> Same comment as before. Move the throw clause inside the analyzeSortByColum
Done


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS2, Line 33: /**
            :  * This class collects various methods to analyze table 
properties.
            :  */
            : public class TablePropertyAnalyzer {
> You're only analyzing the sort by columns, so why this generic class? Can't
I discussed this with Alex before starting to implement this change and the 
idea was to have a class that collects methods to analyze table properties 
(sort.by.columns, skip.header.line.count, parquet.mr.int96.write.zone). These 
are often called from various places throughout the code so we want to collect 
them in a single file.


PS2, Line 41: Analyzes the 'sort.by.columns' property of 'table'.
> What is the return value?
I am not really happy how the methods here are ordered and commented, but also 
couldn't think of a much better way. Assuming we keep the class, do you have a 
tip how to better order it? Should each method have the full comment of inputs, 
output, and what it does?


PS2, Line 86: List<String> partitionCols, List<String> primaryKeyCols
> You can simply pass one list which includes the list of column names to exc
Alex and I discussed passing an error message template and decided to go for 
the two-list approach. Looking at the code node I don't feel strongly about it. 
Can you decide with Alex what you prefer?


PS2, Line 90: for (int i = 0; i < sortByCols.size(); ++i) {
            :       String colName = sortByCols.get(i);
> for (String colName: sortByCols) ?
Done


PS2, Line 101: return ImmutableList.<Integer>of();
> If I understand correctly, the callers of this function don't do anything w
Done


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/catalog/Column.java
File fe/src/main/java/org/apache/impala/catalog/Column.java:

PS2, Line 134: <String>
> You don't need that.
Done


PS2, Line 133: public static List<String> toColumnNames(List<Column> columns) {
             :     List<String> colNames = Lists.<String>newArrayList();
             :     for (Column col: columns) { colNames.add(col.getName()); }
             :     return colNames;
             :   }
> Alternate implementation may use the Lists.transform function.
As discussed in person, the current implementation seems to be clearer and more 
concise.


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS2, Line 379: public List<String> getColumnNames() {
             :     return Column.toColumnNames(colsByPos_);
             :   }
> nit: single line?
Done


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS2, Line 539: size() > 0
> !isEmpty
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-HasComments: Yes

Reply via email to