Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause ......................................................................
Patch Set 2: (20 comments) Thank you for your Review. Please see my comments and PS3. http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS2, Line 986: // SORT BY columns are stored in the 'sort.by.columns' table property. : HashMap<String, String> properties = new HashMap<String, String>(); : String columnString = Joiner.on(",").join(col_names); : properties.put(TablePropertyAnalyzer.TBL_PROP_SORT_BY_COLUMNS, columnString); : RESULT = new AlterTableSetTblProperties(table, null, TTablePropertyType.TBL_PROPERTY, : properties); : :} > I don't see why you need to treat it as another SetTblProperties statement. Done PS2, Line 1143: opt_sort_by_cols:sort_by_cols : KW_LIKE table_name:other_table : opt_comment_val:comment : file_format_create_table_val:file_format location_val:location > nit: indentation Done PS2, Line 1182: opt_sort_by_cols:sort_by_cols opt_comment_val:comment row_format_val:row_format serde_properties:serde_props > nit: long line Done http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java: PS2, Line 162: /** : * Analyzes the 'sort.by.columns' property to make sure the columns inside the property : * occur in the target table and are neither partitioning nor primary key columns. : */ : public void analyzeSortByColumns() throws AnalysisException { : StringBuilder error = new StringBuilder(); : TablePropertyAnalyzer.analyzeSortByColumns(tblProperties_, getTargetTable(), error); : if (error.length() > 0) throw new AnalysisException(error.toString()); : } > Along the lines of my previous comment: a) I would move this completely out Done http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: PS2, Line 275: Boolean > boolean? why class? Done PS2, Line 2486: Preconditions.checkState(!globalState_.warningsRetrieved); > This is kind of harsh :). Do we guarantee somewhere that no one will call a We currently don't guarantee that, so we'd be adding that here. Should I remove it? Or add a comment to this method and getWarnings() to explain the behavior. I ran into this when trying to add warnings to the Analyzer after getWarnings() had been called and my intention was to prevent that error in the future. I don't feel strongly about keeping this. http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java: PS2, Line 119: sortByColumns_.size() > 0 > !sortByColumns_.isEmpty() Done PS2, Line 179: StringBuilder error = new StringBuilder(); : TablePropertyAnalyzer.analyzeSortByColumns(sortByColumns_, srcTable, error); : if (error.length() > 0) throw new AnalysisException(error.toString()); > Why don't you just throw the AnalysisException from the analyzeSortByColumn Done. When I started with this I thought I may have to call the method from places after the analysis so throwing AnalysisExceptions wouldn't work. That turned out to be false, so I now throw in analyzeSortByColumns() directly. http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS2, Line 382: sortByExprs_.size() > 0 > !sortByExprs_.isEmpty() Done PS2, Line 510: sortByExprs_.add(resultExprs_.get(colIdx)); > Does this work ok if you have column permutation? I think it does. colIdx is the index in the target table, and resultExprs_[i] gets written into column[i]. Lines 703ff in prepareExpressions seem to guarantee this. I added tests to the PlannerTest/insert-sort-by.test file. PS2, Line 768: StringBuilder error = new StringBuilder(); : sortByColumns_ = TablePropertyAnalyzer.analyzeSortByColumns(table_, error); : if (error.length() > 0) throw new AnalysisException(error.toString()); > Same comment as before. Move the throw clause inside the analyzeSortByColum Done http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: PS2, Line 33: /** : * This class collects various methods to analyze table properties. : */ : public class TablePropertyAnalyzer { > You're only analyzing the sort by columns, so why this generic class? Can't I discussed this with Alex before starting to implement this change and the idea was to have a class that collects methods to analyze table properties (sort.by.columns, skip.header.line.count, parquet.mr.int96.write.zone). These are often called from various places throughout the code so we want to collect them in a single file. PS2, Line 41: Analyzes the 'sort.by.columns' property of 'table'. > What is the return value? I am not really happy how the methods here are ordered and commented, but also couldn't think of a much better way. Assuming we keep the class, do you have a tip how to better order it? Should each method have the full comment of inputs, output, and what it does? PS2, Line 86: List<String> partitionCols, List<String> primaryKeyCols > You can simply pass one list which includes the list of column names to exc Alex and I discussed passing an error message template and decided to go for the two-list approach. Looking at the code node I don't feel strongly about it. Can you decide with Alex what you prefer? PS2, Line 90: for (int i = 0; i < sortByCols.size(); ++i) { : String colName = sortByCols.get(i); > for (String colName: sortByCols) ? Done PS2, Line 101: return ImmutableList.<Integer>of(); > If I understand correctly, the callers of this function don't do anything w Done http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/catalog/Column.java File fe/src/main/java/org/apache/impala/catalog/Column.java: PS2, Line 134: <String> > You don't need that. Done PS2, Line 133: public static List<String> toColumnNames(List<Column> columns) { : List<String> colNames = Lists.<String>newArrayList(); : for (Column col: columns) { colNames.add(col.getName()); } : return colNames; : } > Alternate implementation may use the Lists.transform function. As discussed in person, the current implementation seems to be clearer and more concise. http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: PS2, Line 379: public List<String> getColumnNames() { : return Column.toColumnNames(colsByPos_); : } > nit: single line? Done http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS2, Line 539: size() > 0 > !isEmpty Done -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-HasComments: Yes
