Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/24105 )
Change subject: IMPALA-2083: Add PIVOT clause ...................................................................... Patch Set 9: (9 comments) I still need some more time to look into the details. Left some comments for discussion first. http://gerrit.cloudera.org:8080/#/c/24105/3/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/24105/3/fe/src/main/cup/sql-parser.cup@3525 PS3, Line 3525: ident_or_unreserved > Thanks! I think nested fields can be extracted in a subquery. If we don't support this, let's mention it in the commit message. http://gerrit.cloudera.org:8080/#/c/24105/9/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/24105/9/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@625 PS9, Line 625: if (stmt_ instanceof SelectStmt) { It'd be more tidy to use a method like requiresSubqueryRewrite(). Then we can also remove the changed_ flag used here. http://gerrit.cloudera.org:8080/#/c/24105/3/fe/src/main/java/org/apache/impala/analysis/PivotTableRef.java File fe/src/main/java/org/apache/impala/analysis/PivotTableRef.java: http://gerrit.cloudera.org:8080/#/c/24105/3/fe/src/main/java/org/apache/impala/analysis/PivotTableRef.java@96 PS3, Line 96: // Creates an unanalyzed clone. > Thanks! Changed to cloning the whole list. Can we add a test for this? http://gerrit.cloudera.org:8080/#/c/24105/3/fe/src/main/java/org/apache/impala/analysis/PivotTableRef.java@147 PS3, Line 147: throw new AnalysisException("Duplicate column name: " + fieldName); > Thanks! I think they are supported in the new Patch Set. The new patch set still cast the Expr to LiteralExpr which doesn't work for FunctionalCallExpr. It'd be nice to support statements like SELECT v1, v2 FROM functional.alltypes PIVOT ( count(id) for month in (month(now()) AS m1, month(now())+1 AS m2) ) AS t; month(now()) is evaluated to the current month. This needs a better error message instead of hitting ClassCastException: SELECT v1, v2 FROM functional.alltypes PIVOT ( count(id) for month in (int_col AS v1, 2 AS v2) ) AS t; http://gerrit.cloudera.org:8080/#/c/24105/9/fe/src/main/java/org/apache/impala/analysis/PivotTableRef.java File fe/src/main/java/org/apache/impala/analysis/PivotTableRef.java: http://gerrit.cloudera.org:8080/#/c/24105/9/fe/src/main/java/org/apache/impala/analysis/PivotTableRef.java@21 PS9, Line 21: import java.util.HashMap; nit: unused import http://gerrit.cloudera.org:8080/#/c/24105/9/fe/src/main/java/org/apache/impala/analysis/PivotTableRef.java@73 PS9, Line 73: private final Map<String, FunctionCallExpr> analyzedAggList_ = new LinkedHashMap<>(); : private final Map<String, LiteralExpr> headerValueMap_ = new LinkedHashMap<>(); : private final Map<String, SlotDescriptor> groupingSlotMap_ = new LinkedHashMap<>(); nit: Let's add a comment mentioning what are the keys of these maps, e.g. analyzedAggList_ maps alias to the analyzed agg exprs. http://gerrit.cloudera.org:8080/#/c/24105/9/fe/src/main/java/org/apache/impala/analysis/PivotTableRef.java@277 PS9, Line 277: uniqueAggAlias nit: at first glance, it seems all items use the same alias. But later I realized we require writing alias for multiple aggs on L184. Maybe it's more clear by adding a comment above 'uniqueAggAlias' mentioning it's only used for single agg scenario. http://gerrit.cloudera.org:8080/#/c/24105/9/fe/src/main/java/org/apache/impala/analysis/PivotTableRef.java@291 PS9, Line 291: Expr headerSlotRef = transposeGroupingExprs.get(headerSlotRefIndex); nit: not used http://gerrit.cloudera.org:8080/#/c/24105/9/fe/src/main/java/org/apache/impala/analysis/PivotTableRef.java@298 PS9, Line 298: for (SelectListItem aggItem : aggList_.getItems()) { : String aggAlias = aggItem.getAlias() == null ? uniqueAggAlias : aggItem.getAlias(); : for (Map.Entry<String, LiteralExpr> entry : headerValueMap_.entrySet()) { Oracle iterates the header values before the agg alias: https://docs.oracle.com/cd/E11882_01/server.112/e25554/analysis.htm#sthref1224 Same for SparkSQL: https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-pivot.html It'd be better to be consistent with them. SELECT * FROM person PIVOT ( SUM(age) AS a, AVG(class) AS c FOR name IN ('John' AS john, 'Mike' AS mike) ); +------+-----------+---------+---------+---------+---------+ | id | address | john_a | john_c | mike_a | mike_c | +------+-----------+---------+---------+---------+---------+ | 200 | Street 2 | NULL | NULL | NULL | NULL | | 100 | Street 1 | 30 | 1.0 | NULL | NULL | | 300 | Street 3 | NULL | NULL | 80 | 3.0 | | 400 | Street 4 | NULL | NULL | NULL | NULL | +------+-----------+---------+---------+---------+---------+ -- To view, visit http://gerrit.cloudera.org:8080/24105 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib799b772be18d2a3db8983b24e1068e7dfdf1ca9 Gerrit-Change-Number: 24105 Gerrit-PatchSet: 9 Gerrit-Owner: Xuebin Su <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Xuebin Su <[email protected]> Gerrit-Comment-Date: Wed, 13 May 2026 09:29:20 +0000 Gerrit-HasComments: Yes
