Ryan Foss created SPARK-24390: --------------------------------- Summary: confusion of columns in projection after WITH ROLLUP Key: SPARK-24390 URL: https://issues.apache.org/jira/browse/SPARK-24390 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Environment: Databricks runtime 4.0 Reporter: Ryan Foss
Using two CTEs, where the first issues a WITH ROLLUP and the second is a projection of the first, when attempting to join the two CTEs, spark seems to consider the key column in each CTE to be the same column, resulting in a "Cannot resolve column" error. {noformat} CREATE TABLE IF NOT EXISTS test_rollup (key varchar(3), code varchar(3), stuff int); EXPLAIN WITH cte1 AS ( SELECT key, code, struct(code, avg(stuff)) AS stuff FROM test_rollup GROUP BY key, code WITH ROLLUP ), cte2 AS ( SELECT key, collect_list(stuff) AS stuff_details FROM cte1 WHERE code IS NOT NULL GROUP BY key ) -- join summary record from cte1 to cte2 SELECT c1.key, c1.stuff AS summary_stuff, c2.stuff_details AS detail_stuff FROM cte1 c1 JOIN cte2 c2 ON c2.key = c1.key WHERE c1.code IS NULL == Physical Plan == org.apache.spark.sql.AnalysisException: cannot resolve '`c2.key`' given input columns: [c1.key, c1.code, c1.stuff, c2.stuff_details]; line 22 pos 5; 'Project ['c1.key, 'c1.stuff AS summary_stuff#5415, 'c2.stuff_details AS detail_stuff#5416] +- 'Filter isnull('c1.code) +- 'Join Inner, ('c2.key = 'c1.key) :- SubqueryAlias c1 : +- SubqueryAlias cte1 : +- Aggregate [key#5429, code#5430, spark_grouping_id#5426], [key#5429, code#5430, named_struct(code, code#5430, col2, avg(cast(stuff#5424 as bigint))) AS stuff#5417] : +- Expand [List(key#5422, code#5423, stuff#5424, key#5427, code#5428, 0), List(key#5422, code#5423, stuff#5424, key#5427, null, 1), List(key#5422, code#5423, stuff#5424, null, null, 3)], [key#5422, code#5423, stuff#5424, key#5429, code#5430, spark_grouping_id#5426] : +- Project [key#5422, code#5423, stuff#5424, key#5422 AS key#5427, code#5423 AS code#5428]{noformat} Changing the cte2 query and adding a column alias "key AS key", will cause the columns to be considered unique, resolving the join issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org