Ryan Foss created SPARK-24390:
---------------------------------

             Summary: confusion of columns in projection after WITH ROLLUP
                 Key: SPARK-24390
                 URL: https://issues.apache.org/jira/browse/SPARK-24390
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.0
         Environment: Databricks runtime 4.0
            Reporter: Ryan Foss


Using two CTEs, where the first issues a WITH ROLLUP and the second is a 
projection of the first, when attempting to join the two CTEs, spark seems to 
consider the key column in each CTE to be the same column, resulting in a 
"Cannot resolve column" error.

 
{noformat}
CREATE TABLE IF NOT EXISTS test_rollup (key varchar(3), code varchar(3), stuff 
int);

EXPLAIN

WITH
cte1 AS (
  SELECT
    key,
    code,
    struct(code, avg(stuff)) AS stuff
  FROM test_rollup
  GROUP BY key, code WITH ROLLUP
),

cte2 AS (
  SELECT
    key,
    collect_list(stuff) AS stuff_details
  FROM cte1
  WHERE code IS NOT NULL
  GROUP BY key
)

-- join summary record from cte1 to cte2
SELECT c1.key, c1.stuff AS summary_stuff, c2.stuff_details AS detail_stuff
FROM cte1 c1
JOIN cte2 c2
  ON c2.key = c1.key
WHERE c1.code IS NULL


== Physical Plan == org.apache.spark.sql.AnalysisException: cannot resolve 
'`c2.key`' given input columns: [c1.key, c1.code, c1.stuff, c2.stuff_details]; 
line 22 pos 5; 'Project ['c1.key, 'c1.stuff AS summary_stuff#5415, 
'c2.stuff_details AS detail_stuff#5416] +- 'Filter isnull('c1.code) +- 'Join 
Inner, ('c2.key = 'c1.key) :- SubqueryAlias c1 : +- SubqueryAlias cte1 : +- 
Aggregate [key#5429, code#5430, spark_grouping_id#5426], [key#5429, code#5430, 
named_struct(code, code#5430, col2, avg(cast(stuff#5424 as bigint))) AS 
stuff#5417] : +- Expand [List(key#5422, code#5423, stuff#5424, key#5427, 
code#5428, 0), List(key#5422, code#5423, stuff#5424, key#5427, null, 1), 
List(key#5422, code#5423, stuff#5424, null, null, 3)], [key#5422, code#5423, 
stuff#5424, key#5429, code#5430, spark_grouping_id#5426] : +- Project 
[key#5422, code#5423, stuff#5424, key#5422 AS key#5427, code#5423 AS 
code#5428]{noformat}
Changing the cte2 query and adding a column alias "key AS key", will cause the 
columns to be considered unique, resolving the join issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to