Bruce Robbins created SPARK-46779:
-------------------------------------

             Summary: Grouping by subquery with a cached relation can fail
                 Key: SPARK-46779
                 URL: https://issues.apache.org/jira/browse/SPARK-46779
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.0.0
            Reporter: Bruce Robbins


Example:
{noformat}
create or replace temp view data(c1, c2) as values
(1, 2),
(1, 3),
(3, 7),
(4, 5);

cache table data;

select c1, (select count(*) from data d1 where d1.c1 = d2.c1), count(c2) from 
data d2 group by all;
{noformat}
It fails with the following error:
{noformat}
[INTERNAL_ERROR] Couldn't find count(1)#163L in 
[c1#78,_groupingexpression#149L,count(1)#82L] SQLSTATE: XX000
org.apache.spark.SparkException: [INTERNAL_ERROR] Couldn't find count(1)#163L 
in [c1#78,_groupingexpression#149L,count(1)#82L] SQLSTATE: XX000
{noformat}
If you don't cache the view, the query succeeds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to