Github user nsyca commented on the issue:

    https://github.com/apache/spark/pull/16246
  
    This problem can be reproduced with a simple script now.
    
    ````scala
    Seq((1,1)).toDF("pk","pv").createOrReplaceTempView("p")
    Seq((1,1)).toDF("ck","cv").createOrReplaceTempView("c")
    sql("select * from p,c where p.pk=c.ck and c.cv = (select avg(c1.cv) from c 
c1 where c1.ck = p.pk)").show
    ````
    
    The requirements are:
    1. We need to reference the same table twice in both the parent and the 
subquery. Here is the table c.
    2. We need to have a correlated predicate but to a different table. Here is 
from c (as c1) in the subquery to p in the parent.
    3. We will then "deduplicate" c1.ck in the subquery to `ck#<n1>#<n2>` at 
`Project` above `Aggregate` of `avg`. Then when we compare `ck#<n1>#<n2>` and 
the original group by column `ck#<n1>` by their canonicalized form, which is 
#<n2> != #<n1>. That's how we trigger the exception I added in SPARK-18504.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to