[ 
https://issues.apache.org/jira/browse/SPARK-17348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nattavut Sutyanyong updated SPARK-17348:
----------------------------------------
    Comment: was deleted

(was: The same problem surfaced in different symptoms was discussed in 
SPARK-13801, SPARK-14040, and SPARK-17154. The problem reported here is a 
specific pattern. We shall find a solution that addresses the root cause. I am 
considering closing this JIRA as a duplicate.)

> Incorrect results from subquery transformation
> ----------------------------------------------
>
>                 Key: SPARK-17348
>                 URL: https://issues.apache.org/jira/browse/SPARK-17348
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Nattavut Sutyanyong
>              Labels: correctness
>
> {noformat}
> Seq((1,1)).toDF("c1","c2").createOrReplaceTempView("t1")
> Seq((1,1),(2,0)).toDF("c1","c2").createOrReplaceTempView("t2")
> sql("select c1 from t1 where c1 in (select max(t2.c1) from t2 where t1.c2 >= 
> t2.c2)").show
> +---+
> | c1|
> +---+
> |  1|
> +---+
> {noformat}
> The correct result of the above query should be an empty set. Here is an 
> explanation:
> Both rows from T2 satisfies the correlated predicate T1.C2 >= T2.C2 when 
> T1.C1 = 1 so both rows needs to be processed in the same group of the 
> aggregation process in the subquery. The result of the aggregation yields 
> MAX(T2.C1) as 2. Therefore, the result of the evaluation of the predicate 
> T1.C1 (which is 1) IN MAX(T2.C1) (which is 2) should be an empty set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to