Nattavut Sutyanyong created SPARK-19017: -------------------------------------------
Summary: NOT IN subquery with more than one column may return incorrect results Key: SPARK-19017 URL: https://issues.apache.org/jira/browse/SPARK-19017 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0, 2.0.2, 2.0.1, 2.0.0 Reporter: Nattavut Sutyanyong When putting more than one column in the NOT IN, the query may not return correctly if there is a null data. We can demonstrate the problem with the following data set and query: {code} Seq((2,1)).toDF("a1","b1").createOrReplaceTempView("t1") Seq[(java.lang.Integer,java.lang.Integer)]((1,null)).toDF("a2","b2").createOrReplaceTempView("t2") sql("select * from t1 where (a1,b1) not in (select a2,b2 from t2)").show +---+---+ | a1| b1| +---+---+ +---+---+ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org