AngersZhuuuu commented on issue #26437: [SPARK-29800][SQL] Plan non-correlated Exists 's subquery in PlanSubqueries URL: https://github.com/apache/spark/pull/26437#issuecomment-557636283 cc @cloud-fan Simply look at the calculation process, the calculation of non-correlated exists sub-query is very fast. And remove one shuffle, I will try this in our env with real production case. **With this pr** ``` scala> (1 to 10000).toDF("id").createOrReplaceTempView("s1") scala> (0 to 50000).toDF("id").createOrReplaceTempView("s2") scala> (0 to 1000000).map(_ * 2).toDF("id").createOrReplaceTempView("s3") scala> val df = sql( | """ | | SELECT s1.id FROM s1 | | WHERE EXISTS (SELECT * from s3) | """.stripMargin) df: org.apache.spark.sql.DataFrame = [id: int] scala> var start = System.currentTimeMillis() start: Long = 1574445595283 scala> df.show(5) +---+ | id| +---+ | 1| | 2| | 3| | 4| | 5| +---+ only showing top 5 rows scala> var end = System.currentTimeMillis() end: Long = 1574445609103 scala> println(s"duration = ${end - start}") duration = 13820 ```   **Without this pr current master:** ``` scala> (1 to 10000).toDF("id").createOrReplaceTempView("s1") scala> (0 to 50000).toDF("id").createOrReplaceTempView("s2") scala> (0 to 1000000).map(_ * 2).toDF("id").createOrReplaceTempView("s3") scala> val df = sql( | """ | | SELECT s1.id FROM s1 | | WHERE EXISTS (SELECT * from s3) | """.stripMargin) df: org.apache.spark.sql.DataFrame = [id: int] scala> var start = System.currentTimeMillis() start: Long = 1574445708886 scala> df.show(5) +---+ | id| +---+ | 1| | 2| | 3| | 4| | 5| +---+ only showing top 5 rows scala> var end = System.currentTimeMillis() end: Long = 1574445730126 scala> println(s"duration = ${end - start}") duration = 21240 ```  
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
