AngersZhuuuu commented on issue #26437: [SPARK-29800][SQL] Plan non-correlated 
Exists 's subquery in PlanSubqueries
URL: https://github.com/apache/spark/pull/26437#issuecomment-570445224
 
 
   **With current pr**
   
   ```
   scala> (1 to 10000).toDF("id").createOrReplaceTempView("s1")
   scala> (0 to 50000).toDF("id").createOrReplaceTempView("s2")
   scala> (0 to 1000000).map(_ * 2).toDF("id").createOrReplaceTempView("s3")
   scala> val df = sql("SELECT s1.id  FROM s1 WHERE EXISTS (SELECT * from s3)")
   df: org.apache.spark.sql.DataFrame = [id: int]
   scala> var start = System.currentTimeMillis()
   start: Long = 1578018739056
   scala> df.show(5)
   +---+
   | id|
   +---+
   |  1|
   |  2|
   |  3|
   |  4|
   |  5|
   +---+
   only showing top 5 rows
   scala> var end = System.currentTimeMillis()
   end: Long = 1578018740882
   scala> println(s"duration = ${end - start}")
   duration = 1826
   scala>
   ```
   
![image](https://user-images.githubusercontent.com/46485123/71704320-a3c45c80-2e14-11ea-83d1-decddffdefb3.png)
   
   
   
   **Without pr**
   ```
   scala> (1 to 10000).toDF("id").createOrReplaceTempView("s1")
   scala> (0 to 50000).toDF("id").createOrReplaceTempView("s2")
   scala> (0 to 1000000).map(_ * 2).toDF("id").createOrReplaceTempView("s3")
   scala> val df = sql("SELECT s1.id  FROM s1 WHERE EXISTS (SELECT * from s3)")
   df: org.apache.spark.sql.DataFrame = [id: int]
   scala> var start = System.currentTimeMillis()
   start: Long = 1578020812055
   scala> df.show(5)
   20/01/03 11:07:00 dispatcher-event-loop-4 WARN TaskSetManager: Stage 0 
contains a task of very large size (4035 KiB). The maximum recommended task 
size is 1000 KiB.
   +---+
   | id|
   +---+
   |  1|
   |  2|
   |  3|
   |  4|
   |  5|
   +---+
   only showing top 5 rows
   scala> var end = System.currentTimeMillis()
   end: Long = 1578020823600
   scala> println(s"duration = ${end - start}")
   duration = 11545
   ```
   
   
![image](https://user-images.githubusercontent.com/46485123/71705211-71692e00-2e19-11ea-8e79-f8469a9bcffc.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to