[GitHub] spark pull request #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries t...

cloud-fan Mon, 12 Nov 2018 08:58:57 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22518#discussion_r232737458
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 
---
    @@ -1268,4 +1269,16 @@ class SubquerySuite extends QueryTest with 
SharedSQLContext {
           assert(getNumSortsInQuery(query5) == 1)
         }
       }
    +
    +  test("SPARK-25482: Reuse same Subquery in order to execute it only 
once") {
    +    withTempView("t1", "t2") {
    +      sql("create temporary view t1(a int) using parquet")
    +      sql("create temporary view t2(b int) using parquet")
    +      val plan = sql("select * from t2 where b > (select max(a) from t1)")
    --- End diff --
    
    ah sorry I misread the code. Unless the subquery is rewritten into join, we 
must wait for all subqueries to be finished before executing the plan.
    
    We can rewrite scalar subquery in data source filters into literal, to make 
it work with the filter pushdown API.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries t...

Reply via email to