[GitHub] spark pull request #22518: [SPARK-25482][SQL] ReuseSubquery can be useless w...

mgaido91 Mon, 12 Nov 2018 08:28:58 -0800

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22518#discussion_r232725686
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 
---
    @@ -1268,4 +1269,16 @@ class SubquerySuite extends QueryTest with 
SharedSQLContext {
           assert(getNumSortsInQuery(query5) == 1)
         }
       }
    +
    +  test("SPARK-25482: Reuse same Subquery in order to execute it only 
once") {
    +    withTempView("t1", "t2") {
    +      sql("create temporary view t1(a int) using parquet")
    +      sql("create temporary view t2(b int) using parquet")
    +      val plan = sql("select * from t2 where b > (select max(a) from t1)")
    --- End diff --
    
    >  it also means the data source scan must wait until the subquery is 
finished
    
    The subquery should be executed anyway sooner or later, right? So I don't 
see the problem here: am I missing something?
    
    Ok, thanks, I'll follow your suggestion and forbid it here and create a new 
ticket about pushing it down to data sources. Thanks.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22518: [SPARK-25482][SQL] ReuseSubquery can be useless w...

Reply via email to