GitHub user wangyum opened a pull request:

    https://github.com/apache/spark/pull/22778

    [SPARK-25784][SQL] Infer filters from constraints after rewriting predicate 
subquery

    ## What changes were proposed in this pull request?
    
    Infer filters from constraints after rewriting predicate subquery.
    
    ## How was this patch tested?
    unit tests and benchmark tests
    ```scala
    withTempView("t1", "t2") {
      withTempDir { dir =>
        spark.range(3000000)
          .selectExpr("cast(null as int) as c1", "if(id % 2 = 0, null, id) as 
c2", "id as c3")
          .coalesce(1)
          .orderBy("c2")
          .write
          .mode("overwrite")
          .option("parquet.block.size", 10485760)
          .parquet(dir.getCanonicalPath)
    
        spark.read.parquet(dir.getCanonicalPath).createTempView("t1")
        spark.read.parquet(dir.getCanonicalPath).createTempView("t2")
    
        Seq("c1", "c2", "c3").foreach { column =>
          val benchmark = new Benchmark(s"join key $column", 10)
          Seq(false, true).foreach { inferFilters =>
            benchmark.addCase(s"Is infer filters $inferFilters", numIters = 5) 
{ _ =>
              withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> 
inferFilters.toString) {
                sql(s"select t1.* from t1 where t1.$column in (select $column 
from t2)").count()
              }
            }
          }
          benchmark.run()
        }
      }
    }
    ```
    
    ```
    ava HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
    join key c1:                             Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Is infer filters false                        2005 / 2163          0.0   
200481431.0       1.0X
    Is infer filters true                          190 /  207          0.0    
18962935.7      10.6X
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
    join key c2:                             Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Is infer filters false                        2368 / 2498          0.0   
236803743.1       1.0X
    Is infer filters true                         1234 / 1268          0.0   
123443912.3       1.9X
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
    join key c3:                             Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Is infer filters false                        2754 / 2907          0.0   
275376009.7       1.0X
    Is infer filters true                         2237 / 2255          0.0   
223739457.8       1.2X
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangyum/spark SPARK-25784

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22778.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22778
    
----
commit c8d1b91b93e7ad05ca0bd17984fad1c30062d504
Author: Yuming Wang <yumwang@...>
Date:   2018-10-20T01:39:51Z

    Infer filters from constraints after rewriting predicate subquery

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to