GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/14117

    [SPARK-16461][SQL] Support partition batch pruning with `<=>` predicate in 
InMemoryTableScanExec

    ## What changes were proposed in this pull request?
    
    It seems `EqualNullSafe` filter was missed for batch pruneing partitions in 
cached tables.
    
    It seems supporting this improves the performance roughly 5 times faster.
    
    Running the codes below:
    
    ```scala
    test("Null-safe equal comparison") {
      val N = 20000000
      val df = spark.range(N).repartition(20)
      val benchmark = new Benchmark("Null-safe equal comparison", N)
      df.createOrReplaceTempView("t")
      spark.catalog.cacheTable("t")
      sql("select id from t where id <=> 1").collect()
    
      benchmark.addCase("Null-safe equal comparison", 10) { _ =>
        sql("select id from t where id <=> 1").collect()
      }
      benchmark.run()
    }
    ```
    
    
    produces the results below:
    
    **Before:**
    
    ```
    Running benchmark: Null-safe equal comparison
      Running case: Null-safe equal comparison
      Stopped after 10 iterations, 2098 ms
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14 on Mac OS X 10.11.5
    Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
    
    Null-safe equal comparison:              Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Null-safe equal comparison                     204 /  210         98.1      
    10.2       1.0X
    ```
    
    **After:**
    
    ```
    Running benchmark: Null-safe equal comparison
      Running case: Null-safe equal comparison
      Stopped after 10 iterations, 478 ms
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14 on Mac OS X 10.11.5
    Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
    
    Null-safe equal comparison:              Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Null-safe equal comparison                      42 /   48        474.1      
     2.1       1.0X
    ```
    
    ## How was this patch tested?
    
    Unit tests in `PartitionBatchPruningSuite`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-16461

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14117.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14117
    
----
commit a7c750ec236973bec29380c39f9b7e1627979d04
Author: hyukjinkwon <[email protected]>
Date:   2016-07-09T13:41:01Z

    Support partition batch pruning with `<=>` (EqualNullSafe) predicate in 
InMemoryTableScanExec

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to