[GitHub] spark pull request #22698: [SPARK-25710][SQL] range should report metrics co...

cloud-fan Thu, 11 Oct 2018 08:02:22 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22698#discussion_r224484333
  
    --- Diff: sql/core/benchmarks/RangeBenchmark-results.txt ---
    @@ -0,0 +1,16 @@
    
+================================================================================================
    +range
    
+================================================================================================
    +
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_161-b12 on Mac OS X 10.13.6
    +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
    +
    +range:                                   Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
+------------------------------------------------------------------------------------------------
    +full scan                                   12674 / 12840         41.4     
     24.2       1.0X
    +limit after range                               33 /   37      15900.2     
      0.1     384.4X
    +filter after range                             969 /  985        541.0     
      1.8      13.1X
    +count after range                               42 /   42      12510.5     
      0.1     302.4X
    +count after limit after range                   32 /   33      16337.0     
      0.1     394.9X
    --- End diff --
    
    several learnings:
    1. limit does help
    2. The performance is bad if we interrupt the data processing loop too 
often. Full scan is the worst case, we interrupt the loop for every record.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22698: [SPARK-25710][SQL] range should report metrics co...

Reply via email to