[GitHub] spark pull request: [SPARK-13361][SQL] Add benchmark codes for Enc...

maropu Tue, 23 Feb 2016 07:10:53 -0800

Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/11236#issuecomment-187733938
  
    I tried to implement `IntDeltaBinaryPacking` in `compressionSchemes`; this 
is the simplified version of `IntDeltaBinaryPackingReader/Writer` in 
`parquet-column` so as to calculate compressed size easily in 
`gatherCompressibilityStats`. The benchmark results are as follows;
    
    ```
    Running benchmark: INT Decode(Lower Skew)
      Running case: PassThrough(1.000)
      Running case: RunLengthEncoding(1.002)
      Running case: DictionaryEncoding(0.500)
      Running case: IntDelta(0.250)
      Running case: IntDeltaBinaryPacking(0.068)
    
    Intel(R) Core(TM) i7-4578U CPU @ 3.00GHz
    INT Decode(Lower Skew):             Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
    
-------------------------------------------------------------------------------------------
    PassThrough(1.000)                        285 /  360        235.7           
4.2       1.0X
    RunLengthEncoding(1.002)                  700 /  715         95.8          
10.4       0.4X
    DictionaryEncoding(0.500)                 763 /  782         88.0          
11.4       0.4X
    IntDelta(0.250)                           684 /  702         98.1          
10.2       0.4X
    IntDeltaBinaryPacking(0.068)              805 /  811         83.4          
12.0       0.4X
    
    Running benchmark: INT Decode(Higher Skew)
      Running case: PassThrough(1.000)
      Running case: RunLengthEncoding(1.337)
      Running case: DictionaryEncoding(0.501)
      Running case: IntDelta(0.250)
      Running case: IntDeltaBinaryPacking(0.182)
    
    Intel(R) Core(TM) i7-4578U CPU @ 3.00GHz
    INT Decode(Higher Skew):            Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
    
-------------------------------------------------------------------------------------------
    PassThrough(1.000)                        690 /  716         97.3          
10.3       1.0X
    RunLengthEncoding(1.337)                 1127 / 1148         59.5          
16.8       0.6X
    DictionaryEncoding(0.501)                 836 /  856         80.2          
12.5       0.8X
    IntDelta(0.250)                           763 /  778         88.0          
11.4       0.9X
    IntDeltaBinaryPacking(0.182)              873 /  884         76.9          
13.0       0.8X
    ```
    
    The speeds of encoding/decoding get a little worse though, the compression 
ratios get much better.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-13361][SQL] Add benchmark codes for Enc...

Reply via email to