Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/17303
  
    @Cyan4973 I quickly checked again;
    ```
    scaleFactor: 4
    AWS instance: c4.4xlarge    
    
    // In this bench, I used `local-cluster` (`local` used in the benchmark 
above)
    ./bin/spark-shell --master local-cluster[4,4,7500] \
      --conf spark.driver.memory=1g \
      --conf spark.executor.memory=7g \
      --conf spark.io.compression.codec=xxx
    
    --- zstd (level=3)
    Running execution q4-v1.4 iteration: 1, StandardRun=true
    Execution time: 36.517211838s
    Running execution q4-v1.4 iteration: 2, StandardRun=true
    Execution time: 25.026869575s                                               
    
    Running execution q4-v1.4 iteration: 3, StandardRun=true
    Execution time: 24.370711575s                                               
    
    
    --- zstd (level=1)
    Running execution q4-v1.4 iteration: 1, StandardRun=true
    Execution time: 29.654705815s
    Running execution q4-v1.4 iteration: 2, StandardRun=true
    Execution time: 20.638918335s
    Running execution q4-v1.4 iteration: 3, StandardRun=true
    Execution time: 19.928730758999997s
    
    --- lz4
    Running execution q4-v1.4 iteration: 1, StandardRun=true
    Execution time: 27.422360631s
    Running execution q4-v1.4 iteration: 2, StandardRun=true
    Execution time: 17.38519278s
    Running execution q4-v1.4 iteration: 3, StandardRun=true
    Execution time: 15.779084563s
    
    --- snappy
    Running execution q4-v1.4 iteration: 1, StandardRun=true
    Execution time: 27.476569521000002s
    Running execution q4-v1.4 iteration: 2, StandardRun=true
    Execution time: 16.438640631s                                               
    
    Running execution q4-v1.4 iteration: 3, StandardRun=true
    Execution time: 14.949329456s
    
    --- lzf
    Running execution q4-v1.4 iteration: 1, StandardRun=true
    Execution time: 27.853010073s
    Running execution q4-v1.4 iteration: 2, StandardRun=true
    Execution time: 17.431232532000003s
    Running execution q4-v1.4 iteration: 3, StandardRun=true
    Execution time: 15.916569896999999s
    ```
    `zstd` was still worse than the others.
    Not sure though, there might be the winner case where `zstd` overcomes the 
others in more larger data set.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to