Re: Spark 3.0 and ORC 1.6

2020-01-29 Thread Dongjoon Hyun
Hi, David. Thank you for sharing your opinion. I'm also a supporter for ZStandard. Apache Spark 3.0 starts to take advantage of ZStd a lot. 1) Switch the default codec for MapOutputStatus from GZip to ZStd. 2) Add spark.eventLog.compression.codec to allow ZStd. 3) Use Parquet+ZStd

Spark 3.0 and ORC 1.6

2020-01-28 Thread David Christle
Hi all, I am a heavy user of Spark at LinkedIn, and am excited about the ZStandard compression option recently incorporated into ORC 1.6. I would love to explore using it for storing/querying of large (>10 TB) tables for my own disk I/O intensive workloads, and other users & companies may be