GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21902
[SPARK-24952][SQL] Support LZMA2 compression by Avro datasource ## What changes were proposed in this pull request? In the PR, I propose to support `LZMA2` (`XZ`) and `BZIP2` compressions by `AVRO` datasource in write since the codecs has much better compression ratio comparing to already supported `deflate` and `snappy` codecs. To tune compression level of `XZ`, the PR introduces new SQL config `spark.sql.avro.xz.level` with default value `6`. Allowed range of levels is `[0, 9]`. ## How was this patch tested? It was tested manually and by an existing test which was extended to check the `xz` and `bzip2` compressions. You can merge this pull request into a Git repository by running: $ git pull https://github.com/MaxGekk/spark-1 avro-xz-bzip2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21902.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21902 ---- commit e3b8856c6f8769cf1c2646e7cf5ae41fb3c8d626 Author: Maxim Gekk <maxim.gekk@...> Date: 2018-07-27T20:15:04Z Support bzip2 commit 7b9dd253e313fb7b5f674672f8bd5447812522a3 Author: Maxim Gekk <maxim.gekk@...> Date: 2018-07-27T20:40:18Z Support xz commit d4dbeb10656283d957c9c52327da97170f9ad080 Author: Maxim Gekk <maxim.gekk@...> Date: 2018-07-27T21:12:54Z Refactoring commit 3e1139af293cb2e06e125edfd443a5b5a0265b84 Author: Maxim Gekk <maxim.gekk@...> Date: 2018-07-27T21:30:30Z Fix comments ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org