GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/21902

    [SPARK-24952][SQL] Support LZMA2 compression by Avro datasource

    ## What changes were proposed in this pull request?
    
    In the PR, I propose to support `LZMA2` (`XZ`) and `BZIP2` compressions by 
`AVRO` datasource  in write since the codecs has much better compression ratio 
comparing to already supported `deflate` and `snappy` codecs. To tune 
compression level of `XZ`, the PR introduces new SQL config 
`spark.sql.avro.xz.level` with default value `6`. Allowed range of levels is 
`[0, 9]`.
    
    ## How was this patch tested?
    
    It was tested manually and by an existing test which was extended to check 
the `xz` and `bzip2` compressions.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 avro-xz-bzip2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21902.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21902
    
----
commit e3b8856c6f8769cf1c2646e7cf5ae41fb3c8d626
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-07-27T20:15:04Z

    Support bzip2

commit 7b9dd253e313fb7b5f674672f8bd5447812522a3
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-07-27T20:40:18Z

    Support xz

commit d4dbeb10656283d957c9c52327da97170f9ad080
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-07-27T21:12:54Z

    Refactoring

commit 3e1139af293cb2e06e125edfd443a5b5a0265b84
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-07-27T21:30:30Z

    Fix comments

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to