fornaix opened a new pull request #32751:
URL: https://github.com/apache/spark/pull/32751
### What changes were proposed in this pull request?
This PR aims to support LZ4 compression in the ORC data source.
### Why are the changes needed?
Apache ORC supports LZ4 compression, but we cannot set LZ4 compression in
the ORC data source
**BEFORE**
```scala
scala> spark.range(10).write.option("compression", "lz4").orc("/tmp/lz4")
java.lang.IllegalArgumentException: Codec [lz4] is not available. Available
codecs are uncompressed, lzo, snappy, zlib, none, zstd.
```
**AFTER**
```scala
scala> spark.range(10).write.option("compression", "lz4").orc("/tmp/lz4")
```
```bash
$ orc-tools meta /tmp/lz4
Processing data file
file:/tmp/lz4/part-00000-6a244eee-b092-4c79-a977-fb8a69dde2eb-c000.lz4.orc
[length: 222]
Structure for
file:/tmp/lz4/part-00000-6a244eee-b092-4c79-a977-fb8a69dde2eb-c000.lz4.orc
File Version: 0.12 with ORC_517
Rows: 10
Compression: LZ4
Compression size: 262144
Type: struct<id:bigint>
Stripe Statistics:
Stripe 1:
Column 0: count: 10 hasNull: false
Column 1: count: 10 hasNull: false bytesOnDisk: 7 min: 0 max: 9 sum: 45
File Statistics:
Column 0: count: 10 hasNull: false
Column 1: count: 10 hasNull: false bytesOnDisk: 7 min: 0 max: 9 sum: 45
Stripes:
Stripe: offset: 3 data: 7 rows: 10 tail: 35 index: 35
Stream: column 0 section ROW_INDEX start: 3 length 11
Stream: column 1 section ROW_INDEX start: 14 length 24
Stream: column 1 section DATA start: 38 length 7
Encoding column 0: DIRECT
Encoding column 1: DIRECT_V2
File length: 222 bytes
Padding length: 0 bytes
Padding ratio: 0%
User Metadata:
org.apache.spark.version=3.2.0
```
### Does this PR introduce _any_ user-facing change?
Yes.
### How was this patch tested?
Pass the newly added test case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]