Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/20511
First of all, ORC 1.4.2 was very safe because it has only ORC-235 removing
redundant dependencies.
For ORC 1.4.3, the following five patches are included.
1. ORC-298 Move the benchmark code base to non-Apache repository
2. ORC-240 Fix warnings from Maven
3. ORC-217 Duplicate rat plugins in pom.xml
The above three are trivial.
4. ORC-285 Empty vector batches of floats or doubles get
java.io.EOFException
5. ORC-296 Work around HADOOP-15171; also fix stream contract
(4) is only adding a workaround for `batchSize=0`. (5) may cause
performance difference.
In general, the patches look required, but I didn't run a full test against
ORC 1.4.3.
Only ORC-296 might cause some performance difference.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]