GitHub user rdblue opened a pull request:
https://github.com/apache/spark/pull/21070
SPARK-23972: Update Parquet to 1.10.0.
## What changes were proposed in this pull request?
This updates Parquet to 1.10.0 and updates the vectorized path for buffer
management changes. Parquet 1.10.0 uses ByteBufferInputStream instead of byte
arrays in encoders. This allows Parquet to break allocations into smaller
chunks that are better for garbage collection.
## How was this patch tested?
Existing Parquet tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rdblue/spark
SPARK-23972-update-parquet-to-1.10.0
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21070.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21070
----
commit 4df17a6e9726cb22e499d479a9ab48f5db18a538
Author: Ryan Blue <blue@...>
Date: 2017-12-01T01:25:53Z
SPARK-23972: Update Parquet to 1.10.0.
This updates the vectorized path for changes in Parquet 1.10.0, which
uses ByteBufferInputStream instead of byte arrays in encoders. This
allows Parquet to break allocations into smaller chunks that are better
for garbage collection.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]