[GitHub] [spark] parthchandra opened a new pull request #34471: [SPARK-36879][SQL]Support Parquet v2 data page encoding (DELTA_BINARY_PACKED) for the vectorized path

GitBox Tue, 02 Nov 2021 15:13:22 -0700


parthchandra opened a new pull request #34471:
URL: https://github.com/apache/spark/pull/34471



   ### What changes were proposed in this pull request?
   Implements a vectorized version of the parquet reader for 
DELTA_BINARY_PACKED encoding
   This PR includes a previous PR for this issue which passed the read request 
thru to the parquet implementation and which was not vectorized. The current PR 
builds on top of that PR (hence both are included).
   
   ### Why are the changes needed?
   Currently Spark throws an exception when reading data with these encodings 
if vectorized reader is enabled
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Additional unit tests for the encoding for both long and integer types 
(mirroring the unit tests in the Parquet implementation)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] parthchandra opened a new pull request #34471: [SPARK-36879][SQL]Support Parquet v2 data page encoding (DELTA_BINARY_PACKED) for the vectorized path

Reply via email to