[GitHub] [spark] LuciferYang opened a new pull request #35163: [SPARK-37864][SQL] Support vectorized read boolean values use RLE encoding with Parquet DataPage V2

GitBox Tue, 11 Jan 2022 00:42:13 -0800


LuciferYang opened a new pull request #35163:
URL: https://github.com/apache/spark/pull/35163



   ### What changes were proposed in this pull request?
   Parquet v2 data page write Boolean Values use RLE encoding, when read v2 
boolean type values it will throw exceptions as follows now:
   
   ```java
   Caused by: java.lang.UnsupportedOperationException: Unsupported encoding: RLE
       at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.getValuesReader(VectorizedColumnReader.java:305)
 ~[classes/:?]
       at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.initDataReader(VectorizedColumnReader.java:277)
 ~[classes/:?]
       at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readPageV2(VectorizedColumnReader.java:344)
 ~[classes/:?]
       at 
   ```
   
   This PR extends the `readBooleans` and `skipBooleans` of 
`VectorizedRleValuesReader` to ensure that the above scenario can pass.
   
   ### Why are the changes needed?
   Support Parquet v2 data page RLE encoding  for the vectorized read path
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Add new test case
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] LuciferYang opened a new pull request #35163: [SPARK-37864][SQL] Support vectorized read boolean values use RLE encoding with Parquet DataPage V2

Reply via email to