LuciferYang opened a new pull request #35163:
URL: https://github.com/apache/spark/pull/35163
### What changes were proposed in this pull request?
Parquet v2 data page write Boolean Values use RLE encoding, when read v2
boolean type values it will throw exceptions as follows now:
```java
Caused by: java.lang.UnsupportedOperationException: Unsupported encoding: RLE
at
org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.getValuesReader(VectorizedColumnReader.java:305)
~[classes/:?]
at
org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.initDataReader(VectorizedColumnReader.java:277)
~[classes/:?]
at
org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readPageV2(VectorizedColumnReader.java:344)
~[classes/:?]
at
```
This PR extends the `readBooleans` and `skipBooleans` of
`VectorizedRleValuesReader` to ensure that the above scenario can pass.
### Why are the changes needed?
Support Parquet v2 data page RLE encoding for the vectorized read path
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Add new test case
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]