[GitHub] [spark] parthchandra commented on a change in pull request #34471: [SPARK-36879][SQL] Support Parquet v2 data page encoding (DELTA_BINARY_PACKED) for the vectorized path

GitBox Mon, 13 Dec 2021 15:57:24 -0800


parthchandra commented on a change in pull request #34471:
URL: https://github.com/apache/spark/pull/34471#discussion_r768213546




##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
##########
@@ -147,6 +159,16 @@ object DataSourceReadBenchmark extends SqlBasedBenchmark {
           }
         }
 
+        sqlBenchmark.addCase("SQL Parquet Vectorized (Delta Binary)") { _ =>

Review comment:
       I guess I didn't understand the implication when you suggested I need to 
rebase!
   It looks like the Boolean type breaks vectorized reading. The boolean column 
gets encoded as RLE but the vectorized RLE reader does not support Boolean.
   ```
   [info] Caused by: java.lang.UnsupportedOperationException: only readInts is 
valid.
   [info]       at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedRleValuesReader.readBooleans(VectorizedRleValuesReader.java:368)
   ```
   I guess I could fix this. Would you recommend a different PR, or should I 
make the change in this one?  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] parthchandra commented on a change in pull request #34471: [SPARK-36879][SQL] Support Parquet v2 data page encoding (DELTA_BINARY_PACKED) for the vectorized path

Reply via email to