sunchao commented on a change in pull request #32753:
URL: https://github.com/apache/spark/pull/32753#discussion_r654010520
##########
File path:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
##########
@@ -340,6 +398,71 @@ public Binary readBinary(int len) {
throw new UnsupportedOperationException("only readInts is valid.");
}
+ @Override
+ public void skipIntegers(int total) {
+ int left = total;
+ while (left > 0) {
+ if (this.currentCount == 0) this.readNextGroup();
+ int n = Math.min(left, this.currentCount);
+ advance(n);
+ left -= n;
+ }
+ }
+
+ @Override
+ public void skipBooleans(int total) {
+ throw new UnsupportedOperationException("only skipIntegers is supported");
Review comment:
I can change it to `only skipIntegers is valid`. It's because Parquet
RLE encoding only support integers which is why all the others are
unimplemented.
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala
##########
@@ -378,11 +380,77 @@ class ParquetIOSuite extends QueryTest with ParquetTest
with SharedSparkSession
.withWriterVersion(PARQUET_1_0)
.withCompressionCodec(GZIP)
.withRowGroupSize(1024 * 1024)
- .withPageSize(1024)
+ .withPageSize(pageSize)
+ .withDictionaryPageSize(dictionaryPageSize)
.withConf(hadoopConf)
.build()
}
+ test("test multiple pages with different sizes and nulls") {
Review comment:
Sure wil do
##########
File path:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
##########
@@ -61,6 +61,14 @@ public final void readBooleans(int total,
WritableColumnVector c, int rowId) {
}
}
+ @Override
+ public final void skipBooleans(int total) {
+ // TODO: properly vectorize this
Review comment:
This follows a few other TODOs in the file when handling booleans. Yes I
can file a JIRA too
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]