Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21295#discussion_r190379077
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
---
@@ -147,7 +147,8 @@ public void initialize(InputSplit inputSplit,
TaskAttemptContext taskAttemptCont
this.sparkSchema =
StructType$.MODULE$.fromString(sparkRequestedSchemaString);
this.reader = new ParquetFileReader(
configuration, footer.getFileMetaData(), file, blocks,
requestedSchema.getColumns());
- for (BlockMetaData block : blocks) {
+ // use the blocks from the reader in case some do not match filters
and will not be read
--- End diff --
Yes, we will need to backport this to the 2.3.x line. No rush to make it
for 2.3.1 though, since dictionary filtering is off by default and this isn't a
correctness problem.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]