Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/21295#discussion_r190252764
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
---
@@ -225,7 +226,8 @@ protected void initialize(String path, List<String>
columns) throws IOException
this.sparkSchema = new
ParquetToSparkSchemaConverter(config).convert(requestedSchema);
this.reader = new ParquetFileReader(
config, footer.getFileMetaData(), file, blocks,
requestedSchema.getColumns());
- for (BlockMetaData block : blocks) {
+ // use the blocks from the reader in case some do not match filters
and will not be read
+ for (BlockMetaData block : reader.getRowGroups()) {
--- End diff --
I think this is an existing issue, does your test case fail on Spark 2.3
too?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]