[GitHub] spark pull request #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReade...

rdblue Wed, 23 May 2018 12:55:51 -0700

Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21295#discussion_r190378887
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
 ---
    @@ -225,7 +226,8 @@ protected void initialize(String path, List<String> 
columns) throws IOException
         this.sparkSchema = new 
ParquetToSparkSchemaConverter(config).convert(requestedSchema);
         this.reader = new ParquetFileReader(
             config, footer.getFileMetaData(), file, blocks, 
requestedSchema.getColumns());
    -    for (BlockMetaData block : blocks) {
    +    // use the blocks from the reader in case some do not match filters 
and will not be read
    +    for (BlockMetaData block : reader.getRowGroups()) {
    --- End diff --
    
    Dictionary filtering is off by default in 1.8.x. It was enabled after we 
built confidence in its correctness in 1.9.x.
    
    We should backport this fix to 2.3.x also, but the only downside to not 
having it is that dictionary filtering will throw an exception when it is 
enabled. So the feature just isn't available.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReade...

Reply via email to