Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22197#discussion_r213597570
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
    @@ -366,18 +367,29 @@ class ParquetFileFormat
     
           val sharedConf = broadcastedHadoopConf.value.value
     
    -      lazy val footerFileMetaData =
    +      val footerFileMetaData =
             ParquetFileReader.readFooter(sharedConf, filePath, 
SKIP_ROW_GROUPS).getFileMetaData
    +
    +      val parquetRequestedSchema = {
    +        val schemaString = 
sharedConf.get(ParquetReadSupport.SPARK_ROW_CATALYST_REQUESTED_SCHEMA)
    +        assert(schemaString != null, "Catalyst requested schema not set.")
    +        val catalystRequestedSchema = StructType.fromString(schemaString)
    +        val parquetSchema = footerFileMetaData.getSchema
    +        ParquetReadSupport.clipParquetSchema(
    +          parquetSchema, catalystRequestedSchema, isCaseSensitive)
    +      }
    +      sharedConf.set(ParquetReadSupport.SPARK_ROW_PARQUET_REQUESTED_SCHEMA,
    --- End diff --
    
    We are already at executor side here, why do we need to set the conf? We 
can even pass the `parquetRequestedSchema` to reader via constructor.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to