Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7049#discussion_r33633463
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala ---
    @@ -361,13 +355,15 @@ private[sql] class ParquetRelation2(
             rawFooters.map(footer => footer.getFile -> footer).toMap
           }
     
    +      footers = new RelationMemo(getFooters())
    +
           // If we already get the schema, don't need to re-compute it since 
the schema merging is
           // time-consuming.
           if (dataSchema == null) {
             dataSchema = {
               val dataSchema0 = maybeDataSchema
    -            .orElse(readSchema())
    --- End diff --
    
    The reason why we put `readSchema()` in front of `maybeMetastoreSchema` is 
that Parquet is case sensitive and case preserving, while Hive is neither. 
Also, table fields in Hive are always nullable, while Parquet is not.
    
    This change is OK for Parquet files that are written by Hive at the first 
place, but for use cases where Parquet files written by other systems/tools are 
registered as an external Hive table, it may fail if uppercase letters and/or 
non-nullable fields appear in the Parquet schema. That's why we must read 
Parquet schema first and reconcile possible schema conflicts with 
`ParquetRelation2.mergeMetastoreParquetSchema`.
    
    I think one thing we can do here can be having a `SQLConf` option 
`spark.sql.parquet.trustMetastoreSchema` which is set to `false` by default.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to