[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

DaimonPl Mon, 09 Oct 2017 05:26:06 -0700

Github user DaimonPl commented on the issue:

    https://github.com/apache/spark/pull/16578
  
    @mallman @viirya from my understanding current workaround is for case when 
reading columns which are not in file schema
    
    > Parquet-mr will throw an exception if we try to read a superset of the 
file's schema.
    
    Isn't it somehow dependent on schema evolution setting? 
http://spark.apache.org/docs/latest/sql-programming-guide.html#schema-merging
    
    > Since schema merging is a relatively expensive operation, and is not a 
necessity in most cases, we turned it off by default starting from 1.5.0. You 
may enable it by
    > * setting data source option mergeSchema to true when reading Parquet 
files (as shown in the examples below), or
    > * setting the global SQL option spark.sql.parquet.mergeSchema to true.
    
    Wouldn't it work fine with `spark.sql.parquet.mergeSchema` enabled?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

Reply via email to