[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

mallman Tue, 14 Nov 2017 19:35:14 -0800

Github user mallman commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16578#discussion_r151026919
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -961,6 +961,15 @@ object SQLConf {
         .booleanConf
         .createWithDefault(true)
     
    +  val NESTED_SCHEMA_PRUNING_ENABLED =
    +    buildConf("spark.sql.nestedSchemaPruning.enabled")
    +      .internal()
    +      .doc("Prune nested fields from a logical relation's output which are 
unnecessary in " +
    +        "satisfying a query. This optimization allows columnar file format 
readers to avoid " +
    +        "reading unnecessary nested column data.")
    +      .booleanConf
    +      .createWithDefault(true)
    --- End diff --
    
    Giving it more though, I believe it's prudent to choose correctness over 
performance. I will change the default to `false`. "Power users" will set it to 
`true` and (hopefully) report a problem if they run into one.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

Reply via email to