[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

cloud-fan Wed, 28 Nov 2018 04:12:02 -0800

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22887
  
    Spark SQL SET command can't update any static config or Spark core configs, 
but I think hadoop configs are different. It's not static as users can update 
it via `SparkContext.hadoopConfiguration`.  
`SparkSession.SessionState.newHadoopConf()` is a mechanism to allow users to 
set hadoop config per-session in Spark SQL.
    
    So it's reasonble for users to expect that, if they set hadoop config via 
the SQL SET command, it should override the one in `spark-defaults.conf`.
    
    Looking back at `appendS3AndSparkHadoopConfigurations`, it has 2 
parameters: spark conf and hadoop conf. The spark conf comes from 
`spark-defaults.conf` and any user provided configs when building the 
`SparkContext`. The user provided configs override `spark-defaults.conf`. The 
hadoop conf is either an empty config(if `appendS3AndSparkHadoopConfigurations` 
is called from `SparkHadoopUtil.newHadoopConfiguration`), or from 
`SparkSession.SessionState.newHadoopConf()`(if 
`appendS3AndSparkHadoopConfigurations` is called from `HadoopTableReader`).
    
    For the first case, nothing we need to worry about. For the second case, I 
think the hadoop config should take priority, as it contains the configs 
specified by users at rutime.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

Reply via email to