Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22887
Spark SQL SET command can't update any static config or Spark core configs,
but I think hadoop configs are different. It's not static as users can update
it via `SparkContext.hadoopConfiguration`.
`SparkSession.SessionState.newHadoopConf()` is a mechanism to allow users to
set hadoop config per-session in Spark SQL.
So it's reasonble for users to expect that, if they set hadoop config via
the SQL SET command, it should override the one in `spark-defaults.conf`.
Looking back at `appendS3AndSparkHadoopConfigurations`, it has 2
parameters: spark conf and hadoop conf. The spark conf comes from
`spark-defaults.conf` and any user provided configs when building the
`SparkContext`. The user provided configs override `spark-defaults.conf`. The
hadoop conf is either an empty config(if `appendS3AndSparkHadoopConfigurations`
is called from `SparkHadoopUtil.newHadoopConfiguration`), or from
`SparkSession.SessionState.newHadoopConf()`(if
`appendS3AndSparkHadoopConfigurations` is called from `HadoopTableReader`).
For the first case, nothing we need to worry about. For the second case, I
think the hadoop config should take priority, as it contains the configs
specified by users at rutime.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]