[GitHub] spark issue #22289: [SPARK-25200][YARN] Allow specifying HADOOP_CONF_DIR as ...

adambalogh Tue, 11 Sep 2018 08:42:43 -0700

Github user adambalogh commented on the issue:

    https://github.com/apache/spark/pull/22289
  
    Thank you for the detailed explanation! @vanzin 
    
    I agree with what you are saying, however I'm not sure about some of your 
points about configs, so I would like to find a common ground regarding how 
hadoop/yarn configuration is supposed to work.
    
    Regarding your 3 points about how configs work, I agree with point 1, 
however for point 2, I failed to find documentation about the RM adding its own 
Hadoop config files to the AM/executors' classpath. Is that documented 
somewhere or is that configurable? I did some experimenting where I placed some 
invalid configurations in `HADOOP_CONF_DIR`'s `hdfs-site.xml` (but _not_ in the 
yarn `Client`'s configs in the classpath), and the AM failed to start up, 
indicating that it's actually using the configs from 
[`LOCALIZED_HADOOP_CONF_DIR`](https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1204),
 which is based on the contents of `HADOOP_CONF_DIR`, and not from the RM's 
`hdfs-site.xml` which had the correct configuration.
    
    For point 3, the yarn `Client` does distribute its own Hadoop configs to 
[`SPARK_HADOOP_CONF_FILE`](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L411),
 which should be overlaid on top of AM/executors' configs as you said, however 
it seems like `ApplicationMaster` is actually not doing that, because it 
[doesnât 
use](https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L83)
 the [newConfiguration instance 
method](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L112)
 from `SparkHadoopUtil`, instead it uses the [static newConfiguration 
method](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L434),
 which doesnât do the overlaying. Is that intentional? It seems like it was 
introduced [here](https://github.com/apache/
 spark/pull/19631/files#diff-f442537993cdfc7444783a606b3bd7a4L60)
    
    Sorry for the long comment, and please let me know if I got something wrong.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #22289: [SPARK-25200][YARN] Allow specifying HADOOP_CONF_DIR as ...

Reply via email to