Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20785#discussion_r175153149
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2434,7 +2434,8 @@ private[spark] object Utils extends Logging {
        */
       def getSparkOrYarnConfig(conf: SparkConf, key: String, default: String): 
String = {
         val sparkValue = conf.get(key, default)
    -    if (conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn") {
    +    if (conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn"
    --- End diff --
    
    I'm not sure I follow what you're saying, but let me explain how the 
configuration is expected to work.
    
    "spark." options are set in "SparkConf". "spark.hadoop.*" options, on top 
of those, should also be reflected in any Hadoop `Configuration` objects that 
are created.
    
    So you should never need to directly reference "spark.hadoop." properties 
in Spark code. They are not meant to be used by Spark, they are meant to be 
Hadoop configs. That's why I'm saying your code should not be doing what it is 
doing.
    
    From what I understand of what you're trying to do, you want 
"spark.shuffle.service.port" to have precedence over the YARN configuration. 
For that, you just do what I suggested above. Check whether it's set in the 
Spark configuration before you even look at any Hadoop configuration.
    
    The current order of precedence should be:
    - spark.hadoop.spark.shuffle.service.port (since it overrides Hadoop config)
    - hadoop config (spark.shuffle.service.port set in xml files)
    - spark.shuffle.service.port
    
    You're proposing moving the lowest one to the top. That's a simple change. 
If you're trying to also fix something else, then it means there's a problem in 
another place.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to