Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20785#discussion_r175153149
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2434,7 +2434,8 @@ private[spark] object Utils extends Logging {
*/
def getSparkOrYarnConfig(conf: SparkConf, key: String, default: String):
String = {
val sparkValue = conf.get(key, default)
- if (conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn") {
+ if (conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn"
--- End diff --
I'm not sure I follow what you're saying, but let me explain how the
configuration is expected to work.
"spark." options are set in "SparkConf". "spark.hadoop.*" options, on top
of those, should also be reflected in any Hadoop `Configuration` objects that
are created.
So you should never need to directly reference "spark.hadoop." properties
in Spark code. They are not meant to be used by Spark, they are meant to be
Hadoop configs. That's why I'm saying your code should not be doing what it is
doing.
From what I understand of what you're trying to do, you want
"spark.shuffle.service.port" to have precedence over the YARN configuration.
For that, you just do what I suggested above. Check whether it's set in the
Spark configuration before you even look at any Hadoop configuration.
The current order of precedence should be:
- spark.hadoop.spark.shuffle.service.port (since it overrides Hadoop config)
- hadoop config (spark.shuffle.service.port set in xml files)
- spark.shuffle.service.port
You're proposing moving the lowest one to the top. That's a simple change.
If you're trying to also fix something else, then it means there's a problem in
another place.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]