[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov closed the pull request at: https://github.com/apache/spark/pull/20327 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r176827527 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite { checkResult(finalState, result) } + private def testClusterDriverBind( + uiEnabled: Boolean, + localHost: String, + localIp: String, + success: Boolean): Unit = { +val result = File.createTempFile("result", null, tempDir) +val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass), + appArgs = Seq(result.getAbsolutePath()), + extraConf = Map( +"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost, +"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp, +"spark.ui.enabled" -> uiEnabled.toString + )) +if (success) { + checkResult(finalState, result, "success") +} else { + finalState should be (SparkAppHandle.State.FAILED) +} + } + + test("yarn-cluster driver should be able to bind listeners to MM_HOST") { --- End diff -- Yes, we are, sorry. I'm referring to your change in YarnRMClient.scala. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r176825696 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite { checkResult(finalState, result) } + private def testClusterDriverBind( + uiEnabled: Boolean, + localHost: String, + localIp: String, + success: Boolean): Unit = { +val result = File.createTempFile("result", null, tempDir) +val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass), + appArgs = Seq(result.getAbsolutePath()), + extraConf = Map( +"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost, +"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp, +"spark.ui.enabled" -> uiEnabled.toString + )) +if (success) { + checkResult(finalState, result, "success") +} else { + finalState should be (SparkAppHandle.State.FAILED) +} + } + + test("yarn-cluster driver should be able to bind listeners to MM_HOST") { --- End diff -- Maybe we are talking about different parts of the patch. This thread is attached to my test code where I am demoing how this happens. It doesn't explicitly validate the generated URL's, only how bind works/fails. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r176823881 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite { checkResult(finalState, result) } + private def testClusterDriverBind( + uiEnabled: Boolean, + localHost: String, + localIp: String, + success: Boolean): Unit = { +val result = File.createTempFile("result", null, tempDir) +val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass), + appArgs = Seq(result.getAbsolutePath()), + extraConf = Map( +"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost, +"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp, +"spark.ui.enabled" -> uiEnabled.toString + )) +if (success) { + checkResult(finalState, result, "success") +} else { + finalState should be (SparkAppHandle.State.FAILED) +} + } + + test("yarn-cluster driver should be able to bind listeners to MM_HOST") { --- End diff -- Yes, but your change isn't touching that argument. I was wondering what is the effect of the code you're actually changing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r176822894 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite { checkResult(finalState, result) } + private def testClusterDriverBind( + uiEnabled: Boolean, + localHost: String, + localIp: String, + success: Boolean): Unit = { +val result = File.createTempFile("result", null, tempDir) +val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass), + appArgs = Seq(result.getAbsolutePath()), + extraConf = Map( +"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost, +"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp, +"spark.ui.enabled" -> uiEnabled.toString + )) +if (success) { + checkResult(finalState, result, "success") +} else { + finalState should be (SparkAppHandle.State.FAILED) +} + } + + test("yarn-cluster driver should be able to bind listeners to MM_HOST") { --- End diff -- correct, SPARK_LOCAL_IP/HOSTNAME and SPARK_PUBLIC_DNS play a role in how tracking URL is generated --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r176817667 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite { checkResult(finalState, result) } + private def testClusterDriverBind( + uiEnabled: Boolean, + localHost: String, + localIp: String, + success: Boolean): Unit = { +val result = File.createTempFile("result", null, tempDir) +val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass), + appArgs = Seq(result.getAbsolutePath()), + extraConf = Map( +"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost, +"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp, +"spark.ui.enabled" -> uiEnabled.toString + )) +if (success) { + checkResult(finalState, result, "success") +} else { + finalState should be (SparkAppHandle.State.FAILED) +} + } + + test("yarn-cluster driver should be able to bind listeners to MM_HOST") { --- End diff -- Isn't that the `trackingUrl` which is a separate parameter to the call? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r176816848 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite { checkResult(finalState, result) } + private def testClusterDriverBind( + uiEnabled: Boolean, + localHost: String, + localIp: String, + success: Boolean): Unit = { +val result = File.createTempFile("result", null, tempDir) +val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass), + appArgs = Seq(result.getAbsolutePath()), + extraConf = Map( +"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost, +"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp, +"spark.ui.enabled" -> uiEnabled.toString + )) +if (success) { + checkResult(finalState, result, "success") +} else { + finalState should be (SparkAppHandle.State.FAILED) +} + } + + test("yarn-cluster driver should be able to bind listeners to MM_HOST") { --- End diff -- The URL should have a reachable authority for proxying and direct use. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r176599395 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite { checkResult(finalState, result) } + private def testClusterDriverBind( + uiEnabled: Boolean, + localHost: String, + localIp: String, + success: Boolean): Unit = { +val result = File.createTempFile("result", null, tempDir) +val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass), + appArgs = Seq(result.getAbsolutePath()), + extraConf = Map( +"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost, +"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp, +"spark.ui.enabled" -> uiEnabled.toString + )) +if (success) { + checkResult(finalState, result, "success") +} else { + finalState should be (SparkAppHandle.State.FAILED) +} + } + + test("yarn-cluster driver should be able to bind listeners to MM_HOST") { --- End diff -- > RM is not actively involved here. Right, I meant NM. The rest of the comment for why using Spark's bind address is not correct is still valid though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r176598095 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite { checkResult(finalState, result) } + private def testClusterDriverBind( + uiEnabled: Boolean, + localHost: String, + localIp: String, + success: Boolean): Unit = { +val result = File.createTempFile("result", null, tempDir) +val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass), + appArgs = Seq(result.getAbsolutePath()), + extraConf = Map( +"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost, +"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp, +"spark.ui.enabled" -> uiEnabled.toString + )) +if (success) { + checkResult(finalState, result, "success") +} else { + finalState should be (SparkAppHandle.State.FAILED) +} + } + + test("yarn-cluster driver should be able to bind listeners to MM_HOST") { --- End diff -- I will see what to do with this PR since #20883 is going faster --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r176597452 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite { checkResult(finalState, result) } + private def testClusterDriverBind( + uiEnabled: Boolean, + localHost: String, + localIp: String, + success: Boolean): Unit = { +val result = File.createTempFile("result", null, tempDir) +val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass), + appArgs = Seq(result.getAbsolutePath()), + extraConf = Map( +"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost, +"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp, +"spark.ui.enabled" -> uiEnabled.toString + )) +if (success) { + checkResult(finalState, result, "success") +} else { + finalState should be (SparkAppHandle.State.FAILED) +} + } + + test("yarn-cluster driver should be able to bind listeners to MM_HOST") { --- End diff -- > it assumes that both the RM and the Spark app have the same configuration w.r.t. which interfaces they're binding to. RM is not actively involved here. The driver executes on the NM. The launch context of the driver prescribes to assign `SPARK_LOCAL_IP` and `SPARK_LOCAL_HOSTNAME` on the worker node. Then AM rpcs the tracking URL computed on the NM back to RM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r176585546 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite { checkResult(finalState, result) } + private def testClusterDriverBind( + uiEnabled: Boolean, + localHost: String, + localIp: String, + success: Boolean): Unit = { +val result = File.createTempFile("result", null, tempDir) +val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass), + appArgs = Seq(result.getAbsolutePath()), + extraConf = Map( +"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost, +"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp, +"spark.ui.enabled" -> uiEnabled.toString + )) +if (success) { + checkResult(finalState, result, "success") +} else { + finalState should be (SparkAppHandle.State.FAILED) +} + } + + test("yarn-cluster driver should be able to bind listeners to MM_HOST") { --- End diff -- > the regression of not being able to bind webUI to a specific interface is fixed This can be a unit test for `JettyUtils` if you really care about that. > they demonstrate how to bind RPC and webUI to different interfaces This can be added to the docs. Nobody is going to look at unit test code to figure out how to do that. > It's not wrong. it's one of the reasonable choices. It actually is, because it assumes that both the RM and the Spark app have the same configuration w.r.t. which interfaces they're binding to. But as you say, it's better to use `$NM_HOST` to find the NM's address instead of the current code. > We need the fix in JettyUtils There's a separate PR with that fix too (the one I referenced above was closed and the correct one was opened as #20883). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r176584238 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite { checkResult(finalState, result) } + private def testClusterDriverBind( + uiEnabled: Boolean, + localHost: String, + localIp: String, + success: Boolean): Unit = { +val result = File.createTempFile("result", null, tempDir) +val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass), + appArgs = Seq(result.getAbsolutePath()), + extraConf = Map( +"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost, +"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp, +"spark.ui.enabled" -> uiEnabled.toString + )) +if (success) { + checkResult(finalState, result, "success") +} else { + finalState should be (SparkAppHandle.State.FAILED) +} + } + + test("yarn-cluster driver should be able to bind listeners to MM_HOST") { --- End diff -- > So I'm not sure the tests are actually that useful. The way they're written, as yarn apps, actually makes them very expensive, and this is testing basic networking config that we know will not work if the IPs are invalid. I agree the tests are not cheap. However, they show - the regression of not being able to bind webUI to a specific interface is fixed - they demonstrate how to bind RPC and webUI to different interfaces > The actual change you're introducing - using the bind address as the address of the NM, is actually wrong if you think about it. It just happens that the default value of that config is the local host name. It's not wrong. it's one of the reasonable choices. Moreover it's one that is consistent with the setting executors use to bind RPC. Obviously there can be others. > So basically if setting those env variables in the AM fixes the issue I'm not sure there's any need to change anything in Spark. We need the fix in JettyUtils. After thinking more the change YarnRMClient.scala should use NM_HOST for registerApplicationMaster because it's the right host part of YARN's NodeId. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r176516514 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite { checkResult(finalState, result) } + private def testClusterDriverBind( + uiEnabled: Boolean, + localHost: String, + localIp: String, + success: Boolean): Unit = { +val result = File.createTempFile("result", null, tempDir) +val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass), + appArgs = Seq(result.getAbsolutePath()), + extraConf = Map( +"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost, +"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp, +"spark.ui.enabled" -> uiEnabled.toString + )) +if (success) { + checkResult(finalState, result, "success") +} else { + finalState should be (SparkAppHandle.State.FAILED) +} + } + + test("yarn-cluster driver should be able to bind listeners to MM_HOST") { --- End diff -- `NM_HOST`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r176183580 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- I am sorry, it does not make sense to me to treat RPC and http endpoints inconsistently. Therefore I am removing the logic borrowed from YARN for RPC. Now we have a simpler PR. We can achieve what I need either with `appMasterEnv.SPARK_LOCAL_IP` or cluster-side config. At the same time we keep the prior behavior as you requested. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r175938595 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- Or, again, you can do nothing, leaving the existing code here, and nothing will be broken. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r175934086 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- at best, the way you would do this is like this on each worker: ``` yarn.nodemanager.admin-env MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX,SPARK_LOCAL_IP=$NM_HOST ``` which builds on the fact that NM_HOST is defined earlier on the launch script or or some other value or maybe ``` yarn.nodemanager.admin-env MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX,SPARK_LOCAL_IP=${yarn.nodemanager.bind-host} ``` As previously mentioned if I have to break abstractions and intermix YARN worker settings with Spark environment (which will also unnecessarily passed to other apps) the only thing we need from this patch is the fix of setting the hostname in JettyUtils.scala --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r175873355 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- You can set `SPARK_LOCAL_IP`. If you really want to change this, you must not change the current default behavior, which is to bind to all interfaces. If you manage to do that without doing cluster-specific checks, then I'm ok with it. But none of your current solutions really pass that bar. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r175869822 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- you should not force users to bind to unintended addresses without providing reasonable means to change that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r175836822 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- I'd rather not change it unless it's fixing a problem. I don't see a problem being fixed here. Also, we should avoid adding more and more cluster-specific logic. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r175672010 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- How about the compromise I was suggesting about making the webUI bind change only for YARN. YARN's default bind address at the NM is still `0.0.0.0` but if we change it at the YARN level we get it for both Spark rpc and webUI as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r175629947 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- There's a big different between the RPC endpoint and the web UI. For the RPC endpoint, your change is fixing a potential issue. For the web UI, this change is potentially introducing a new issue in non-YARN mode. So please, revert this one change and all the rest is good to go. Otherwise I can't merge this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r175623955 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- I formulated the problem more broadly in the title of the PR: "NM host for driver end points". It's not a intuitive default behavior to bind to `0.0.0.0` if the backend (YARN) is explicitly configured not to, and we need a mechanism to allow Spark to inherit the YARN-determined bind address on NM. You convinced me that the client mode is less critical, and it's easy to override via environment of spark submit (after the bug fix). Although I'd prefer using bindAddress everywhere for consistency and as it's documented. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r17284 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- I thought the point of this PR is for the driver RPC endpoint to bind the correct address (since it doesn't support binding to 0.0.0.0)? I don't see how the http server changes that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r175551377 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- If we don't bind all listener sockets consistently I don't see a point in this PR (modulo bug fixes). This is supposed to simplify deployment in restricted environments and make it consistent with the documentation. If we leave one of this untreated we force users to keep `SPARK_LOCAL_IP` in sync with related YARN settings. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r175545093 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- Also, same question as before: why not just leave the code as is? It will still work on YARN. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r175544922 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- Even if it's not explicitly documented, that's what the previous code did. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r175544171 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- I can make the change YARN-only. However, I'd like to point out I don't see any documentation stating `0.0.0.0` as the default. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r175533480 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- Sorry, but this is still bothering me. It probably works fine in YARN mode because the UI is proxied by the RM. But that's not true for any other mode, and the default before was to bind to `0.0.0.0` and now it is binding to a specific hostname. And that may not be the public one in a multi-homed machine, so now the UI wouldn't be accessible without extra configuration. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r173611826 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- This is the same issue as with the RPC address. the preference should be given to the setting by YARN admins. Opening a port on an expected network is an additional vulnerability. That said, YARN's default is also 0.0.0.0 which users will get with this patch as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r173611565 --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala --- @@ -342,13 +342,13 @@ private[spark] object JettyUtils extends Logging { -1, -1, connectionFactories: _*) +connector.setHost(hostName) connector.setPort(port) connector.start() // Currently we only use "SelectChannelConnector" // Limit the max acceptor number to 8 so that we don't waste a lot of threads connector.setAcceptQueueSize(math.min(connector.getAcceptors, 8)) --- End diff -- good point --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r173030832 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- Is this change actually required by the rest of your changes? What is the advantage of binding the UI to a specific host instead of "0.0.0.0"? Leaving it as is would also avoid adding `isClusterMode`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r173031021 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -68,6 +69,20 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends private val securityMgr = new SecurityManager(sparkConf) + private val yarnConf = new YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf)) + + if (isClusterMode) { +// If the public address of NM does not adhere Utils.localCanonicalHostname +// there is no way to correctly configure it from the client. To this end, +// this logic replicates the way NM determines the address to bind listeners to +// from yarnConf. It has to be executed on the NM node itself in order to resolve correctly. +// --- End diff -- nit: remove this line --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r173030160 --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala --- @@ -342,13 +342,13 @@ private[spark] object JettyUtils extends Logging { -1, -1, connectionFactories: _*) +connector.setHost(hostName) connector.setPort(port) connector.start() // Currently we only use "SelectChannelConnector" // Limit the max acceptor number to 8 so that we don't waste a lot of threads connector.setAcceptQueueSize(math.min(connector.getAcceptors, 8)) --- End diff -- Show this also be moved before the `start()` call? (Unrelated to your change, but since you're here...) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r171749148 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -79,6 +80,19 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends private val yarnConf = new YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf)) + if (isClusterMode) { +// this logic replicates the way YARN NM determines the address to bind listeners to --- End diff -- please review my updated comment --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r171749008 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,7 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = conf.get(DRIVER_BIND_ADDRESS) --- End diff -- This is a great point, which helped identify another bug where this `host` is entirely ignored. reworked it to behave as was intended on the client node. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r171748638 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -123,6 +123,10 @@ class YarnClusterSuite extends BaseYarnClusterSuite { checkResult(finalState, result) } + test("yarn-cluster should use NM's public address instead of SPARK_LOCAL_*") { +testBasicYarnApp(false, conf = Map("spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> "1.1.1.1")) --- End diff -- They are both valid addresses. `ping`-ability is orthogonal. the point was that you can't bind to it. ``` $ nc -l 1.1.1.1 4040 nc: Can't assign requested address ``` your comment inspired me though to use 0.0.0.1 which can be used only as a [source address](https://tools.ietf.org/rfc/rfc3330.txt) to signify the issue . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r171747629 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -79,6 +80,19 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends private val yarnConf = new YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf)) + if (isClusterMode) { +// this logic replicates the way YARN NM determines the address to bind listeners to +// from yarnConf +// +val nmHostPort = WebAppUtils.getWebAppBindURL(yarnConf, YarnConfiguration.NM_BIND_HOST, + WebAppUtils.getNMWebAppURLWithoutScheme(yarnConf)) +val (nmHost, _) = Utils.parseHostPort(nmHostPort) + +sparkConf.set(DRIVER_HOST_ADDRESS, nmHost) +// propagate to the user class +System.setProperty(DRIVER_HOST_ADDRESS.key, nmHost) --- End diff -- good catch. done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r171379498 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -79,6 +80,19 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends private val yarnConf = new YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf)) + if (isClusterMode) { +// this logic replicates the way YARN NM determines the address to bind listeners to +// from yarnConf +// +val nmHostPort = WebAppUtils.getWebAppBindURL(yarnConf, YarnConfiguration.NM_BIND_HOST, + WebAppUtils.getNMWebAppURLWithoutScheme(yarnConf)) +val (nmHost, _) = Utils.parseHostPort(nmHostPort) + +sparkConf.set(DRIVER_HOST_ADDRESS, nmHost) +// propagate to the user class +System.setProperty(DRIVER_HOST_ADDRESS.key, nmHost) --- End diff -- If you move the block in L77 after the code you're adding, you don't need this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r171381824 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -123,6 +123,10 @@ class YarnClusterSuite extends BaseYarnClusterSuite { checkResult(finalState, result) } + test("yarn-cluster should use NM's public address instead of SPARK_LOCAL_*") { +testBasicYarnApp(false, conf = Map("spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> "1.1.1.1")) --- End diff -- 1.1.1.1 is a valid IP address (try it out, ping it!); you should probably use something invalid here instead (like "0.1.1.1", which you shouldn't be able to use as a target IP address). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r171379388 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -79,6 +80,19 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends private val yarnConf = new YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf)) + if (isClusterMode) { +// this logic replicates the way YARN NM determines the address to bind listeners to --- End diff -- This explain what it does, but it should be explaining why it does it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r171382335 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,7 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = conf.get(DRIVER_BIND_ADDRESS) --- End diff -- This actually changes behavior, because now you'll always be listening on a specific interface instead of "0.0.0.0". Try it out: run spark-shell in local mode; it should try to figure out your non-loopback address for the driver, but you should still be able to connect to "127.0.0.1:4040" in your browser. With your changes that will probably stop working. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r170682155 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -115,6 +116,19 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends } } + if (isClusterMode) { +val nmHostPort = WebAppUtils.getWebAppBindURL(yarnConf, YarnConfiguration.NM_BIND_HOST, --- End diff -- Can this be done above where the configuration is prepared? Then you wouldn't need to manually add this to the system properties since it would be done by the existing code. Also it would be good to add a comment explaining why this is needed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r170682734 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -115,6 +116,19 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends } } + if (isClusterMode) { +val nmHostPort = WebAppUtils.getWebAppBindURL(yarnConf, YarnConfiguration.NM_BIND_HOST, + WebAppUtils.getNMWebAppURLWithoutScheme(yarnConf)) +val (nmHost, _) = Utils.parseHostPort(nmHostPort) + +Seq(DRIVER_BIND_ADDRESS, DRIVER_HOST_ADDRESS) --- End diff -- `DRIVER_BIND_ADDRESS` is something that is not set by Spark anywhere, so it feels a little weird to override it. If the user is setting that explicitly, the code should assume the user knows what he's doing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r169257050 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -744,7 +744,9 @@ object SparkSubmit extends CommandLineUtils with Logging { } // Ignore invalid spark.driver.host in cluster modes. -if (deployMode == CLUSTER) { +if (isYarnCluster) { + sparkConf.set("spark.driver.host", "${env:NM_HOST}") --- End diff -- Sorry for the delay. I verified that moving setting `${env:NM_HOST}` to `yarn/Client.scala` works. However, the most robust method is to use the YARN cluster-side config by replicating how NodeManager determines the public address. I added a unit test failing before the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r165496876 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -744,7 +744,9 @@ object SparkSubmit extends CommandLineUtils with Logging { } // Ignore invalid spark.driver.host in cluster modes. -if (deployMode == CLUSTER) { +if (isYarnCluster) { + sparkConf.set("spark.driver.host", "${env:NM_HOST}") --- End diff -- re: sql, this is not a sql conf object, so sql options have no effect. I think it would be better to have this in YARN's `Client.scala` instead, in the spirit of not adding more backend-specific logic to this class. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r165206579 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -744,7 +744,9 @@ object SparkSubmit extends CommandLineUtils with Logging { } // Ignore invalid spark.driver.host in cluster modes. -if (deployMode == CLUSTER) { +if (isYarnCluster) { + sparkConf.set("spark.driver.host", "${env:NM_HOST}") --- End diff -- Good to know. It seems that substitute is unconditional in core for this `ConfigEntryWithDefaultString`. I liked the brevity of this approach. Pending @vanzin's feedback I can redo it in explicit code again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r165193521 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -744,7 +744,9 @@ object SparkSubmit extends CommandLineUtils with Logging { } // Ignore invalid spark.driver.host in cluster modes. -if (deployMode == CLUSTER) { +if (isYarnCluster) { + sparkConf.set("spark.driver.host", "${env:NM_HOST}") --- End diff -- This actually requires `spark.sql.variable.substitute` set to `true`, though by default it is, if some user accidentally changes it, the code won't work. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r163491802 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -385,7 +386,13 @@ class SparkContext(config: SparkConf) extends Logging { // Set Spark driver host and port system properties. This explicitly sets the configuration // instead of relying on the default value of the config constant. -_conf.set(DRIVER_HOST_ADDRESS, _conf.get(DRIVER_HOST_ADDRESS)) +_conf.set(DRIVER_HOST_ADDRESS, + if (master == "yarn") { +System.getenv(ApplicationConstants.Environment.NM_HOST.toString) --- End diff -- The changes here will also affect yarn client mode. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
GitHub user gerashegalov opened a pull request: https://github.com/apache/spark/pull/20327 [SPARK-12963][CORE] NM host for driver end points ## What changes were proposed in this pull request? Driver end points on YARN in the cluster mode are potentially bound to incorrect network interfaces because the host name is not retrieved from YARN as in the executor container case. ## How was this patch tested? On a cluster previously experiencing `Service 'Driver' failed after 16 retries (on a random free port) ...` You can merge this pull request into a Git repository by running: $ git pull https://github.com/gerashegalov/spark gera/driver-host-on-in-yarn-cluster-mode Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20327.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20327 commit 016323a8d163e31052776a50590b47d9a38b6cdb Author: Gera ShegalovDate: 2018-01-19T06:05:32Z [SPARK-12963][CORE] NM host for driver end points --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org