[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-04-09 Thread gerashegalov
Github user gerashegalov closed the pull request at:

https://github.com/apache/spark/pull/20327


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-23 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r176827527
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  private def testClusterDriverBind(
+  uiEnabled: Boolean,
+  localHost: String,
+  localIp: String,
+  success: Boolean): Unit = {
+val result = File.createTempFile("result", null, tempDir)
+val finalState = runSpark(false, 
mainClassName(YarnClusterDriver.getClass),
+  appArgs = Seq(result.getAbsolutePath()),
+  extraConf = Map(
+"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
+"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
+"spark.ui.enabled" -> uiEnabled.toString
+  ))
+if (success) {
+  checkResult(finalState, result, "success")
+} else {
+  finalState should be (SparkAppHandle.State.FAILED)
+}
+  }
+
+  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
--- End diff --

Yes, we are, sorry. I'm referring to your change in YarnRMClient.scala.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-23 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r176825696
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  private def testClusterDriverBind(
+  uiEnabled: Boolean,
+  localHost: String,
+  localIp: String,
+  success: Boolean): Unit = {
+val result = File.createTempFile("result", null, tempDir)
+val finalState = runSpark(false, 
mainClassName(YarnClusterDriver.getClass),
+  appArgs = Seq(result.getAbsolutePath()),
+  extraConf = Map(
+"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
+"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
+"spark.ui.enabled" -> uiEnabled.toString
+  ))
+if (success) {
+  checkResult(finalState, result, "success")
+} else {
+  finalState should be (SparkAppHandle.State.FAILED)
+}
+  }
+
+  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
--- End diff --

Maybe we are talking about different parts of the patch. This thread is 
attached to my test code where I am demoing how this happens. It doesn't 
explicitly validate the generated URL's, only how bind works/fails.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-23 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r176823881
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  private def testClusterDriverBind(
+  uiEnabled: Boolean,
+  localHost: String,
+  localIp: String,
+  success: Boolean): Unit = {
+val result = File.createTempFile("result", null, tempDir)
+val finalState = runSpark(false, 
mainClassName(YarnClusterDriver.getClass),
+  appArgs = Seq(result.getAbsolutePath()),
+  extraConf = Map(
+"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
+"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
+"spark.ui.enabled" -> uiEnabled.toString
+  ))
+if (success) {
+  checkResult(finalState, result, "success")
+} else {
+  finalState should be (SparkAppHandle.State.FAILED)
+}
+  }
+
+  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
--- End diff --

Yes, but your change isn't touching that argument. I was wondering what is 
the effect of the code you're actually changing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-23 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r176822894
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  private def testClusterDriverBind(
+  uiEnabled: Boolean,
+  localHost: String,
+  localIp: String,
+  success: Boolean): Unit = {
+val result = File.createTempFile("result", null, tempDir)
+val finalState = runSpark(false, 
mainClassName(YarnClusterDriver.getClass),
+  appArgs = Seq(result.getAbsolutePath()),
+  extraConf = Map(
+"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
+"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
+"spark.ui.enabled" -> uiEnabled.toString
+  ))
+if (success) {
+  checkResult(finalState, result, "success")
+} else {
+  finalState should be (SparkAppHandle.State.FAILED)
+}
+  }
+
+  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
--- End diff --

correct, SPARK_LOCAL_IP/HOSTNAME and SPARK_PUBLIC_DNS play a role in how 
tracking URL is generated


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-23 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r176817667
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  private def testClusterDriverBind(
+  uiEnabled: Boolean,
+  localHost: String,
+  localIp: String,
+  success: Boolean): Unit = {
+val result = File.createTempFile("result", null, tempDir)
+val finalState = runSpark(false, 
mainClassName(YarnClusterDriver.getClass),
+  appArgs = Seq(result.getAbsolutePath()),
+  extraConf = Map(
+"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
+"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
+"spark.ui.enabled" -> uiEnabled.toString
+  ))
+if (success) {
+  checkResult(finalState, result, "success")
+} else {
+  finalState should be (SparkAppHandle.State.FAILED)
+}
+  }
+
+  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
--- End diff --

Isn't that the `trackingUrl` which is a separate parameter to the call?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-23 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r176816848
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  private def testClusterDriverBind(
+  uiEnabled: Boolean,
+  localHost: String,
+  localIp: String,
+  success: Boolean): Unit = {
+val result = File.createTempFile("result", null, tempDir)
+val finalState = runSpark(false, 
mainClassName(YarnClusterDriver.getClass),
+  appArgs = Seq(result.getAbsolutePath()),
+  extraConf = Map(
+"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
+"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
+"spark.ui.enabled" -> uiEnabled.toString
+  ))
+if (success) {
+  checkResult(finalState, result, "success")
+} else {
+  finalState should be (SparkAppHandle.State.FAILED)
+}
+  }
+
+  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
--- End diff --

The URL should have a reachable authority for proxying and direct use.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-22 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r176599395
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  private def testClusterDriverBind(
+  uiEnabled: Boolean,
+  localHost: String,
+  localIp: String,
+  success: Boolean): Unit = {
+val result = File.createTempFile("result", null, tempDir)
+val finalState = runSpark(false, 
mainClassName(YarnClusterDriver.getClass),
+  appArgs = Seq(result.getAbsolutePath()),
+  extraConf = Map(
+"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
+"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
+"spark.ui.enabled" -> uiEnabled.toString
+  ))
+if (success) {
+  checkResult(finalState, result, "success")
+} else {
+  finalState should be (SparkAppHandle.State.FAILED)
+}
+  }
+
+  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
--- End diff --

> RM is not actively involved here. 

Right, I meant NM. The rest of the comment for why using Spark's bind 
address is not correct is still valid though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-22 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r176598095
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  private def testClusterDriverBind(
+  uiEnabled: Boolean,
+  localHost: String,
+  localIp: String,
+  success: Boolean): Unit = {
+val result = File.createTempFile("result", null, tempDir)
+val finalState = runSpark(false, 
mainClassName(YarnClusterDriver.getClass),
+  appArgs = Seq(result.getAbsolutePath()),
+  extraConf = Map(
+"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
+"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
+"spark.ui.enabled" -> uiEnabled.toString
+  ))
+if (success) {
+  checkResult(finalState, result, "success")
+} else {
+  finalState should be (SparkAppHandle.State.FAILED)
+}
+  }
+
+  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
--- End diff --

I will see what to do with this PR since #20883 is going faster


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-22 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r176597452
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  private def testClusterDriverBind(
+  uiEnabled: Boolean,
+  localHost: String,
+  localIp: String,
+  success: Boolean): Unit = {
+val result = File.createTempFile("result", null, tempDir)
+val finalState = runSpark(false, 
mainClassName(YarnClusterDriver.getClass),
+  appArgs = Seq(result.getAbsolutePath()),
+  extraConf = Map(
+"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
+"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
+"spark.ui.enabled" -> uiEnabled.toString
+  ))
+if (success) {
+  checkResult(finalState, result, "success")
+} else {
+  finalState should be (SparkAppHandle.State.FAILED)
+}
+  }
+
+  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
--- End diff --

> it assumes that both the RM and the Spark app have the same configuration 
w.r.t. which interfaces they're binding to.
RM is not actively involved here. The driver executes on the NM. The launch 
context of the driver prescribes to assign `SPARK_LOCAL_IP` and 
`SPARK_LOCAL_HOSTNAME` on the worker node. Then AM rpcs the tracking URL 
computed on the NM back to RM.  




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-22 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r176585546
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  private def testClusterDriverBind(
+  uiEnabled: Boolean,
+  localHost: String,
+  localIp: String,
+  success: Boolean): Unit = {
+val result = File.createTempFile("result", null, tempDir)
+val finalState = runSpark(false, 
mainClassName(YarnClusterDriver.getClass),
+  appArgs = Seq(result.getAbsolutePath()),
+  extraConf = Map(
+"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
+"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
+"spark.ui.enabled" -> uiEnabled.toString
+  ))
+if (success) {
+  checkResult(finalState, result, "success")
+} else {
+  finalState should be (SparkAppHandle.State.FAILED)
+}
+  }
+
+  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
--- End diff --

> the regression of not being able to bind webUI to a specific interface is 
fixed

This can be a unit test for `JettyUtils` if you really care about that.

> they demonstrate how to bind RPC and webUI to different interfaces

This can be added to the docs. Nobody is going to look at unit test code to 
figure out how to do that.

> It's not wrong. it's one of the reasonable choices. 

It actually is, because it assumes that both the RM and the Spark app have 
the same configuration w.r.t. which interfaces they're binding to.

But as you say, it's better to use `$NM_HOST` to find the NM's address 
instead of the current code.

> We need the fix in JettyUtils

There's a separate PR with that fix too (the one I referenced above was 
closed and the correct one was opened as #20883).



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-22 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r176584238
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  private def testClusterDriverBind(
+  uiEnabled: Boolean,
+  localHost: String,
+  localIp: String,
+  success: Boolean): Unit = {
+val result = File.createTempFile("result", null, tempDir)
+val finalState = runSpark(false, 
mainClassName(YarnClusterDriver.getClass),
+  appArgs = Seq(result.getAbsolutePath()),
+  extraConf = Map(
+"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
+"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
+"spark.ui.enabled" -> uiEnabled.toString
+  ))
+if (success) {
+  checkResult(finalState, result, "success")
+} else {
+  finalState should be (SparkAppHandle.State.FAILED)
+}
+  }
+
+  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
--- End diff --

> So I'm not sure the tests are actually that useful. The way they're 
written, as yarn apps, actually makes them very expensive, and this is testing 
basic networking config that we know will not work if the IPs are invalid.

I agree the tests are not cheap. However, they show 
- the regression of not being able to bind webUI to a specific interface is 
fixed 
- they demonstrate how to bind RPC and webUI to different interfaces

> The actual change you're introducing - using the bind address as the 
address of the NM, is actually wrong if you think about it. It just happens 
that the default value of that config is the local host name.

It's not wrong. it's one of the reasonable choices. Moreover it's one that 
is consistent with the setting executors use to bind RPC. Obviously there can 
be others.

> So basically if setting those env variables in the AM fixes the issue I'm 
not sure there's any need to change anything in Spark.

We need the fix in JettyUtils. After thinking more the change 
YarnRMClient.scala should use NM_HOST for registerApplicationMaster because 
it's the right host part of YARN's NodeId.



 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-22 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r176516514
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  private def testClusterDriverBind(
+  uiEnabled: Boolean,
+  localHost: String,
+  localIp: String,
+  success: Boolean): Unit = {
+val result = File.createTempFile("result", null, tempDir)
+val finalState = runSpark(false, 
mainClassName(YarnClusterDriver.getClass),
+  appArgs = Seq(result.getAbsolutePath()),
+  extraConf = Map(
+"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
+"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
+"spark.ui.enabled" -> uiEnabled.toString
+  ))
+if (success) {
+  checkResult(finalState, result, "success")
+} else {
+  finalState should be (SparkAppHandle.State.FAILED)
+}
+  }
+
+  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
--- End diff --

`NM_HOST`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-21 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r176183580
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

I am sorry, it does not make sense to me to treat RPC and http endpoints 
inconsistently. Therefore I am removing the logic borrowed from YARN for RPC. 
Now we have a simpler PR. We can achieve what I need either with 
`appMasterEnv.SPARK_LOCAL_IP` or cluster-side config. At  the same time we keep 
the prior behavior as you requested.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-20 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r175938595
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

Or, again, you can do nothing, leaving the existing code here, and nothing 
will be broken.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-20 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r175934086
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

at best, the way you would do this is like this on each worker:
```

yarn.nodemanager.admin-env
MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX,SPARK_LOCAL_IP=$NM_HOST

```
which builds on the fact that NM_HOST is defined earlier on the launch 
script or 
or some other value or maybe 
```

yarn.nodemanager.admin-env

MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX,SPARK_LOCAL_IP=${yarn.nodemanager.bind-host}

```
As previously mentioned if I have to break abstractions and intermix YARN 
worker settings with Spark environment (which will also unnecessarily passed to 
other apps)  the only thing we need from this patch is the fix of setting the 
hostname  in JettyUtils.scala


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-20 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r175873355
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

You can set `SPARK_LOCAL_IP`.

If you really want to change this, you must not change the current default 
behavior, which is to bind to all interfaces. If you manage to do that without 
doing cluster-specific checks, then I'm ok with it. But none of your current 
solutions really pass that bar.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-20 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r175869822
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

you should not force users to bind to unintended addresses without 
providing reasonable means to change that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-20 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r175836822
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

I'd rather not change it unless it's fixing a problem. I don't see a 
problem being fixed here. Also, we should avoid adding more and more 
cluster-specific logic.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-20 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r175672010
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

How about the compromise I was suggesting about making the webUI bind 
change only for YARN. YARN's default bind address at the NM is still `0.0.0.0` 
but if we change it at the YARN level we get it for both Spark rpc and webUI as 
well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-19 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r175629947
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

There's a big different between the RPC endpoint and the web UI.

For the RPC endpoint, your change is fixing a potential issue. For the web 
UI, this change is potentially introducing a new issue in non-YARN mode. So 
please, revert this one change and all the rest is good to go.

Otherwise I can't merge this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-19 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r175623955
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

I formulated the problem more broadly in the title of the PR: "NM host for 
driver end points". It's not a intuitive default behavior to bind to `0.0.0.0` 
if the backend (YARN) is explicitly configured not to, and we need a mechanism 
to allow Spark to inherit the YARN-determined bind address on NM.
You convinced me that the client mode is less critical, and it's easy to 
override via environment of spark submit (after the bug fix). Although I'd 
prefer using bindAddress everywhere for consistency and as it's documented.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-19 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r17284
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

I thought the point of this PR is for the driver RPC endpoint to bind the 
correct address (since it doesn't support binding to 0.0.0.0)?

I don't see how the http server changes that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-19 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r175551377
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

If we don't bind all listener sockets consistently I don't see a point in 
this PR (modulo bug fixes). This is supposed to simplify deployment in 
restricted environments and make it consistent with the documentation. If we 
leave one of this untreated we force users to keep `SPARK_LOCAL_IP`  in sync 
with related YARN settings.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-19 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r175545093
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

Also, same question as before: why not just leave the code as is? It will 
still work on YARN.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-19 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r175544922
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

Even if it's not explicitly documented, that's what the previous code did.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-19 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r175544171
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

I can make the change YARN-only. 
However, I'd like to point out I don't see any documentation stating 
`0.0.0.0` as the default.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-19 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r175533480
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

Sorry, but this is still bothering me. It probably works fine in YARN mode 
because the UI is proxied by the RM. But that's not true for any other mode, 
and the default before was to bind to `0.0.0.0` and now it is binding to a 
specific hostname. And that may not be the public one in a multi-homed machine, 
so now the UI wouldn't be accessible without extra configuration.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-09 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r173611826
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

This is the same issue as with the RPC address. the preference should be 
given to the setting by YARN admins. Opening a port on an expected network is 
an additional vulnerability. That said, YARN's default is also 0.0.0.0 which 
users will get with this patch as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-09 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r173611565
  
--- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
@@ -342,13 +342,13 @@ private[spark] object JettyUtils extends Logging {
   -1,
   -1,
   connectionFactories: _*)
+connector.setHost(hostName)
 connector.setPort(port)
 connector.start()
 
 // Currently we only use "SelectChannelConnector"
 // Limit the max acceptor number to 8 so that we don't waste a lot 
of threads
 connector.setAcceptQueueSize(math.min(connector.getAcceptors, 8))
--- End diff --

good point


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r173030832
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

Is this change actually required by the rest of your changes? What is the 
advantage of binding the UI to a specific host instead of "0.0.0.0"?

Leaving it as is would also avoid adding `isClusterMode`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r173031021
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -68,6 +69,20 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 
   private val securityMgr = new SecurityManager(sparkConf)
 
+  private val yarnConf = new 
YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf))
+
+  if (isClusterMode) {
+// If the public address of NM does not adhere 
Utils.localCanonicalHostname
+// there is no way to correctly configure it from the client. To this 
end,
+// this logic replicates the way NM determines the address to bind 
listeners to
+// from yarnConf. It has to be executed on the NM node itself in order 
to resolve correctly.
+//
--- End diff --

nit: remove this line


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r173030160
  
--- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
@@ -342,13 +342,13 @@ private[spark] object JettyUtils extends Logging {
   -1,
   -1,
   connectionFactories: _*)
+connector.setHost(hostName)
 connector.setPort(port)
 connector.start()
 
 // Currently we only use "SelectChannelConnector"
 // Limit the max acceptor number to 8 so that we don't waste a lot 
of threads
 connector.setAcceptQueueSize(math.min(connector.getAcceptors, 8))
--- End diff --

Show this also be moved before the `start()` call? (Unrelated to your 
change, but since you're here...)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-01 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r171749148
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -79,6 +80,19 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 
   private val yarnConf = new 
YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf))
 
+  if (isClusterMode) {
+// this logic replicates the way YARN NM determines the address to 
bind listeners to
--- End diff --

please review my updated comment


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-01 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r171749008
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,7 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = conf.get(DRIVER_BIND_ADDRESS)
--- End diff --

This is a great point, which helped identify another bug where this `host` 
is entirely ignored. 

reworked it to behave as was intended on the client node.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-01 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r171748638
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -123,6 +123,10 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  test("yarn-cluster should use NM's public address instead  of 
SPARK_LOCAL_*") {
+testBasicYarnApp(false, conf = 
Map("spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> "1.1.1.1"))
--- End diff --

They are both valid addresses. `ping`-ability is orthogonal. the point was 
that you can't bind to it.
```
$ nc -l 1.1.1.1 4040
nc: Can't assign requested address
```
your comment inspired me though to use 0.0.0.1 which can be used only as a 
[source address](https://tools.ietf.org/rfc/rfc3330.txt) to signify the issue .
 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-01 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r171747629
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -79,6 +80,19 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 
   private val yarnConf = new 
YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf))
 
+  if (isClusterMode) {
+// this logic replicates the way YARN NM determines the address to 
bind listeners to
+// from yarnConf
+//
+val nmHostPort = WebAppUtils.getWebAppBindURL(yarnConf, 
YarnConfiguration.NM_BIND_HOST,
+  WebAppUtils.getNMWebAppURLWithoutScheme(yarnConf))
+val (nmHost, _) = Utils.parseHostPort(nmHostPort)
+
+sparkConf.set(DRIVER_HOST_ADDRESS, nmHost)
+// propagate to the user class
+System.setProperty(DRIVER_HOST_ADDRESS.key, nmHost)
--- End diff --

good catch. done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-02-28 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r171379498
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -79,6 +80,19 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 
   private val yarnConf = new 
YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf))
 
+  if (isClusterMode) {
+// this logic replicates the way YARN NM determines the address to 
bind listeners to
+// from yarnConf
+//
+val nmHostPort = WebAppUtils.getWebAppBindURL(yarnConf, 
YarnConfiguration.NM_BIND_HOST,
+  WebAppUtils.getNMWebAppURLWithoutScheme(yarnConf))
+val (nmHost, _) = Utils.parseHostPort(nmHostPort)
+
+sparkConf.set(DRIVER_HOST_ADDRESS, nmHost)
+// propagate to the user class
+System.setProperty(DRIVER_HOST_ADDRESS.key, nmHost)
--- End diff --

If you move the block in L77 after the code you're adding, you don't need 
this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-02-28 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r171381824
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -123,6 +123,10 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  test("yarn-cluster should use NM's public address instead  of 
SPARK_LOCAL_*") {
+testBasicYarnApp(false, conf = 
Map("spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> "1.1.1.1"))
--- End diff --

1.1.1.1 is a valid IP address (try it out, ping it!); you should probably 
use something invalid here instead (like "0.1.1.1", which you shouldn't be able 
to use as a target IP address).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-02-28 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r171379388
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -79,6 +80,19 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 
   private val yarnConf = new 
YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf))
 
+  if (isClusterMode) {
+// this logic replicates the way YARN NM determines the address to 
bind listeners to
--- End diff --

This explain what it does, but it should be explaining why it does it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-02-28 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r171382335
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,7 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = conf.get(DRIVER_BIND_ADDRESS)
--- End diff --

This actually changes behavior, because now you'll always be listening on a 
specific interface instead of "0.0.0.0".

Try it out: run spark-shell in local mode; it should try to figure out your 
non-loopback address for the driver, but you should still be able to connect to 
"127.0.0.1:4040" in your browser. With your changes that will probably stop 
working.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-02-26 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r170682155
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -115,6 +116,19 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 }
   }
 
+  if (isClusterMode) {
+val nmHostPort = WebAppUtils.getWebAppBindURL(yarnConf, 
YarnConfiguration.NM_BIND_HOST,
--- End diff --

Can this be done above where the configuration is prepared? Then you 
wouldn't need to manually add this to the system properties since it would be 
done by the existing code.

Also it would be good to add a comment explaining why this is needed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-02-26 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r170682734
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -115,6 +116,19 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments) extends
 }
   }
 
+  if (isClusterMode) {
+val nmHostPort = WebAppUtils.getWebAppBindURL(yarnConf, 
YarnConfiguration.NM_BIND_HOST,
+  WebAppUtils.getNMWebAppURLWithoutScheme(yarnConf))
+val (nmHost, _) = Utils.parseHostPort(nmHostPort)
+
+Seq(DRIVER_BIND_ADDRESS, DRIVER_HOST_ADDRESS)
--- End diff --

`DRIVER_BIND_ADDRESS` is something that is not set by Spark anywhere, so it 
feels a little weird to override it. If the user is setting that explicitly, 
the code should assume the user knows what he's doing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-02-20 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r169257050
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -744,7 +744,9 @@ object SparkSubmit extends CommandLineUtils with 
Logging {
 }
 
 // Ignore invalid spark.driver.host in cluster modes.
-if (deployMode == CLUSTER) {
+if (isYarnCluster) {
+  sparkConf.set("spark.driver.host", "${env:NM_HOST}")
--- End diff --

Sorry for the delay. I verified that moving setting `${env:NM_HOST}` to 
`yarn/Client.scala` works. However, the most robust method is to use the YARN 
cluster-side config by replicating how NodeManager determines the public 
address. I added a unit test failing before the PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-02-01 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r165496876
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -744,7 +744,9 @@ object SparkSubmit extends CommandLineUtils with 
Logging {
 }
 
 // Ignore invalid spark.driver.host in cluster modes.
-if (deployMode == CLUSTER) {
+if (isYarnCluster) {
+  sparkConf.set("spark.driver.host", "${env:NM_HOST}")
--- End diff --

re: sql, this is not a sql conf object, so sql options have no effect.

I think it would be better to have this in YARN's `Client.scala` instead, 
in the spirit of not adding more backend-specific logic to this class.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-01-31 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r165206579
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -744,7 +744,9 @@ object SparkSubmit extends CommandLineUtils with 
Logging {
 }
 
 // Ignore invalid spark.driver.host in cluster modes.
-if (deployMode == CLUSTER) {
+if (isYarnCluster) {
+  sparkConf.set("spark.driver.host", "${env:NM_HOST}")
--- End diff --

Good to know. It seems that substitute is unconditional in core for this 
`ConfigEntryWithDefaultString`. I liked the brevity of this approach. Pending 
@vanzin's feedback I can redo it in explicit code again.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-01-31 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r165193521
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -744,7 +744,9 @@ object SparkSubmit extends CommandLineUtils with 
Logging {
 }
 
 // Ignore invalid spark.driver.host in cluster modes.
-if (deployMode == CLUSTER) {
+if (isYarnCluster) {
+  sparkConf.set("spark.driver.host", "${env:NM_HOST}")
--- End diff --

This actually requires `spark.sql.variable.substitute` set to `true`, 
though by default it is, if some user accidentally changes it, the code won't 
work.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-01-24 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r163491802
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -385,7 +386,13 @@ class SparkContext(config: SparkConf) extends Logging {
 
 // Set Spark driver host and port system properties. This explicitly 
sets the configuration
 // instead of relying on the default value of the config constant.
-_conf.set(DRIVER_HOST_ADDRESS, _conf.get(DRIVER_HOST_ADDRESS))
+_conf.set(DRIVER_HOST_ADDRESS,
+  if (master == "yarn") {
+System.getenv(ApplicationConstants.Environment.NM_HOST.toString)
--- End diff --

The changes here will also affect yarn client mode.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-01-18 Thread gerashegalov
GitHub user gerashegalov opened a pull request:

https://github.com/apache/spark/pull/20327

[SPARK-12963][CORE] NM host for driver end points

## What changes were proposed in this pull request?

Driver end points on YARN in the cluster mode are potentially bound to 
incorrect network interfaces because the host name is not retrieved from YARN 
as in the executor container case.

## How was this patch tested?

On a cluster previously experiencing `Service 'Driver' failed after 16 
retries  (on a random free port) ...`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gerashegalov/spark 
gera/driver-host-on-in-yarn-cluster-mode

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20327.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20327


commit 016323a8d163e31052776a50590b47d9a38b6cdb
Author: Gera Shegalov 
Date:   2018-01-19T06:05:32Z

[SPARK-12963][CORE] NM host for driver end points




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org