[GitHub] spark pull request #14789: [SPARK-17209][YARN] Add the ability to manually u...

2017-03-02 Thread jerryshao
Github user jerryshao closed the pull request at:

https://github.com/apache/spark/pull/14789


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14789: [SPARK-17209][YARN] Add the ability to manually u...

2016-11-08 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/14789#discussion_r87119403
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -138,6 +138,14 @@ class SparkHadoopUtil extends Logging {
   }
 
   /**
+   * Update credentials manually. This will trigger AM to renew the 
credentials immediately,
+   * also executors and driver (in client mode) will update the 
credentials accordingly.
+   *
+   * Note this will only be worked under YARN cluster manager.
+   */
+  def updateCredentials(sc: SparkContext): Unit = { }
--- End diff --

Agreed, `Future` might be better, I will change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14789: [SPARK-17209][YARN] Add the ability to manually u...

2016-11-08 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/14789#discussion_r87116199
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
 ---
@@ -189,6 +190,39 @@ private[spark] abstract class YarnSchedulerBackend(
   }
 
   /**
+   * Renew and update the credentials. This method will trigger 
credentials renewal in the AM
+   * side, once successfully renewed, it will trigger credentials updating 
in executor and
+   * driver side.
+   */
+  def updateCredentials(): Unit = {
+yarnSchedulerEndpoint.amEndpoint match {
+  case Some(am) =>
+val future = am.ask[Boolean](UpdateCredentials)
+
+future.onSuccess {
+  case true =>
+// Update credentials in the executor side only when AM is 
successfully renewed the
+// Credentials.
+synchronized {
+  
executorDataMap.values.foreach(_.executorEndpoint.send(UpdateCredentials))
+}
+// This will trigger credential updating in the driver side.
+SparkHadoopUtil.get.triggerCredentialUpdater()
--- End diff --

Note that when we modify the above to return future, this will also need to 
be in the future (or can always be done inline).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14789: [SPARK-17209][YARN] Add the ability to manually u...

2016-11-08 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/14789#discussion_r87072908
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
@@ -64,7 +64,7 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
   // must be protected by `CoarseGrainedSchedulerBackend.this`. Besides, 
`executorDataMap` should
   // only be modified in `DriverEndpoint.receive/receiveAndReply` with 
protection by
   // `CoarseGrainedSchedulerBackend.this`.
-  private val executorDataMap = new HashMap[String, ExecutorData]
+  protected val executorDataMap = new HashMap[String, ExecutorData]
--- End diff --

@tgravescs @vanzin Any thoughts on keeping this private and executing a 
closure on values of map for usecases like this ? (instead of exposing it or 
creating a copy via toMap). This can also ensure that outside uses will 
maintain the MT-invariance expected.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14789: [SPARK-17209][YARN] Add the ability to manually u...

2016-11-08 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/14789#discussion_r87075067
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -138,6 +138,14 @@ class SparkHadoopUtil extends Logging {
   }
 
   /**
+   * Update credentials manually. This will trigger AM to renew the 
credentials immediately,
+   * also executors and driver (in client mode) will update the 
credentials accordingly.
+   *
+   * Note this will only be worked under YARN cluster manager.
+   */
+  def updateCredentials(sc: SparkContext): Unit = { }
--- End diff --

You might want to return a Future for this, which can indicate when the 
credentials have been updated : as of now, developers invoking this api have no 
way to know when (if at all) the credentials have been propagated through the 
driver/executors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14789: [SPARK-17209][YARN] Add the ability to manually u...

2016-11-08 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/14789#discussion_r87078341
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
 ---
@@ -189,6 +190,39 @@ private[spark] abstract class YarnSchedulerBackend(
   }
 
   /**
+   * Renew and update the credentials. This method will trigger 
credentials renewal in the AM
+   * side, once successfully renewed, it will trigger credentials updating 
in executor and
+   * driver side.
+   */
+  def updateCredentials(): Unit = {
+yarnSchedulerEndpoint.amEndpoint match {
+  case Some(am) =>
+val future = am.ask[Boolean](UpdateCredentials)
+
+future.onSuccess {
+  case true =>
+// Update credentials in the executor side only when AM is 
successfully renewed the
+// Credentials.
+synchronized {
+  
executorDataMap.values.foreach(_.executorEndpoint.send(UpdateCredentials))
--- End diff --

Use ask instead of send.
This allows for consumers of the api to have reasonable confidence (based 
on Future's timeout) whether the update was propagated.

send could result in a task schedule before the update is processed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14789: [SPARK-17209][YARN] Add the ability to manually u...

2016-08-24 Thread jerryshao
GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/14789

[SPARK-17209][YARN] Add the ability to manually update credentials for 
Spark running on YARN

## What changes were proposed in this pull request?

This PR propose to add a new API in `SparkHadoopUtil` to trigger manual 
credential updating in the run-time.

This is mainly used in long running spark applications which needs to 
access different secured system in the run-time. For example, when Zeppelin / 
Spark Shell needs to access different secured HBase cluster or Hive metastore 
service in the run-time, it requires tokens from new services and updates to 
executors immediately. Previously either we need to relaunch the application to 
get new tokens, or we need to wait until the old tokens to expire to get new 
ones. 

With this new API, user could manually trigger credential updating in the 
run-time when required. Credentials will be renewed in AM and updated in 
executor and driver side.

## How was this patch tested?

Manually verified in the secured cluster.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-17209

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14789.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14789


commit 841bec0dfae79ecfce5dc535949879f397313736
Author: jerryshao 
Date:   2016-08-24T07:12:22Z

Support manual credential update

commit 85370e787b719136cc97cfc358f0e50c55582bb8
Author: jerryshao 
Date:   2016-08-24T13:23:36Z

Change the comments




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org