[GitHub] spark issue #21150: [SPARK-24075][MESOS] Option to limit number of retries f...

2018-10-22 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/21150
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21150: [SPARK-24075][MESOS] Option to limit number of re...

2018-10-22 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21150#discussion_r227028091
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ---
@@ -728,6 +729,28 @@ private[spark] class MesosClusterScheduler(
   state == MesosTaskState.TASK_LOST
   }
 
+  /**
+   * Check if the driver has exceed the number of retries.
+   * When "spark.mesos.driver.supervise.maxRetries" is not set,
+   * the default behavior is to retry indefinitely
+   *
+   * @param retryState Retry state of the driver
+   * @param conf Spark Context to check if it contains 
"spark.mesos.driver.supervise.maxRetries"
+   * @return true if driver has reached retry limit
+   * false if driver can be retried
+   */
+  private[scheduler] def hasDriverExceededRetries(retryState: 
Option[MesosClusterRetryState],
--- End diff --

Please fix the param style:
hasDriverExceededRetries(
 retryState: Option[MesosClusterRetryState],
 conf.) 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22146: [SPARK-24434][K8S] pod template files

2018-09-12 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/22146
  
The PR works for me now as well for adding volumes to executors.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files

2018-08-30 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22146#discussion_r214205873
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
 ---
@@ -59,5 +66,28 @@ private[spark] object KubernetesUtils {
 }
   }
 
+  def loadPodFromTemplate(
+  kubernetesClient: KubernetesClient,
+  templateFile: File): SparkPod = {
+try {
+  val pod = kubernetesClient.pods().load(templateFile).get()
+  pod.getSpec.getContainers.asScala.toList match {
+case first :: rest => SparkPod(
+  new PodBuilder(pod)
+.editSpec()
+  .withContainers(rest.asJava)
+  .endSpec()
+.build(),
+  first)
+case Nil => SparkPod(pod, new ContainerBuilder().build())
+  }
+} catch {
+  case e: Exception =>
+logError(
+  s"Encountered exception while attempting to load initial pod 
spec from file", e)
+throw new SparkException("Could not load driver pod from template 
file.", e)
--- End diff --

This error message is misleading, it throws when both executor and driver 
pod failed to load from its own template. Either remove "driver" or be more 
specific that its executor or driver.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22146: [SPARK-24434][K8S] pod template files

2018-08-29 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/22146
  
I've been looking to mount additional volumes to the executor containers 
and just tried out the PR.
It doesn't seem possible since if you add the container in the pod 
template, BasicExecutorFeatureStep still adds another executor container and 
then it becomes a invalid pod spec.
I think it's worth considering if we want pod templates to have the ability 
to modify or add existing elements that the code adds to the pod spec, or this 
is only for adding additional attributes than what the code already does.
It seems like the latter is simplest, but just throwing out here that for 
features like adding volumes to executor this won't work.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Server docs...

2018-08-13 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/22071
  
LGTM as well


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Serv...

2018-08-13 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22071#discussion_r209714362
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/MesosClusterDispatcher.scala
 ---
@@ -51,6 +51,13 @@ private[mesos] class MesosClusterDispatcher(
 conf: SparkConf)
   extends Logging {
 
+  {
+val authKey = SecurityManager.SPARK_AUTH_SECRET_CONF
--- End diff --

Got it, my reasoning is that it could be harder for someone looking at the 
code to figure out why this is not allowed, since we don't really mention about 
the rest server which is really the one requiring security to be turned off. 
Another reason it will be beneficial to have the check in the MesosRestServer 
is that the MesosClusterDispatcher framework could technically be decoupled 
from the MesosRestServer and allow another way to receive requests, so to 
increase flexibility and avoid someone forgetting about why we put this here, 
my suggestion is to move the check closer to where it's being required will 
help maintain this a bit better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Serv...

2018-08-12 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22071#discussion_r209464703
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/MesosClusterDispatcher.scala
 ---
@@ -51,6 +51,13 @@ private[mesos] class MesosClusterDispatcher(
 conf: SparkConf)
   extends Logging {
 
+  {
+val authKey = SecurityManager.SPARK_AUTH_SECRET_CONF
--- End diff --

I think it might be better to place this in the MesosRestServer code, since 
it's not really about the framework (MesosClusterDispatcher) but the RestServer 
receiving requests. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21027: [SPARK-23943][MESOS][DEPLOY] Improve observabilit...

2018-08-08 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21027#discussion_r208468846
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala ---
@@ -63,6 +63,8 @@ private[spark] abstract class RestSubmissionServer(
 s"$baseContext/create/*" -> submitRequestServlet,
 s"$baseContext/kill/*" -> killRequestServlet,
 s"$baseContext/status/*" -> statusRequestServlet,
+"/health" -> new ServerStatusServlet(this),
+"/status" -> new ServerStatusServlet(this),
--- End diff --

Also, what is the intended user for this information? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21027: [SPARK-23943][MESOS][DEPLOY] Improve observabilit...

2018-08-08 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21027#discussion_r208468697
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/deploy/rest/mesos/MesosRestServer.scala
 ---
@@ -50,6 +50,24 @@ private[spark] class MesosRestServer(
 new MesosKillRequestServlet(scheduler, masterConf)
   protected override val statusRequestServlet =
 new MesosStatusRequestServlet(scheduler, masterConf)
+
+  override def isServerHealthy(): Boolean = 
!scheduler.isSchedulerDriverStopped()
+
+  override def serverStatus(): ServerStatusResponse = {
+val s = new ServerStatusResponse
+s.schedulerDriverStopped = scheduler.isSchedulerDriverStopped()
+s.queuedDrivers = scheduler.getQueuedDriversSize
+s.launchedDrivers = scheduler.getLaunchedDriversSize
+s.pendingRetryDrivers = scheduler.getPendingRetryDriversSize
+s.success = true
+s.message = "iamok"
--- End diff --

How about leaving this blank?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21027: [SPARK-23943][MESOS][DEPLOY] Improve observabilit...

2018-08-08 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21027#discussion_r208468594
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala ---
@@ -331,3 +345,15 @@ private class ErrorServlet extends RestServlet {
 sendResponse(error, response)
   }
 }
+
+private class ServerStatusServlet(server: RestSubmissionServer) extends 
RestServlet {
+  override def doGet(req: HttpServletRequest, resp: HttpServletResponse): 
Unit = {
+val path = req.getRequestURI
+if (!server.isServerHealthy() && path == "/health") {
--- End diff --

I would switch the order (check path first). However, from this logic, if 
server is healthy and request is asking for /health, it will return status 
instead? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21027: [SPARK-23943][MESOS][DEPLOY] Improve observabilit...

2018-08-08 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21027#discussion_r208468305
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala ---
@@ -63,6 +63,8 @@ private[spark] abstract class RestSubmissionServer(
 s"$baseContext/create/*" -> submitRequestServlet,
 s"$baseContext/kill/*" -> killRequestServlet,
 s"$baseContext/status/*" -> statusRequestServlet,
+"/health" -> new ServerStatusServlet(this),
--- End diff --

This impacts the Rest submission server in general too, I do like the idea 
to provide an endpoint to get status but I'm not sure this is a paradigm that 
Spark is going for. I know the common pattern is to poll Spark metrics to 
understand status of the components. @felixcheung do you have thoughts around 
this?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21027: [SPARK-23943][MESOS][DEPLOY] Improve observabilit...

2018-08-08 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21027#discussion_r208467704
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala
 ---
@@ -160,7 +161,10 @@ trait MesosSchedulerUtils extends Logging {
   logError("driver.run() failed", e)
   error = Some(e)
   markErr()
-  }
+  } finally {
+logWarning("schedulerDriver stopped")
+schedulerDriverStopped.set(true)
+}
--- End diff --

Fix indent


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21006: [SPARK-22256][MESOS] - Introduce spark.mesos.driver.memo...

2018-08-01 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/21006
  
Besides the test comment, everything else LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21006: [SPARK-22256][MESOS] - Introduce spark.mesos.driv...

2018-08-01 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21006#discussion_r207060617
  
--- Diff: 
resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala
 ---
@@ -199,6 +200,33 @@ class MesosClusterSchedulerSuite extends SparkFunSuite 
with LocalSparkContext wi
 })
   }
 
+  test("supports spark.mesos.driver.memoryOverhead") {
+setScheduler()
+
+val mem = 1000
+val cpu = 1
+
+val response = scheduler.submitDriver(
+  new MesosDriverDescription("d1", "jar", mem, cpu, true,
+command,
+Map("spark.mesos.executor.home" -> "test",
+  "spark.app.name" -> "test"),
+"s1",
+new Date()))
+assert(response.success)
+
+val offer = Utils.createOffer("o1", "s1", mem*2, cpu)
+scheduler.resourceOffers(driver, List(offer).asJava)
+val tasks = Utils.verifyTaskLaunched(driver, "o1")
+// 1384.0
+val taskMem = tasks.head.getResourcesList
+  .asScala
+  .filter(_.getName.equals("mem"))
+  .map(_.getScalar.getValue)
+  .head
+assert(1384.0 === taskMem)
--- End diff --

Can we also test the 10% case as well?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21006: [SPARK-22256][MESOS] - Introduce spark.mesos.driver.memo...

2018-08-01 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/21006
  
jenkins ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20451: [SPARK-23146][WIP] Support client mode for Kubern...

2018-07-04 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20451#discussion_r200020286
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala
 ---
@@ -18,43 +18,68 @@ package org.apache.spark.scheduler.cluster.k8s
 
 import java.io.File
 
-import io.fabric8.kubernetes.client.Config
+import io.fabric8.kubernetes.client.{Config, KubernetesClient}
 
-import org.apache.spark.{SparkContext, SparkException}
+import org.apache.spark.{SparkContext, SparkConf}
 import org.apache.spark.deploy.k8s.{KubernetesUtils, 
SparkKubernetesClientFactory}
 import org.apache.spark.deploy.k8s.Config._
 import org.apache.spark.deploy.k8s.Constants._
 import org.apache.spark.internal.Logging
 import org.apache.spark.scheduler.{ExternalClusterManager, 
SchedulerBackend, TaskScheduler, TaskSchedulerImpl}
 import org.apache.spark.util.ThreadUtils
 
-private[spark] class KubernetesClusterManager extends 
ExternalClusterManager with Logging {
+trait ManagerSpecificHandlers {
+   def createKubernetesClient(sparkConf: SparkConf): KubernetesClient
+ }
 
-  override def canCreate(masterURL: String): Boolean = 
masterURL.startsWith("k8s")
+private[spark] class KubernetesClusterManager extends 
ExternalClusterManager
+  with ManagerSpecificHandlers with Logging {
 
-  override def createTaskScheduler(sc: SparkContext, masterURL: String): 
TaskScheduler = {
-if (masterURL.startsWith("k8s") &&
-  sc.deployMode == "client" &&
-  !sc.conf.get(KUBERNETES_DRIVER_SUBMIT_CHECK).getOrElse(false)) {
-  throw new SparkException("Client mode is currently not supported for 
Kubernetes.")
+ class InClusterHandlers extends ManagerSpecificHandlers {
+   override def createKubernetesClient(sparkConf: SparkConf): 
KubernetesClient =
+   SparkKubernetesClientFactory.createKubernetesClient(
+   KUBERNETES_MASTER_INTERNAL_URL,
+   Some(sparkConf.get(KUBERNETES_NAMESPACE)),
+   APISERVER_AUTH_DRIVER_MOUNTED_CONF_PREFIX,
--- End diff --

Why do we need a separate conf prefix as well?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20451: [SPARK-23146][WIP] Support client mode for Kubern...

2018-07-01 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20451#discussion_r199340795
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
 ---
@@ -140,13 +140,6 @@ private[spark] class Client(
   throw e
   }
 
-  if (waitForAppCompletion) {
--- End diff --

Why is this not needed anymore? If we enable cluster mode we still want the 
same behavior defined here right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20451: [SPARK-23146][WIP] Support client mode for Kubern...

2018-07-01 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20451#discussion_r199340757
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -88,6 +103,56 @@ private[spark] object SparkKubernetesClientFactory {
 new DefaultKubernetesClient(httpClientWithCustomDispatcher, config)
   }
 
+  def createOutClusterKubernetesClient(
+ master: String,
--- End diff --

Fix ident


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20451: [SPARK-23146][WIP] Support client mode for Kubern...

2018-07-01 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20451#discussion_r199340748
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -88,6 +103,56 @@ private[spark] object SparkKubernetesClientFactory {
 new DefaultKubernetesClient(httpClientWithCustomDispatcher, config)
   }
 
+  def createOutClusterKubernetesClient(
+ master: String,
+ namespace: Option[String],
+ kubernetesAuthConfPrefix: String,
+ sparkConf: SparkConf,
+ maybeServiceAccountToken: Option[File],
+ maybeServiceAccountCaCert: Option[File]): 
KubernetesClient = {
+ val oauthTokenFileConf = 
s"$kubernetesAuthConfPrefix.$OAUTH_TOKEN_FILE_CONF_SUFFIX"
+ val oauthTokenConf = 
s"$kubernetesAuthConfPrefix.$OAUTH_TOKEN_CONF_SUFFIX"
+ val oauthTokenFile = sparkConf.getOption(oauthTokenFileConf)
+   .map(new File(_))
+   .orElse(maybeServiceAccountToken)
+ val oauthTokenValue = sparkConf.getOption(oauthTokenConf)
+ OptionRequirements.requireNandDefined(
--- End diff --

Since it's only used once I'm not sure it warrents a separate file/method 
for checking Options.
Also the method signature isn't quite clear for me what it does. 
(especially requireN)
How about just a simple match that @squito suggested here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...

2018-04-12 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/21033
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard ...

2018-04-12 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21033#discussion_r181199842
  
--- Diff: 
resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala
 ---
@@ -165,18 +165,47 @@ class MesosCoarseGrainedSchedulerBackendSuite extends 
SparkFunSuite
   }
 
 
-  test("mesos does not acquire more than spark.mesos.gpus.max") {
-val maxGpus = 5
-setBackend(Map("spark.mesos.gpus.max" -> maxGpus.toString))
+  test("mesos acquires spark.mesos.executor.gpus number of gpus per 
executor") {
+setBackend(Map("spark.mesos.gpus.max" -> "5",
+   "spark.mesos.executor.gpus" -> "2"))
 
 val executorMemory = backend.executorMemory(sc)
-offerResources(List(Resources(executorMemory, 1, maxGpus + 1)))
+offerResources(List(Resources(executorMemory, 1, 5)))
 
 val taskInfos = verifyTaskLaunched(driver, "o1")
 assert(taskInfos.length == 1)
 
 val gpus = backend.getResource(taskInfos.head.getResourcesList, "gpus")
-assert(gpus == maxGpus)
+assert(gpus == 2)
+  }
+
+
+  test("mesos declines offers that cannot satisfy 
spark.mesos.executor.gpus") {
+setBackend(Map("spark.mesos.gpus.max" -> "5",
--- End diff --

I think it's worth testing setting max less than the number of executor 
gpus as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17714: [SPARK-20428][Core]REST interface about 'v1/submissions/...

2017-06-01 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/17714
  
Unfortunately I'm not a committer, so need to loop in someone who is to 
help merge it though. @srowen do you know who's responsible for the general 
deploy package?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17714: [SPARK-20428][Core]REST interface about 'v1/submi...

2017-04-30 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17714#discussion_r114089418
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala ---
@@ -214,15 +214,15 @@ private[rest] abstract class KillRequestServlet 
extends RestServlet {
   protected override def doPost(
   request: HttpServletRequest,
   response: HttpServletResponse): Unit = {
-val submissionId = parseSubmissionId(request.getPathInfo)
-val responseMessage = submissionId.map(handleKill).getOrElse {
+val submissionIds = parseSubmissionId(request.getPathInfo)
--- End diff --

I don't think having submission Ids parsed on the request path is a good 
idea.
I would assume most use cases for batch delete is required when you have a 
larger number of drivers to delete (otherwise you would be fine just deleting a 
few one by one).
But most URLs are length limited.
You might be better have creating a new request for this that takes a body


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17109: [SPARK-19740][MESOS]Add support in Spark to pass ...

2017-04-15 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17109#discussion_r111670608
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala
 ---
@@ -99,6 +99,26 @@ private[mesos] object MesosSchedulerBackendUtil extends 
Logging {
 .toList
   }
 
+  /**
+   * Parse a list of docker parameters, each of which
+   * takes the form key=value
+   */
+  private def parseParamsSpec(params: String): List[Parameter] = {
+params.split(",").map(_.split("=")).flatMap { spec: Array[String] =>
--- End diff --

I see, so we should split with a limit instead.
@yanji84 can you fix this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17109: [SPARK-19740][MESOS]Add support in Spark to pass arbitra...

2017-04-11 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/17109
  
@srowen Appreciate the help you're doing, I think we're doing what we can 
to help review these patches and making sure Mesos support is still being 
maintained and improved over time. 
If you trust our judgement and also us still around fixing issues when 
arises, then we really just need someone like you to help merge patches. 
Ensuring someone else or if anyone that's been contributing to this area 
can become a committer definitely is a ever ongoing problem that we're still 
hoping one day can be addressed. Another parallel effort that I think is very 
worth investigating is to decouple the cluster manager intergation from Spark, 
which I believe is becoming more relevant now as we have more integration 
coming.

Long story short, if you can still help in the mean time will be greatly 
appreciated as we can still make sure improvements around Mesos integration can 
still happen


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17109: [SPARK-19740][MESOS]Add support in Spark to pass ...

2017-03-14 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17109#discussion_r106053115
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala
 ---
@@ -99,6 +99,26 @@ private[mesos] object MesosSchedulerBackendUtil extends 
Logging {
 .toList
   }
 
+  /**
+   * Parse a list of docker parameters, each of which
+   * takes the form key=value
+   */
+  private def parseParamsSpec(params: String): List[Parameter] = {
+params.split(",").map(_.split("=")).flatMap { spec: Array[String] =>
--- End diff --

hmm if a value contains a '=' we will have a parsing error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17109: [SPARK-19740][MESOS]Add support in Spark to pass arbitra...

2017-03-09 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/17109
  
@srowen @mgummelt PTAL


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17109: [SPARK-19740][MESOS]Add support in Spark to pass arbitra...

2017-03-09 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/17109
  
Hey sorry for the late response, code looks good to me, however we need to 
add documentation about the new flag. Can you modify the Mesos configuration 
docs in the docs folder?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17109: [SPARK-19740][MESOS]Add support in Spark to pass arbitra...

2017-02-28 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/17109
  
@yanji84 can you add a test for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17109: [SPARK-19740][MESOS]Add support in Spark to pass arbitra...

2017-02-28 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/17109
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13072: [SPARK-15288] [Mesos] Mesos dispatcher should handle gra...

2017-01-22 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13072
  
LGTM, @srowen can you please take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...

2016-12-06 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13077
  
@devaraj-kavali let us know if you can still update this, otherwise I'll 
close this as it's no longer being updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...

2016-12-06 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/12933
  
@srowen can you help review this? Besides my minor comment overall it looks 
fine with me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12933: [Spark-15155][Mesos] Optionally ignore default ro...

2016-12-06 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12933#discussion_r91195201
  
--- Diff: 
mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala
 ---
@@ -50,6 +50,44 @@ trait MesosSchedulerUtils extends Logging {
   protected var mesosDriver: SchedulerDriver = null
 
   /**
+   * Returns the configured set of roles that an offer can be selected from
+   * @param conf Spark configuration
+   */
+  protected def getAcceptedResourceRoles(conf: SparkConf): Set[String] = {
+getAcceptedResourceRoles(
+  conf.getBoolean("spark.mesos.ignoreDefaultRoleResources", false),
+  conf.getOption("spark.mesos.role"))
+  }
+  /**
+   * Returns the configured set of roles that an offer can be selected from
+   * @param props Mesos driver description schedulerProperties map
+   */
+  protected def getAcceptedResourceRoles(props: Map[String, String]): 
Set[String] = {
+getAcceptedResourceRoles(
+  props.get("spark.mesos.ignoreDefaultRoleResources") match {
+case Some(truth) => truth.toBoolean
+case None => false
+  },
+  props.get("spark.mesos.role"))
+  }
+  /**
+   * Internal version of getAcceptedResourceRoles
+   * @param ignoreDefaultRoleResources user specified property
+   * @param role user specified property
+   */
+  private def getAcceptedResourceRoles(
+  ignoreDefaultRoleResources: Boolean,
+  role: Option[String]) = {
+val roles = ignoreDefaultRoleResources match {
+  case true if role.isDefined => Set(role)
+  case _ => Set(Some("*"), role)
+}
+val acceptedRoles = roles.flatten
+logDebug(s"Accepting resources from role(s): 
${acceptedRoles.mkString(",")}")
--- End diff --

I think we should move this log outside of this helper method, as there 
might be other context in the future that's calling this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...

2016-12-06 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/14936
  
@philipphoffmann Sorry for the long delay, one last ask. Can you add a 
simple unit test to verify it works? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15684: [SPARK-18171][MESOS] Show correct framework address in m...

2016-12-06 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/15684
  
LGTM, @srowen can you help on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16092: [SPARK-18662] Move resource managers to separate directo...

2016-12-01 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/16092
  
Changed suggested by @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16061: [SPARK-18278] [Scheduler] Support native submission of s...

2016-11-29 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/16061
  
@rxin Makes sense, @srowen also talked about starting the discussion of 
having a better support for external cluster managers as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16061: [SPARK-18278] [Scheduler] Support native submissi...

2016-11-29 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16061#discussion_r90063528
  
--- Diff: 
kubernetes/src/main/scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterScheduler.scala
 ---
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster.kubernetes
+
+import java.io.File
+import java.util.Date
+import java.util.concurrent.atomic.AtomicLong
+
+import io.fabric8.kubernetes.client.{ConfigBuilder, 
DefaultKubernetesClient, KubernetesClient}
+import io.fabric8.kubernetes.api.model.{PodBuilder, ServiceBuilder}
+import io.fabric8.kubernetes.client.dsl.LogWatch
+import org.apache.spark.deploy.Command
+import org.apache.spark.deploy.kubernetes.ClientArguments
+import org.apache.spark.{io, _}
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+
+import collection.JavaConverters._
+import org.apache.spark.util.Utils
+
+import scala.util.Random
+
+private[spark] object KubernetesClusterScheduler {
+  def defaultNameSpace = "default"
+  def defaultServiceAccountName = "default"
+}
+
+/**
+  * This is a simple extension to ClusterScheduler
+  * */
+private[spark] class KubernetesClusterScheduler(conf: SparkConf)
+extends Logging {
+  private val DEFAULT_SUPERVISE = false
+  private val DEFAULT_MEMORY = Utils.DEFAULT_DRIVER_MEM_MB // mb
+  private val DEFAULT_CORES = 1.0
+
+  logInfo("Created KubernetesClusterScheduler instance")
+
+  var client = setupKubernetesClient()
+  val driverName = s"spark-driver-${Random.alphanumeric take 5 
mkString("")}".toLowerCase()
+  val svcName = s"spark-svc-${Random.alphanumeric take 5 
mkString("")}".toLowerCase()
+  val nameSpace = conf.get(
+"spark.kubernetes.namespace",
+KubernetesClusterScheduler.defaultNameSpace)
+  val serviceAccountName = conf.get(
+"spark.kubernetes.serviceAccountName",
+KubernetesClusterScheduler.defaultServiceAccountName)
+
+  // Anything that should either not be passed to driver config in the 
cluster, or
+  // that is going to be explicitly managed as command argument to the 
driver pod
+  val confBlackList = scala.collection.Set(
+"spark.master",
+"spark.app.name",
+"spark.submit.deployMode",
+"spark.executor.jar",
+"spark.dynamicAllocation.enabled",
+"spark.shuffle.service.enabled")
+
+  def start(args: ClientArguments): Unit = {
+startDriver(client, args)
+  }
+
+  def stop(): Unit = {
+client.pods().inNamespace(nameSpace).withName(driverName).delete()
+client
+  .services()
+  .inNamespace(nameSpace)
+  .withName(svcName)
+  .delete()
+  }
+
+  def startDriver(client: KubernetesClient,
+  args: ClientArguments): Unit = {
+logInfo("Starting spark driver on kubernetes cluster")
+val driverDescription = buildDriverDescription(args)
+
+// image needs to support shim scripts "/opt/driver.sh" and 
"/opt/executor.sh"
+val sparkImage = 
conf.getOption("spark.kubernetes.sparkImage").getOrElse {
+  // TODO: this needs to default to some standard Apache Spark image
+  throw new SparkException("Spark image not set. Please configure 
spark.kubernetes.sparkImage")
+}
+
+// This is the URL of the client jar.
+val clientJarUri = args.userJar
+
+// This is the kubernetes master we're launching on.
+val kubernetesHost = "k8s://" + client.getMasterUrl().getHost()
+logInfo("Using as kubernetes-master: " + kubernetesHost.toString())
+
+val submitArgs = sca

[GitHub] spark pull request #16061: [SPARK-18278] [Scheduler] Support native submissi...

2016-11-29 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16061#discussion_r90063456
  
--- Diff: 
kubernetes/src/main/scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterSchedulerBackend.scala
 ---
@@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster.kubernetes
+
+import collection.JavaConverters._
+import io.fabric8.kubernetes.api.model.PodBuilder
+import io.fabric8.kubernetes.api.model.extensions.JobBuilder
+import io.fabric8.kubernetes.client.{ConfigBuilder, 
DefaultKubernetesClient}
+import org.apache.spark.internal.config._
+import org.apache.spark.scheduler._
+import org.apache.spark.scheduler.cluster._
+import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages._
+import org.apache.spark.{SparkConf, SparkContext, SparkException}
+import org.apache.spark.rpc.RpcEndpointAddress
+import org.apache.spark.scheduler.TaskSchedulerImpl
+import org.apache.spark.util.Utils
+
+import scala.collection.mutable
+import scala.util.Random
+import scala.concurrent.Future
+
+private[spark] class KubernetesClusterSchedulerBackend(
+  scheduler: 
TaskSchedulerImpl,
--- End diff --

Fix the formatting to conform with Spark style


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16061: [SPARK-18278] [Scheduler] Support native submissi...

2016-11-29 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16061#discussion_r90063380
  
--- Diff: 
kubernetes/src/main/scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterSchedulerBackend.scala
 ---
@@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster.kubernetes
+
+import collection.JavaConverters._
+import io.fabric8.kubernetes.api.model.PodBuilder
+import io.fabric8.kubernetes.api.model.extensions.JobBuilder
+import io.fabric8.kubernetes.client.{ConfigBuilder, 
DefaultKubernetesClient}
+import org.apache.spark.internal.config._
+import org.apache.spark.scheduler._
+import org.apache.spark.scheduler.cluster._
+import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages._
+import org.apache.spark.{SparkConf, SparkContext, SparkException}
+import org.apache.spark.rpc.RpcEndpointAddress
+import org.apache.spark.scheduler.TaskSchedulerImpl
+import org.apache.spark.util.Utils
+
+import scala.collection.mutable
+import scala.util.Random
+import scala.concurrent.Future
+
+private[spark] class KubernetesClusterSchedulerBackend(
+  scheduler: 
TaskSchedulerImpl,
+  sc: SparkContext)
+  extends CoarseGrainedSchedulerBackend(scheduler, sc.env.rpcEnv) {
+
+  val client = new DefaultKubernetesClient()
+
+  val DEFAULT_NUMBER_EXECUTORS = 2
+  val sparkExecutorName = s"spark-executor-${Random.alphanumeric take 5 
mkString("")}".toLowerCase()
+
+  // TODO: do these need mutex guarding?
+  // key is executor id, value is pod name
+  var executorToPod = mutable.Map.empty[String, String] // active executors
+  var shutdownToPod = mutable.Map.empty[String, String] // pending shutdown
+  var executorID = 0
+
+  val sparkImage = conf.get("spark.kubernetes.sparkImage")
+  val clientJarUri = conf.get("spark.executor.jar")
+  val ns = conf.get(
+"spark.kubernetes.namespace",
+KubernetesClusterScheduler.defaultNameSpace)
+  val dynamicExecutors = Utils.isDynamicAllocationEnabled(conf)
+
+  // executor back-ends take their configuration this way
+  if (dynamicExecutors) {
+conf.setExecutorEnv("spark.dynamicAllocation.enabled", "true")
+conf.setExecutorEnv("spark.shuffle.service.enabled", "true")
+  }
+
+  override def start(): Unit = {
+super.start()
+createExecutorPods(getInitialTargetExecutorNumber(sc.getConf))
+  }
+
+  override def stop(): Unit = {
+// Kill all executor pods indiscriminately
+killExecutorPods(executorToPod.toVector)
+killExecutorPods(shutdownToPod.toVector)
+super.stop()
+  }
+
+  // Dynamic allocation interfaces
+  override def doRequestTotalExecutors(requestedTotal: Int): 
Future[Boolean] = {
+logInfo(s"Received doRequestTotalExecutors: $requestedTotal")
+val n = executorToPod.size
+val delta = requestedTotal - n
+if (delta > 0) {
+  logInfo(s"Adding $delta new executors")
+  createExecutorPods(delta)
+} else if (delta < 0) {
+  val d = -delta
--- End diff --

This shouldn't happen, assert instead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16061: [SPARK-18278] [Scheduler] Support native submissi...

2016-11-29 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16061#discussion_r90062695
  
--- Diff: dev/make-distribution.sh ---
@@ -154,7 +154,9 @@ export MAVEN_OPTS="${MAVEN_OPTS:--Xmx2g 
-XX:MaxPermSize=512M -XX:ReservedCodeCac
 # Store the command as an array because $MVN variable might have spaces in 
it.
 # Normal quoting tricks don't work.
 # See: http://mywiki.wooledge.org/BashFAQ/050
-BUILD_COMMAND=("$MVN" -T 1C clean package -DskipTests $@)
+# BUILD_COMMAND=("$MVN" -T 1C clean package -DskipTests $@)
+
+BUILD_COMMAND=("$MVN" -T 2C package -DskipTests $@)
--- End diff --

?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16061: [SPARK-18278] [Scheduler] Support native submissi...

2016-11-29 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16061#discussion_r90062639
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -596,6 +599,26 @@ object SparkSubmit extends CommandLineUtils {
   }
 }
 
+if (isKubernetesCluster) {
--- End diff --

What if in Kubernetes and client mode?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...

2016-10-13 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/12933
  
I ran `mvn test` inside of the mesos folder.

On Thu, Oct 13, 2016 at 3:21 AM, Chris Heller <notificati...@github.com>
wrote:

> You saw the error with `./dev/run-tests`? Ok I'll figure this out.
>
> Sent from my iPhone
>
> > On Oct 13, 2016, at 12:24 AM, Timothy Chen <notificati...@github.com>
> wrote:
> >
> > I just tried running it locally and I'm getting the same error. It seems
> like with your change that test is simply declining the offer.
>
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub, or mute the thread.
> >
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/12933#issuecomment-253474465>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/AAEGrAU4TKCiLBwMpDm7PmGsEm79tdoEks5qzgY0gaJpZM4IYBAd>
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...

2016-10-12 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/12933
  
I just tried running it locally and I'm getting the same error. It seems 
like with your change that test is simply declining the offer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13713: [SPARK-15994] [MESOS] Allow enabling Mesos fetch ...

2016-10-10 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13713#discussion_r82695978
  
--- Diff: docs/running-on-mesos.md ---
@@ -506,8 +506,13 @@ See the [configuration page](configuration.html) for 
information on Spark config
 since this configuration is just a upper limit and not a guaranteed 
amount.
   
 
-
-
+
+  spark.mesos.fetchCache.enable
+  false
+  
+If set to `true`, all URIs in `spark.mesos.uris` will be eligible for 
caching by the [Mesos fetch 
cache](http://mesos.apache.org/documentation/latest/fetcher/)
--- End diff --

From the implementation you actually set all downloadable URIs (like 
spark.executor.uri, jarUrl, etc) to be fetcher cachable. I think we need to be 
more explicit here that it's more than just spark.mesos.uris


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13713: [SPARK-15994] [MESOS] Allow enabling Mesos fetch cache i...

2016-10-10 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13713
  
Other than the 2 comments, the changes LGTM. @mgummelt @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13713: [SPARK-15994] [MESOS] Allow enabling Mesos fetch ...

2016-10-10 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13713#discussion_r82695810
  
--- Diff: 
mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala
 ---
@@ -463,6 +463,21 @@ class MesosCoarseGrainedSchedulerBackendSuite extends 
SparkFunSuite
 assert(launchedTasks.head.getCommand.getUrisList.asScala(0).getValue 
== url)
   }
 
+  test("mesos supports setting fetcher") {
--- End diff --

s/supports setting fetcher/supports setting fetcher cache/g


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...

2016-10-10 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/14936
  
Hmm it is currently Integer.MAX because we assume the cluster scheduler to 
be long living, and without setting it be a long value Mesos will automatically 
terminate that framework when it's disconnected. I think currently all Spark 
jobs don't have it specified so when it disconnects it's simply removed.

I think we should keep the same semantics and don't impose a default value 
for everything, we have it left to be 0 in coarse grain scheduler, and default 
to Int.MAX in cluster scheduler. But user can always override it no matter what.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14644: [SPARK-14082][MESOS] Enable GPU support with Mesos

2016-10-09 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/14644
  
I just tested it with a GPU instance and it works. @mgummelt @klueska any 
more comments? otherwise @srowen I think we should merge as we no longer have 
any comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9287: SPARK-11326: Split networking in standalone mode

2016-10-08 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/9287
  
This has been stale for a while, we should close this if there is no update 
here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...

2016-10-08 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/12933
  
@hellertime Are you able to rebase?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...

2016-10-08 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13077
  
@devaraj-kavali Are you still able to update this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13713: [SPARK-15994] [MESOS] Allow enabling Mesos fetch cache i...

2016-10-08 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13713
  
@drcrallen Are you still planning to update this? It's quite a useful 
feature, so hoping this can get in. Also since Fine grain mode is depcreated I 
don't think we need to update it too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14936: [SPARK-7877][MESOS] Allow configuration of framew...

2016-10-07 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14936#discussion_r82430755
  
--- Diff: 
mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala
 ---
@@ -69,38 +68,51 @@ trait MesosSchedulerUtils extends Logging {
   conf: SparkConf,
   webuiUrl: Option[String] = None,
   checkpoint: Option[Boolean] = None,
-  failoverTimeout: Option[Double] = None,
   frameworkId: Option[String] = None): SchedulerDriver = {
-val fwInfoBuilder = 
FrameworkInfo.newBuilder().setUser(sparkUser).setName(appName)
+val fwInfo = createFrameworkInfo(sparkUser, appName, conf, webuiUrl, 
checkpoint, frameworkId)
 val credBuilder = Credential.newBuilder()
-webuiUrl.foreach { url => fwInfoBuilder.setWebuiUrl(url) }
-checkpoint.foreach { checkpoint => 
fwInfoBuilder.setCheckpoint(checkpoint) }
-failoverTimeout.foreach { timeout => 
fwInfoBuilder.setFailoverTimeout(timeout) }
-frameworkId.foreach { id =>
-  fwInfoBuilder.setId(FrameworkID.newBuilder().setValue(id).build())
-}
 conf.getOption("spark.mesos.principal").foreach { principal =>
-  fwInfoBuilder.setPrincipal(principal)
   credBuilder.setPrincipal(principal)
 }
 conf.getOption("spark.mesos.secret").foreach { secret =>
   credBuilder.setSecret(secret)
 }
-if (credBuilder.hasSecret && !fwInfoBuilder.hasPrincipal) {
+if (credBuilder.hasSecret && !fwInfo.hasPrincipal) {
   throw new SparkException(
 "spark.mesos.principal must be configured when spark.mesos.secret 
is set")
 }
-conf.getOption("spark.mesos.role").foreach { role =>
-  fwInfoBuilder.setRole(role)
-}
 if (credBuilder.hasPrincipal) {
   new MesosSchedulerDriver(
-scheduler, fwInfoBuilder.build(), masterUrl, credBuilder.build())
+scheduler, fwInfo, masterUrl, credBuilder.build())
 } else {
-  new MesosSchedulerDriver(scheduler, fwInfoBuilder.build(), masterUrl)
+  new MesosSchedulerDriver(scheduler, fwInfo, masterUrl)
 }
   }
 
+  def createFrameworkInfo(
+sparkUser: String,
+appName: String,
+conf: SparkConf,
+webuiUrl: Option[String] = None,
+checkpoint: Option[Boolean] = None,
+frameworkId: Option[String] = None): FrameworkInfo = {
+val fwInfoBuilder = 
FrameworkInfo.newBuilder().setUser(sparkUser).setName(appName)
+webuiUrl.foreach { url => fwInfoBuilder.setWebuiUrl(url) }
+checkpoint.foreach { checkpoint => 
fwInfoBuilder.setCheckpoint(checkpoint) }
+
fwInfoBuilder.setFailoverTimeout(conf.getDouble("spark.mesos.failoverTimeout", 
10))
--- End diff --

This new flag needs to get added to the documentation as well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14936: [SPARK-7877][MESOS] Allow configuration of framew...

2016-10-07 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14936#discussion_r82430668
  
--- Diff: 
mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala
 ---
@@ -69,38 +68,51 @@ trait MesosSchedulerUtils extends Logging {
   conf: SparkConf,
   webuiUrl: Option[String] = None,
   checkpoint: Option[Boolean] = None,
-  failoverTimeout: Option[Double] = None,
   frameworkId: Option[String] = None): SchedulerDriver = {
-val fwInfoBuilder = 
FrameworkInfo.newBuilder().setUser(sparkUser).setName(appName)
+val fwInfo = createFrameworkInfo(sparkUser, appName, conf, webuiUrl, 
checkpoint, frameworkId)
 val credBuilder = Credential.newBuilder()
-webuiUrl.foreach { url => fwInfoBuilder.setWebuiUrl(url) }
-checkpoint.foreach { checkpoint => 
fwInfoBuilder.setCheckpoint(checkpoint) }
-failoverTimeout.foreach { timeout => 
fwInfoBuilder.setFailoverTimeout(timeout) }
-frameworkId.foreach { id =>
-  fwInfoBuilder.setId(FrameworkID.newBuilder().setValue(id).build())
-}
 conf.getOption("spark.mesos.principal").foreach { principal =>
-  fwInfoBuilder.setPrincipal(principal)
   credBuilder.setPrincipal(principal)
 }
 conf.getOption("spark.mesos.secret").foreach { secret =>
   credBuilder.setSecret(secret)
 }
-if (credBuilder.hasSecret && !fwInfoBuilder.hasPrincipal) {
+if (credBuilder.hasSecret && !fwInfo.hasPrincipal) {
   throw new SparkException(
 "spark.mesos.principal must be configured when spark.mesos.secret 
is set")
 }
-conf.getOption("spark.mesos.role").foreach { role =>
-  fwInfoBuilder.setRole(role)
-}
 if (credBuilder.hasPrincipal) {
   new MesosSchedulerDriver(
-scheduler, fwInfoBuilder.build(), masterUrl, credBuilder.build())
+scheduler, fwInfo, masterUrl, credBuilder.build())
 } else {
-  new MesosSchedulerDriver(scheduler, fwInfoBuilder.build(), masterUrl)
+  new MesosSchedulerDriver(scheduler, fwInfo, masterUrl)
 }
   }
 
+  def createFrameworkInfo(
+sparkUser: String,
--- End diff --

Fix the parameters indent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14644: [SPARK-14082][MESOS] Enable GPU support with Mesos

2016-09-29 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/14644
  
Ya, the default GPU requirement I have is 0 (cores per executor/node is 1). 
I'm still gathering feedback what's the more sensible thing to do for GPUs. 
We can either set a configurable amount that each executor has to use, or have 
a max.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14644: [SPARK-14082][MESOS] Enable GPU support with Mesos

2016-09-29 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/14644
  
1. Good catch, my old patch had docs but I rebased and it didn't apply for 
some reason. Let me add it.

2,3: we don't fail if you ask for more GPUs since it's not a hard 
requirement but simply a max, just like how CPUs.max work. I didn't add a 
required amount setting but we can certainly add it in the future. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14644: [SPARK-14082][MESOS] Enable GPU support with Mesos

2016-09-28 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/14644
  
@mgummelt @srowen Please review as well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14644: [SPARK-14082][MESOS] Enable GPU support with Mesos

2016-09-22 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/14644
  
@klueska Just updated the patch and I think it's using the right semantics 
now, where it has a global gpus max just like cores. Can you try it out?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14644: [MESOS] Enable GPU support with Mesos

2016-09-12 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14644#discussion_r78348094
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
@@ -103,6 +103,7 @@ private[spark] class MesosCoarseGrainedSchedulerBackend(
   private val stateLock = new ReentrantLock
 
   val extraCoresPerExecutor = conf.getInt("spark.mesos.extra.cores", 0)
+  val maxGpus = conf.getInt("spark.mesos.gpus.max", 0)
--- End diff --

I see, in this case it's the same semantics as cpus.max, so I think using a 
really big number seems right to me. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14644: [MESOS] Enable GPU support with Mesos

2016-09-11 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14644#discussion_r78298417
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
@@ -103,6 +103,7 @@ private[spark] class MesosCoarseGrainedSchedulerBackend(
   private val stateLock = new ReentrantLock
 
   val extraCoresPerExecutor = conf.getInt("spark.mesos.extra.cores", 0)
+  val maxGpus = conf.getInt("spark.mesos.gpus.max", 0)
--- End diff --

Which sounds sensible to me since GPU is not usually required to run your 
Spark job. And also cores.max is an aggregate max, where gpu.max as the current 
patch is a per node max. I think I will change this into how cores.max work, 
but default to 0. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14644: [MESOS] Enable GPU support with Mesos

2016-09-08 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14644#discussion_r78002761
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
@@ -103,6 +103,7 @@ private[spark] class MesosCoarseGrainedSchedulerBackend(
   private val stateLock = new ReentrantLock
 
   val extraCoresPerExecutor = conf.getInt("spark.mesos.extra.cores", 0)
+  val maxGpus = conf.getInt("spark.mesos.gpus.max", 0)
--- End diff --

My thoughts was that by only allowing a Boolean flag a spark job either 
uses all GPUs from a host or not, which it won't be able to have different GPu 
devices shared by different jobs. By specifying a limit at least there is 
ability to let a job specify how much GPUs it should grab per node. thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14644: [MESOS] Enable GPU support with Mesos

2016-08-15 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/14644
  
@srowen Mesos also supports node labels as well (which is how constraints 
is implemented in Spark framework). However GPUs are implemented as a resource 
(as we want to account for # of GPUs instead of just placing a task there).

As for the config name, I just picked that to begin with. I was also 
thinking we should consider having a generic config name (spark.gpus?) as I 
believe it could be reused. But I wasn't sure how we like to account for this 
yet as GPUs are quite different from CPUs (Mesos currently just do a integer 
number of GPUs, not sharing or topology information yet). You have suggestons?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14644: Enable GPU support with Mesos

2016-08-15 Thread tnachen
GitHub user tnachen opened a pull request:

https://github.com/apache/spark/pull/14644

Enable GPU support with Mesos

## What changes were proposed in this pull request?

Enable GPU resources to be used when running coarse grain mode with Mesos.


## How was this patch tested?

Manual test with GPU.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tnachen/spark gpu_mesos

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14644.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14644


commit 163cfa49b2116612f981aa8158054e006d40b52d
Author: Timothy Chen <tnac...@gmail.com>
Date:   2016-05-23T23:23:51Z

Enable GPU with Mesos on Spark

commit 4edc6db5329a19f49af9303897ee0a2f1fc91a14
Author: Timothy Chen <tnac...@gmail.com>
Date:   2016-08-15T06:39:05Z

Enable GPU support with Mesos




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13713: [SPARK-15994] [MESOS] Allow enabling Mesos fetch cache i...

2016-07-27 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13713
  
We also fetch URIs for running drivers in cluster mode 
(MesosClusterScheduler.scala). I'm thinking we should also allow this 
configuration to effect that too. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...

2016-07-24 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13077
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13051: [SPARK-15271] [MESOS] Allow force pulling executor docke...

2016-07-24 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13051
  
@srowen Or if you could help :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14275: [SPARK-16637] Unified containerizer

2016-07-24 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14275#discussion_r72003350
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala
 ---
@@ -105,16 +105,27 @@ private[mesos] object MesosSchedulerBackendUtil 
extends Logging {
   def addDockerInfo(
   container: ContainerInfo.Builder,
   image: String,
+  containerizer: String,
   volumes: Option[List[Volume]] = None,
-  network: Option[ContainerInfo.DockerInfo.Network] = None,
   portmaps: Option[List[ContainerInfo.DockerInfo.PortMapping]] = 
None): Unit = {
 
-val docker = ContainerInfo.DockerInfo.newBuilder().setImage(image)
+containerizer match {
--- End diff --

Can we have a sensible message/exception when we pass in a unknown 
containerizer? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13950: [SPARK-15487] [Web UI] Spark Master UI to reverse...

2016-07-06 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13950#discussion_r69690304
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -502,6 +502,9 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
 _applicationId = _taskScheduler.applicationId()
 _applicationAttemptId = taskScheduler.applicationAttemptId()
 _conf.set("spark.app.id", _applicationId)
+if (_conf.getBoolean("spark.ui.reverseProxy", false)) {
+  System.setProperty("spark.ui.proxyBase", "/target/" + _applicationId)
--- End diff --

ah I see, sorry I thought it will be just opaque ids, but worker- and app- 
prefixes looks fine to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13950: [SPARK-15487] [Web UI] Spark Master UI to reverse...

2016-07-06 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13950#discussion_r69678895
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -502,6 +502,9 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
 _applicationId = _taskScheduler.applicationId()
 _applicationAttemptId = taskScheduler.applicationAttemptId()
 _conf.set("spark.app.id", _applicationId)
+if (_conf.getBoolean("spark.ui.reverseProxy", false)) {
+  System.setProperty("spark.ui.proxyBase", "/target/" + _applicationId)
--- End diff --

I wonder if this is better named /proxy/app/applicationId for applications, 
and /proxy/worker/workerId, so it's more clear what the destination target is?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13950: [SPARK-15487] [Web UI] Spark Master UI to reverse...

2016-07-05 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13950#discussion_r69676891
  
--- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
@@ -186,6 +188,67 @@ private[spark] object JettyUtils extends Logging {
 contextHandler
   }
 
+  /** Create a handler for proxying request to Workers and Application 
Drivers */
+  def createProxyHandler(
+  prefix: String,
+  target: String): ServletContextHandler = {
+val servlet = new ProxyServlet {
+  override def rewriteTarget(request: HttpServletRequest): String = {
+val path = request.getRequestURI();
+if (!path.startsWith(prefix)) return null
+
+val uri = new StringBuilder(target)
+if (target.endsWith("/")) uri.setLength(uri.length() - 1)
+val rest = path.substring(prefix.length())
+if (!rest.isEmpty())
+{
--- End diff --

Move { to previous line 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13950: [SPARK-15487] [Web UI] Spark Master UI to reverse...

2016-07-05 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13950#discussion_r69676869
  
--- Diff: docs/configuration.md ---
@@ -598,6 +598,20 @@ Apart from these, the following properties are also 
available, and may be useful
   
 
 
+  spark.ui.reverseProxy
+  false
+  
+To enable running Spark Master, worker and application UI behined a 
reverse proxy. In this mode, Spark master will reverse proxy the worker and 
application UIs to enable access.
+  
+
+
+  spark.ui.reverseProxyUrl
+  http://localhost:8080
+  
+This is the URL where your proxy is running. Make sure this is a 
complete URL includeing scheme (http/https) and port to reach your proxy.
--- End diff --

includeing -> including


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13143: [SPARK-15359] [Mesos] Mesos dispatcher should han...

2016-06-23 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13143#discussion_r68194509
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala
 ---
@@ -120,14 +120,25 @@ private[mesos] trait MesosSchedulerUtils extends 
Logging {
 val ret = mesosDriver.run()
 logInfo("driver.run() returned with code " + ret)
 if (ret != null && ret.equals(Status.DRIVER_ABORTED)) {
-  error = Some(new SparkException("Error starting driver, 
DRIVER_ABORTED"))
-  markErr()
+  val ex = new SparkException("Error starting driver, 
DRIVER_ABORTED")
+  // if the driver gets aborted after the successful 
registration
--- End diff --

Also to simplify the code, can we just throw SparkExecption here? Then the 
catch will then handle all cases


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...

2016-06-21 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/12933
  
@hellertime ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13072: [SPARK-15288] [Mesos] Mesos dispatcher should handle gra...

2016-06-21 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13072
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13143: [SPARK-15359] [Mesos] Mesos dispatcher should handle DRI...

2016-06-21 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13143
  
Is it because MesosDriver actually threw an exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13143: [SPARK-15359] [Mesos] Mesos dispatcher should han...

2016-06-21 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13143#discussion_r67988954
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala
 ---
@@ -120,14 +120,25 @@ private[mesos] trait MesosSchedulerUtils extends 
Logging {
 val ret = mesosDriver.run()
 logInfo("driver.run() returned with code " + ret)
 if (ret != null && ret.equals(Status.DRIVER_ABORTED)) {
-  error = Some(new SparkException("Error starting driver, 
DRIVER_ABORTED"))
-  markErr()
+  val ex = new SparkException("Error starting driver, 
DRIVER_ABORTED")
+  // if the driver gets aborted after the successful 
registration
--- End diff --

s/after the successful registration/after registration/g


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...

2016-06-21 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13077
  
@devaraj-kavali Ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13326: [SPARK-15560] [Mesos] Queued/Supervise drivers waiting f...

2016-06-21 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13326
  
jenkins please retest


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13713: [SPARK-15994] [MESOS] Allow enabling Mesos fetch cache i...

2016-06-21 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13713
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13715: [SPARK-15992] [MESOS] Refactor MesosCoarseGrained...

2016-06-21 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13715#discussion_r67988575
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
@@ -382,59 +382,97 @@ private[spark] class 
MesosCoarseGrainedSchedulerBackend(
   for (offer <- offers) {
 val slaveId = offer.getSlaveId.getValue
 val offerId = offer.getId.getValue
-val resources = remainingResources(offerId)
-
-if (canLaunchTask(slaveId, resources)) {
-  // Create a task
-  launchTasks = true
-  val taskId = newMesosTaskId()
-  val offerCPUs = getResource(resources, "cpus").toInt
-
-  val taskCPUs = executorCores(offerCPUs)
-  val taskMemory = executorMemory(sc)
-
-  slaves.getOrElseUpdate(slaveId, new 
Slave(offer.getHostname)).taskIDs.add(taskId)
-
-  val (afterCPUResources, cpuResourcesToUse) =
-partitionResources(resources, "cpus", taskCPUs)
-  val (resourcesLeft, memResourcesToUse) =
-partitionResources(afterCPUResources.asJava, "mem", taskMemory)
-
-  val taskBuilder = MesosTaskInfo.newBuilder()
-
.setTaskId(TaskID.newBuilder().setValue(taskId.toString).build())
-.setSlaveId(offer.getSlaveId)
-.setCommand(createCommand(offer, taskCPUs + 
extraCoresPerExecutor, taskId))
-.setName("Task " + taskId)
-.addAllResources(cpuResourcesToUse.asJava)
-.addAllResources(memResourcesToUse.asJava)
-
-  sc.conf.getOption("spark.mesos.executor.docker.image").foreach { 
image =>
-MesosSchedulerBackendUtil
-  .setupContainerBuilderDockerInfo(image, sc.conf, 
taskBuilder.getContainerBuilder)
+val availableResources = remainingResources(offerId)
+val offerMem = getResource(availableResources, "mem")
+val offerCpu = getResource(availableResources, "cpus")
+
+// Catch offer limits
+calculateUsableResources(
+  sc,
+  offerCpu.toInt,
+  offerMem.toInt
+).flatMap(
+  {
+// Catch "global" limits
+case (taskCPUs: Int, taskMemory: Int) =>
+  if (numExecutors() >= executorLimit) {
+logTrace(s"${numExecutors()} exceeds limit of 
$executorLimit")
+None
+  } else if (
+slaves.get(slaveId).map(_.taskFailures).getOrElse(0) >= 
MAX_SLAVE_FAILURES
+  ) {
+logTrace(s"Slave $slaveId exceeded limit of 
$MAX_SLAVE_FAILURES failures")
+None
+  } else {
+Some((taskCPUs, taskMemory))
+  }
   }
-
-  tasks(offer.getId) ::= taskBuilder.build()
-  remainingResources(offerId) = resourcesLeft.asJava
-  totalCoresAcquired += taskCPUs
-  coresByTaskId(taskId) = taskCPUs
+) match {
+  case Some((taskCPUs: Int, taskMemory: Int)) =>
+// Create a task
+launchTasks = true
+val taskId = newMesosTaskId()
+
+slaves.getOrElseUpdate(slaveId, new 
Slave(offer.getHostname)).taskIDs.add(taskId)
+
+val (afterCPUResources, cpuResourcesToUse) =
+  partitionResources(availableResources, "cpus", taskCPUs)
+val (resourcesLeft, memResourcesToUse) =
+  partitionResources(afterCPUResources.asJava, "mem", 
taskMemory)
+
+val taskBuilder = MesosTaskInfo.newBuilder()
+  
.setTaskId(TaskID.newBuilder().setValue(taskId.toString).build())
+  .setSlaveId(offer.getSlaveId)
+  .setCommand(createCommand(offer, taskCPUs + 
extraCoresPerExecutor, taskId))
+  .setName("Task " + taskId)
+  .addAllResources(cpuResourcesToUse.asJava)
+  .addAllResources(memResourcesToUse.asJava)
+
+sc.conf.getOption("spark.mesos.executor.docker.image").foreach 
{ image =>
+  MesosSchedulerBackendUtil
+.setupContainerBuilderDockerInfo(image, sc.conf, 
taskBuilder.getContainerBuilder)
+}
+
+tasks(offer.getId) ::= taskBuilder.build()
+remainingResources(offerId) = resourcesLeft.asJava
+totalCoresAcquired += taskCPUs
+coresByTaskId(taskId) = taskCPUs
+  case None => logDebu

[GitHub] spark issue #13715: [SPARK-15992] [MESOS] Refactor MesosCoarseGrainedSchedul...

2016-06-21 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13715
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13323: [SPARK-15555] [Mesos] Driver with --supervise option can...

2016-06-09 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13323
  
Thanks @devaraj-kavali, this LGTM. @andrewor14 can you take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13051: [SPARK-15271] [MESOS] Allow force pulling executor docke...

2016-06-02 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13051
  
@andrewor14 PTAL, this PR LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...

2016-06-02 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/12933
  
@hellertime Can you rebase and submit again to rerun the tests?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10949: [SPARK-12832][MESOS] mesos scheduler respect agent attri...

2016-06-02 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/10949
  
@atongen please rebase and try again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...

2016-06-02 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13077
  
I think instead of just moving to finished drivers, can we also show that 
message on the UI?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...

2016-06-02 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13077
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13323: [SPARK-15555] [Mesos] Driver with --supervise option can...

2016-06-02 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13323
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13143: [SPARK-15359] [Mesos] Mesos dispatcher should handle DRI...

2016-06-02 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13143
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13051: [SPARK-15271] [MESOS] Allow force pulling executor docke...

2016-06-02 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13051
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13323: [SPARK-15555] [Mesos] Driver with --supervise option can...

2016-06-02 Thread tnachen
Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/13323
  
Nice catch, can you add a unit test to test this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13326: [SPARK-15560] [Mesos] Queued/Supervise drivers wa...

2016-06-02 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13326#discussion_r65580076
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ---
@@ -188,10 +188,10 @@ private[spark] class MesosClusterScheduler(
 mesosDriver.killTask(task.taskId)
 k.success = true
 k.message = "Killing running driver"
-  } else if (removeFromQueuedDrivers(submissionId)) {
--- End diff --

We should just rename and change the existing function, I Don't think it's 
being used elsewhere?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11887: [SPARK-13041][Mesos]add driver sandbox uri to the...

2016-06-02 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/11887#discussion_r65579886
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/mesos/ui/MesosClusterPage.scala ---
@@ -115,4 +142,58 @@ private[mesos] class MesosClusterPage(parent: 
MesosClusterUI) extends WebUIPage(
 }
 sb.toString()
   }
+
+  private def getIp4(ip: Int): String = {
+val buffer = ByteBuffer.allocate(4)
+buffer.putInt(ip)
+// we need to check about that because protocolbuf changes the order
+// which by mesos api is considered to be network order (big endian).
+val result = if (ByteOrder.nativeOrder() == ByteOrder.LITTLE_ENDIAN) {
+  buffer.array.toList.reverse
+} else {
+  buffer.array.toList
+}
+result.map{byte => byte & 0xFF}.mkString(".")
+  }
+
+  private def getListFromJson(value: JValue): List[Map[String, Any]] = {
+value.values.asInstanceOf[List[Map[String, Any]]]
+  }
+
+  private def getTaskDirectory(masterUri: String, driverFwId: String, 
slaveId: String):
+  Option[String] = {
+
--- End diff --

Remove extra white spaces here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11887: [SPARK-13041][Mesos]add driver sandbox uri to the...

2016-06-02 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/11887#discussion_r65579463
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/mesos/ui/MesosClusterPage.scala ---
@@ -68,6 +75,25 @@ private[mesos] class MesosClusterPage(parent: 
MesosClusterUI) extends WebUIPage(
 
   private def driverRow(state: MesosClusterSubmissionState): Seq[Node] = {
 val id = state.driverDescription.submissionId
+val masterInfo = parent.scheduler.getSchedulerMasterInfo()
+val schedulerFwId = parent.scheduler.getSchedulerFrameworkId()
+val sandboxCol = if (masterInfo.isDefined && schedulerFwId.isDefined) {
+
+  val masterUri = masterInfo.map{info => 
s"http://${getIp4(info.getIp)}:${info.getPort}"}.get
+  val directory = getTaskDirectory(masterUri, id, 
state.slaveId.getValue)
+
--- End diff --

Remove extra space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11887: [SPARK-13041][Mesos]add driver sandbox uri to the...

2016-06-02 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/11887#discussion_r65579454
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/mesos/ui/MesosClusterPage.scala ---
@@ -68,6 +75,25 @@ private[mesos] class MesosClusterPage(parent: 
MesosClusterUI) extends WebUIPage(
 
   private def driverRow(state: MesosClusterSubmissionState): Seq[Node] = {
 val id = state.driverDescription.submissionId
+val masterInfo = parent.scheduler.getSchedulerMasterInfo()
+val schedulerFwId = parent.scheduler.getSchedulerFrameworkId()
+val sandboxCol = if (masterInfo.isDefined && schedulerFwId.isDefined) {
+
+  val masterUri = masterInfo.map{info => 
s"http://${getIp4(info.getIp)}:${info.getPort}"}.get
+  val directory = getTaskDirectory(masterUri, id, 
state.slaveId.getValue)
+
+  if(directory.isDefined) {
--- End diff --

Space between if (


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11887: [SPARK-13041][Mesos]add driver sandbox uri to the...

2016-06-02 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/11887#discussion_r65579412
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/mesos/ui/MesosClusterPage.scala ---
@@ -68,6 +75,25 @@ private[mesos] class MesosClusterPage(parent: 
MesosClusterUI) extends WebUIPage(
 
   private def driverRow(state: MesosClusterSubmissionState): Seq[Node] = {
 val id = state.driverDescription.submissionId
+val masterInfo = parent.scheduler.getSchedulerMasterInfo()
+val schedulerFwId = parent.scheduler.getSchedulerFrameworkId()
+val sandboxCol = if (masterInfo.isDefined && schedulerFwId.isDefined) {
+
--- End diff --

Kill space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11887: [SPARK-13041][Mesos]add driver sandbox uri to the...

2016-06-02 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/11887#discussion_r65579276
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/mesos/ui/MesosClusterPage.scala ---
@@ -68,6 +75,25 @@ private[mesos] class MesosClusterPage(parent: 
MesosClusterUI) extends WebUIPage(
 
   private def driverRow(state: MesosClusterSubmissionState): Seq[Node] = {
 val id = state.driverDescription.submissionId
+val masterInfo = parent.scheduler.getSchedulerMasterInfo()
+val schedulerFwId = parent.scheduler.getSchedulerFrameworkId()
+val sandboxCol = if (masterInfo.isDefined && schedulerFwId.isDefined) {
+
+  val masterUri = masterInfo.map{info => 
s"http://${getIp4(info.getIp)}:${info.getPort}"}.get
+  val directory = getTaskDirectory(masterUri, id, 
state.slaveId.getValue)
+
+  if(directory.isDefined) {
+val sandBoxUri = s"$masterUri" +
+  s"/#/slaves/${state.slaveId.getValue}" +
+  s"/browse?path=${directory.get}"
+  Sandbox
--- End diff --

I think we should add a property like @mgummelt suggested to override 
masterUri if available.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >