[GitHub] spark pull request: [SPARK-4158] Fix for missing resources.

2014-10-30 Thread brndnmtthws
GitHub user brndnmtthws opened a pull request: https://github.com/apache/spark/pull/3024 [SPARK-4158] Fix for missing resources. Mesos offers may not contain all resources, and Spark needs to check to ensure they are present and sufficient. Spark may throw an erroneous

[GitHub] spark pull request: [SPARK-3535][Mesos] Fix resource handling.

2014-10-03 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-57833282 Done as per @andrewor14's suggestions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-3535][Mesos] Fix resource handling.

2014-10-03 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-57814884 That test failure appears to be unrelated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3535][Mesos] Fix resource handling.

2014-10-03 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-57806924 Fixed typo's (also switch from `Math.max` to `math.max` because that seems to be the Scala way). --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-3535][Mesos] Fix resource handling.

2014-10-02 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-57703146 Updated to match YARN. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3535][Mesos] Fix resource handling.

2014-10-02 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-57700183 It's counterintuitive to the policy used elsewhere, but if it will appease the Spark folks, I will make the change. --- If your project is set up for it, yo

[GitHub] spark pull request: [SPARK-3535][Mesos] Fix resource handling.

2014-10-02 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-57698498 The definition is indeed the same, but I don't see how the YARN patch solves that better than this one. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-3535][Mesos] Fix resource handling.

2014-10-02 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-57697287 Mesos will indeed kill your contains as well (provided cgroup limits are enabled). I also don't see how this would necessarily apply differently for Python,

[GitHub] spark pull request: [SPARK-3535][Mesos] Fix resource handling.

2014-10-02 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-57696587 As expressed elsewhere, I think the model that the #2485 has doesn't make sense. The most important knob is the overhead fraction, rather than the minimum numb

[GitHub] spark pull request: [SPARK-3597][Mesos] Implement `killTask`.

2014-10-02 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2453#issuecomment-57676255 This code has been tested & verified on a cluster of this size: ![screen shot 2014-10-02 at 11 13 57 am](https://cloud.githubusercontent.com/assets/312

[GitHub] spark pull request: [SPARK-3535][Mesos] Fix resource handling.

2014-10-02 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-57676227 This code has been tested & verified on a cluster of this size: ![screen shot 2014-10-02 at 11 13 57 am](https://cloud.githubusercontent.com/assets/312

[GitHub] spark pull request: [SPARK-3535][Mesos] Fix resource handling.

2014-09-27 Thread brndnmtthws
Github user brndnmtthws commented on a diff in the pull request: https://github.com/apache/spark/pull/2401#discussion_r18122508 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MemoryUtils.scala --- @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3597][Mesos] Implement `killTask`.

2014-09-26 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2453#issuecomment-57025923 Okay, I'll rebase. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3597][Mesos] Implement `killTask`.

2014-09-26 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2453#issuecomment-57023257 Build error appears to be unrelated to my patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-3535][Mesos] Fix resource handling.

2014-09-26 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-57023278 Build error appears to be unrelated to my patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-3535][Mesos] Fix resource handling.

2014-09-26 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-57008399 I've updated the PR to match closer to what #2485 does. I'd like to keep the fractional param. Having successfully operated various services o

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2485#issuecomment-57008082 That's fair. I'm updating the PR to make that Mesos specific now. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2485#issuecomment-57006489 Naturally you wouldn't want to have to change yours. I'll drop the `.minimum` thing, and prefix the config params with `.mesos`, like you'v

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2485#issuecomment-57004940 So I guess there's nothing to do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pr

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2485#issuecomment-57004118 Why can't they both share the same config parameters, for example? I understand the implementation differences, but we shouldn't need to have distinct con

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2485#issuecomment-57001899 In particular, look at how I put the logic into a common function, `calculateTotalMemory`. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2485#issuecomment-57001671 Can you refactor this to be non-YARN specific? It would be good to share code between this and #2401. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-3535][Mesos] Fix resource handling.

2014-09-26 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-57001557 That code is still YARN specific. Shouldn't we have common code for this? Also, I disagree on the 7% overhead. I think 15% is a better default. -

[GitHub] spark pull request: [SPARK-3535][Mesos] Fix resource handling.

2014-09-19 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56216904 I thought there was some desire to have the same thing also #1391? Furthermore, from my experience writing frameworks, I think a much better model is the

[GitHub] spark pull request: [SPARK-3597][Mesos] Implement `killTask`.

2014-09-19 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2453#issuecomment-56184849 I did indeed test it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3597][Mesos] Implement `killTask`.

2014-09-18 Thread brndnmtthws
GitHub user brndnmtthws opened a pull request: https://github.com/apache/spark/pull/2453 [SPARK-3597][Mesos] Implement `killTask`. The MesosSchedulerBackend did not previously implement `killTask`, resulting in an exception. You can merge this pull request into a Git repository

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-18 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56126568 I've cleaned up the patch again. I spent about an hour trying to apply this to the YARN code, but it was pretty difficult to follow so I gave up. -

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-18 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56123833 I can emulate the YARN behaviour, but it seems better to just do the same thing with both Mesos and YARN. Thoughts? I can refactor this (including the YARN code

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-18 Thread brndnmtthws
Github user brndnmtthws commented on a diff in the pull request: https://github.com/apache/spark/pull/2401#discussion_r17763516 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CommonProps.scala --- @@ -0,0 +1,27 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-18 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56120347 Forgot to mention: I also set the executor CPUs correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-18 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56120329 Updated as per @andrewor14's suggestions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-18 Thread brndnmtthws
Github user brndnmtthws commented on a diff in the pull request: https://github.com/apache/spark/pull/2401#discussion_r17763087 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CommonProps.scala --- @@ -0,0 +1,27 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-17 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-55921158 When you need to provide guarantees for other services, it's better to stick to hard limits. Having other tasks get randomly OOM killed is a bad exper

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-17 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-55903894 Oh, and one more thing you may want to think about is the OS filesystem buffers. Again, as you scale up the heap, you may want to proportionally reserve a slice of

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-17 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-55903580 That implies that as you grow the heap, you're not adding threads (or other things that use off-heap memory). I'm not familiar with Spark's executio

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-16 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-55821735 YARN has a similar strategy, from what I can tell. There's a separate config value, `spark.yarn.executor.memoryOverhead`. We could go the other way,

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-16 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-55820092 Updated diff based on coding. I've also added the same thing to the coarse scheduler, and updated the documentation accordingly. --- If your project i

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-15 Thread brndnmtthws
GitHub user brndnmtthws opened a pull request: https://github.com/apache/spark/pull/2401 [SPARK-3535][Mesos] Add 15% task memory overhead. You can merge this pull request into a Git repository by running: $ git pull https://github.com/brndnmtthws/spark master Alternatively

[GitHub] spark pull request: mesos executor ids now consist of the slave id...

2014-09-15 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/1358#issuecomment-55659794 It seems that this is a symptom of the following issue: https://issues.apache.org/jira/browse/SPARK-3535 --- If your project is set up for it, you can reply

[GitHub] spark pull request: mesos executor ids now consist of the slave id...

2014-09-11 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/1358#issuecomment-55310679 Yep, also hitting this same problem. We're running Spark 1.0.2 and Mesos 0.20.0. From a quick analysis, it looks like a bug in Spark. --- If your pr