[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-21 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/634#discussion_r15173511
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
 ---
@@ -30,6 +30,11 @@ private[spark] class YarnClientSchedulerBackend(
   extends CoarseGrainedSchedulerBackend(scheduler, sc.env.actorSystem)
   with Logging {
 
+  if 
(conf.getOption(spark.scheduler.minRegisteredExecutorsRatio).isEmpty) {
+minRegisteredRatio = 0.8
--- End diff --

No real reason other then it might take longer to get 100%.  Its just kind 
of a number we choose to hopefully give the user a good experience without 
having to wait to long if the cluster is really busy.   The user can change it 
if they want.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-21 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/634#discussion_r15173596
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientClusterScheduler.scala
 ---
@@ -37,14 +37,4 @@ private[spark] class YarnClientClusterScheduler(sc: 
SparkContext, conf: Configur
 val retval = YarnAllocationHandler.lookupRack(conf, host)
 if (retval != null) Some(retval) else None
   }
-
-  override def postStartHook() {
-
-super.postStartHook()
--- End diff --

YarnClientClusterScheduler extends TaskSchedulerImpl, so it will just fall 
through to TaskSchedulerImpl.postStartHook() which calls waitBackendReady


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-21 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-49643611
  
Looks good. +1, Thanks @sryza.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/634


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-21 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/634#discussion_r15156184
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientClusterScheduler.scala
 ---
@@ -37,14 +37,4 @@ private[spark] class YarnClientClusterScheduler(sc: 
SparkContext, conf: Configur
 val retval = YarnAllocationHandler.lookupRack(conf, host)
 if (retval != null) Some(retval) else None
   }
-
-  override def postStartHook() {
-
-super.postStartHook()
--- End diff --

when it is yarn-client mode, super.postStartHook() donot be deleted. 
because it can waitBackendReady when configured ratio executors have not been 
registered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-21 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/634#discussion_r15156221
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
 ---
@@ -30,6 +30,11 @@ private[spark] class YarnClientSchedulerBackend(
   extends CoarseGrainedSchedulerBackend(scheduler, sc.env.actorSystem)
   with Logging {
 
+  if 
(conf.getOption(spark.scheduler.minRegisteredExecutorsRatio).isEmpty) {
+minRegisteredRatio = 0.8
--- End diff --

why minRegisteredRatio  is 0.8 and is not 1.0?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-20 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-49554989
  
Updated patch sets the min registered executors ratio to .8 for YARN


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-49555111
  
QA tests have started for PR 634. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16877/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-49558003
  
QA results for PR 634:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16877/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-18 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-49429447
  
He had increased it to 30 seconds based on some experiments he did.   I 
guess it depends on how well the scheduler recovers from starting with fewer 
executors.  I know they have been doing a bit of work around this area but I'm 
not sure the real affect.  I have definitely seen issues with jobs when they 
start to soon so I think I would rather leave it at 30 for now unless we have 
more concrete data saying its not longer an issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-16 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-49126877
  
That's a good point.  Changing it for YARN seems like the right thing, and 
80% sounds reasonable to me.

Another thing is the wait time.  Previously it was 6 seconds, but now 
spark.scheduler.maxRegisteredExecutorsWaitingTime defaults to 5 times that.  30 
seconds seems a little excessive to me in general - at least for jobs without 
caching, after a couple seconds the wait outweighs scheduling some non-local 
tasks.  What do you think about decreasing this to 6 seconds in general?  Or at 
least for YARN.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-15 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/634#discussion_r14940726
  
--- Diff: 
yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -416,19 +407,8 @@ object ApplicationMaster extends Logging {
   // This is to ensure that we have reasonable number of containers before 
we start
--- End diff --

we can remove this whole comment block now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-15 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/634#discussion_r14940751
  
--- Diff: 
yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -370,7 +359,6 @@ object ApplicationMaster extends Logging {
   // This is to ensure that we have reasonable number of containers before 
we start
--- End diff --

same here, we can remove the comment block.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-15 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-49048527
  
I just thought of something bad here.  Now the default behavior is to wait 
for 0% and we have removed the bit of sleep there so if someone is just using 
the default they aren't very likely to get many before starting.

originally I was thinking keep the default 0 so behavior doesn't change, 
but personally I would rather see it higher so out of the box behavior is 
better.  thoughts on changing it for yarn?  To say 80%?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-14 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-48957597
  
@sryza I merged in SPARK-1946, would you want to update this to handle both 
cluster mode and client mode.   I think we can actually also remove the 
waitForInitialAllocations() call.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-48969752
  
QA tests have started for PR 634. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16642/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-14 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-48971649
  
Posted a patch that takes out the sleeps and waitForInitialAllocations


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-48971773
  
QA tests have started for PR 634. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16644/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-48976630
  
QA results for PR 634:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16642/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-48978149
  
QA results for PR 634:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16644/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-06-05 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-45267979
  
@sryza  can we close this and just do SPARK-1453 and/or SPARK-1946


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-06-05 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-45274561
  
I guess we could keep this around incase we do SPARK-1946


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-05 Thread mridulm
Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-42172774
  
Btw I dont recall exactly why the actual value of 3 seconds is there; I 
think earlier spark used to crash in case there are no containers available 
when job is going to be launched - thankfully that is not the case anymore !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-05 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-42188695
  
Note I had filed SPARK-1453 to make the wait configurable, see the 
discussions there.   I'm fine with removing this once the other one is 
configurable.  I would hesitate to do it before that.  
Like was already mentioned there are 2 different timeouts, this is the 
second and hardcoded to 3 seconds. I'm not really sure what the difference is 
between this one and the ALLOCATOR_LOOP_WAIT_COUNT. I was originally thinking 
of removing this one and just making the other configurable but I haven't 
investigated in depth yet.

I don't know the details of spark.locality.wait to know if it works in all 
cases. I assume the spark.locality.wait would work fine as long as you have an 
input from like hadoop or somewhere that tells you the locations, but for RDD's 
without the locations it wouldn't work to wait?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-05 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-42209580
  
I like the idea of making the first wait configurable and taking out the 
second.  For the second, we could possibly still wait for the executors we've 
launched to register.

Regarding spark.locality.wait, yes, it needs to locations, but it doesn't 
need the preferredLocalityData to be explicitly passed in by the user in the 
way it's needed for requesting where to put executors. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-05 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-42256611
  
@sryza Feel free to take over that jira to make it configurable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-04 Thread sryza
GitHub user sryza opened a pull request:

https://github.com/apache/spark/pull/634

SPARK-1707. Remove unnecessary 3 second sleep in YarnClusterScheduler



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sryza/spark sandy-spark-1707

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/634.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #634


commit 49f6cf4398e179ae6163a1c4a098e2b5006fa1e6
Author: Sandy Ryza sa...@cloudera.com
Date:   2014-05-04T18:58:50Z

SPARK-1707. Remove unnecessary 3 second sleep in YarnClusterScheduler




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-42141909
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-42141915
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-42142889
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14647/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-42142888
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-04 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-42146148
  
@sryza @mridulm what is this sleep actually for? We should really have 
in-code documentation for things like this.

If it's necessary to sleep like this, could we restructure it as a 1 second 
busy-wait on a particular condition? That way we wouldn't need to hard code any 
particular value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-04 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-42149808
  
Ah I see - sorry I didn't read the JIRA.

So since we don't know a-priori how many executors to expect, I don't think 
we can wait on any particular condition. Even in YARN mode, AFAIK the executor 
number is just a request to the RM and not a guarentee.

I'm still not 100% sure why this needs to exist. @mridulm  is the main 
concern here just launching non-local tasks? If so, why not just set 
`spark.locality.wait` threshold (that's the whole reason it exists).

In standalone mode this may not really be necessary because the executors 
typically launch in under 3 seconds which is the default locality wait.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-04 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-42149840
  
@sryza @mridulm if we do decide we want to have a new type of waiting here, 
maybe we could add `spark.scheduler.initialWait` or something to give time to 
allow executors to show up before we try to schedule.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-04 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-42159221
  
Keep in mind this sleep is after we've already waited to receive a set of 
containers from the YARN ResourceManager and sent RPCs to the NodeManagers to 
start them.  So we do know how many executors to expect. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-04 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-42159371
  
@sryza Ah I see, in that case, we could wait on the specific condition, but 
what if e.g. there is a straggler and one container takes a long time to launch 
(or never launches)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-04 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-42159417
  
Yeah, I think we'd want some (probably configurable) max wait time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-05-04 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/634#issuecomment-42159450
  
Seems reasonable - still curious exactly what the case @mridulm is dealing 
with is. I guess it's just the case where the containers take a long time to 
launch (once allocated), much longer than the delay scheduling threhsold?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---