Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/900#issuecomment-45786350
  
    @li-zhihui it looks like the JIRA you created 
(https://issues.apache.org/jira/browse/SPARK-1946) describes the issue fixed by 
#892 and described by a redundant issue 
(https://issues.apache.org/jira/browse/SPARK-1937). As @mridulm explained 
(thanks!!), the primary set of issues addressed by this pull request center 
around the fact that Spark-on-YARN has various performance problems when not 
enough executors have registered yet.  Could you update SPARK-1946 accordingly?
    
    @tgravescs I'm a little nervous about adding more scheduler config options, 
because I think the average user would have a very difficult time figuring out 
that their performance problems could be fixed by tuning this particular set of 
options.  The scheduler already has quite a few config options and I think we 
should be very cautious in adding more (cc @pwendell).  On the other hand, as 
you pointed out, it seems like a user typically wants to wait for some number 
of executors to become available, and those semantics aren't available to the 
application -- so we're stuck with adding something to the scheduler code.  Is 
it possible to do this only for the YARN scheduler / do you think it's 
necessary in standalone too?  Doing it only for YARN (and naming the config 
variable accordingly) could help signal to a naive user when tuning this might 
help.  From @mridulm's description, it sounds like many of the issues here are 
yarn-specific.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to