[GitHub] spark pull request: [SPARK-4962] [CORE] Put TaskScheduler.start ba...

YanTangZhai Tue, 13 Jan 2015 01:27:43 -0800

Github user YanTangZhai commented on the pull request:

    https://github.com/apache/spark/pull/3810#issuecomment-69716974
  
    @srowen I've updated this PR and resolved conflict. Please review again. 
Thanks.
    I explain three points:
    1. I am not sure the description makes a case that it's significant enough 
to bother...
    Let me give two examples:
    (1) When I entered ./bin/spark-sql in command line with yarn-client mode 
and these resources requests as follows 
    spark.executor.instances 100
    spark.executor.memory 4g
    spark.executor.cores 1.
    However, I didn't enter sql query string immediately. Because I was 
interrupted for example I was called to attend a important meeting or I go to 
fire fighting in our cluster. Even sometimes I forgot enter sql query string.
    Then this application ran a night using 100 * 4g * 12h memory resources and 
100 * 1 * 12h core resources. But it did nothing.
    (2) When SparkContext with 100 spark.executor.instancesã4g 
spark.executor.memoryã1 spark.executor.cores was initialized and HadoopRDD 
scanned 11596 files taking 29.253s to compute splits. And then this job was 
submitted by DAGScheduler. The resources of 100 * 4g * 29s memory resources and 
100 * 1 * 29s core resources were idle.
    2. There are several new API methods and changes here.
    SparkContext firstly gets applicationId from taskScheduler and uses it to 
initialize blockManager and eventLogger. And then dagScheduler runs job and 
submits resources requests to cluster master.
    Getting applicationId and submitting resources requests to cluster master 
are split into two methods.
    3. My overall impression is that this adds different code paths and 
behaviors in different modes for little gain.
    I'm sorry that I couldn't get mesos apis to split getting applicationId and 
submitting resources requests to cluster master into two methods.
    Thus slow start of application is currently only supported in YARN mode.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-4962] [CORE] Put TaskScheduler.start ba...

Reply via email to