Github user YanTangZhai commented on the pull request:
https://github.com/apache/spark/pull/3810#issuecomment-69716974
@srowen I've updated this PR and resolved conflict. Please review again.
Thanks.
I explain three points:
1. I am not sure the description makes a case that it's significant enough
to bother...
Let me give two examples:
(1) When I entered ./bin/spark-sql in command line with yarn-client mode
and these resources requests as follows
spark.executor.instances 100
spark.executor.memory 4g
spark.executor.cores 1.
However, I didn't enter sql query string immediately. Because I was
interrupted for example I was called to attend a important meeting or I go to
fire fighting in our cluster. Even sometimes I forgot enter sql query string.
Then this application ran a night using 100 * 4g * 12h memory resources and
100 * 1 * 12h core resources. But it did nothing.
(2) When SparkContext with 100 spark.executor.instancesã4g
spark.executor.memoryã1 spark.executor.cores was initialized and HadoopRDD
scanned 11596 files taking 29.253s to compute splits. And then this job was
submitted by DAGScheduler. The resources of 100 * 4g * 29s memory resources and
100 * 1 * 29s core resources were idle.
2. There are several new API methods and changes here.
SparkContext firstly gets applicationId from taskScheduler and uses it to
initialize blockManager and eventLogger. And then dagScheduler runs job and
submits resources requests to cluster master.
Getting applicationId and submitting resources requests to cluster master
are split into two methods.
3. My overall impression is that this adds different code paths and
behaviors in different modes for little gain.
I'm sorry that I couldn't get mesos apis to split getting applicationId and
submitting resources requests to cluster master into two methods.
Thus slow start of application is currently only supported in YARN mode.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]