Github user sarutak commented on the pull request:

    https://github.com/apache/spark/pull/2250#issuecomment-55940146
  
    @JoshRosen Thanks for your advise. I tried to use application id for 
metrics name and I found there were something difficulty.
    
    Problem 1. We need application id before creating SparkEnv
    For driver, we need application id before creating SparkEnv because some 
metrics sources are loaded and registered within SparkEnv.create. To be exact, 
in SparkEnv.create, an instance of MetricsSystem is created and the constructor 
of MetricsSystem invokes registerSource method, which loads sources from 
metrics.properties. 
    Unfortunately, SparkEnv cannot create after before getting application id. 
Application id is gotten from SchedulerBackend (or its sub classes), but 
instances of SchedulerBackend cannot create before creating SparkEnv, for 
instance, TaskSchedulerImpl needs SparkEnv and TaskSchedulerImpl and 
SchedulerBackend are created at the same time.
    
    Problem 2. Difficult to pass application id to Executors via SparkConf
    Considering all of implementations of SchedulerBackends, we can get 
application id after invoking "taskScheduler.start()" in SparkContext.
    But, before finishing "taskScheduler.start()", Executors should be launched 
and extract SparkConf from DriverActor. In other words, Executors extract 
SparkConf before setting application id to SparkConf.
    
    So I have 2 solutions.
    1st is this PR. This is a compromised solution. When we use YARN Cluster 
mode, we can get application id by SparkConf.get("spark.yarn.app.id") before 
SparkEnv is created and if we use other modes, we use System.currentTimeMillis 
instead.
    
    2nd is #2432 . 
    To register metrics sources after getting application id, SparkEnv doesn't 
register metrics sources and doesn't start MetricsSystem within SparkEnv#create 
when SparkEnv-creator is a driver so after getting application id, register 
metrics and start MetricsSystem instead. This is for problem 1.
    
    And for problem 2, when launching ExecutorBackends, launcher pass 
application id to ExecutorBackends. It doesn't consider Mesos because 
MesosSchedulerBackend doesn't return application id so if we use Mesos, 
System.currentTimeMillis is used instead of application id.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to