GitHub user Sherry302 opened a pull request:

    https://github.com/apache/spark/pull/14312

    [SPARK-15857]Add caller context in Spark: invoke YARN/HDFS API to set…

    ## What changes were proposed in this pull request?
    1. Pass 'jobId' to Task.
    2. Add a new function 'setCallerContext' in Utils. 'setCallerContext' 
function will call APIs of 'org.apache.hadoop.ipc.CallerContext' to set up 
spark caller contexts, which will be written into HDFS hdfs-audit.log or Yarn 
resource manager log.
    3. 'setCallerContext' function will be called in Yarn client, 
ApplicationMaster, and Task class.
     
     The Spark caller context written into HDFS log will be 
"JobID_stageID_stageAttemptId_taskID_attemptNumbe on Spark", and the Spark 
caller context written into Yarn log will be"{spark.app.name} running on Spark".
    
    ## How was this patch tested?
    Manual Tests against some Spark applications in Yarn client mode and 
cluster mode. Need to check if spark caller contexts were written into HDFS 
hdfs-audit.log and Yarn resource manager log successfully. 
    
    For example, run SparkKmeans on Spark:
    In Yarn resource manager log, there will be a record with the spark caller 
context.
    ...
       2016-07-21 13:36:26,318 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=wyang        
IP=127.0.0.1            OPERATION=Submit Application Request  
TARGET=ClientRMService   RESULT=SUCCESS            
APPID=application_1469125587135_0004   CALLERCONTEXT=SparkKMeans running on 
Spark
    ...
    
     In HDFS hdfs-audit.log, there will be records with spark caller contexts.
    ...
    2016-07-21 13:38:30,799 INFO FSNamesystem.audit: allowed=true           
ugi=wyang (auth:SIMPLE)    ip=/127.0.0.1   cmd=getfileinfo            
src=/lr_big.txt/_spark_metadata          dst=null           perm=null        
proto=rpc        callerContext=SparkKMeans running on Spark
    ...
    2016-07-21 13:39:35,584 INFO FSNamesystem.audit: allowed=true           
ugi=wyang (auth:SIMPLE)    ip=/127.0.0.1   cmd=open            src=/lr_big.txt  
dst=null           perm=null        proto=rpc            
callerContext=JobId_0_StageID_0_stageAttemptId_0_taskID_1_attemptNumber_0 on 
Spark
    ...
    
    If the hadoop version on which Spark runs does not have CallerContext APIs, 
there will be no information of Spark caller context in those logs.
    
    … up caller context

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Sherry302/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14312.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14312
    
----
commit 38c4f58dbf30d541260ee1b0381993a9bec393f8
Author: Weiqing Yang <[email protected]>
Date:   2016-07-22T01:21:03Z

    [SPARK-15857]Add caller context in Spark: invoke YARN/HDFS API to set up 
caller context

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to