Github user Sherry302 commented on the issue:
https://github.com/apache/spark/pull/14659
Hi, @tgravescs Thank you very much for the review. I have updated the PR
based on your every comment, including adding a CallerContext class, updating
java doc, and made the caller context string shorter, etc. Manual Tests against
some Spark applications in Yarn client mode and Yarn cluster mode, and spark
caller contexts are written into HDFS `hdfs-audit.log` successfully.
The following is the screenshot of the audit log (SparkKMeans in yarn
client mode):
<img width="1407" alt="screen shot 2016-09-14 at 10 34 25 pm"
src="https://cloud.githubusercontent.com/assets/8546874/18539563/1eb16748-7acd-11e6-840a-0e8bfabf5954.png">
This is the caller context which was written into `hdfs-audit.log` by `Yarn
Client`:
```
2016-09-14 22:28:59,341 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=SPARK_AppName_SparkKMeans_AppID_application_1473908768790_0007
```
The callerContext above is `SPARK_AppName_***_AppID_***`
These are the caller contexts which were written into `hdfs-audit.log` by
`Task`:
```
2016-09-14 22:29:06,525 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=SPARK_AppID_application_1473908768790_0007_JobID_0_StageID_0_0_TaskId_1_0
2016-09-14 22:29:06,526 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=SPARK_AppID_application_1473908768790_0007_JobID_0_StageID_0_0_TaskId_0_0
2016-09-14 22:29:06,526 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=SPARK_AppID_application_1473908768790_0007_JobID_0_StageID_0_0_TaskId_2_0
```
The callContext above is
`SPARK_AppID_***_JobID_***_StageID_***_(StageAttemptID)_TaskId_***_(TaskAttemptNumber)`.
The static strings `jobAttemptID`, `stageAttemptID`, and `attemptNumber` of
tasks have been deleted. (For `jobAttemptID`, please refer the following
records produced by SparkKMeans ran in Yarn cluster mode)
The records below were written into `hdfs-audit.log` when SparkKMeans ran
in Yarn cluster mode:
```
2016-09-14 22:25:30,100 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=mkdirs
src=/private/tmp/hadoop-wyang/nm-local-dir/usercache/wyang/appcache/application_1473908768790_0006/container_1473908768790_0006_01_000001/spark-warehouse
dst=null perm=wyang:supergroup:rwxr-xr-x proto=rpc
callerContext=SPARK_AppName_org.apache.spark.examples.SparkKMeans_AppID_application_1473908768790_0006_1
2016-09-14 22:25:33,635 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=SPARK_AppID_application_1473908768790_0006_1_JobID_0_StageID_0_0_TaskId_0_0
2016-09-14 22:25:33,635 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=SPARK_AppID_application_1473908768790_0006_1_JobID_0_StageID_0_0_TaskId_2_0
2016-09-14 22:25:33,635 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=SPARK_AppID_application_1473908768790_0006_1_JobID_0_StageID_0_0_TaskId_1_0
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]