Github user Sherry302 commented on the issue:
https://github.com/apache/spark/pull/14659
Hi, @steveloughran Thanks a lot for the comments.
In the audit log, if users set some configuration in spark-defaults.conf
like `spark.eventLog.dir hdfs://localhost:9000/spark-history`, there will be a
record below in audit log:
```
2016-08-21 23:47:50,834 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=setPermission
src=/spark-history/application_1471835208589_0013.lz4.inprogress dst=null
perm=wyang:supergroup:rwxrwx--- proto=rpc
```
We can see the application id `application_1471835208589_0013` above.
Except that case, there is no Spark application information like application
name and application id (or in yarn appID+attemptID) in the audit log. So I
think it is better to include application name/id in the caller context. I have
updated the PR to include those information.
In the commit
[5ab2a41](https://github.com/apache/spark/pull/14659/commits/5ab2a41b93bfd73baf3798ba66fc7554b10b78e6),
application ID and attemptID (only in yarn cluster mode) are included in the
value of the caller context when Yarn `client` (if applications run in Yarn
client mode) or `ApplicationMaster` (if applications run in Yarn cluster mode)
do some operations in HDFS. So in the audit log, you can see `callercontext =
Spark_appName_**_appId_**_attemptID_**`:
_Applications in yarn cluster mode_
```
2016-08-21 22:55:44,568 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo
src=/lr_big.txt/_spark_metadata dst=null perm=null proto=rpc
callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
2016-08-21 22:55:44,573 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
2016-08-21 22:55:44,583 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=listStatus src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
2016-08-21 22:55:44,589 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
2016-08-21 22:55:46,163 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=mkdirs
src=/private/tmp/hadoop-wyang/nm-local-dir/usercache/wyang/appcache/application_1471835208589_0010/container_1471835208589_0010_01_000001/spark-warehouse
dst=null perm=wyang:supergroup:rwxr-xr-x proto=rpc
callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
```
_Applications in yarn client mode_
```
2016-08-21 22:59:20,775 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo
src=/lr_big.txt/_spark_metadata dst=null perm=null proto=rpc
callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011
2016-08-21 22:59:20,778 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011
2016-08-21 22:59:20,785 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=listStatus src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011
2016-08-21 22:59:20,791 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011
```
In the commit
[1512775,](https://github.com/apache/spark/pull/14659/commits/1512775a3faddb9de9299662a6f3bfec3f6fe205)
application ID, name and attempt ID (only in yarn cluster mode) are included
in the value of the caller context when `Tasks` do operations in HDFS. So in
the audit log, you can see
`callercontext=Spark_appName_**_appID_**_appAttemtID_**_JobId_**_StageID_**_stageAttemptId_**_taskID_**_attemptNumber_**`:
_Applications in Yarn cluster mode_
```
2016-08-21 22:55:50,977 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=Spark_AppId_application_1471835208589_0010_AppAttemptId_1_JobId_0_StageID_0_stageAttemptId_0_taskID_3_attemptNumber_0
2016-08-21 22:55:50,978 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=Spark_AppId_application_1471835208589_0010_AppAttemptId_1_JobId_0_StageID_0_stageAttemptId_0_taskID_5_attemptNumber_0
2016-08-21 22:55:50,978 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=Spark_AppId_application_1471835208589_0010_AppAttemptId_1_JobId_0_StageID_0_stageAttemptId_0_taskID_1_attemptNumber_0
```
_Applications in Yarn client mode_
```
2016-08-21 23:15:43,089 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=Spark_AppId_application_1471835208589_0012_AppAttemptId_None_JobId_0_StageID_0_stageAttemptId_0_taskID_3_attemptNumber_0
2016-08-21 23:15:43,089 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=Spark_AppId_application_1471835208589_0012_AppAttemptId_None_JobId_0_StageID_0_stageAttemptId_0_taskID_5_attemptNumber_0
2016-08-21 23:15:43,089 INFO FSNamesystem.audit: allowed=true
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt
dst=null perm=null proto=rpc
callerContext=Spark_AppId_application_1471835208589_0012_AppAttemptId_None_JobId_0_StageID_0_stageAttemptId_0_taskID_1_attemptNumber_0
```
For commit
[1512775,](https://github.com/apache/spark/pull/14659/commits/1512775a3faddb9de9299662a6f3bfec3f6fe205)
, Application Id and attemptID are passed to âTaskâ, is it good for
âTaskâ to see those application information? What do you think about this
@steveloughran ? Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]