cnauroth opened a new pull request, #49858:
URL: https://github.com/apache/spark/pull/49858
### What changes were proposed in this pull request?
Initialize the Hadoop RPC `CallerContext` during History Server startup,
before `FileSystem` access. Calls to HDFS will get tagged in the audit log as
originating from the History Server.
### Why are the changes needed?
Other Spark processes set the `CallerContext`, so that additional auditing
context propagates in Hadoop RPC calls. This PR provides auditing context for
calls from the History Server. Other callers provide additional information
like app ID, attempt ID, etc. We don't provide that here through History
Server, which serves multiple apps/attempts.
### Does this PR introduce _any_ user-facing change?
Yes. In environments that configure `hadoop.caller.context.enabled=true`,
users will now see additional information in the HDFS audit logs explicitly
stating that calls originated from the History Server.
### How was this patch tested?
A new unit test has been added. All tests pass in the history package.
```
build/mvn -pl core test -Dtest=none
-DmembersOnlySuites=org.apache.spark.deploy.history
```
When the changes are deployed to a running cluster, the new caller context
is visible in the HDFS audit logs.
```
2025-02-07 23:00:54,657 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0012
dst=null perm=null proto=rpc
callerContext=SPARK_HISTORY
2025-02-07 23:00:54,683 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0011
dst=null perm=null proto=rpc
callerContext=SPARK_HISTORY
2025-02-07 23:00:54,699 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0011
dst=null perm=null proto=rpc
callerContext=SPARK_HISTORY
2025-02-07 23:00:54,715 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0010
dst=null perm=null proto=rpc
callerContext=SPARK_HISTORY
2025-02-07 23:00:54,729 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0010
dst=null perm=null proto=rpc
callerContext=SPARK_HISTORY
2025-02-07 23:00:54,743 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0009
dst=null perm=null proto=rpc
callerContext=SPARK_HISTORY
2025-02-07 23:00:54,755 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0009
dst=null perm=null proto=rpc
callerContext=SPARK_HISTORY
2025-02-07 23:00:54,767 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0008
dst=null perm=null proto=rpc
callerContext=SPARK_HISTORY
2025-02-07 23:00:54,779 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0008
dst=null perm=null proto=rpc
callerContext=SPARK_HISTORY
2025-02-07 23:01:04,160 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=listStatus
src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history dst=null
perm=null proto=rpc callerContext=SPARK_HISTORY
```
### Was this patch authored or co-authored using generative AI tooling?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]