Re: [PR] [SPARK-51095][CORE][SQL] Include caller context for hdfs audit logs for calls from driver [spark]

via GitHub Thu, 13 Feb 2025 19:01:48 -0800


sririshindra commented on code in PR #49814:
URL: https://github.com/apache/spark/pull/49814#discussion_r1955487024



##########
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala:
##########
@@ -171,6 +172,11 @@ private[hive] class HiveClientImpl(
   private def newState(): SessionState = {
     val hiveConf = newHiveConf(sparkConf, hadoopConf, extraConfig, 
Some(initClassLoader))
     val state = new SessionState(hiveConf)
+    // When SessionState is initialized, the caller context is overridden by 
hive
+    // so we need to reset it back to the DRIVER

Review Comment:
   @pan3793 , @cnauroth I was finally able to properly test on this upstream 
version in a docker based Cluster with this current branch. Looks like the 
change in hiveClinetImpl is not needed in Spark 4. I checked if the caller 
context is being set during the sessionSate initialization in HiveClinetImpl 
and it looks like it is not. So, Once the CallerContext is set inside the 
SparkContext class its is not being overridden by anything else from the Driver 
process. 
   
   ```
   2025-02-14 02:26:23,249 INFO FSNamesystem.audit: allowed=true   ugi=root 
(auth:SIMPLE)  ip=/192.168.97.4        cmd=getfileinfo src=/warehouse/sample   
dst=null        perm=null       proto=rpc       
callerContext=SPARK_DRIVER_application_1739496632907_0005
   2025-02-14 02:26:23,265 INFO FSNamesystem.audit: allowed=true   ugi=root 
(auth:SIMPLE)  ip=/192.168.97.4        cmd=listStatus  src=/warehouse/sample   
dst=null        perm=null       proto=rpc       
callerContext=SPARK_DRIVER_application_1739496632907_0005
   2025-02-14 02:26:25,519 INFO FSNamesystem.audit: allowed=true   ugi=root 
(auth:SIMPLE)  ip=/192.168.97.5        cmd=open        
src=/warehouse/sample/part-00000-dd473344-76b1-4179-91ae-d15a8da4a888-c000      
dst=null        perm=null       proto=rpc       
callerContext=SPARK_TASK_application_1739496632907_0005_JId_0_SId_0_0_TId_0_0
   2025-02-14 02:26:26,345 INFO FSNamesystem.audit: allowed=true   ugi=root 
(auth:SIMPLE)  ip=/192.168.97.5        cmd=open        
src=/warehouse/sample/part-00000-dd473344-76b1-4179-91ae-d15a8da4a888-c000      
dst=null        perm=null       proto=rpc       
callerContext=SPARK_TASK_application_1739496632907_0005_JId_1_SId_1_0_TId_1_0
   ```
   I wasn't able to test with Iceberg though. This is because Iceberg doesn't 
support Spark4 yet. Once an Iceberg release with Spark 4 support is released, I 
will retest it and make any changes needed in a separate PR. But For now, I 
removed the change that was in HiveClientImpl.scala .  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-51095][CORE][SQL] Include caller context for hdfs audit logs for calls from driver [spark]

Reply via email to