[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

tgravescs Wed, 14 Sep 2016 09:24:01 -0700

Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14659#discussion_r78783274
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
    @@ -79,6 +82,13 @@ private[spark] abstract class Task[T](
           metrics)
         TaskContext.setTaskContext(context)
         taskThread = Thread.currentThread()
    +
    +    val callerContext =
    +      
s"Spark_AppId_${appId.getOrElse("")}_AppAttemptId_${appAttemptId.getOrElse("None")}"
 +
    +        
s"_JobId_${jobId.getOrElse("0")}_StageID_${stageId}_stageAttemptId_${stageAttemptId}"
 +
    +        s"_taskID_${taskAttemptId}_attemptNumber_${attemptNumber}"
    --- End diff --
    
    one concern I have about this is the length of the string.  This string is 
going to be sent on every hadoop RPC call and then I believe get into the hdfs 
audit log.  The audit log does have a config to truncate it (default 128) but 
it still gets sent over rpc.  So I would like to keep this string as small as 
possible while still being useful.  The one above seems very long to me since 
just the static string is like 90 characters.  So I would like to see some of 
those static strings removed or abbreviated.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

Reply via email to