Nishanth28 commented on code in PR #52931:
URL: https://github.com/apache/spark/pull/52931#discussion_r2525720637
##########
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala:
##########
@@ -134,6 +133,15 @@ private[spark] object BasePythonRunner extends Logging {
} else None
}
+ /**
+ * Creates a task identifier string for logging following Spark's standard
format.
+ * Format: "task <partition>.<attempt> in stage <stageId> (TID
<taskAttemptId>)"
+ */
+ private[spark] def taskIdentifier(context: TaskContext): String = {
+ s"task ${context.partitionId()}.${context.attemptNumber()} in stage
${context.stageId()} " +
Review Comment:
Thank you @HyukjinKwon for your input on this. Yes, TaskContext does provide
all those methods (partitionId(),
stageId(), taskAttemptId(), etc.).
I created the `taskIdentifier()` helper method because it's used **9 times**
throughout PythonRunner in various log statements, mainly for observability, so
that while debugging the UDF-related issues, it's easy to track the task
details and how much data processing is being performed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]