Re: [PR] [SPARK-54539][SQL] TaskMetrics add spillTime metrics [spark]

via GitHub Wed, 03 Dec 2025 18:03:17 -0800


AngersZhuuuu commented on code in PR #53247:
URL: https://github.com/apache/spark/pull/53247#discussion_r2587169686



##########
core/src/main/scala/org/apache/spark/InternalAccumulator.scala:
##########
@@ -40,6 +40,7 @@ private[spark] object InternalAccumulator {
   val RESULT_SERIALIZATION_TIME = METRICS_PREFIX + "resultSerializationTime"
   val MEMORY_BYTES_SPILLED = METRICS_PREFIX + "memoryBytesSpilled"
   val DISK_BYTES_SPILLED = METRICS_PREFIX + "diskBytesSpilled"
+  val SPILL_TIME = METRICS_PREFIX + "spillTime"

Review Comment:
   However, time more directly reflects the impact on overall performance. 
Spill size is not a direct performance evaluation. We can optimize the impact 
of spills on storage based on spill size and perform targeted optimizations. 
Time can be used to directly evaluate the impact on execution performance and 
the effectiveness of optimizations. Moreover, adding this metric is not costly, 
similar to the `build size` and `build time` of hash joins.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-54539][SQL] TaskMetrics add spillTime metrics [spark]

Reply via email to