Github user LucaCanali commented on a diff in the pull request:
https://github.com/apache/spark/pull/22218#discussion_r212933292
--- Diff:
core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala ---
@@ -73,6 +75,13 @@ class ExecutorSource(threadPool: ThreadPoolExecutor,
executorId: String) extends
registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0)
}
+ // Dropwizard metrics gauge measuring the executor's process (JVM) CPU
time.
+ // The value is returned in nanoseconds, the method return -1 if this
operation is not supported.
+ val osMXBean =
ManagementFactory.getOperatingSystemMXBean.asInstanceOf[OperatingSystemMXBean]
+ metricRegistry.register(MetricRegistry.name("executorCPUTime" ), new
Gauge[Long] {
+ override def getValue: Long = osMXBean.getProcessCpuTime()
--- End diff --
I believe the proposed metric tracking the executor CPU time is useful and
adds additional information and convenience on top of the task CPU metric, as
implemented in SPARK-22190. A couple of considerations to support this argument
from some of the recent findings and experimentation on this:
- the process CPU time contains all the CPU consumed by the JVM, notably
including the CPU consumed by garbage collection, which can be important in
some cases and definitely something we want to measure and analyze
- the CPU time collected from the tasks is "harder to consume" in a
dashboard as the CPU value is only updated at the end of the successful
execution of the task, which makes it harder to handle for a dashboard in case
of long-running tasks. In contrast, the executor process CPU time "dropwizard
gauge" gives an up-to-date value of the CPU consumed by the executor at any
time as it takes it directly from the OS.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]