Github user LucaCanali commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22218#discussion_r212933292
  
    --- Diff: 
core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala ---
    @@ -73,6 +75,13 @@ class ExecutorSource(threadPool: ThreadPoolExecutor, 
executorId: String) extends
         registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0)
       }
     
    +  // Dropwizard metrics gauge measuring the executor's process (JVM) CPU 
time.
    +  // The value is returned in nanoseconds, the method return -1 if this 
operation is not supported.
    +  val osMXBean = 
ManagementFactory.getOperatingSystemMXBean.asInstanceOf[OperatingSystemMXBean]
    +  metricRegistry.register(MetricRegistry.name("executorCPUTime" ), new 
Gauge[Long] {
    +    override def getValue: Long = osMXBean.getProcessCpuTime()
    --- End diff --
    
    I believe the proposed metric tracking the executor CPU time is useful and 
adds additional information and convenience on top of the task CPU metric, as 
implemented in SPARK-22190. A couple of considerations to support this argument 
from some of the recent findings and experimentation on this:
    - the process CPU time contains all the CPU consumed by the JVM, notably 
including the CPU consumed by garbage collection, which can be important in 
some cases and definitely something we want to measure and analyze
    - the CPU time collected from the tasks is "harder to consume" in a 
dashboard as the CPU value is only updated at the end of the successful 
execution of the task, which makes it harder to handle for a dashboard in case 
of long-running tasks. In contrast, the executor process CPU time "dropwizard 
gauge" gives an up-to-date value of the CPU consumed by the executor at any 
time as it takes it directly from the OS.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to