Github user edwinalu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20940#discussion_r181611071
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
    @@ -234,8 +244,22 @@ private[spark] class EventLoggingListener(
         }
       }
     
    -  // No-op because logging every update would be overkill
    -  override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit = { }
    +  /**
    +   * Log if there is a new peak value for one of the memory metrics for 
the given executor.
    +   * Metrics are cleared out when a new stage is started in 
onStageSubmitted, so this will
    +   * log new peak memory metric values per executor per stage.
    +   */
    +  override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit = {
    --- End diff --
    
    ExecutorMetrics right now has: jvmUsedHeapMemory, jvmUsedNonHeapMemory, 
onHeapExecutionMemory, offHeapExecutionMemory, onHeapStorageMemory, and 
offHeapStorageMemory. For logging at stage end, we can log the peak for each of 
these, but unified memory is more problematic. We could add new fields for on 
heap/off heap unified memory, but I'm inclined to remove unified memory (from 
all the places it is currently used), rather than add more fields. Users can 
still sum peak execution and peak storage values, which may be larger than the 
actual peak unified memory if they are not at peak values at the same time, but 
should still be a reasonable estimate for sizing.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to