Github user edwinalu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20940#discussion_r180287725
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
    @@ -234,8 +244,22 @@ private[spark] class EventLoggingListener(
         }
       }
     
    -  // No-op because logging every update would be overkill
    -  override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit = { }
    +  /**
    +   * Log if there is a new peak value for one of the memory metrics for 
the given executor.
    +   * Metrics are cleared out when a new stage is started in 
onStageSubmitted, so this will
    +   * log new peak memory metric values per executor per stage.
    +   */
    +  override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit = {
    --- End diff --
    
    The original analysis was focused on some test applications and larger 
applications. Looking at a broader sample of our applications, the logging 
overhead is higher, averaging about 14% per application, and 4% overall (sum of 
logging for ExecutorMetricsUpdate / sum of Spark history logs). The overhead 
for larger Spark history logs is mostly pretty small, but increases 
significantly for smaller ones (there's one at 49%).
    
    There's often some logging before the first stage starts, which is extra 
overhead especially for smaller applications/history logs, that doesn't contain 
useful information. It can also be high for the case where the stage takes a 
long time to run and memory is increasing rather than reaching the peak quickly 
-- logging at stage end would work better for this case. 
    
    I should also note that these numbers are for the case where only 4 longs 
are recorded, and with more metrics, the overhead would be higher, both in the 
size of each logged event, and the number of potential peaks, since a new peak 
for any metric would be logged.
    
    Since there will be more metrics added, and the cost is higher than 
originally added, logging at stage end could be a better choice. We would lose 
some information about overlapping stages, but this information wouldn't be 
visible anyway with the currently planned REST APIs or web UI, which just show 
the peaks for stages and executors. 
    
    For logging at stage end, we can log an ExecutorMetricsUpdate event for 
each executor that has sent a heartbeat for the stage just before logging the 
stage end -- this would have the peak value for each metric. This should be the 
minimum amount of logging needed to have information about peak values per 
stage per executor. Alternatively, the information could be added to the 
StageCompleted event for more compaction, but the code would be more 
complicated, with 2 paths for reading in values. Logging an event per executor 
at stage end seems like a reasonable choice, not too much extra logging or too 
much extra complexity.
    
    What are your thoughts?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to