wypoon edited a comment on issue #23767: [SPARK-26329][CORE] Faster polling of 
executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#issuecomment-494165380
 
 
   @squito I have implemented your suggestions.
   
   Also, @squito and @edwinalu, I ran some experiments. Previously, I'd run 
experiments with spark.executor.heartbeatInterval set to 1s. This time, I did 
not set this, so it defaults to 10s. In this case, if I did not set 
spark.executor.metrics.pollingInterval, so polling only happens at heartbeats, 
then we sometimes see metric peaks reported that are all zero. This happens 
when a task is very short, on the order of 10s; the executor heartbeat does not 
necessarily start when the executor starts, but some random time up to 10s 
later. Metric peaks of all zeros are seen both in the task metrics in 
SparkListenerTaskEnd events and in executor metrics in 
SparkListenerStageExecutorMetrics events.
   
   When a task starts in an executor, an entry for it is created in a CHM. The 
metrics associated with this entry are all zero to begin with, and don't change 
until a poll happens. On task end, if polling hasn't happened in the executor, 
the metrics are all zero. A SparkListenerTaskEnd event will be written with 
zeros for metrics. The EventLoggingListener keeps track of metric peaks per 
stage per executor; it updates the peaks on task end and on executor update 
(this happens on heartbeat). On stage end, a set of 
SparkListenerStageExecutorMetrics (one for each executor) will be written to 
the event log. If no heartbeat and thus no polling has happened in the executor 
and a stage ends, we will see a SparkListenerStageExecutorMetrics event for 
that executor with zeros for the metric peaks.
   
   We discussed this possibility, at least for task metrics, quite early on 
above, and the consensus was that it was ok to report zeros. I still think this 
is ok, but I think it would be helpful to have some kind of documentation that 
describes this behavior, but I'm not sure where would be an appropriate place 
to document this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to