JoshRosen opened a new pull request, #39770:
URL: https://github.com/apache/spark/pull/39770

   ### What changes were proposed in this pull request?
   
   This change updates `JsonProtocol` to add logic to exclude the "Task 
Executor Metrics" field from SparkListenerTaskEnd events in cases where all 
metric values are zero.
   
   
   ### Why are the changes needed?
   
   This is done to save space from event logs when Spark runs under its default 
out-of-the-box configuration and tasks are shorter than the executor hearbeat 
interval.
   
   [SPARK-26329](https://issues.apache.org/jira/browse/SPARK-26329) added "Task 
Executor Metrics" to JsonProtocol SparkListenerTaskEnd JSON. With the default 
`spark.executor.metrics.pollingInterval = 0` configuration these metric values 
are only updated when heartbeats occur. If a task launches and finishes between 
executor heartbeats then all of the "Task Executor Metrics" values will be 
zero. For jobs with large numbers of short tasks, this contributes to 
significant event log bloat.
   
   JsonProtocol already knows how to handle the absence of the "Task Executor 
Metrics" field, so I think it's safe for us to omit this field when all values 
are zero.
   
   There is a possibility that third-party code which directly consumes Spark 
event logs might be relying on the presence of this field. As an "escape-hatch" 
to avoid breaking such workloads, I have introduced a 
`spark.eventLog.includeAllZeroTaskExecutorMetrics` (default `false`) which can 
be set to `true` to restore the old behavior.
   
   ### Does this PR introduce _any_ user-facing change?
   No user-facing changes in history server.
   
   This could be considered a user-facing change from the perspective of 
third-party code which does its own processing of Spark logs, hence the config. 
I think it's reasonable to set a sensible default which shrinks event logs for 
most users instead of keeping a conservative default to support a hypothetical 
third-party use case of our event logs.
   
   ### How was this patch tested?
   
   Added new test cases in JsonProtocolSuite.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to