Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/20532#discussion_r166855772
--- Diff:
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -53,10 +53,21 @@ package object config {
.booleanConf
.createWithDefault(false)
- private[spark] val EVENT_LOG_BLOCK_UPDATES =
- ConfigBuilder("spark.eventLog.logBlockUpdates.enabled")
- .booleanConf
- .createWithDefault(false)
+ private[spark] val EVENT_LOG_BLOCK_UPDATES_FRACTION =
+ ConfigBuilder("spark.eventLog.logBlockUpdates.fraction")
+ .doc("Expected number of times each blockUpdated event is chosen to
log, " +
+ "fraction must be [0, 1]. 0 by default, means disabled")
+ .doubleConf
+ .checkValue(_ >= 0, "The fraction must not be negative")
--- End diff --
But if you're using sampling, you will possibly miss some events, for
example if you're tracking memory usage, you may miss the peak memory usage
events that might be important to your analysis.
Ideally, I think you can write a custom SparkListener to dump executor
metrics to some time series DB like openTSDB, which might be better compared to
analysis the static files.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]