Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/20532#discussion_r166852617 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -53,10 +53,21 @@ package object config { .booleanConf .createWithDefault(false) - private[spark] val EVENT_LOG_BLOCK_UPDATES = - ConfigBuilder("spark.eventLog.logBlockUpdates.enabled") - .booleanConf - .createWithDefault(false) + private[spark] val EVENT_LOG_BLOCK_UPDATES_FRACTION = + ConfigBuilder("spark.eventLog.logBlockUpdates.fraction") + .doc("Expected number of times each blockUpdated event is chosen to log, " + + "fraction must be [0, 1]. 0 by default, means disabled") + .doubleConf + .checkValue(_ >= 0, "The fraction must not be negative") --- End diff -- >how about control the max number of events recorded per time split? I think this approach is still hard to balance the user requirement and event log size. Spark will possibly ignore the events that is required by the user at the specific time. IMO, using "true" or "false" might be a feasible solution - whether to dump all the events or just ignore them. For normal user, by default (false) should be enough for them, but if you want further analysis, you can enable this by taking the risk of large event file. For the configuration, I think we could use something like "spark.eventLog.logVerboseEvent.enabled" to control all the verbose events that will be dumped manually.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org