Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20532#discussion_r166855772
  
    --- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
    @@ -53,10 +53,21 @@ package object config {
           .booleanConf
           .createWithDefault(false)
     
    -  private[spark] val EVENT_LOG_BLOCK_UPDATES =
    -    ConfigBuilder("spark.eventLog.logBlockUpdates.enabled")
    -      .booleanConf
    -      .createWithDefault(false)
    +  private[spark] val EVENT_LOG_BLOCK_UPDATES_FRACTION =
    +    ConfigBuilder("spark.eventLog.logBlockUpdates.fraction")
    +      .doc("Expected number of times each blockUpdated event is chosen to 
log, " +
    +        "fraction must be [0, 1]. 0 by default, means disabled")
    +      .doubleConf
    +      .checkValue(_ >= 0, "The fraction must not be negative")
    --- End diff --
    
    But if you're using sampling, you will possibly miss some events, for 
example if you're tracking memory usage, you may miss the peak memory usage 
events that might be important to your analysis.
    
    Ideally, I think you can write a custom SparkListener to dump executor 
metrics to some time series DB like openTSDB, which might be better compared to 
analysis the static files.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to