Eric Vandenberg created SPARK-21598:
---------------------------------------

             Summary: Collect usability/events information from Spark History 
Server
                 Key: SPARK-21598
                 URL: https://issues.apache.org/jira/browse/SPARK-21598
             Project: Spark
          Issue Type: Improvement
          Components: Scheduler
    Affects Versions: 2.0.2
            Reporter: Eric Vandenberg
            Priority: Minor


The Spark History Server doesn't currently have a way to collect 
usability/performance on its main activity, loading/replay of history files.  
We'd like to collect this information to monitor, target and measure 
improvements in the spark debugging experience (via history server usage.)  
Once available these usability events could be analyzed using other analytics 
tools.

The event info to collect:
    SparkHistoryReplayEvent(
        logPath: String,
        logCompressionType: String,
        logReplayException: String // if an error
        logReplayAction: String // user replay, vs checkForLogs replay
        logCompleteFlag: Boolean,
        logFileSize: Long,
        logFileSizeUncompressed: Long,
        logLastModifiedTimestamp: Long,
        logCreationTimestamp: Long,
        logJobId: Long,
        logNumEvents: Int,
        logNumStages: Int,
        logNumTasks: Int
        logReplayDurationMillis: Long
)

The main spark engine has a SparkListenerInterface through which all compute 
engine events are broadcast.  It probably doesn't make sense to reuse this 
abstraction for broadcasting spark history server events since the "events" are 
not related or compatible with one another.  Also note the metrics registry 
collects history caching metrics but doesn't provide the type of above 
information.

Proposal here would be to add some basic event listener infrastructure to 
capture history server activity events.  This would work similar to how the 
SparkListener infrastructure works.  It could be configured in a similar 
manner, eg. spark.history.listeners=MyHistoryListenerClass.

Open to feedback / suggestions / comments on the approach or alternatives.

cc: [~vanzin] [~cloud_fan] [~ajbozarth] [~jiangxb1987]





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to