[ https://issues.apache.org/jira/browse/SPARK-46094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun reassigned SPARK-46094: ------------------------------------- Assignee: Parth Chandra > Support Executor JVM Profiling > ------------------------------ > > Key: SPARK-46094 > URL: https://issues.apache.org/jira/browse/SPARK-46094 > Project: Spark > Issue Type: New Feature > Components: Connect > Affects Versions: 4.0.0 > Reporter: Parth Chandra > Assignee: Parth Chandra > Priority: Major > Labels: pull-request-available > > To profile a Spark application a user or developer has to run a spark job > locally on the development machine and use a tool like Java flight recorder, > Yourkit, or async-profiler to record profiling information. Because profiling > can be expensive, the profiler is typically attached to the Spark jvm process > after the process has started and stopped once sufficient profiling data is > collected. > The developers environment is frequently different from the production > environment and may not yield accurate information. > However, the profiling process is hard when a Spark application runs as a > distributed job on a cluster where the developer may have limited access to > the actual nodes where the executor processes are running. Also, in > environments like Kubernetes where the executor pods may be removed as soon > as the job completes, retrieving the profiling information from each executor > pod can become quite tricky. > This feature is to add a low overhead sampling profiler like async-profiler > as a built in capability to the Spark job that can be turned on using only > user configurable parameters (async-profiler is a low overhead profiler that > can be invoked programmatically and is available as a single multi-platform > jar (for linux, and mac). > In addition, for convenience, the feature would save profiling output files > to the distributed file system so that information from all executors can be > available in a single place. > The feature would add an executor plugin that does not add any overhead > unless enabled and can be configured to accept profiler arguments as a > configuration parameter. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org