Parth Chandra created SPARK-46094:
-------------------------------------

             Summary: Add support for code profiling executors
                 Key: SPARK-46094
                 URL: https://issues.apache.org/jira/browse/SPARK-46094
             Project: Spark
          Issue Type: New Feature
          Components: Spark Core
    Affects Versions: 4.0.0
            Reporter: Parth Chandra


To profile a Spark application a user or developer has to run a spark job 
locally on the development machine and use a tool like Java flight recorder, 
Yourkit, or async-profiler to record profiling information. Because profiling 
can be expensive, the profiler is typically attached to the Spark jvm process 
after the process has started and stopped once sufficient profiling data is 
collected.

The developers environment is frequently different from the production 
environment and may not yield accurate information.

However, the profiling process is hard when a Spark application runs as a 
distributed job on a cluster where the developer may have limited access to the 
actual nodes where the executor processes are running.  Also, in environments 
like Kubernetes where the executor pods may be removed as soon as the job 
completes, retrieving the profiling information from each executor pod can 
become quite tricky.

This feature is to add a low overhead sampling profiler like async-profiler as 
a built in capability to the Spark job that can be turned on using only user 
configurable parameters (async-profiler is a low overhead profiler that can be 
invoked programmatically and is available as a single multi-platform jar (for 
linux, and mac).

In addition, for convenience, the feature would save profiling output files to 
the distributed file system so that information from all executors can be 
available in a single place.

The feature would add an executor plugin that does not add any overhead unless 
enabled and can be configured to accept profiler arguments as a configuration 
parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to