pan3793 commented on PR #49483: URL: https://github.com/apache/spark/pull/49483#issuecomment-2591070767
@parthchandra I did some homework on integrating the profiler with the Spark UI flame graph. the first important question is what the pipeline of collecting and aggregating profiling events should be like. the current Spark UI building pipeline is: Live UI: events from Spark event bus => aggregated data in KVStore => UI History UI: events from event logs persisted on DFS => aggregated data in KVStore => Spark UI The JDK built-in JFR provides methods to read JFR events from both disk and in-process, so we can follow the current Spark UI process to use in-process JFR monitoring for live UI flame graph and read JFR results from DFS for History UI flame graph https://openjdk.org/jeps/349 > There are three factory methods to create a stream. > `EventStream::openRepository(Path)` constructs a stream from a disk repository. This is a way to monitor other processes by working directly against the file system. The location of the disk repository is stored in the system property `jdk.jfr.repository` that can be read using the attach API. It is also possible to perform in-process monitoring using the `EventStream::openRepository()` method. Unlike `RecordingStream`, it does not start a recording. Instead, the stream receives events only when recordings are started by external means, for example using JCMD or JMX. The method `EventStream::openFile(Path)` creates a stream from a recording file. It complements the `RecordingFile` class that already exists today. but I think that `async-profiler` does not support in-process monitoring(correct me if I'm wrong), so we must persist the results to disk first and read again to replay the events to aggregate and draw the flame graph, so the pipeline will be unified to: JFR results persisted on DFS => aggregated data in KVStore => flame graph Spark UI (live and history) If so, the draw of the flame graph is decoupled from how we collect and generate the JFR results, as long as the JFR results have a stable folder layout and name pattern on the DFS. As you can see, the proposed refactor does not change that. ``` <baseDir>/{{APP_ID}}/profile-driver.jfr -- new added for driver <baseDir>/{{APP_ID}}/profile-exec-{{EXECUTOR_ID}}.jfr -- unchanged for executors ``` Before making the Spark UI display flame graph directly, I'd like to allow users to download the JFR results from the SHS listing page directly so that they can import the JFR results to local tools like JDK Mission Control or IDEA to analyze their jobs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
