pan3793 commented on PR #49483: URL: https://github.com/apache/spark/pull/49483#issuecomment-2589508768
SPARK-46094 added the JVM profiler but limited the scope to executor-only, and all classes and configurations are located under the "executor" namespace, it looks a little bit weird if we accept this PR to extend the scope also cover the driver. Additionally, Spark already has the following configurations for profiling Python process - `spark.python.profile` - `spark.python.profile.memory` - `spark.python.profile.dump` Given that, I propose the following changes: - rename `ExecutorProfilerPlugin` to `ProfilerPlugin`(or `JVMProfilerPlugin`) - move classes to `org.apache.spark.profile`(`profile` or `profiler`) package - move configuration to `spark.profile.`(or `spark.jvm.profile.`) namespace then the configurations might like ``` spark.profile.driver.enabled spark.profile.executor.enabled spark.profile.executor.fraction spark.profile.dfsDir spark.profile.local spark.profile.options spark.profile.writeInterval ``` some thoughts: - `spark.profile.` vs `spark.jvm.profile.`, I prefer the former because 1) JVM is the main process of Spark that we don't need to emphasize. 2) async profiler 3.0 added support for profiling native process which gives us a chance to extend the profiling scope beyond the JVM process in the future - `spark.profile.options`. currently, we forcibly use `async-profiler`, while modern JDK also has built-in JFR support, if we want to make the `profiler` implementation substitutable, we may need to rename it to `spark.profile.asyncProfiler.options` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
