parthchandra commented on PR #44021: URL: https://github.com/apache/spark/pull/44021#issuecomment-1863253175
> A couple of queries: > > a) IIRC async profiler does not have a way to associate the java thread with the underlying linux thread - and so from a future evolution point of view, we cannot corelate this with a task's stack traces - is this still the case ? If yes, given work being done as part of [SPARK-45151](https://issues.apache.org/jira/browse/SPARK-45151), is there a way to make this work with whatever @yaooqinn added support for ? ([honest-profiler](https://github.com/jvm-profiling-tools/honest-profiler) has support for this, but unfortunately it does not have maven artifacts yet :-( ) I believe async_profiler uses the JVM's internal `AsyncGetCallTrace` method (https://github.com/async-profiler/async-profiler?tab=readme-ov-file#cpu-profiling) to walk the stack for a thread including native calls made by the JVM. AFAIK, honest-profiler does the same. There is also the `AsyncGetStackTrace` API in the works https ://openjdk.org/jeps/435 (not sure if this is available yet). At some point we can investigate how to leverage the new API to add native calls to the work done in SPARK-45151. > b) This PR is creating one file per executor on hdfs - which would end up being very expensive as number of executors per application and number of applications increase. How are we looking to minimize this cost ? There is a configuration `spark.executor.profiling.fraction` to limit the number of executors to be profiled (default is 0.1). One can also specify profiler options to limit the sampling frequecy `spark.executor.profiling.options=...,interval=10ms,...` which can limit the size of the jfr file created (useful for very long running jobs). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
