parthchandra commented on PR #44021:
URL: https://github.com/apache/spark/pull/44021#issuecomment-1863253175

   > A couple of queries:
   > 
   > a) IIRC async profiler does not have a way to associate the java thread 
with the underlying linux thread - and so from a future evolution point of 
view, we cannot corelate this with a task's stack traces - is this still the 
case ? If yes, given work being done as part of 
[SPARK-45151](https://issues.apache.org/jira/browse/SPARK-45151), is there a 
way to make this work with whatever @yaooqinn added support for ? 
([honest-profiler](https://github.com/jvm-profiling-tools/honest-profiler) has 
support for this, but unfortunately it does not have maven artifacts yet :-( )
   
   I believe async_profiler uses the JVM's internal `AsyncGetCallTrace` method 
(https://github.com/async-profiler/async-profiler?tab=readme-ov-file#cpu-profiling)
 to walk the stack for a thread including native calls made by the JVM. AFAIK, 
honest-profiler does the same. There is also the `AsyncGetStackTrace` API in 
the works https ://openjdk.org/jeps/435 (not sure if this is available yet). At 
some point we can investigate how to leverage the new API to add native calls 
to the work done in SPARK-45151.
   
   > b) This PR is creating one file per executor on hdfs - which would end up 
being very expensive as number of executors per application and number of 
applications increase. How are we looking to minimize this cost ?
   
   There is a configuration `spark.executor.profiling.fraction` to limit the 
number of executors to be profiled (default is 0.1). One can also specify 
profiler options to limit the sampling frequecy 
`spark.executor.profiling.options=...,interval=10ms,...` which can limit the 
size of the jfr file created (useful for very long running jobs).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to