mridulm commented on PR #44021: URL: https://github.com/apache/spark/pull/44021#issuecomment-1863873954
> `AsyncGetCallTrace` is used precisely to map calls in the native thread to calls in the java thread. Not sure exactly what you are looking for here. Are you looking to profile individual tasks? It certainly can be done, but would require some changes similar to [SPARK-45151](https://issues.apache.org/jira/browse/SPARK-45151) and some additional work if you want the profile available thru the UI. Or are you looking to enhance [SPARK-45151](https://issues.apache.org/jira/browse/SPARK-45151) and get a stack trace that includes native calls? This is a little harder via async_profiler since there is no API to get a snapshot. Note that getting a profile needs to be collected over a period of time and so is different from getting a snapshot as [SPARK-45151](https://issues.apache.org/jira/browse/SPARK-45151) is doing. There is a difference between native thread id's and java thread ids. Given the async profiler output, can we map it to the corresponding task (given task's java thread id) ? My understanding is currently no - but if I am missing something, do let me know. Assuming no, this means the stack traces generated are for all threads in the executor jvm - and so does not allow us to get stack traces and/or flamegraphs for a particular task, tasks of a stage, etc. If yes, this would be very useful - and will allow for future evolution as part of SPARK-44893 [1]. > > > Simply dumping per executor flamegraphs or stack traces has limited utility (and can be done today). > > I would suggest that this PR makes it trivially simple to profile with no setup required. On K8s, with ephemeral storage, it is not a simple task to dump a profile to disk and get it off the pod before the pod is destroyed (it was in fact the original motivation behind doing this). I am not seeing a lot of value in including this into Apache Spark itself - plugin api is public, and users can leverage it to do precisely what the PR is proposing. On other hand, if the PR is integrating well with SPARK-44893 [1] - and/or there is a path to leveraging it in that work, it would be more useful. I am not exactly -1 on this @dongjoon-hyun , but I am not seeing a lot of value in it: will let you make the call. [1] This is the jira I was trying to paste, but github mobile messed it up - and ended up referencing a subtask ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
