parthchandra commented on PR #44021:
URL: https://github.com/apache/spark/pull/44021#issuecomment-1864980397

   > There is a difference between native thread id's and java thread ids. 
Given the async profiler output, can we map it to the corresponding task (given 
task's java thread id) ? My understanding is currently no - but if I am missing 
something, do let me know.
   
   Yes we can map the stack traces to the java thread. Here's how it looks 
(this is in intellij's profiler window)
   <img width="2046" alt="Screenshot 2023-12-20 at 9 46 49 AM" 
src="https://github.com/apache/spark/assets/6529136/e87d4b9d-4e00-4a89-aa50-c107399359bc";>
   
   > Assuming no, this means the stack traces generated are for all threads in 
the executor jvm - and so does not allow us to get stack traces and/or 
flamegraphs for a particular task, tasks of a stage, etc.
   
   We can get individual threads and even filter to profile a single thread. 
This PR specifically profiles every thread in the executor. 
   
   > If yes, this would be very useful - and will allow for future evolution as 
part of [SPARK-44893](https://issues.apache.org/jira/browse/SPARK-44893) [1].
   
   Ah, this JIRA makes it clearer. We can leverage the async-profiler to 
provide the features not yet implemented in  
[SPARK-45209](https://issues.apache.org/jira/browse/SPARK-45209). The current 
implementation uses a simple snapshot of the task stack traces which can be 
enhanced by using the async-profiler to get accurate profiling.  
    
   > I am not seeing a lot of value in including this into Apache Spark itself 
- plugin api is public, and users can leverage it to do precisely what the PR 
is proposing. On other hand, if the PR is integrating well with 
[SPARK-44893](https://issues.apache.org/jira/browse/SPARK-44893) [1] - and/or 
there is a path to leveraging it in that work, it would be more useful.
   
   I think we can certainly leverage this work. This PR by itself does not have 
the APIs needed to enhance SPARK-45209. It would probably need to be a separate 
PR because it may need changes to the UI implementation. We can either get a 
flamegraph (covering a period of time for a task) or collapsed call traces from 
which a flamegraph can be produced and the choice will affect the UI.
   
   > I am not -1 on this @dongjoon-hyun , but I am not seeing a lot of value in 
it: will let you make the call (also because I am on vacation, dont have my 
desktop handy to investigate in detail :) ).
   
   🙏🏾 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to