Re: [OMPI users] Tracing of openmpi internal functions

2022-11-16 Thread Joseph Schuchart via users

Arun,

You can use a small wrapper script like this one to store the perf data 
in separate files:


```
$ cat perfwrap.sh
#!/bin/bash
exec perf record -o perf.data.$OMPI_COMM_WORLD_RANK $@
```

Then do `mpirun -n  ./perfwrap.sh ./a.out` to run all processes under 
perf. You can also select a subset of processes based on 
$OMPI_COMM_WORLD_RANK.


HTH,
Joseph


On 11/16/22 09:24, Chandran, Arun via users wrote:

Hi Jeff,

Thanks, I will check flamegraphs.

Sample generation with perf could be a problem, I don't think I can do 'mpirun -np <> 
perf record ' and get
the sampling done on all the cores and store each cores data (perf.data) 
separately to analyze it. Is it possible to do?

Came to know that amduprof support individual sample collection for mpi app 
running on multiple cores,  need to investigate further on this.

--Arun

From: users  On Behalf Of Jeff Squyres 
(jsquyres) via users
Sent: Monday, November 14, 2022 11:34 PM
To: users@lists.open-mpi.org
Cc: Jeff Squyres (jsquyres) ; arun c 

Subject: Re: [OMPI users] Tracing of openmpi internal functions


Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding.

Open MPI uses plug-in modules for its implementations of the MPI collective 
algorithms.  From that perspective, once you understand that infrastructure, 
it's exactly the same regardless of whether the MPI job is using intra-node or 
inter-node collectives.

We don't have much in the way of detailed internal function call tracing inside 
Open MPI itself, due to performance considerations.  You might want to look 
into flamegraphs, or something similar...?

--
Jeff Squyres
mailto:jsquy...@cisco.com

From: users  on behalf of arun c via users 

Sent: Saturday, November 12, 2022 9:46 AM
To: mailto:users@lists.open-mpi.org 
Cc: arun c 
Subject: [OMPI users] Tracing of openmpi internal functions
  
Hi All,


I am new to openmpi and trying to learn the internals (source code
level) of data transfer during collective operations. At first, I will
limit it to intra-node (between cpu cores, and sockets) to minimize
the scope of learning.

What are the best options (Looking for only free and open methods) for
tracing the openmpi code? (say I want to execute alltoall collective
and trace all the function calls and event callbacks that happened
inside the libmpi.so on all the cores)

Linux kernel has something called ftrace, it gives a neat call graph
of all the internal functions inside the kernel with time, is
something similar available?

--Arun




Re: [OMPI users] Tracing of openmpi internal functions

2022-11-16 Thread Chandran, Arun via users
Hi Jeff,

Thanks, I will check flamegraphs.

Sample generation with perf could be a problem, I don't think I can do 'mpirun 
-np <> perf record ' and get
the sampling done on all the cores and store each cores data (perf.data) 
separately to analyze it. Is it possible to do?

Came to know that amduprof support individual sample collection for mpi app 
running on multiple cores,  need to investigate further on this.

--Arun

From: users  On Behalf Of Jeff Squyres 
(jsquyres) via users
Sent: Monday, November 14, 2022 11:34 PM
To: users@lists.open-mpi.org
Cc: Jeff Squyres (jsquyres) ; arun c 

Subject: Re: [OMPI users] Tracing of openmpi internal functions


Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding. 

Open MPI uses plug-in modules for its implementations of the MPI collective 
algorithms.  From that perspective, once you understand that infrastructure, 
it's exactly the same regardless of whether the MPI job is using intra-node or 
inter-node collectives.

We don't have much in the way of detailed internal function call tracing inside 
Open MPI itself, due to performance considerations.  You might want to look 
into flamegraphs, or something similar...?

-- 
Jeff Squyres
mailto:jsquy...@cisco.com

From: users  on behalf of arun c via 
users 
Sent: Saturday, November 12, 2022 9:46 AM
To: mailto:users@lists.open-mpi.org 
Cc: arun c 
Subject: [OMPI users] Tracing of openmpi internal functions 
 
Hi All,

I am new to openmpi and trying to learn the internals (source code
level) of data transfer during collective operations. At first, I will
limit it to intra-node (between cpu cores, and sockets) to minimize
the scope of learning.

What are the best options (Looking for only free and open methods) for
tracing the openmpi code? (say I want to execute alltoall collective
and trace all the function calls and event callbacks that happened
inside the libmpi.so on all the cores)

Linux kernel has something called ftrace, it gives a neat call graph
of all the internal functions inside the kernel with time, is
something similar available?

--Arun