On Tue, 3 Jan 2006, Graham E Fagg wrote:

> Do you have any tools such as Vampir (or its Intel equivalent) available
> to get a time line graph ? (even jumpshot of one of the bad cases such as
> the 128/32 for 256 floats below would help).

Hi Graham,

I have attached an slog file of an all-to-all run for 1024 floats (ompi
tuned alltoall). I could not get clog files for >32 processes - is this
perhaps a limitation of MPE? So I decided to take the case 32 CPUs on
32 nodes which is performance-critical as well. From the run output you
can see that 2 of the 5 tries yield a fast execution while the others
are slow (see below).

Carsten



ckutzne@node001:~/mpe> mpirun -hostfile ./bhost1 -np 32 ./phas_mpe.x
Alltoall Test on 32 CPUs. 5 repetitions.
--- New category (first test not counted) ---
MPI: sending    1024 floats (    4096 bytes) to 32 processes (      1 times) 
took ...    0.00690 seconds
---------------------------------------------
MPI: sending    1024 floats (    4096 bytes) to 32 processes (      1 times) 
took ...    0.00320 seconds
MPI: sending    1024 floats (    4096 bytes) to 32 processes (      1 times) 
took ...    0.26392 seconds !
MPI: sending    1024 floats (    4096 bytes) to 32 processes (      1 times) 
took ...    0.26868 seconds !
MPI: sending    1024 floats (    4096 bytes) to 32 processes (      1 times) 
took ...    0.26398 seconds !
MPI: sending    1024 floats (    4096 bytes) to 32 processes (      1 times) 
took ...    0.00339 seconds
Summary (5-run average, timer resolution 0.000001):
      1024 floats took 0.160632 (0.143644) seconds. Min: 0.003200  max: 0.268681
Writing logfile....
Finished writing logfile.

Attachment: phas_mpe.x.slog2
Description: Binary data

Reply via email to