On Tue, 3 Jan 2006, Graham E Fagg wrote: > Do you have any tools such as Vampir (or its Intel equivalent) available > to get a time line graph ? (even jumpshot of one of the bad cases such as > the 128/32 for 256 floats below would help).
Hi Graham, I have attached an slog file of an all-to-all run for 1024 floats (ompi tuned alltoall). I could not get clog files for >32 processes - is this perhaps a limitation of MPE? So I decided to take the case 32 CPUs on 32 nodes which is performance-critical as well. From the run output you can see that 2 of the 5 tries yield a fast execution while the others are slow (see below). Carsten ckutzne@node001:~/mpe> mpirun -hostfile ./bhost1 -np 32 ./phas_mpe.x Alltoall Test on 32 CPUs. 5 repetitions. --- New category (first test not counted) --- MPI: sending 1024 floats ( 4096 bytes) to 32 processes ( 1 times) took ... 0.00690 seconds --------------------------------------------- MPI: sending 1024 floats ( 4096 bytes) to 32 processes ( 1 times) took ... 0.00320 seconds MPI: sending 1024 floats ( 4096 bytes) to 32 processes ( 1 times) took ... 0.26392 seconds ! MPI: sending 1024 floats ( 4096 bytes) to 32 processes ( 1 times) took ... 0.26868 seconds ! MPI: sending 1024 floats ( 4096 bytes) to 32 processes ( 1 times) took ... 0.26398 seconds ! MPI: sending 1024 floats ( 4096 bytes) to 32 processes ( 1 times) took ... 0.00339 seconds Summary (5-run average, timer resolution 0.000001): 1024 floats took 0.160632 (0.143644) seconds. Min: 0.003200 max: 0.268681 Writing logfile.... Finished writing logfile.
phas_mpe.x.slog2
Description: Binary data