Hi Ray,
with the Tight Integration of Open MPI into SGE (http://
gridengine.sunsource.net/) you will get a correct accouting. Every
process created with qrsh (a replacement for ssh) will have an
additional group id attached and SGE will accumulate them all.
Depending on the size of the cluster, you might want to look into a
batch queuing system. In fact: we use it even local on some machines
to serialize the workflow.
-- Reuti
Am 12.11.2008 um 14:40 schrieb Fabian Hänsel:
So, to make sure I understand what happens... This command:
mpirun -np 2 myprog
starts the program "mpirun" and two processes of "myprog". So, what
the "real time" of /usr/bin/time reports is the wall clock for
mpirun.
Exactly.
Does the user time have any meaning here?
At least no meaning you can be sure of what it measures (could be time
of MPI infrastructure setup, could be time of setup + masterthread,
could be something completely different - depends on MPI
implementation).
I'm not very good with the
theory behind multi-processor programming...but Perl (for example)has
a "times" function (http://perldoc.perl.org/functions/times.html)
which "Returns a ... list ... for this process and the children of
this process". Are the two instances of myprog considered children
of mpirun?
In single system setup: generally yes.
In multisystem setup: no. The MPI processes span many computers over
e.g. ssh.
Hmmmm, I guess user time does not matter since it is real time that
we are interested in reducing.
Right. Even if we *could* measure user time of every MPI worker
process
correctly this was not what you are interested in: Depending on the
algorithm a significant amount of time could get spend waiting for
MPI
messages to arrive - and that time would not count as user time, but
also was not 'wasted' as something important happens.
Best regards
Fabian
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users