On Mon, 14 Mar 2011, Esztermann, Ansgar wrote:
Hi List,
can anyone give me a hint as to what scheduler performance to expect,
and what would typically be the bottleneck? We have 6.2u5 running here,
and one scheduler run takes about 5 minutes (with 600 jobs and 800
nodes).
From what I've seen with params monitor=1 and strace, the scheduler[1]
has a list of running jobs almost instantaneously, then spends about
four minutes at 100% CPU writing nothing to common/schedule (and
actually not doing any system calls but futex() and write (stdout).
During that time, it spews a lot of diagnostic messages about resource
utilization to stdout (see below[2]). Finally, reservations are made
(they take about four seconds each, which is not exactly fast, but quite
manageable), and jobs are started (very quickly).
Is such a long delay between the :RUNNING: and :RESERVING: lines normal?
I've thought our disk may be at fault here -- /var is often maxed out in
terms of bandwidth. But then again, the thread with 100% CPU doesn't do
any read() calls.
...
You're running at a bigger scale than we are (~420 hosts) but...
I/O on the $SGE_ROOT directory can certainly cause the problems you
report. I would take a look at what your disks are doing with "iostat -x"
if I were you. You might see a large number of small I/O requests: we
certainly did.
* If $SGE_ROOT is not local to the qmaster, MONITOR=1 can itself generate
a large number of small I/Os and be a significant contributor to the
problem. Replacing common/schedule with a symlink to a disk local to the
qmaster resolved many "slow running" problems for us.
* Do your compute nodes spool to local disk, or to an NFS share?
("qconf -sconf | grep execd_spool_dir")
* Is $SGE_ROOT local to the qmaster?
* Are you using classic or BDB spooling?
Mark
--
-----------------------------------------------------------------
Mark Dixon Email : [email protected]
HPC/Grid Systems Support Tel (int): 35429
Information Systems Services Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users