> I/O on the $SGE_ROOT directory can certainly cause the problems you
> report. I would take a look at what your disks are doing with "iostat -x"
> if I were you. You might see a large number of small I/O requests: we
> certainly did.
There are many small requests, but they seem to be on /var, not $SGE_ROOT. Of
course, this might be caused by some process apart from SGE. Our cluster
management software uses MySQL, and that's using /var as well.
> * If $SGE_ROOT is not local to the qmaster, MONITOR=1 can itself generate
> a large number of small I/Os and be a significant contributor to the
> problem. Replacing common/schedule with a symlink to a disk local to the
> qmaster resolved many "slow running" problems for us.
>
> * Do your compute nodes spool to local disk, or to an NFS share?
> ("qconf -sconf | grep execd_spool_dir")
Local.
> * Is $SGE_ROOT local to the qmaster?
I was about to write "yes", but that's not entirely true. It's on drbd.
> * Are you using classic or BDB spooling?
Classic.
A.
--
Ansgar Esztermann
DV-Systemadministration
Max-Planck-Institut für biophysikalische Chemie, Abteilung 105
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users