> I/O on the $SGE_ROOT directory can certainly cause the problems you 
> report. I would take a look at what your disks are doing with "iostat -x" 
> if I were you. You might see a large number of small I/O requests: we 
> certainly did.

There are many small requests, but they seem to be on /var, not $SGE_ROOT. Of 
course, this might be caused by some process apart from SGE. Our cluster 
management software uses MySQL, and that's using /var as well.

> * If $SGE_ROOT is not local to the qmaster, MONITOR=1 can itself generate 
> a large number of small I/Os and be a significant contributor to the 
> problem. Replacing common/schedule with a symlink to a disk local to the 
> qmaster resolved many "slow running" problems for us.
> 
> * Do your compute nodes spool to local disk, or to an NFS share?
> ("qconf -sconf | grep execd_spool_dir")

Local.

> * Is $SGE_ROOT local to the qmaster?

I was about to write "yes", but that's not entirely true. It's on drbd.

> * Are you using classic or BDB spooling?

Classic.


A.

-- 
Ansgar Esztermann
DV-Systemadministration
Max-Planck-Institut für biophysikalische Chemie, Abteilung 105


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to