Hi Joshua,

How many cores do you have on your gridengine master?

Do you have any per-host quotas set? ( You need at least 2 cores
for scheduling decisions to be made involving per-host/per-user quotas in a 
timely manner....)

How large is your accounting file?

Any other programs, jobs, people, accessing your accounting file heavily?

Are you using the gridengine master for anything else besides scheduling
like copying the common area out in cron with huge accounting files?

If you are not already, try and run your scheduler on a 4 core 16 GB
virtual machine with basic underlying hardware set up for at least
one 10 GBit/s uplink.

Is your shared nfs area slow or getting hammered by users?

Just some points to help you in identifying the issues....

Ed Lauzier




-----Original Message-----
From: Joshua Baker-LePain [mailto:[email protected]]
Sent: Friday, November 1, 2013 01:44 PM
To: 'users'
Subject: [gridengine users] Debugging *really* long scheduling runs

I'm currently running Grid Engine 2011.11p1 on CentOS-6.
 I'm using classic spooling to a local disk, local $SGE_ROOT
 (except for $SGE_ROOT/$SGE_CELL/common), and local spooling directories
 on the nodes (of which there are more than 600). I'm occasionally seeing
 *really* long scheduling runs (the last two were 4005 and 4847 seconds).
 This leads to extra fun like:11/01/2013 
08:35:39|event_|sortinghat|W|acknowledge timeout after 600 seconds for event 
client (schedd:0) on host "$SGE_MASTER"11/01/2013 
08:35:39|event_|sortinghat|E|removing event client (schedd:0) on hos
t "$SGE_MASTER" after acknowledge timeout from event client listI
 have "PROFILE=1" set, and of course most of the time is spent in
 "job dispatching". But I'm really not sure how else to track down the
 cause of this. Where should I be looking? Are there any other options
I can set to get more info?Thanks.--
 Joshua Baker-LePainQB3 Shared Cluster 
SysadminUCSF_______________________________________________
users mailing 
[email protected]https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to