On Mar 15, 2011, at 20:38 , Rayson Ho wrote:

>> Thanks, that has helped a bit. I know now that most of the CPU time is spent 
>> in the dispatching stage. However, it is still unclear to me why dispatching 
>> should be such a time-consuming task.
> 
> Good, at least we are on the right track!! :-)
> 
> The stage is called "job dispatching" but it really is not about
> sending job start requests to the execution hosts -- in fact, the
> scheduler thread does not talk to the execution hosts directly. The
> job dispatching stage
> (daemons/qmaster/sge_sched_thread.c:dispatch_jobs()) in the scheduler
> tries to find a queue instance (think of it is a host or slot) that is
> suitable for running the job.

OK, that explains why it's CPU-bound (and in userspace, too).

> With a few hundred jobs, grid engine (this applies to SGE forks like
> Open Grid Scheduler or Son of GE -- as we have not changed the
> scheduler code yet) can easily the load. But as your cluster is
> spending 5 minutes to decide where the jobs should go, I'm curious
> what kind of resource requirements do they have, and most importantly,
> do they have soft request specified??

Hmm. I *think* there were no soft requests, but I am not sure. Right now, 
dispatching is back to 1..2 seconds, but I will be sure to take a look when 
times go up again.
As to the resource requirements, all our jobs have -l h_rt, and many have an 
additional boolean complex.


A.

-- 
Ansgar Esztermann
DV-Systemadministration
Max-Planck-Institut für biophysikalische Chemie, Abteilung 105


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to