On Mar 15, 2011, at 20:38 , Rayson Ho wrote: >> Thanks, that has helped a bit. I know now that most of the CPU time is spent >> in the dispatching stage. However, it is still unclear to me why dispatching >> should be such a time-consuming task. > > Good, at least we are on the right track!! :-) > > The stage is called "job dispatching" but it really is not about > sending job start requests to the execution hosts -- in fact, the > scheduler thread does not talk to the execution hosts directly. The > job dispatching stage > (daemons/qmaster/sge_sched_thread.c:dispatch_jobs()) in the scheduler > tries to find a queue instance (think of it is a host or slot) that is > suitable for running the job.
OK, that explains why it's CPU-bound (and in userspace, too). > With a few hundred jobs, grid engine (this applies to SGE forks like > Open Grid Scheduler or Son of GE -- as we have not changed the > scheduler code yet) can easily the load. But as your cluster is > spending 5 minutes to decide where the jobs should go, I'm curious > what kind of resource requirements do they have, and most importantly, > do they have soft request specified?? Hmm. I *think* there were no soft requests, but I am not sure. Right now, dispatching is back to 1..2 seconds, but I will be sure to take a look when times go up again. As to the resource requirements, all our jobs have -l h_rt, and many have an additional boolean complex. A. -- Ansgar Esztermann DV-Systemadministration Max-Planck-Institut für biophysikalische Chemie, Abteilung 105 _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
