On Tue, Mar 15, 2011 at 6:12 AM, Esztermann, Ansgar
<[email protected]> wrote:
> Thanks, that has helped a bit. I know now that most of the CPU time is spent 
> in the dispatching stage. However, it is still unclear to me why dispatching 
> should be such a time-consuming task.

Good, at least we are on the right track!! :-)

The stage is called "job dispatching" but it really is not about
sending job start requests to the execution hosts -- in fact, the
scheduler thread does not talk to the execution hosts directly. The
job dispatching stage
(daemons/qmaster/sge_sched_thread.c:dispatch_jobs()) in the scheduler
tries to find a queue instance (think of it is a host or slot) that is
suitable for running the job.

With a few hundred jobs, grid engine (this applies to SGE forks like
Open Grid Scheduler or Son of GE -- as we have not changed the
scheduler code yet) can easily the load. But as your cluster is
spending 5 minutes to decide where the jobs should go, I'm curious
what kind of resource requirements do they have, and most importantly,
do they have soft request specified??

Rayson





>
> 03/15/2011 10:53:56|schedu|master1|P|PROF: job dispatching took 327.370 s (0 
> fast, 0 fast_soft, 8 pe, 0 pe_soft, 4 res)
> 03/15/2011 10:53:56|schedu|master1|P|PROF: parallel matching            878   
>     262664         2634       159203       137234       159203       131007
> 03/15/2011 10:53:56|schedu|master1|P|PROF: sequential matching            0   
>          0            0            0            0            0            0
> 03/15/2011 10:53:56|schedu|master1|P|PROF: create pending job orders: 0.000 s
> 03/15/2011 10:53:56|schedu|master1|P|PROF: scheduled in 327.450 (u 337.270 + 
> s 7.960 = 345.230): 0 sequential, 0 parallel, 452 orders, 846 H, 214 Q, 839 
> QA, 10 J(qw), 431 J(r), 0 J(s), 0 J(h), 0 J(e), 0 J(x), 449 J(all), 57 C, 3 
> ACL, 149 PE, 12 U, 1 D, 0 PRJ, 1 ST, 0 CKPT, 0 RU, 1 gMes, 0 jMes, 452/3 
> pre-send, 0/0/0 pe-alg
>
> 03/15/2011 10:53:56|schedu|master1|P|PROF: send orders and cleanup took: 
> 0.020 (u 0.020,s 0.000) s
> 03/15/2011 10:53:56|schedu|master1|P|PROF: schedd run took: 327.630 s (init: 
> 0.000 s, copy: 0.130 s, run:327.470, free: 0.030 s, jobs: 449, categories: 
> 43/0)
>
>
> A.
>
> --
> Ansgar Esztermann
> DV-Systemadministration
> Max-Planck-Institut für biophysikalische Chemie, Abteilung 105
>
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to