Hi List, can anyone give me a hint as to what scheduler performance to expect, and what would typically be the bottleneck? We have 6.2u5 running here, and one scheduler run takes about 5 minutes (with 600 jobs and 800 nodes).
>From what I've seen with params monitor=1 and strace, the scheduler[1] has a >list of running jobs almost instantaneously, then spends about four minutes at >100% CPU writing nothing to common/schedule (and actually not doing any system >calls but futex() and write (stdout). During that time, it spews a lot of >diagnostic messages about resource utilization to stdout (see below[2]). >Finally, reservations are made (they take about four seconds each, which is >not exactly fast, but quite manageable), and jobs are started (very quickly). Is such a long delay between the :RUNNING: and :RESERVING: lines normal? I've thought our disk may be at fault here -- /var is often maxed out in terms of bandwidth. But then again, the thread with 100% CPU doesn't do any read() calls. Thanks, A. [1] At least, I assume that's the scheduler -- it is simply the thread with the highest CPU percentage. [2] It looks more or less like this: ------------------------------- RUE_name (String) * = ////node11-33/ RUE_utilized_now (Double) = 0.000000 RUE_utilized (List) = empty RUE_utilized_now_non (Double) = 0.000000 RUE_utilized_nonexcl (List) = empty -- Ansgar Esztermann DV-Systemadministration Max-Planck-Institut für biophysikalische Chemie, Abteilung 105 _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
