Am 26.04.2012 um 18:38 schrieb Stuart Barkley:
> While looking at my previous issue with job reservations I have
> noticed a large performance issues with RQS and the scheduler.
>
> I have previously noticed my qmaster system often running at 100% when
> 1000 or more jobs where in the system. I had just assumed this was
> normal.
>
> When I set "max_reservation 8" the scheduler task takes almost 5
> minutes of 100% cpu on 1 core to run (running on a KVM virtual machine
> with 2 cores dedicated). This means that priority changes can take up
> to 10 minutes to be fully reflected in qstat output.
>
> I measure the scheduler run time with 'qconf -tsm'.
>
> starting configuration:
>
> max_reservation 8, max-slots-on-all-hosts enabled:
> Tue Apr 24 23:21:07 2012|-------------START-SCHEDULER-RUN-------------
> Tue Apr 24 23:25:36 2012|--------------STOP-SCHEDULER-RUN-------------
> Tue Apr 24 23:25:36 2012|-------------START-SCHEDULER-RUN-------------
> Tue Apr 24 23:30:09 2012|--------------STOP-SCHEDULER-RUN-------------
>
> max_reservation 0: max-slots-on-all-hosts enabled:
> Tue Apr 24 22:58:23 2012|-------------START-SCHEDULER-RUN-------------
> Tue Apr 24 22:59:22 2012|--------------STOP-SCHEDULER-RUN-------------
>
> This is with ~1800 jobs running, ~20 jobs in 'qw' state.
>
> I have just noticed that when I disable my max-slots-on-all-hosts RQS
> the scheduling time drops significantly.
>
> max_reservation 8, max-slots-on-all-hosts disabled:
> Thu Apr 26 11:56:24 2012|-------------START-SCHEDULER-RUN-------------
> Thu Apr 26 11:56:30 2012|--------------STOP-SCHEDULER-RUN-------------
>
> For the record my RQS is now disabled:
> {
> name max-slots-on-all-hosts
> description "Don't over commit host slots"
> enabled FALSE
> limit hosts {*} to slots=$num_proc
> }
>
> <rant>My internal logic says it should take any where near the
> original time to schedule 2000 jobs. But an awful lot of today's code
> will just consume resources without good reason. I come from a time
> when compute resources where actually very expensive and people paid
> attention to performance. Now-a-days, it seems people are willing to
> just throw memory and cpu at problems instead of careful
> programming.</rant>
+1
The hardware is getting faster, but the software slower. In the end you get the
same speed. ;-)
What was your "schedule_interval" set to?
Was "schedd_job_info true" set by accident?
-- Reuti
> This restores my belief in the original Grid Engine coders.
>
> (still using sge6.2u5, CentOS 5)
>
> Stuart Barkley
> --
> I've never been lost; I was once bewildered for three days, but never lost!
> -- Daniel Boone
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users