Re: [gridengine users] RQS and scheduler performance (max-slots-on-all-hosts)

Reuti Thu, 26 Apr 2012 09:58:42 -0700

Am 26.04.2012 um 18:38 schrieb Stuart Barkley:

> While looking at my previous issue with job reservations I have
> noticed a large performance issues with RQS and the scheduler.
> 
> I have previously noticed my qmaster system often running at 100% when
> 1000 or more jobs where in the system.  I had just assumed this was
> normal.
> 
> When I set "max_reservation 8" the scheduler task takes almost 5
> minutes of 100% cpu on 1 core to run (running on a KVM virtual machine
> with 2 cores dedicated).  This means that priority changes can take up
> to 10 minutes to be fully reflected in qstat output.
> 
> I measure the scheduler run time with 'qconf -tsm'.
> 
> starting configuration:
> 
> max_reservation 8, max-slots-on-all-hosts enabled:
>  Tue Apr 24 23:21:07 2012|-------------START-SCHEDULER-RUN-------------
>  Tue Apr 24 23:25:36 2012|--------------STOP-SCHEDULER-RUN-------------
>  Tue Apr 24 23:25:36 2012|-------------START-SCHEDULER-RUN-------------
>  Tue Apr 24 23:30:09 2012|--------------STOP-SCHEDULER-RUN-------------
> 
> max_reservation 0: max-slots-on-all-hosts enabled:
>  Tue Apr 24 22:58:23 2012|-------------START-SCHEDULER-RUN-------------
>  Tue Apr 24 22:59:22 2012|--------------STOP-SCHEDULER-RUN-------------
> 
> This is with ~1800 jobs running, ~20 jobs in 'qw' state.
> 
> I have just noticed that when I disable my max-slots-on-all-hosts RQS
> the scheduling time drops significantly.
> 
> max_reservation 8, max-slots-on-all-hosts disabled:
>  Thu Apr 26 11:56:24 2012|-------------START-SCHEDULER-RUN-------------
>  Thu Apr 26 11:56:30 2012|--------------STOP-SCHEDULER-RUN-------------
> 
> For the record my RQS is now disabled:
> {
>   name         max-slots-on-all-hosts
>   description  "Don't over commit host slots"
>   enabled      FALSE
>   limit        hosts {*} to slots=$num_proc
> }
> 
> <rant>My internal logic says it should take any where near the
> original time to schedule 2000 jobs.  But an awful lot of today's code
> will just consume resources without good reason.  I come from a time
> when compute resources where actually very expensive and people paid
> attention to performance.  Now-a-days, it seems people are willing to
> just throw memory and cpu at problems instead of careful
> programming.</rant>


+1

The hardware is getting faster, but the software slower. In the end you get the 
same speed. ;-)

What was your "schedule_interval" set to?

Was "schedd_job_info true" set by accident?

-- Reuti


> This restores my belief in the original Grid Engine coders.
> 
> (still using sge6.2u5, CentOS 5)
> 
> Stuart Barkley
> -- 
> I've never been lost; I was once bewildered for three days, but never lost!
>                                        --  Daniel Boone
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] RQS and scheduler performance (max-slots-on-all-hosts)

Reply via email to