Re: [gridengine users] Job Reservations/Job starvation

Rayson Ho Thu, 26 Apr 2012 09:36:25 -0700

Is fairshare not used for a reason?? Sounds to me that in your
scenario it is only 1 account using most of the resources of the
cluster.


See "Scheduler Policies for Job Prioritization in the Sun N1 Grid
Engine 6 System" written by Charu Chaubal.

http://confluence.rcs.griffith.edu.au:8080/download/attachments/5668957/Sge-SCHEDULER+POLICIES-guide.pdf

Rayson



On Thu, Apr 26, 2012 at 12:20 PM, Stuart Barkley <[email protected]> wrote:
> Can someone give a quick overview of how job reservations are supposed
> to work?
>
> I have a large cluster where one user has ~1500 jobs executing which
> take several days each to run.  Every day several of the jobs finish
> and then new ones from the array job start.  These jobs have a minimal
> memory footprint (h_vmem=2G), so the new jobs fit exactly in the old
> footprint.
>
> I also have another user array job, but these have h_vmem=20G.  These
> jobs have been starved for several days and none have started since
> 20G never becomes free on a single host.
>
> h_vmem is consumable and generally works well preventing memory over
> allocation.
>
> In this case the blocking job turnover is pretty slow and I suspect
> fragmentation between nodes means that a single node is unlikely to
> actually become empty within any short time period.
>
> I have manually adjusted the job priority so the starved job is at the
> top of the waiting list (qalter -p 500).
>
> I've manually set the qalter "-R y" option, so the job should be
> considered for a reservation.
>
> I have "max_reservation 8" is sched_conf so believe that an internal
> jobs reservation should be done.
>
> Previous experimentation with job reservations on smaller turn around
> jobs appeared to have an effect, at some point new it appeared that
> grid engine was clearing off nodes for larger jobs, but I didn't find
> any way to actually confirm that is what was happening.
>
> Is there any way to tell that grid engine has even noticed this and
> created a reservation?  Is there a way to see what future resources
> have been "reserved"?
>
> Will the reservation adjust itself over time?  The running jobs all
> have a huge h_rt, but the actual run time varies a lot.  It would be
> good if grid engine would reallocate reservations as jobs end and
> better possibilities emerge.
>
> Current manual workaround:
>
> For this specific case I have created a temporary rqs to limit this
> specific user to only 1000 slots, but I need to be sure I reset this
> once the current blocked jobs get started.
>
> (still using sge6.2u5, CentOS 5)
>
> Thanks,
> Stuart Barkley
> --
> I've never been lost; I was once bewildered for three days, but never lost!
>                                        --  Daniel Boone
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Job Reservations/Job starvation

Reply via email to