Re: [gridengine users] Job Reservations/Job starvation

Reuti Thu, 26 Apr 2012 10:16:11 -0700

Am 26.04.2012 um 19:06 schrieb Stuart Barkley:

> On Thu, 26 Apr 2012 at 12:35 -0000, Rayson Ho wrote:
> 
>> Is fairshare not used for a reason?? Sounds to me that in your
>> scenario it is only 1 account using most of the resources of the
>> cluster.
> 
> I didn't go into it because I don't think fairshare is directly
> related to the issue (I could be wrong).  I think I have fairshare
> working now, but our usage is bursty and I'm still attempting to learn
> about fairshare.  I'll write up my experiences once I learn more.
> 
> Fairshare is just a component of priority.  I used qalter to force the
> job priority up as a way to bypass possibly faulty fairshare
> configuration.
> 
> We also do not have preemption configured (another discussion for
> another time).
> 
> We do not limit users artificially, if there is nothing else running
> on the cluster a user can get 100% of it.  In the past when I have
> tried to limit the users to not being able to allocate more than ~75%
> of total resources I've received push back for having idle systems
> when jobs where waiting to run.  This is more a management/user
> counsel issue than technical.
> 
> Some resources are reserved for shorter jobs and currently there are
> other user jobs flowing through the system which have shorter h_rt
> values.
> 
> For this specific case:
> 
> The first large array job was submitted when the system was idle, so
> this user was able to grab almost all of the resources.
> 
> Of 1500+ array job instances, a couple may end every hour, but these
> will mostly release small fragmented resources.
> 
> The top priority job (the one I concerned about) needs larger memory
> resources so cannot fit into the resources released by a single array
> task of the large job.
> 
> The scheduler then moves down the job list and finds another of the
> first array job tasks.  This fits in available resources so get
> started.


IIRC the array tasks are not handled as individual jobs but one job. I.e. once 
the first was scheduled, all other will follow.

-- Reuti


> I was hoping/expecting that reservations (qsub -R and sched_conf
> max_reservations) was something that would help with this problem.
> 
> Stuart
> -- 
> I've never been lost; I was once bewildered for three days, but never lost!
>                                        --  Daniel Boone
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Job Reservations/Job starvation

Reply via email to