On Thu, 26 Apr 2012 at 12:35 -0000, Rayson Ho wrote:

> Is fairshare not used for a reason?? Sounds to me that in your
> scenario it is only 1 account using most of the resources of the
> cluster.

I didn't go into it because I don't think fairshare is directly
related to the issue (I could be wrong).  I think I have fairshare
working now, but our usage is bursty and I'm still attempting to learn
about fairshare.  I'll write up my experiences once I learn more.

Fairshare is just a component of priority.  I used qalter to force the
job priority up as a way to bypass possibly faulty fairshare
configuration.

We also do not have preemption configured (another discussion for
another time).

We do not limit users artificially, if there is nothing else running
on the cluster a user can get 100% of it.  In the past when I have
tried to limit the users to not being able to allocate more than ~75%
of total resources I've received push back for having idle systems
when jobs where waiting to run.  This is more a management/user
counsel issue than technical.

Some resources are reserved for shorter jobs and currently there are
other user jobs flowing through the system which have shorter h_rt
values.

For this specific case:

The first large array job was submitted when the system was idle, so
this user was able to grab almost all of the resources.

Of 1500+ array job instances, a couple may end every hour, but these
will mostly release small fragmented resources.

The top priority job (the one I concerned about) needs larger memory
resources so cannot fit into the resources released by a single array
task of the large job.

The scheduler then moves down the job list and finds another of the
first array job tasks.  This fits in available resources so get
started.

I was hoping/expecting that reservations (qsub -R and sched_conf
max_reservations) was something that would help with this problem.

Stuart
-- 
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to