Re: [gridengine users] Large cluster with memory reservation leaving cores idle

Fotis Georgatos Tue, 08 Mar 2016 02:41:26 -0800

Hi William, Christopher, all,

qtop is not yet made to visualize memory allocation info (or any other
consumables so far)
yet there is no reason why it couldn't:
https://github.com/qtop/qtop/tree/develop


If people come up with some scheme which makes senses to debug job
allocation information,
it should be possible to extend qtop in some way that makes it helpful for
that particular task.
That all basically boils down to some python-scriptable logic to present
the right information.

hoping you find it useful, F.



On 8 March 2016 at 09:48, William Hay <w....@ucl.ac.uk> wrote:

> On Mon, Mar 07, 2016 at 11:20:04PM +0000, Christopher Black wrote:
> > Greetings!
> > We are running SoGE (mix of 8.1.6 and 8.1.8, soon 8.1.8 everywhere) on a
> > ~300 node cluster.
> > We utilize RQS and memory reservation via a complex to allow most nodes
> to
> > be shared among multiple queues and run a mix of single core and multi
> > core jobs.
> > Recently when we hit 10k+ jobs in qw, we are seeing the job dispatch rate
> > not keep up with how quickly jobs are finishing and leaving cores idle.
> > Our jobs aren't particularly short (avg ~2h).
> > We sometimes have a case where there are thousands of jobs not suitable
> > for execution due to hitting a per-queue RQS rule, but we still want
> other
> > jobs to get started on idle cores.
> >
> > We have tried tuning some parameters but could use some advice as we are
> > now having trouble keeping all the cores busy despite there being many
> > eligible jobs in qw.
>
> If you have a complex controlling memory and users are requesting lost of
> memory
> then a node could be "full" with idle cores.  Have you tried running
> qalter -w p
> on the highest priority job you think should run?  What does it say?
>
> >
> > We have tried tuning max_advance_reservations,
> > max_functional_jobs_to_schedule, max_pending_tasks_per_job,
> > max_reservation as well as disabling schedd_job_info. We have applied
> some
> > of the scaling best practices such as using local spools. I saw mention
> of
> > MAX_DYN_EC but have not tried that yet, is it fairly safe to do so?
> > Any other changes we should consider?
>
> Is this just a delay in the scheduler running?  What values do you have
> for scheduler_interval, flush_submit_sec and flush_finish_sec in the
> scheduler
> config?
>
> Is the scheduler taking a long time to do its work?  Set PROFILE to
> 1 in the scheduler's params and look for entries in the messages file
> (or wherever you send such messages if logging via syslog) containing
> the string "schedd run took:" to see how long it is taking.  This
> number can vary a lot depending on cluster config and the sort
> of requests submitted to it.  On one of our clusters a scheduling run takes
> 3-4 seconds (could be shorter still if we disabled reservations).  Another
> one was taking 10-15 minutes or more for a scheduling run before I tweaked
> it.
> A lot of jobs can finish in 10-15 minutes.
>
> > Any thoughts or suggestions?
> >
> > Also, we sometimes see the following in spool/qmaster/messages:
> > 03/07/2016 18:00:09|worker|pqmaster|E|resources no longer available for
> > start of job 7499766.1
> > 03/07/2016 18:00:09|worker|pqmaster|E|debiting 31539000000.000000 of
> > h_vmem on host pnode141.nygenome.org for 1 slots would exceed remaining
> > capacity of 2986364800.000000
> > 03/07/2016 18:00:09|worker|pqmaster|E|resources no longer available for
> > start of job 7499767.1
> > 03/07/2016 18:00:09|worker|pqmaster|E|debiting 18022000000.000000 of
> > h_vmem on host pnode176.nygenome.org for 1 slots would exceed remaining
> > capacity of 9887037760.000000
> >
> > I expect this is due to the memory reservation, but I'm not sure the
> exact
> > cause, if it is a problem, or if a parameter change might improve
> > operations. One theory is that when looking to do reservations on
> hundreds
> > of jobs, by the time it gets part way through the list the memory that
> > would have been reserved in consumable resource has been allocated to
> > another job, but I'm not sure as I don't see many hits on that log
> message.
> > (update: just found
> > http://arc.liv.ac.uk/pipermail/sge-bugs/2016-February.txt)
> > I don't know if this is a root cause of our problems leaving cores idle
> as
> > we see some of these even when everything is running fine.
>
> That message was from me.  In that case numbers were off by a fraction due
> AFACT to
> the qmaster and scheduler rounding differently.  The examples you quote
> are very
> different.  Not the same problem I think.
>
> William
>
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users
>
>


-- 


echo "sysadmin know better bash than english"|sed s/min/mins/ \
  | sed 's/better bash/bash better/' # signal detected in a CERN forum

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Large cluster with memory reservation leaving cores idle

Reply via email to