On Mon, Oct 10, 2016 at 02:39:21PM +0000, Yuri Burmachenko wrote:
>    We are using SoGE 8.1.8 and since recently approximately 2 months ago our
>    job schedule time raised up to 30-60 sec.
>     
> 
>    Any tips and advices where to look for the root cause and/or how can we
>    improve the situation, will be greatly appreciated.
> 
It isn't clear to me if the cluster has ever run quickly with something like 
the current load
or whether this is a decline in performance from a scheduler that used to 
handle load better.

You could enable PROFILE in the schedulers params to get a bit more info on 
what is going on.

Check if people are soft requesting resources/queues as these are supposed to 
slow it down a lot.

Try to run the qmaster on a dedicated machine or if that isn't possible on 
dedicated cores.

You could also try setting JC_FILTER in params even though it is officially 
deprecated.

We have two clusters run centrally at UCL.  The main difference betwween them 
being that the newer 
one has a uniform flat infiniband network and identical hosts.  The older 
cluster has a lot of 
different hardware and isolated infiniband islands.  The infiniband islands 
each have their own 
PE which is matched by a PE wildcard in job submissions.  The older cluster 
takes a lot
longer to schedule.  It sounds like you might have a similarly complex config.


William

Attachment: signature.asc
Description: Digital signature

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to