On Mon, Oct 10, 2016 at 02:39:21PM +0000, Yuri Burmachenko wrote: > We are using SoGE 8.1.8 and since recently approximately 2 months ago our > job schedule time raised up to 30-60 sec. > > > Any tips and advices where to look for the root cause and/or how can we > improve the situation, will be greatly appreciated. > It isn't clear to me if the cluster has ever run quickly with something like the current load or whether this is a decline in performance from a scheduler that used to handle load better.
You could enable PROFILE in the schedulers params to get a bit more info on what is going on. Check if people are soft requesting resources/queues as these are supposed to slow it down a lot. Try to run the qmaster on a dedicated machine or if that isn't possible on dedicated cores. You could also try setting JC_FILTER in params even though it is officially deprecated. We have two clusters run centrally at UCL. The main difference betwween them being that the newer one has a uniform flat infiniband network and identical hosts. The older cluster has a lot of different hardware and isolated infiniband islands. The infiniband islands each have their own PE which is matched by a PE wildcard in job submissions. The older cluster takes a lot longer to schedule. It sounds like you might have a similarly complex config. William
signature.asc
Description: Digital signature
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users