[slurm-dev] Re: Scheduling jobs according to the CPU load

kesim Sun, 19 Mar 2017 07:37:02 -0700

Dear Will,

I am not trying here to diminish the value of slurm. I only want to find
the solution for the trivial problem. I also think that slurm was design
for HPC and it is performing well in such env. I agree with you that my
env. hardly qualifies as HPC but still one of the simplest concept behind
any scheduler is to not overload some nodes when the others are idling -
can it really be by design? I cannot also speak for developers but it
probably needs a few lines of code to add this feature considering that the
data is already collected. As far as I understand there is no default slurm
installation - you have to adopt it to your env. and it is quite flexible.
I tried a lot but unfortunately I failed to achieve my simple goal.


Best regards,

Ketiw

On Sun, Mar 19, 2017 at 2:56 PM, Will French <w...@accre.vanderbilt.edu>
wrote:

> Just because the scheduler does not do what you want or expect by default
> does not make this a bug. A bug would imply some unexpected behavior due to
> an error or unanticipated condition within the SLURM source code. I can’t
> speak for the developers, but it might be that this default behavior you
> keep referring to as a “bug” was an intentional design decision for
> efficiency reasons. Job scheduling is an incredibly complex task and by
> almost all metrics SLURM is currently the most efficient.
>
> Also consider that SLURM was designed for massive HPC environments and
> that your setup is a significant departure from this. It is not
> unreasonable at all that you need to alter the default configuration of
> SLURM in order to run in a setup involving workstations, interactive use,
> processes unmanaged by SLURM, etc. I suspect that is a pretty massive
> departure from the use case SLURM was targeting when it was initially
> developed.
>
> Will
>
>
> On Mar 19, 2017, at 7:26 AM, kesim <ketiw...@gmail.com> wrote:
>
> I have 11 nodes and declared 7 CPUs per node. My setup is such that all
> desktop belongs to group members who are using them mainly as graphics
> stations. Therefore from time to time an application is requesting high CPU
> usage. Firefox can do it easily. We also have applications which were
> compiled with intel MPI and the whole setup is mainly for them. I would
> like my scheduler to fully fill nodes with tasks but starting from idling
> nodes. Let say I looked a the CPU load of my nodes (sinfo -N -o '%N %O %C
> will do that)  and since 2 nodes have load ~2 (usually it more or less
> means that they use 2 processors at 100%) I want to use 73 instead 77
> available processors and my simple minded understanding is that in the node
> which have CPU load ~2 the two processors will be the last to allocate even
> though they are technically available. This what a scheduler should do on
> its own without my intervention. However sadly it is not what happens. If I
> request 73 processors the scheduler does not take into account the real CPU
> load and it is filling the nodes alphabetically. Since sinfo is aware of
> the CPU load slurm should take it into account when filling nodes and this
> is a serious bug that it is not doing that.
> I use slurm 17.02.1-2 in the Ubutntu 16.04 environment.
>
>
>
> On Sun, Mar 19, 2017 at 11:26 AM, TO_Webmaster <luftha...@gmail.com>
> wrote:
>
>>
>> Please remember that might lead to a huge waste of resources. Imagine
>> you have a cluster with 10 nodes with 10 cores eachs. Then somebody
>> submits 10 jobs requesting 1 core per job. If I understand you
>> correctly, you would like to see one job per node then? Now imagine
>> someone else submits 9 jobs requesting the nodes exclusively. Then
>> none of these 9 jobs can start, because there is one job with one core
>> on each node. If the former 10 jobs had been packed on one node, all
>> of the latter 9 jobs could have started immediately.
>>
>> 2017-03-18 19:06 GMT+01:00 kesim <ketiw...@gmail.com>:
>> > Dear John,
>> >
>> > Thank you for your answer. Obviously you are right that I could slurm up
>> > everything and thus avoid the issue and your points are taken. However,
>> I
>> > still insist that it is a serious bug not to take into account the
>> actual
>> > CPU load when the scheduler submit a job regardless whose fault it is
>> that a
>> > non-slurm job is running. I would not suspect that from even simplest
>> > scheduler and if I had such prior knowledge I would not invest so much
>> time
>> > and effort  to setup slurm.
>> > Best regards,
>> >
>> > Ketiw
>> >
>> > On Sat, Mar 18, 2017 at 5:42 PM, John Hearns <john.hea...@xma.co.uk>
>> wrote:
>> >>
>> >>
>> >> Kesim,
>> >>
>> >> what you are saying is that Slurm schedukes tasks based on the number
>> of
>> >> allocated CPUs. Rather than the actual load factor on the server.
>> >> As I recall Gridengine actually used the load factor.
>> >>
>> >> However you comment that "users run programs on the nodes" and "the
>> slurm
>> >> is aware about the load of non-slurm jobs"
>> >> IMHO, in any well-run HPC setup any user running jobs without using the
>> >> scheduler would have their fingers broken. or at least bruised using
>> the
>> >> clue stick.
>> >>
>> >> Seriously, three points:
>> >>
>> >> a) tell users to use 'salloc' and 'srun'  to run interactive jobs. They
>> >> can easily open a Bash session on a compute node and do what they like.
>> >> Under the Slurm scheduler.
>> >>
>> >> b) implement the pam-slurm PAMmodule. It is a few minutes work. This
>> means
>> >> your users cannot go behind the sluem scheduler and log into the nodes
>> >>
>> >> c) on Bright clusters which I configure, you have a healtcheck running
>> >> which wans you when a user is detected as logging in withotu using
>> Slurm
>> >>
>> >>
>> >> Seriously again. You have implemented an HPC infrastructure, and have
>> gone
>> >> to the time and effort to implement a batch scheduling system.
>> >> A batch scheduler can be adapted to let your users do their jobs,
>> >> including interactive shell sessions and remote visualization sessions.
>> >> Do not let the users ride roughshod over you.
>> >>
>> >> ________________________________________
>> >> From: kesim [ketiw...@gmail.com]
>> >> Sent: 18 March 2017 16:16
>> >> To: slurm-dev
>> >> Subject: [slurm-dev] Re: Fwd: Scheduling jobs according to the CPU load
>> >>
>> >> Unbelievable but it seems that nobody knows how to do that. It is
>> >> astonishing that such sophisticated system fails with such simple
>> problem.
>> >> The slurm is aware about the cpu load of non-slurm jobs but it does
>> not use
>> >> the info. My original understanding of LLN was apparently correct. I
>> can
>> >> practically kill the CPUs on particular node with nonslurm tasks but
>> slurm
>> >> will diligently submit 7 jobs to this node leaving other idling.  I
>> consider
>> >> this as a serious bug of this program.
>> >>
>> >>
>> >> On Fri, Mar 17, 2017 at 10:32 AM, kesim
>> >> <ketiw...@gmail.com<mailto:ketiw...@gmail.com>> wrote:
>> >> Dear All,
>> >> Yesterday I did some tests and it seemed that the scheduling is
>> following
>> >> CPU load but I was wrong.
>> >> My configuration is at the moment:
>> >> SelectType=select/cons_res
>> >> SelectTypeParameters=CR_CPU,CR_LLN
>> >>
>> >> Today I submitted 70 threaded jobs to the queue and here is the
>> CPU_LOAD
>> >> info
>> >> node1         0.08          7/0/0/7
>> >> node2        0.01          7/0/0/7
>> >> node3        0.00          7/0/0/7
>> >> node4        2.97          7/0/0/7
>> >> node5       0.00          7/0/0/7
>> >> node6         0.01          7/0/0/7
>> >> node7      0.00          7/0/0/7
>> >> node8       0.05          7/0/0/7
>> >> node9        0.07          7/0/0/7
>> >> node10        0.38          7/0/0/7
>> >> node11     0.01          0/7/0/7
>> >> As you can see it allocated 7 CPUs on node 4 with CPU_LOAD 2.97 and 0
>> CPUs
>> >> on idling node11. Why such simple thing is not a default? What am I
>> >> missing???
>> >>
>> >> On Thu, Mar 16, 2017 at 7:53 PM, kesim
>> >> <ketiw...@gmail.com<mailto:ketiw...@gmail.com>> wrote:
>> >> Than you for great suggestion. It is working! However the description
>> of
>> >> CR_LLN is misleading "Schedule resources to jobs on the least loaded
>> nodes
>> >> (based upon the number of idle CPUs)" Which I understood that if the
>> two
>> >> nodes has not fully allocated CPUs  the node with smaller number of
>> >> allocated CPUs will take precedence. Therefore the bracketed comment
>> should
>> >> be removed from the description.
>> >>
>> >> On Thu, Mar 16, 2017 at 6:24 PM, Paul Edmon
>> >> <ped...@cfa.harvard.edu<mailto:ped...@cfa.harvard.edu>> wrote:
>> >>
>> >> You should look at LLN (least loaded nodes):
>> >>
>> >> https://slurm.schedmd.com/slurm.conf.html
>> >>
>> >> That should do what you want.
>> >>
>> >> -Paul Edmon-
>> >>
>> >> On 03/16/2017 12:54 PM, kesim wrote:
>> >>
>> >> ---------- Forwarded message ----------
>> >> From: kesim <ketiw...@gmail.com<mailto:ketiw...@gmail.com>>
>> >> Date: Thu, Mar 16, 2017 at 5:50 PM
>> >> Subject: Scheduling jobs according to the CPU load
>> >> To: slurm-dev@schedmd.com<mailto:slurm-dev@schedmd.com>
>> >>
>> >>
>> >> Hi all,
>> >>
>> >> I am a new user and I created a small network of 11 nodes 7 CPUs per
>> node
>> >> out of users desktops.
>> >> I configured slurm as:
>> >> SelectType=select/cons_res
>> >> SelectTypeParameters=CR_CPU
>> >> When I submit a task with srun -n70 task
>> >> It will fill 10 nodes with 7 tasks/node. However, I have no clue what
>> is
>> >> the algorithm of choosing the nodes. Users run programs on the nodes
>> and
>> >> some nodes are more busy than others. It seems logical that the
>> scheduler
>> >> should submit the tasks to the less busy nodes but it is not the case.
>> >> In the sinfo -N -o '%N %O %C' I can see that the jobs are allocated to
>> the
>> >> node11 with the load 2.06 leaving the node4 which is totally idling.
>> That
>> >> somehow make no sense to me.
>> >> node1         0.00          7/0/0/7
>> >> node2        0.26          7/0/0/7
>> >> node3         0.54          7/0/0/7
>> >> node4        0.07          0/7/0/7
>> >> node5      0.00          7/0/0/7
>> >> node6        0.01          7/0/0/7
>> >> node7       0.00          7/0/0/7
>> >> node8       0.01          7/0/0/7
>> >> node9        0.06          7/0/0/7
>> >> node10      0.11          7/0/0/7
>> >> node11      2.06          7/0/0/7
>> >> How can I configure slurm to be able to fill the node with minimum load
>> >> first?
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Any views or opinions presented in this email are solely those of the
>> >> author and do not necessarily represent those of the company.
>> Employees of
>> >> XMA Ltd are expressly required not to make defamatory statements and
>> not to
>> >> infringe or authorise any infringement of copyright or any other legal
>> right
>> >> by email communications. Any such communication is contrary to company
>> >> policy and outside the scope of the employment of the individual
>> concerned.
>> >> The company will not accept any liability in respect of such
>> communication,
>> >> and the employee responsible will be personally liable for any damages
>> or
>> >> other liability arising. XMA Limited is registered in England and Wales
>> >> (registered no. 2051703). Registered Office: Wilford Industrial Estate,
>> >> Ruddington Lane, Wilford, Nottingham, NG11 7EP
>> >
>> >
>>
>
>
>

[slurm-dev] Re: Scheduling jobs according to the CPU load

Reply via email to