Thank you all for your help and for your suggestions, that's really appreciated! I will give a try to the cgroups plugins and to the --exclusive option and experiment with slurm to find out what's more appropriate for my server.

On 06/23/2015 02:59 PM, Morris Jette wrote:
See the task/cgroups plugin for constraining jobs to specific CPUs and memory. Also see --exclusive option in srun.

On June 23, 2015 4:44:27 AM PDT, "Rémi Piatek" <[email protected]> wrote:

    I had considered this simple explanation, but it seemed unlikely to me,
    as it would imply that we completely have to rely on users to specify
    correctly the number of CPUs they need. People use my server for
    CPU-intensive jobs, so it is important for me to make sure that
    resources are fairly shared. I was hoping slurm would allow me to do
    this, and prevent people from free-riding (so far, it would be easy to
    request a small number of CPUs and use a much larger number, thus
    slowing down the other users).

    I read that when jobs exceed the memory requested and allocated by
    slurm, they are automatically interrupted. Is there nothing similar for
    the use of CPUs?

    Thanks for the help! Much appreciated.


    On 06/23/2015 12:05 PM, Loris Bennett wrote:

        Rémi Piatek <[email protected]> writes:

            Hello, I am getting started with SLURM and I am having a
            hard time understanding how it allocates CPUs to users
            depending on the resources they request. The problem I am
            facing can be summarized as follows. Consider a bash
            script test.sh <http://test.sh> that requests 8 CPUs but
            actually starts a job that uses 10 CPUs: #!/bin/sh #SBATCH
            --ntasks=8 stress -c 10 On a server with 32 CPUs, if I
            start 5 times this script with sbatch test.sh
            <http://test.sh>, 4 of them start running right away and
            the last one appears as pending, as shown by the squeue
            command: JOBID PARTITION NAME USER ST TIME NODES
            NODELIST(REASON) 5 main test.sh <http://test.sh> jack PD
            0:00 1 (Resources) 1 main test.sh <http://test.sh> jack R
            0:08 1 server 2 main test.sh <http://test.sh> jack R 0:08
            1 server 3 main test.sh <http://test.sh> jack R 0:05 1
            server 4 main test.sh <http://test.sh> jack R 0:05 1
            server The problem is that these 4 jobs are actually using
            40 CPUs and overload the server. I would on the contrary
            expect SLURM to either not start the jobs that are
            actually using more resources than requested by the user,
            or to put them on hold until there are enough resources to
            start them. How can I make sure that the users of my
            server do not start jobs that use too many CPUs? Some
            useful details about my slurm.conf file: # SCHEDULING
            #DefMemPerCPU=0 FastSchedule=1 #MaxMemPerCPU=0
            SchedulerType=sched/backfill SchedulerPort=7321
            SelectType=select/cons_res SelectTypeParameters=CR_CPU #
            COMPUTE NODES NodeName=server CPUs=32 RealMemory=10000
            State=UNKNOWN # PARTITIONS PartitionName=main Nodes=server
            Default=YES Shared=YES MaxTime=INFINITE State=UP I am
            probably making a trivial mistake in the configuration
            file, of just misunderstanding a basic concept of SLURM.
            Any help or advice would be much appreciated. Many thanks
in advance!
        Slurm just keeps track of how many cores have been assigned to
        running jobs - it doesn't check how many processes are
        actually started within a given job. So, it is up to the user
        to make sure she starts the correct number of processes.
        Cheers, Loris



--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Reply via email to