Are you specifying a memory limit for your jobs? You haven't set a default
limit per cpu and slurm will allocate all the memory of a node if nothing
else is specified.

Regards,
Carlos Fenoy

On Sun, 12 Feb 2017, 22:54 Travis DePrato, <trav...@umich.edu> wrote:

> Yep! Doing everything I can think of, running scontrol reconfigure,
> restarting all the relevant daemons, can't seem to get it to work.
>
> On Sun, Feb 12, 2017 at 4:38 PM Lachlan Musicman <data...@gmail.com>
> wrote:
>
> On 12 February 2017 at 16:06, Travis DePrato <trav...@umich.edu> wrote:
>
> I've tried multiple variations of the SelectTypeParameters option (before
> sending this mail) to no success.
>
> Currently it's http://pastebin.com/ATcsvvtQ with
> SelectTypeParameters=CR_CPU_Memory
>
> I'm running 10 jobs, each single threaded/processed/etc., just sitting on
> "sleep 1000", but I can never get more than 8 to run at a time, and I still
> can't memory other than 1.
>
>
>
> I always ask the stupid questions: you are changing the conf, distributing
> that change to all nodes,  restarting slurmctld then running scontrol
> reconfigure?
>
>
> cheers
> L.
>
>
> ------
> The most dangerous phrase in the language is, "We've always done it this
> way."
>
> - Grace Hopper
>
>
>
>
>
> On Sat, Feb 11, 2017 at 3:26 AM Lachlan Musicman <data...@gmail.com>
> wrote:
>
> 1. As EV noted, to get Memory as a consumable resource, you will need to
> add it to the line that says CR_CPU - change to CR_CPU_Memory
> https://slurm.schedmd.com/slurm.conf.html
>
> 2. That's because of the CR_CPU combined with cons_res. Change to CR_CORE
> for per core or CR_SOCKET for per socket. For definitions of each, there's
> a hardware page:
>
> https://slurm.schedmd.com/cons_res.html
>
> but for the cpu/core/socket definition, I found the image at the top of
> this page very helpful
>
> https://slurm.schedmd.com/mc_support.html
>
> L.
>
> ------
> The most dangerous phrase in the language is, "We've always done it this
> way."
>
> - Grace Hopper
>
> On 11 February 2017 at 07:31, E V <eliven...@gmail.com> wrote:
>
>
> man slurm.conf and search for cons_res, you need to make a change from
> the defaults. Don't remember the details ATM, but that should get you
> started.
>
> On Fri, Feb 10, 2017 at 2:42 PM, Travis DePrato <trav...@umich.edu> wrote:
> > For reference, slurm.conf: http://pastebin.com/XT6TvQhh
> >
> > I've been tasked with setting up a small cluster for a research group
> where
> > I work, despite knowing relatively little about HPC or clusters in
> general.
> > I've installed slurm on the eight compute nodes and the login node, but,
> I'm
> > having two issues currently:
> >
> > 1. I cannot specify a memory requirement other than --mem=1
> > Sample submission output with --mem=2: http://pastebin.com/5PY9N6n4
> >
> > 2. I cannot get nodes to execute more than one job at a time. The 9th
> job is
> > always queued with reason Resources. I think this is related to the lines
> >
> > scontrol: Consumable Resources (CR) Node Selection plugin loaded with
> > argument 17
> > scontrol: Serial Job Resource Selection plugin loaded with argument 17
> > scontrol: Linear node selection plugin loaded with argument 17
> >
> > because it seems like slurm is only allocating whole nodes at a time.
> >
> > Sorry if this is basic setup, but I've tried googling to no end.
> > --
> > Travis DePrato
> > Computer Science & Engineering
> > Math and Music Minors
> > Student at University of Michigan
> > Computer Consultant at EECS DCO
>
>
> --
> Travis DePrato
> Computer Science & Engineering
> Math and Music Minors
> Student at University of Michigan
> Computer Consultant at EECS DCO
>
> --
> Travis DePrato
> Computer Science & Engineering
> Math and Music Minors
> Student at University of Michigan
> Computer Consultant at EECS DCO
>

Reply via email to