[slurm-dev] Re: Exclusive socket configuration help

E V Thu, 23 Mar 2017 07:31:23 -0700

DefMemPerCPU in slurm.conf is what I use.


On Wed, Mar 22, 2017 at 4:52 PM, Cyrus Proctor <cproc...@tacc.utexas.edu> wrote:
> Merlin, thanks for the insight.
>
> I set:
> #SBATCH --mem=1G
>
> That is all I needed to get it to share.
> How can I set the default to use only the memory attached to the particular
> socket the job is running on and have the default memory to be set to that
> value (64GB in my case)?
>
> I think I've done it (sort of) with:
> PartitionName=normal  Nodes=d0[1,2] Default=YES OverSubscribe=FORCE:2
> SelectTypeParameters=CR_Socket_Memory QoS=part_shared MaxCPUsPerNode=28
> DefMemPerCPU=4590 MaxMemPerCPU=4590 MaxTime=48:00:00 State=UP
>
>
> On 03/22/2017 12:04 PM, Merlin Hartley wrote:
>
> Hi Cyrus
>
> I think you should specify the memory requirements in your sbatch script -
> the default would be to allocate all the memory for a node - thus ‘filling’
> it even with a 1 cpu job.
> #SBATCH --mem 1G
>
> Hope this helps!
>
>
> Merlin
> --
> Merlin Hartley
> Computer Officer
> MRC Mitochondrial Biology Unit
> Cambridge, CB2 0XY
> United Kingdom
>
> On 22 Mar 2017, at 16:20, Cyrus Proctor <cproc...@tacc.utexas.edu> wrote:
>
>
> Hi all,
>
> Any thoughts at all on this would be most helpful. I'm not sure where to go
> from here to get overcommitted nodes working properly.
>
> Thank you,
> Cyrus
>
> On 03/17/2017 11:39 AM, Cyrus Proctor wrote:
>
> Hello,
>
> I currently have a small cluster for testing. Each compute node contains 2
> sockets with 14 cores per CPU and a total of 128 GB RAM. I would like to set
> up Slurm such that two jobs can simultaneously share one compute node,
> effectively giving 1 socket (with binding) and half the total memory to each
> job.
>
> I've tried several iterations of settings, to no avail. It seems that
> whatever I try, I am still only allowed to run one job per node (blocked by
> "resources" reason). I am running Slurm 17.02.1-2, and I am attaching my
> slurm.conf as well as cgroup.conf files. System information includes:
> # uname -r
> 3.10.0-514.10.2.el7.x86_64
> # cat /etc/centos-release
> CentOS Linux release 7.3.1611 (Core)
>
> I am also attaching logs for slurmd (slurmd.d01.log) and slurmctld
> (slurmctld.log) as I submit three jobs (batch.slurm) in rapid succession.
> With two compute nodes available, I would hope that all three start
> together. Instead, two begin and one waits until a node becomes idle to
> start.
>
> There is likely extra "crud" in the config files simply from prior failed
> attempts. I'm happy to take out / reconfigure as necessary but not sure what
> exactly is the right combination of settings to get this to work. I'm hoping
> that's where you all can help.
>
> Thanks,
> Cyrus
>
>
>

[slurm-dev] Re: Exclusive socket configuration help

Reply via email to