DefMemPerCPU in slurm.conf is what I use.
On Wed, Mar 22, 2017 at 4:52 PM, Cyrus Proctor <cproc...@tacc.utexas.edu> wrote: > Merlin, thanks for the insight. > > I set: > #SBATCH --mem=1G > > That is all I needed to get it to share. > How can I set the default to use only the memory attached to the particular > socket the job is running on and have the default memory to be set to that > value (64GB in my case)? > > I think I've done it (sort of) with: > PartitionName=normal Nodes=d0[1,2] Default=YES OverSubscribe=FORCE:2 > SelectTypeParameters=CR_Socket_Memory QoS=part_shared MaxCPUsPerNode=28 > DefMemPerCPU=4590 MaxMemPerCPU=4590 MaxTime=48:00:00 State=UP > > > On 03/22/2017 12:04 PM, Merlin Hartley wrote: > > Hi Cyrus > > I think you should specify the memory requirements in your sbatch script - > the default would be to allocate all the memory for a node - thus ‘filling’ > it even with a 1 cpu job. > #SBATCH --mem 1G > > Hope this helps! > > > Merlin > -- > Merlin Hartley > Computer Officer > MRC Mitochondrial Biology Unit > Cambridge, CB2 0XY > United Kingdom > > On 22 Mar 2017, at 16:20, Cyrus Proctor <cproc...@tacc.utexas.edu> wrote: > > > Hi all, > > Any thoughts at all on this would be most helpful. I'm not sure where to go > from here to get overcommitted nodes working properly. > > Thank you, > Cyrus > > On 03/17/2017 11:39 AM, Cyrus Proctor wrote: > > Hello, > > I currently have a small cluster for testing. Each compute node contains 2 > sockets with 14 cores per CPU and a total of 128 GB RAM. I would like to set > up Slurm such that two jobs can simultaneously share one compute node, > effectively giving 1 socket (with binding) and half the total memory to each > job. > > I've tried several iterations of settings, to no avail. It seems that > whatever I try, I am still only allowed to run one job per node (blocked by > "resources" reason). I am running Slurm 17.02.1-2, and I am attaching my > slurm.conf as well as cgroup.conf files. System information includes: > # uname -r > 3.10.0-514.10.2.el7.x86_64 > # cat /etc/centos-release > CentOS Linux release 7.3.1611 (Core) > > I am also attaching logs for slurmd (slurmd.d01.log) and slurmctld > (slurmctld.log) as I submit three jobs (batch.slurm) in rapid succession. > With two compute nodes available, I would hope that all three start > together. Instead, two begin and one waits until a node becomes idle to > start. > > There is likely extra "crud" in the config files simply from prior failed > attempts. I'm happy to take out / reconfigure as necessary but not sure what > exactly is the right combination of settings to get this to work. I'm hoping > that's where you all can help. > > Thanks, > Cyrus > > >