[slurm-dev] Specify GRES Type with MaxTRESPerNode

2017-05-23 Thread nico.faerber
Hi, Is it possible to specify the GRES type with MaxTRESPerNode? E.g: slurm.conf (…) GresType=gpu (…) NodeName=gpu[01,02] NodeAddr=10.2.1.[1,2] Gres=gpu:tesla:2,gpu:kepler:2 ... (…) PartitionName=gpu_one Nodes=gpu[01,02] QOS=part_gpu_one … PartitionName=gpu_two Nodes=gpu[01,02] QOS=part_gpu_two

[slurm-dev] knl_generic only for specific nodes

2017-05-19 Thread nico.faerber
Hi, I have a question regarding the knl_generic plugin. Of all our nodes only 4 are KNL nodes (in their own partition), but it seems that the syscfg tool must be installed also on the non-KNL nodes, otherwise the log files are filled with messages like: error: node_features_p_node_state:

[slurm-dev] RealMemory setting for KNL nodes

2017-05-18 Thread nico.faerber
Hi, With the possibility of rebooting KNL nodes in different MCDRAM modes (flat,cache,…) I am wondering what is the recommended approach to set RealMemory in slurm.conf for those nodes to always be able to use all the available memory? Here some background information: In flat mode,

[slurm-dev] Re: KNL node down after reboot

2017-05-17 Thread nico.faerber
Thank you for the input! @Ryan I didn’t’ check the code of the knl_generic plugin, but I would expect that in this case it should not represent an unexpected reboot from Slurm’s point of view. @Costin Increasing ResumeTimeout in slurm.conf worked. Although slurmd is still down after reboot,

[slurm-dev] KNL node down after reboot

2017-05-16 Thread nico.faerber
Hi, We want to introduce Intel Knights Landing (KNL) nodes into our cluster, and observed the following problem: Node reboots successfully with desired NUMA and MCDRAM modes, but remains in state down. From slurmctl.log: (…)[ 2017-05-16T15:16:10.437] _update_node_avail_features: nodes

[slurm-dev] Enforcing partition limits

2016-09-14 Thread nico.faerber
Hi all Background: I’m new to Slurm. I Installed Slurm 16.05 (Munge auth, MariaDB managed by slurmdbd). Users maintained by LDAP, no LDAP user imported into Sturm DB so far. bash$ sacctmgr show user User Def Acct Admin - -- -