[slurm-dev] Re: Unexpected change between versions 14.11.3 and 14.11.6

2015-05-08 Thread Lennart Karlsson
On 05/07/2015 05:24 PM, Moe Jette wrote: Configure a DefMemPerCPU. The default memory size for a job is all memory on the node, which was previously not accounted for properly to prevent memory oversubscription. Thank you very much for your quick answer! That fixed the problem for all new

[slurm-dev] Re: bug fix in task error reporting

2015-05-08 Thread Moe Jette
Thank you for your contribution to Slurm. A slight variation of your patch will be in version 14.11.7 when released in late May. The commit is here: https://github.com/SchedMD/slurm/commit/bf81e826a2f7bc752d3239ea724e35ce2867a052 Quoting Jonathon Nelson jdnel...@dyn.com: slurmstepd/task.c

[slurm-dev] bug fix in task error reporting

2015-05-08 Thread Jonathon Nelson
slurmstepd/task.c does not properly save errno and may overwrite it before using %m in error messages. diff --git a/src/slurmd/slurmstepd/task.c b/src/slurmd/slurmstepd/task.c index 186b6ad..a24d997 100644 --- a/src/slurmd/slurmstepd/task.c +++ b/src/slurmd/slurmstepd/task.c @@ -367,6 +367,7 @@

[slurm-dev] RE: launching a variable number of jobs in slurm

2015-05-08 Thread Scharfenberg, Buddy Lee
Trevor, It does depend a bit on the configuration of your cluster, however it sounds like what you need to do is create a job submission file that requests enough resources for one of your jobs and then submit them as an array. Read the man page on sbatch to determine what switches you need

[slurm-dev] launching a variable number of jobs in slurm

2015-05-08 Thread Trevor Gale
Hello everyone, I’m developing a piece of software that runs fault injection simulations on a cluster running slurm and am trying to figure out the ideal method for launching a potentially massive amount of jobs. I’m not very familiar with slurm, and had a questions about how slurm allocates

[slurm-dev] RE: launching a variable number of jobs in slurm

2015-05-08 Thread Scharfenberg, Buddy Lee
This is how I recommend to users of our cluster to pass variable parameters to a slurm array. In your batch file set a variable that reads a line of a data file for parameters. e.g. PARAMETERS=$(awk -v line=${SLURM_ARRAY_TASK_ID} '{if (NR == line) { print $0; };}' ./data.dat) What this does

[slurm-dev] i/o and network limits and monitoring using cgroups

2015-05-08 Thread Igor Kozin
http://slurm.schedmd.com/pdfs/LCS_cgroups_BULL.pdf was an interesting read. I'd assume that cgroup/devices subsystem is fully functioning now and GPUs and Phi are supported. But what about i/o and network usage limits, monitoring and reporting? NFS or Lustre systems? Energy? Dr Igor Kozin |