Hi Carles,
On 02/28/2012 04:11 PM, Carles Fenoy wrote:
...
So, firstly jobs 1586 and 1587 had NumCPUs=1, than these jobs was
finished in parallel on node ts-sl5slurm, then their NumCPUs were
increased to maximum value (12 CPUs) automatically and then state
of 1587 was changed to PD, because all CPUs are allocated by
epilog of job 1586.
As far as I can see here, job 1587 waits untill the job 1586 finishes.
In the squeue output you added, 1586 is completing because, probably
because its epilog is running. When it finishes job 1587 starts. As
you have Shared=NO in your partition configuration, slurm considers
the job has used all the cpus in the node and sets NumCPUs=12.
So there are NO 2 jobs running in parallel, but sequentially because
of the non shared partition.
You right, "issue" was in shared parameter.
Now epilogs is running in parallel.
Thanks!
--
Best regards,
Taras