somewhere got corrupted. Think the new
mission will be to try to reset the correct state file and try again or
if that fails - clean it with fire! ;-)
Regards,
Alexander Åhman
Den 2019-05-29 kl. 19:23, skrev Alex Chekholko:
I think this error usually means that on your node cn7 it has either
for-large-networks
Best regards,
Ole
On 5/29/19 3:00 PM, Alexander Åhman wrote:
Hi,
Have a very strange problem. The cluster has been working just fine
until one node died and now I can't submit jobs to 2 of the nodes
using srun from the login machine. Using sbatch works just fine an
nformation about what the problem is or
what to do to resolve it.
Any kind soul out there who knows what to do next?
Regards,
Alexander Åhman
0:0 1096K 171896K
billing=1,cpu=1,mem=2G,node=1
911081.0 OUT_OF_ME+ 0:125 1216K 240648K
cpu=1,mem=2G,node=1
Memory is complicated stuff...
Regards,
Alexander
On 03/05/2019 08:20, Janne Blomqvist wrote:
On 02/05/2019 17.53, Alexander Åhman wrote:
Hi,
Is it possible to configure slurm/cgroups in such way that jobs that are
using more memory than they asked for are not killed if there still are
free memory available on the compute node? When free memory gets low
these jobs can be killed as usual.
Today when a job has exceeded its