Hello all,

I found a "bug" in slurmctld. The problem is simple.

If an administrator sets the MaxMemPerCPU in slurm.conf to 0 (maybe because
thinking of disable it), and then a user who wants to set their memory
requeriments
like: srun --mem=1000 , a slurmctld crash takes place.

The kernel message is:
slurmctld[9152] trap divide error ip:43ea90 sp:7fa643bfa280 error:0 in
slurmctld[400000+174000]

It also generates a core dump but in this case I thing it's clear what is
happening.

Imho the slurmctld should control the divide by 0 case and act consequently,
not crashing.

Best regards.

Reply via email to