We're running SLURM 2.6.1 with the backfill scheduler enabled.
slurmctld is terminating with a SIGABRT every few hours when
_assert_bitstr_valid() fails due to a NULL part_ptr->node_bitmap.

A backtrace is below, and we have a core dump we can provide if it's useful.

--
Core was generated by `/usr/sbin/slurmctld'.
Program terminated with signal 6, Aborted.
#0  0x0000003b2a6328a5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install slurm-2.6.1-1.el6.x86_64
(gdb) bt
#0  0x0000003b2a6328a5 in raise () from /lib64/libc.so.6
#1  0x0000003b2a634085 in abort () from /lib64/libc.so.6
#2  0x0000003b2a62ba1e in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000003b2a62bae0 in __assert_fail () from /lib64/libc.so.6
#4  0x00000000004a31bb in bit_and (b1=<value optimized out>, b2=<value 
optimized out>) at bitstring.c:546
#5  0x00007f1bbe233488 in _attempt_backfill () at backfill.c:804
#6  0x00007f1bbe2344cb in backfill_agent (args=<value optimized out>) at 
backfill.c:498
#7  0x0000003b2aa07851 in start_thread () from /lib64/libpthread.so.0
#8  0x0000003b2a6e890d in clone () from /lib64/libc.so.6
--

-- 
John Morrissey          _o            /\         ----  __o
[email protected]        _-< \_          /  \       ----  <  \,
www.horde.net/    __(_)/_(_)________/    \_______(_) /_(_)__

Reply via email to