We're running SLURM 2.6.1 with the backfill scheduler enabled. slurmctld is terminating with a SIGABRT every few hours when _assert_bitstr_valid() fails due to a NULL part_ptr->node_bitmap.
A backtrace is below, and we have a core dump we can provide if it's useful. -- Core was generated by `/usr/sbin/slurmctld'. Program terminated with signal 6, Aborted. #0 0x0000003b2a6328a5 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install slurm-2.6.1-1.el6.x86_64 (gdb) bt #0 0x0000003b2a6328a5 in raise () from /lib64/libc.so.6 #1 0x0000003b2a634085 in abort () from /lib64/libc.so.6 #2 0x0000003b2a62ba1e in __assert_fail_base () from /lib64/libc.so.6 #3 0x0000003b2a62bae0 in __assert_fail () from /lib64/libc.so.6 #4 0x00000000004a31bb in bit_and (b1=<value optimized out>, b2=<value optimized out>) at bitstring.c:546 #5 0x00007f1bbe233488 in _attempt_backfill () at backfill.c:804 #6 0x00007f1bbe2344cb in backfill_agent (args=<value optimized out>) at backfill.c:498 #7 0x0000003b2aa07851 in start_thread () from /lib64/libpthread.so.0 #8 0x0000003b2a6e890d in clone () from /lib64/libc.so.6 -- -- John Morrissey _o /\ ---- __o [email protected] _-< \_ / \ ---- < \, www.horde.net/ __(_)/_(_)________/ \_______(_) /_(_)__
