Hi

Were getting frequent repeatable crashes when users input jobs with
sbatch arguments like this -p c2 -N 15-16 -c 6 --no-requeue -H

Here's a traceback of the controller daemon.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fa7839f9700 (LWP 8038)]
_pack_pending_job_details (detail_ptr=0x0, buffer=0x7fa784001e48, 
protocol_version=6912) at job_mgr.c:6969
6969            if (detail_ptr->share_res == 1)
(gdb) where
#0  _pack_pending_job_details (detail_ptr=0x0, buffer=0x7fa784001e48, 
protocol_version=6912) at job_mgr.c:6969
#1  0x00000000004432dd in pack_job (dump_job_ptr=0x7fa784001768, 
show_flags=<value optimized out>, buffer=0x7fa784001e48, 
protocol_version=<value optimized out>, uid=<value optimized out>)
    at job_mgr.c:6764
#2  0x000000000044390b in pack_all_jobs (buffer_ptr=0x7fa7839f8e10, 
buffer_size=0x7fa7839f8e48, show_flags=0, uid=1380, filter_uid=4294967294, 
protocol_version=6912) at job_mgr.c:6274
#3  0x0000000000470b92 in _slurm_rpc_dump_jobs (msg=0x7fa7840008c8) at 
proc_req.c:1076
#4  slurmctld_req (msg=0x7fa7840008c8) at proc_req.c:209
#5  0x000000000042fe58 in _service_connection (arg=0x7fa7c8000ca8) at 
controller.c:1075
#6  0x00000038c4c06ccb in start_thread () from /lib64/libpthread.so.0
#7  0x00000038c48e0c2d in clone () from /lib64/libc.so.6


Cheers,

Reply via email to