Your patch looks good. It will be included in version 14.03.1 when
released, within a few days. The commit is here:
https://github.com/SchedMD/slurm/commit/45c92639711e4ccb11274ae0c02bab319c4812bc
Thanks!
Quoting Franco Broi <[email protected]>:
Here's a patch, only lightly tested.
--- slurm-14.03.0/src/slurmctld/job_mgr.c 2014-03-27 03:57:22.000000000
+0800
+++ slurm-14.03.patch/src/slurmctld/job_mgr.c 2014-04-11
16:18:22.234954705 +0800
@@ -6966,12 +6966,12 @@
{
uint16_t shared = 0;
- if (detail_ptr->share_res == 1)
+ if(!detail_ptr)
+ shared = (uint16_t) NO_VAL;
+ else if (detail_ptr->share_res == 1)
shared = 1;
else if (detail_ptr->whole_node == 1)
shared = 0;
- else
- shared = (uint16_t) NO_VAL;
if (protocol_version >= SLURM_14_03_PROTOCOL_VERSION) {
if (detail_ptr) {
On Fri, 2014-04-11 at 00:50 -0700, Franco Broi wrote:
Hi
Were getting frequent repeatable crashes when users input jobs with
sbatch arguments like this -p c2 -N 15-16 -c 6 --no-requeue -H
Here's a traceback of the controller daemon.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fa7839f9700 (LWP 8038)]
_pack_pending_job_details (detail_ptr=0x0, buffer=0x7fa784001e48,
protocol_version=6912) at job_mgr.c:6969
6969 if (detail_ptr->share_res == 1)
(gdb) where
#0 _pack_pending_job_details (detail_ptr=0x0,
buffer=0x7fa784001e48, protocol_version=6912) at job_mgr.c:6969
#1 0x00000000004432dd in pack_job (dump_job_ptr=0x7fa784001768,
show_flags=<value optimized out>, buffer=0x7fa784001e48,
protocol_version=<value optimized out>, uid=<value optimized out>)
at job_mgr.c:6764
#2 0x000000000044390b in pack_all_jobs (buffer_ptr=0x7fa7839f8e10,
buffer_size=0x7fa7839f8e48, show_flags=0, uid=1380,
filter_uid=4294967294, protocol_version=6912) at job_mgr.c:6274
#3 0x0000000000470b92 in _slurm_rpc_dump_jobs (msg=0x7fa7840008c8)
at proc_req.c:1076
#4 slurmctld_req (msg=0x7fa7840008c8) at proc_req.c:209
#5 0x000000000042fe58 in _service_connection (arg=0x7fa7c8000ca8)
at controller.c:1075
#6 0x00000038c4c06ccb in start_thread () from /lib64/libpthread.so.0
#7 0x00000038c48e0c2d in clone () from /lib64/libc.so.6
Cheers,