Your patch looks good. It will be included in version 14.03.1 when released, within a few days. The commit is here:
https://github.com/SchedMD/slurm/commit/45c92639711e4ccb11274ae0c02bab319c4812bc

Thanks!

Quoting Franco Broi <[email protected]>:

Here's a patch, only lightly tested.

--- slurm-14.03.0/src/slurmctld/job_mgr.c       2014-03-27 03:57:22.000000000 
+0800
+++ slurm-14.03.patch/src/slurmctld/job_mgr.c 2014-04-11 16:18:22.234954705 +0800
@@ -6966,12 +6966,12 @@
 {
        uint16_t shared = 0;

-       if (detail_ptr->share_res == 1)
+       if(!detail_ptr)
+               shared = (uint16_t) NO_VAL;
+       else if (detail_ptr->share_res == 1)
                shared = 1;
        else if (detail_ptr->whole_node == 1)
                shared = 0;
-       else
-               shared = (uint16_t) NO_VAL;

        if (protocol_version >= SLURM_14_03_PROTOCOL_VERSION) {
                if (detail_ptr) {



On Fri, 2014-04-11 at 00:50 -0700, Franco Broi wrote:
Hi

Were getting frequent repeatable crashes when users input jobs with
sbatch arguments like this -p c2 -N 15-16 -c 6 --no-requeue -H

Here's a traceback of the controller daemon.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fa7839f9700 (LWP 8038)]
_pack_pending_job_details (detail_ptr=0x0, buffer=0x7fa784001e48, protocol_version=6912) at job_mgr.c:6969
6969            if (detail_ptr->share_res == 1)
(gdb) where
#0 _pack_pending_job_details (detail_ptr=0x0, buffer=0x7fa784001e48, protocol_version=6912) at job_mgr.c:6969 #1 0x00000000004432dd in pack_job (dump_job_ptr=0x7fa784001768, show_flags=<value optimized out>, buffer=0x7fa784001e48, protocol_version=<value optimized out>, uid=<value optimized out>)
    at job_mgr.c:6764
#2 0x000000000044390b in pack_all_jobs (buffer_ptr=0x7fa7839f8e10, buffer_size=0x7fa7839f8e48, show_flags=0, uid=1380, filter_uid=4294967294, protocol_version=6912) at job_mgr.c:6274 #3 0x0000000000470b92 in _slurm_rpc_dump_jobs (msg=0x7fa7840008c8) at proc_req.c:1076
#4  slurmctld_req (msg=0x7fa7840008c8) at proc_req.c:209
#5 0x000000000042fe58 in _service_connection (arg=0x7fa7c8000ca8) at controller.c:1075
#6  0x00000038c4c06ccb in start_thread () from /lib64/libpthread.so.0
#7  0x00000038c48e0c2d in clone () from /lib64/libc.so.6


Cheers,

Reply via email to