Anyone? This is blocking one of our main nodes from submissions. Any ideas on 
what might cause this or how to debug further are welcome. 

On 07.05.2014, at 11:42, Mario Kadastik <[email protected]> wrote:

> 
> Hi,
> 
> yesterday I upgraded Slurm from 2.5.3 to 14.11 pre-1 (i.e. the current git 
> clone yesterday). The installation went just fine after I updated the spec 
> file to the proper contents (rpmbuild doesn't like spaces in the spec file 
> Name: etc definitions so having "See META file" will break the rpmbuild). 
> However the one SL5.7 node we have and that worked just fine with slurm 2.5.3 
> now segfaults for every slurm command. 
> 
> Here's the backtrace:
> # gdb /usr/bin/squeue 
> GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-37.el5_7.1)
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/bin/squeue...done.
> (gdb) run
> Starting program: /usr/bin/squeue 
> warning: no loadable sections found in added symbol-file system-supplied DSO 
> at 0x7ffff7ffb000
> [Thread debugging using libthread_db enabled]
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000000000459159 in _validate_and_set_defaults (file_name=<value optimized 
> out>) at read_config.c:2836
> 2836          if ((strcmp(conf->crypto_type, "crypto/openssl") == 0) &&
> (gdb) bt
> #0  0x0000000000459159 in _validate_and_set_defaults (file_name=<value 
> optimized out>) at read_config.c:2836
> #1  _init_slurm_conf (file_name=<value optimized out>) at read_config.c:2475
> #2  0x000000000045b6fd in slurm_conf_init (file_name=0x0) at 
> read_config.c:2528
> #3  0x000000000042419d in main (argc=1, argv=0x0) at squeue.c:78
> (gdb) 
> 
> We use AuthType auth/munge so I'm not quite sure why it segfaults on the 
> conf->crypto_type comparison and even more I cannot fathom why it does it 
> only on the SL5.7 node while it works just fine on all the SL6 nodes. I used 
> the same tarball on SL6 and SL5 to create the RPMs using rpmbuild -ta 
> slurm-14-11.pre1.tgz. 
> 
> Ideas are welcome as the SL5.7 node is one of the main user nodes where they 
> create code and submit to cluster so it has to work even though the full rest 
> of the cluster works fine. The config btw is shared over NFS so it is 
> identical on all nodes. 
> 
> Mario Kadastik, PhD
> Senior researcher
> 
> ---
>  "Physics is like sex, sure it may have practical reasons, but that's not why 
> we do it" 
>     -- Richard P. Feynman

Mario Kadastik, PhD
Senior researcher

---
  "Physics is like sex, sure it may have practical reasons, but that's not why 
we do it" 
     -- Richard P. Feynman

Reply via email to