Hi,

yesterday I upgraded Slurm from 2.5.3 to 14.11 pre-1 (i.e. the current git 
clone yesterday). The installation went just fine after I updated the spec file 
to the proper contents (rpmbuild doesn't like spaces in the spec file Name: etc 
definitions so having "See META file" will break the rpmbuild). However the one 
SL5.7 node we have and that worked just fine with slurm 2.5.3 now segfaults for 
every slurm command. 

Here's the backtrace:
# gdb /usr/bin/squeue 
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-37.el5_7.1)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/squeue...done.
(gdb) run
Starting program: /usr/bin/squeue 
warning: no loadable sections found in added symbol-file system-supplied DSO at 
0x7ffff7ffb000
[Thread debugging using libthread_db enabled]

Program received signal SIGSEGV, Segmentation fault.
0x0000000000459159 in _validate_and_set_defaults (file_name=<value optimized 
out>) at read_config.c:2836
2836            if ((strcmp(conf->crypto_type, "crypto/openssl") == 0) &&
(gdb) bt
#0  0x0000000000459159 in _validate_and_set_defaults (file_name=<value 
optimized out>) at read_config.c:2836
#1  _init_slurm_conf (file_name=<value optimized out>) at read_config.c:2475
#2  0x000000000045b6fd in slurm_conf_init (file_name=0x0) at read_config.c:2528
#3  0x000000000042419d in main (argc=1, argv=0x0) at squeue.c:78
(gdb) 

We use AuthType auth/munge so I'm not quite sure why it segfaults on the 
conf->crypto_type comparison and even more I cannot fathom why it does it only 
on the SL5.7 node while it works just fine on all the SL6 nodes. I used the 
same tarball on SL6 and SL5 to create the RPMs using rpmbuild -ta 
slurm-14-11.pre1.tgz. 

Ideas are welcome as the SL5.7 node is one of the main user nodes where they 
create code and submit to cluster so it has to work even though the full rest 
of the cluster works fine. The config btw is shared over NFS so it is identical 
on all nodes. 

Mario Kadastik, PhD
Senior researcher

---
  "Physics is like sex, sure it may have practical reasons, but that's not why 
we do it" 
     -- Richard P. Feynman

Reply via email to