Hi, yesterday I upgraded Slurm from 2.5.3 to 14.11 pre-1 (i.e. the current git clone yesterday). The installation went just fine after I updated the spec file to the proper contents (rpmbuild doesn't like spaces in the spec file Name: etc definitions so having "See META file" will break the rpmbuild). However the one SL5.7 node we have and that worked just fine with slurm 2.5.3 now segfaults for every slurm command.
Here's the backtrace: # gdb /usr/bin/squeue GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-37.el5_7.1) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/bin/squeue...done. (gdb) run Starting program: /usr/bin/squeue warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffb000 [Thread debugging using libthread_db enabled] Program received signal SIGSEGV, Segmentation fault. 0x0000000000459159 in _validate_and_set_defaults (file_name=<value optimized out>) at read_config.c:2836 2836 if ((strcmp(conf->crypto_type, "crypto/openssl") == 0) && (gdb) bt #0 0x0000000000459159 in _validate_and_set_defaults (file_name=<value optimized out>) at read_config.c:2836 #1 _init_slurm_conf (file_name=<value optimized out>) at read_config.c:2475 #2 0x000000000045b6fd in slurm_conf_init (file_name=0x0) at read_config.c:2528 #3 0x000000000042419d in main (argc=1, argv=0x0) at squeue.c:78 (gdb) We use AuthType auth/munge so I'm not quite sure why it segfaults on the conf->crypto_type comparison and even more I cannot fathom why it does it only on the SL5.7 node while it works just fine on all the SL6 nodes. I used the same tarball on SL6 and SL5 to create the RPMs using rpmbuild -ta slurm-14-11.pre1.tgz. Ideas are welcome as the SL5.7 node is one of the main user nodes where they create code and submit to cluster so it has to work even though the full rest of the cluster works fine. The config btw is shared over NFS so it is identical on all nodes. Mario Kadastik, PhD Senior researcher --- "Physics is like sex, sure it may have practical reasons, but that's not why we do it" -- Richard P. Feynman
