On Wednesday, 07 May 2014, at 01:42:50 (-0700), Mario Kadastik wrote: > yesterday I upgraded Slurm from 2.5.3 to 14.11 pre-1 (i.e. the > current git clone yesterday). The installation went just fine after > I updated the spec file to the proper contents (rpmbuild doesn't > like spaces in the spec file Name: etc definitions so having "See > META file" will break the rpmbuild). However the one SL5.7 node we > have and that worked just fine with slurm 2.5.3 now segfaults for > every slurm command. > > Here's the backtrace: > # gdb /usr/bin/squeue > GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-37.el5_7.1) > Copyright (C) 2009 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-redhat-linux-gnu". > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>... > Reading symbols from /usr/bin/squeue...done. > (gdb) run > Starting program: /usr/bin/squeue > warning: no loadable sections found in added symbol-file system-supplied DSO > at 0x7ffff7ffb000 > [Thread debugging using libthread_db enabled] > > Program received signal SIGSEGV, Segmentation fault. > 0x0000000000459159 in _validate_and_set_defaults (file_name=<value optimized > out>) at read_config.c:2836 > 2836 if ((strcmp(conf->crypto_type, "crypto/openssl") == 0) && > (gdb) bt > #0 0x0000000000459159 in _validate_and_set_defaults (file_name=<value > optimized out>) at read_config.c:2836 > #1 _init_slurm_conf (file_name=<value optimized out>) at read_config.c:2475 > #2 0x000000000045b6fd in slurm_conf_init (file_name=0x0) at > read_config.c:2528 > #3 0x000000000042419d in main (argc=1, argv=0x0) at squeue.c:78 > (gdb) > > We use AuthType auth/munge so I'm not quite sure why it segfaults on > the conf->crypto_type comparison and even more I cannot fathom why > it does it only on the SL5.7 node while it works just fine on all > the SL6 nodes. I used the same tarball on SL6 and SL5 to create the > RPMs using rpmbuild -ta slurm-14-11.pre1.tgz. > > Ideas are welcome as the SL5.7 node is one of the main user nodes > where they create code and submit to cluster so it has to work even > though the full rest of the cluster works fine. The config btw is > shared over NFS so it is identical on all nodes.
Try this: rpmbuild -D 'with_cflags CFLAGS="-O0 -g3"' -ta slurm-14-11.pre1.tgz Michael -- Michael Jennings <[email protected]> Senior HPC Systems Engineer High-Performance Computing Services Lawrence Berkeley National Laboratory Bldg 50B-3209E W: 510-495-2687 MS 050B-3209 F: 510-486-8615
