On Wednesday, 07 May 2014, at 01:42:50 (-0700),
Mario Kadastik wrote:

> yesterday I upgraded Slurm from 2.5.3 to 14.11 pre-1 (i.e. the
> current git clone yesterday). The installation went just fine after
> I updated the spec file to the proper contents (rpmbuild doesn't
> like spaces in the spec file Name: etc definitions so having "See
> META file" will break the rpmbuild). However the one SL5.7 node we
> have and that worked just fine with slurm 2.5.3 now segfaults for
> every slurm command.
> 
> Here's the backtrace:
> # gdb /usr/bin/squeue 
> GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-37.el5_7.1)
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/bin/squeue...done.
> (gdb) run
> Starting program: /usr/bin/squeue 
> warning: no loadable sections found in added symbol-file system-supplied DSO 
> at 0x7ffff7ffb000
> [Thread debugging using libthread_db enabled]
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000000000459159 in _validate_and_set_defaults (file_name=<value optimized 
> out>) at read_config.c:2836
> 2836          if ((strcmp(conf->crypto_type, "crypto/openssl") == 0) &&
> (gdb) bt
> #0  0x0000000000459159 in _validate_and_set_defaults (file_name=<value 
> optimized out>) at read_config.c:2836
> #1  _init_slurm_conf (file_name=<value optimized out>) at read_config.c:2475
> #2  0x000000000045b6fd in slurm_conf_init (file_name=0x0) at 
> read_config.c:2528
> #3  0x000000000042419d in main (argc=1, argv=0x0) at squeue.c:78
> (gdb) 
> 
> We use AuthType auth/munge so I'm not quite sure why it segfaults on
> the conf->crypto_type comparison and even more I cannot fathom why
> it does it only on the SL5.7 node while it works just fine on all
> the SL6 nodes. I used the same tarball on SL6 and SL5 to create the
> RPMs using rpmbuild -ta slurm-14-11.pre1.tgz.
> 
> Ideas are welcome as the SL5.7 node is one of the main user nodes
> where they create code and submit to cluster so it has to work even
> though the full rest of the cluster works fine. The config btw is
> shared over NFS so it is identical on all nodes.

Try this:

rpmbuild -D 'with_cflags CFLAGS="-O0 -g3"' -ta slurm-14-11.pre1.tgz

Michael

-- 
Michael Jennings <[email protected]>
Senior HPC Systems Engineer
High-Performance Computing Services
Lawrence Berkeley National Laboratory
Bldg 50B-3209E        W: 510-495-2687
MS 050B-3209          F: 510-486-8615

Reply via email to