[slurm-dev] Re: using gdb to debug slurm-15.08?

2016-04-27 Thread Michael Jennings

That should already be the case.  At least in 15.08.x, the spec file contains:

%define __os_install_post /usr/lib/rpm/brp-compress
%define debug_package %{nil}

The first line has the effect of disabling the stripping of binaries
(among other things).  The second line prevents the -debuginfo
packages from being generated.  With both of these in place, the end
result is that the final packaged binaries and libraries should keep
whatever symbol table info was built into them.  I can confirm that
the RPMs I build on my systems have this debugging info in them.

Michael, make sure when you build SLURM RPMs, you are supplying the
"--with debug" flag to rpmbuild.  This will ensure that the correct
debugging symbols are compiled into the final binaries so you can
debug them.  This will both (1) add debugging symbols ("-g") and (2)
turn off optimization ("-O0") to facilitate the use of gdb.

HTH,
Michael



On Wed, Apr 27, 2016 at 7:51 AM, Andy Riebs  wrote:
> [Apologies to the list if someone has already responded to Michael, but I
> don't recall seeing it.]
>
> Hi Michael,
>
> By far the easiest way to debug Slurm problems is by doing your own, local
> build, outside the context of RPM. You can find appropriate ./configure
> arguments (at least to start) in the slurm.spec file, then do the standard
>
> $ ./configure {args }
> $ make
> $ make install
> (and, optionally,)
> $ make contrib-install
>
> Alternatively, if you really like the rpmbuild approach, there is probably
> an argument in the spec file to either
>
> Generate a debuginfo rpm
> Skip stripping the debug info
>
> HTH!
>
> Andy
>
>
> On 04/25/2016 12:20 PM, Michael Kit Gilbert wrote:
>
> Hello everyone,
>
> I'm trying to troubleshoot a problem with a local patch I'm writing for
> Slurm and can't seem to get gdb working properly. I've built the rpms with
> -D '%_with_cflags CFLAGS="-O0 -g3"' and slurmctld and slurmd appear to be
> starting normally. However, when I try to attach gdb to the slurmd process,
> I get an error "Missing separate debuginfos, use: debuginfo-install
> slurm-15.08.4-1.el6.x86_64". I haven't had any luck finding debuginfo rpms
> for slurm-15.
>
> Hoping it was just a minor issue, I continued with debugging and here's what
> I got:
>
> [root@head slurm_uid_patch]# gdb slurmd 26127
> GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.� Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> ...
> Reading symbols from /usr/sbin/slurmd...done.
> Attaching to program: /usr/sbin/slurmd, process 26127
> Reading symbols from /lib64/libdl.so.2...Reading symbols from
> /usr/lib/debug/lib64/libdl-2.12.so.debug...done.
> done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/libpthread.so.0...Reading symbols from
> /usr/lib/debug/lib64/libpthread-2.12.so.debug...done.
> [Thread debugging using libthread_db enabled]
> done.
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libc.so.6...Reading symbols from
> /usr/lib/debug/lib64/libc-2.12.so.debug...done.
> done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from
> /usr/lib/debug/lib64/ld-2.12.so.debug...done.
> done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /usr/lib64/slurm/select_cons_res.so...done.
> Loaded symbols for /usr/lib64/slurm/select_cons_res.so
> Reading symbols from /usr/lib64/slurm/gres_nic.so...done.
> Loaded symbols for /usr/lib64/slurm/gres_nic.so
> Reading symbols from /usr/lib64/slurm/topology_none.so...done.
> Loaded symbols for /usr/lib64/slurm/topology_none.so
> Reading symbols from /usr/lib64/slurm/route_default.so...done.
> Loaded symbols for /usr/lib64/slurm/route_default.so
> Reading symbols from /usr/lib64/slurm/proctrack_cgroup.so...done.
> Loaded symbols for /usr/lib64/slurm/proctrack_cgroup.so
> Reading symbols from /usr/lib64/slurm/task_cgroup.so...done.
> Loaded symbols for /usr/lib64/slurm/task_cgroup.so
> Reading symbols from /usr/lib64/slurm/auth_munge.so...done.
> Loaded symbols for /usr/lib64/slurm/auth_munge.so
> Reading symbols from /usr/lib64/libmunge.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/libmunge.so.2
> Reading symbols from /usr/lib64/slurm/crypto_munge.so...done.
> Loaded symbols for /usr/lib64/slurm/crypto_munge.so
> Reading symbols from /usr/lib64/slurm/jobacct_gather_none.so...done.
> Loaded symbols for /usr/lib64/slurm/jobacct_gather_none.so
> Reading symbols from /usr/lib64/slurm/job_container_none.so...done.
> Loaded symbols for 

[slurm-dev] Re: using gdb to debug slurm-15.08?

2016-04-27 Thread Andy Riebs
   [Apologies to the list if someone has already responded to Michael,
 but I don't recall seeing it.]
 
 Hi Michael,
 
 By far the easiest way to debug Slurm problems is by doing your own,
 local build, outside the context of RPM. You can find appropriate
 ./configure arguments (at least to start) in the slurm.spec file,
 then do the standard
 
 $ ./configure {args }
 $ make
 $ make install
 (and, optionally,)
 $ make contrib-install
 
 Alternatively, if you really like the rpmbuild approach, there is
 probably an argument in the spec file to either
 
   [*]Generate a debuginfo rpm
 [*]Skip stripping the debug info
 HTH!
 
 Andy
 On 04/25/2016 12:20 PM, Michael Kit
   Gilbert wrote:
   using gdb to debug slurm-15.08?
   
   Hello everyone,
 I'm trying to troubleshoot a problem with a local patch I'm
   writing for Slurm and can't seem to get gdb working properly.
   I've built the rpms with -D '%_with_cflags CFLAGS="-O0 -g3"'
   and slurmctld and slurmd appear to be starting normally.
   However, when I try to attach gdb to the slurmd process, I get
   an error "Missing separate
 debuginfos, use: debuginfo-install
 slurm-15.08.4-1.el6.x86_64". I haven't had any luck
   finding debuginfo rpms for slurm-15.
 Hoping it was just a minor issue, I continued with
   debugging and here's what I got:
   [root@head
   slurm_uid_patch]# gdb slurmd 26127
   GNU gdb (GDB) Red Hat
   Enterprise Linux (7.2-60.el6_4.1)
   Copyright (C) 2010 Free
   Software Foundation, Inc.
   License GPLv3+: GNU GPL
   version 3 or later 
   This is free software:
   you are free to change and redistribute it.
   There is NO WARRANTY,
   to the extent permitted by law.� Type "show copying"
   and "show warranty" for
   details.
   This GDB was configured
   as "x86_64-redhat-linux-gnu".
   For bug reporting
   instructions, please see:
   ...
   Reading symbols from
   /usr/sbin/slurmd...done.
   Attaching to program:
   /usr/sbin/slurmd, process 26127
   Reading symbols from
   /lib64/libdl.so.2...Reading symbols from
   /usr/lib/debug/lib64/libdl-2.12.so.debug...done.
   done.
   Loaded symbols for
   /lib64/libdl.so.2
   Reading symbols from
   /lib64/libpthread.so.0...Reading symbols from
   /usr/lib/debug/lib64/libpthread-2.12.so.debug...done.
   [Thread debugging using
   libthread_db enabled]
   done.
   Loaded symbols for
   /lib64/libpthread.so.0
   Reading symbols from
   /lib64/libc.so.6...Reading symbols from
   /usr/lib/debug/lib64/libc-2.12.so.debug...done.
   done.
   Loaded symbols for
   /lib64/libc.so.6
   Reading symbols from
   /lib64/ld-linux-x86-64.so.2...Reading symbols from
   /usr/lib/debug/lib64/ld-2.12.so.debug...done.
   done.
   Loaded symbols for
   /lib64/ld-linux-x86-64.so.2
   Reading symbols from
   /usr/lib64/slurm/select_cons_res.so...done.
   Loaded symbols for
   /usr/lib64/slurm/select_cons_res.so
   Reading symbols from
   /usr/lib64/slurm/gres_nic.so...done.
   Loaded symbols for
   /usr/lib64/slurm/gres_nic.so
   Reading symbols from
   /usr/lib64/slurm/topology_none.so...done.
   Loaded symbols for
   /usr/lib64/slurm/topology_none.so
   Reading symbols from
   /usr/lib64/slurm/route_default.so...done.
   Loaded symbols for
   /usr/lib64/slurm/route_default.so
   Reading symbols from
   /usr/lib64/slurm/proctrack_cgroup.so...done.
   Loaded symbols for
   /usr/lib64/slurm/proctrack_cgroup.so
   Reading symbols from
   /usr/lib64/slurm/task_cgroup.so...done.
   Loaded symbols for
   /usr/lib64/slurm/task_cgroup.so
   Reading symbols from
   /usr/lib64/slurm/auth_munge.so...done.
   Loaded symbols for
   /usr/lib64/slurm/auth_munge.so
   Reading symbols from
   /usr/lib64/libmunge.so.2...(no debugging symbols
   found)...done.
   Loaded symbols for
   /usr/lib64/libmunge.so.2
   Reading symbols from
   /usr/lib64/slurm/crypto_munge.so...done.
   Loaded symbols for
   /usr/lib64/slurm/crypto_munge.so
   Reading symbols from
   /usr/lib64/slurm/jobacct_gather_none.so...done.
   Loaded symbols for
   /usr/lib64/slurm/jobacct_gather_none.so
   Reading symbols from
   /usr/lib64/slurm/job_container_none.so...done.
   Loaded symbols for
   /usr/lib64/slurm/job_container_none.so
   Reading symbols from
   /usr/lib64/slurm/core_spec_none.so...done.
   Loaded symbols for
   

[slurm-dev] Re: Saving job submissions

2016-04-27 Thread Paul Edmon
We use this python script as a slurmctld prolog to save ours. Basically 
it pulls all the info from the slurm hash files and copies to a separate 
filesystem.  We used to do it via mysql but the database got too large.


We then use the get_jobscript to actually query the job scripts.

-Paul Edmon-

On 04/27/2016 03:41 AM, Lennart Karlsson wrote:


On 04/27/2016 08:46 AM, Miguel Gila wrote:
Another option is to run (as SlurmUser, or root) when the job is 
still in the system (R or PD):


# scontrol show jobid= -dd

and dump its output somewhere.

Miguel


Hi,

We are using this one, run as slurmctld.prolog,
saving the output for 30 days.

Cheers,
-- Lennart Karlsson, UPPMAX, Uppsala University, Sweden
   http://www.uppmax.uu.se/


#!/usr/bin/python -tt

import logging
import os
import sys
import syslog
import traceback
from glob import glob


def get_env_vars(job_id):
#if not os.path.exists('/slurm/spool/job.%s/environment' % job_id):
#return []
paths = glob('/slurm/spool/hash.*/job.%s/environment' % job_id)
if len(paths) == 0:
return []

# Not sure how this would happen, but it would be weird
if len(paths) > 1:
return []

# Skip the first four bytes, which are a uint32 indicating the
# length of the following data.
env_raw = open(paths[0]).read()[4:]

env_vars = []
for env in env_raw.split('\0'):
if env in ['', ';']:
continue

name, value = env.split('=', 1)
env_vars.append("%s='%s'" % (name, value.replace("'", r"'\''")))

return env_vars

def get_job_script(job_id):
#if not os.path.exists('/slurm/spool/job.%s/script' % job_id):
#return ''
paths = glob('/slurm/spool/hash.*/job.%s/script' % job_id)
if len(paths) == 0:
return ''

# Not sure how this would happen, but it would be weird
if len(paths) > 1:
return ''

# The job script has a trailing NULL. o_O
script_raw = open(paths[0]).read().rstrip('\0')
shebang = rest = ''
try:
shebang, rest = script_raw.split('\n', 1)
except ValueError, e:
raise Exception('No lines in script file 
/slurm/spool/hash.*/job.%s/script' % job_id)


sbatch_lines = []
rest_lines = []
in_sbatch = True
for line in rest.split('\n'):
if in_sbatch:
if line.startswith('#SBATCH') or line.strip() == '':
sbatch_lines.append(line)
else:
in_sbatch = False
rest_lines.append(line)
else:
rest_lines.append(line)

return '%s\n%s\n\n%s\n BEGIN RUNTIME ENV \n%s' % (
shebang, '\n'.join(sbatch_lines).strip(),
'\n'.join(rest_lines), '\n'.join(get_env_vars(job_id)), 
)

def save_job_script(job_id):

# The number of subdirectories the scripts will be distributed into
# job_id modulo SUBDIRCOUNT determines parent directory for the job id
SLURM_JOBSCRIPT_SUBDIRCOUNT = 1000;
SLURM_JOBSCRIPT_SUBDIRCOUNT = 
int(os.environ.get("SLURM_JOBSCRIPT_SUBDIRCOUNT","1000"))

# The root path of the jobscripts
SLURM_JOBSCRIPT_HOME = os.environ.get("SLURM_JOBSCRIPT_HOME","/jobscripts")


subdir = str(int(job_id) % SLURM_JOBSCRIPT_SUBDIRCOUNT)
scriptdir = os.path.join(SLURM_JOBSCRIPT_HOME,subdir)
if not os.path.exists(scriptdir):
os.mkdir(scriptdir)

scriptfile = os.path.join(scriptdir,job_id)
with open(scriptfile,'w') as f:
f.write(get_job_script(job_id))


if __name__ == '__main__':
try:
save_job_script(os.environ['SLURM_JOB_ID'])
except Exception as e:
syslog.openlog(os.path.basename(sys.argv[0]), syslog.LOG_PID)
for line in traceback.format_exc().split('\n'):
syslog.syslog(syslog.LOG_ERR, line)

get_jobscript.sh
Description: Bourne shell script


[slurm-dev] General - has any site run slurm-16.05 on a Cray XC

2016-04-27 Thread Brian Gilmer
I would be interested in what CLE version it was tested on and if there
were any issues related to nodes returning to service.

-- 
Speak when you are angry--and you will make the best speech you'll ever
regret.
  - Laurence J. Peter


[slurm-dev] Re: Saving job submissions

2016-04-27 Thread Lennart Karlsson


On 04/27/2016 08:46 AM, Miguel Gila wrote:

Another option is to run (as SlurmUser, or root) when the job is still in the 
system (R or PD):

# scontrol show jobid= -dd

and dump its output somewhere.

Miguel


Hi,

We are using this one, run as slurmctld.prolog,
saving the output for 30 days.

Cheers,
-- Lennart Karlsson, UPPMAX, Uppsala University, Sweden
   http://www.uppmax.uu.se/


[slurm-dev] Re: Saving job submissions

2016-04-27 Thread Miguel Gila

Hi, 

if you use ElasticSearch, you could use the excellent jobcomp plugin for it:

JobCompType = jobcomp/elasticsearch

https://github.com/SchedMD/slurm/tree/slurm-15-08-10-1/src/plugins/jobcomp/elasticsearch

It puts on your elasticsearch instance a log with details of the job, and the 
SBATCH file used to submit the job.

Another option is to run (as SlurmUser, or root) when the job is still in the 
system (R or PD):

# scontrol show jobid= -dd

and dump its output somewhere.

Miguel


> On 27 Apr 2016, at 08:15, Hendryk Bockelmann  wrote:
> 
> Hi Gary,
> 
> we are using the slurm epilog to save data, which is located in 
> StateSaveLocation on the ControlMachine/ControlAddr (see slurm.conf) - this 
> is the batch script and the runtime environment.
> 
> Hendryk
> 
> On 27.04.2016 03:43, Skouson, Gary B wrote:
>> 
>> I'm interested in a way to save job submissions, including the job 
>> submission script, and options passed to the sbatch command.  I'd also like 
>> info like current working directory, stderr and stdout.  Is there an easy 
>> way to save that info each job as jobs are submitted?
>> 
>> -
>> Gary Skouson
> 

-- 
Miguel Gila
CSCS Swiss National Supercomputing Centre
HPC Operations
Via Trevano 131 | CH-6900 Lugano | Switzerland
mg [at] cscs.ch 


[slurm-dev] Re: Saving job submissions

2016-04-27 Thread Roche Ewan
Hello,
we also use an epilog to collect this information

#!/bin/sh

if [ x$SLURM_CLUSTER_NAME = "x" ] ; then
exit 0
fi

_DIR_=/home/slurm/${SLURM_CLUSTER_NAME}/${SLURM_JOB_USER}
if [ ! -d $_DIR_ ] ; then
  mkdir ${_DIR_}
fi
if [ -d $_DIR_ ] ; then
  scontrol show jobid=$SLURM_JOB_ID -dd > "${_DIR_}/$SLURM_JOB_ID.job"
fi
exit 0

Obviously it’s not at submission time but one could argue that it’s better as 
an epilog seeing as users can use scontrol to modify the options that were 
initially given to sbatch.

Ewan Roche

EPFL


On 27 avr. 2016, at 08:15, Hendryk Bockelmann 
> wrote:

Hi Gary,

we are using the slurm epilog to save data, which is located in 
StateSaveLocation on the ControlMachine/ControlAddr (see slurm.conf) - this is 
the batch script and the runtime environment.

Hendryk

On 27.04.2016 03:43, Skouson, Gary B wrote:

I'm interested in a way to save job submissions, including the job submission 
script, and options passed to the sbatch command.  I'd also like info like 
current working directory, stderr and stdout.  Is there an easy way to save 
that info each job as jobs are submitted?

-
Gary Skouson




[slurm-dev] Re: Saving job submissions

2016-04-27 Thread Hendryk Bockelmann

Hi Gary,

we are using the slurm epilog to save data, which is located in 
StateSaveLocation on the ControlMachine/ControlAddr (see slurm.conf) - 
this is the batch script and the runtime environment.


Hendryk

On 27.04.2016 03:43, Skouson, Gary B wrote:


I'm interested in a way to save job submissions, including the job submission 
script, and options passed to the sbatch command.  I'd also like info like 
current working directory, stderr and stdout.  Is there an easy way to save 
that info each job as jobs are submitted?

-
Gary Skouson




smime.p7s
Description: S/MIME Cryptographic Signature