[slurm-dev] Re: Job Accounting for sstat

2016-08-29 Thread Christopher Samuel

On 30/08/16 12:39, Lachlan Musicman wrote:

> Oh! Thanks.
> 
> I presume that includes sruns that are in an sbatch file.

Yup, that's right.

cheers!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: SLURM daemon doesn't start

2016-08-30 Thread Christopher Samuel

On 31/08/16 10:35, James Andrew Venning wrote:

> slurmctld: error: this host (slurm-master) not valid controller
> (144.6.230.71 or (null))

That looks like the main issue, for some reason Slurm doesn't think it's
running on the node you want it to.

> ​As an aside, I installed with sudo apt-get instal slurm-llnl. What's
> the difference between slurm-llnl and slurm?​

In Debian (& hence Ubuntu) slurm is a network load monitor.

https://github.com/mattthias/slurm/blob/upstream/README

In later Debian releases Slurm is called slurm-wlm (where wlm stands for
work load manager) and slurm-llnl is now a transitional package to move
existing installations to the new name.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: new CRIU plugin

2016-08-30 Thread Christopher Samuel

On 30/08/16 22:11, Manuel Rodríguez Pascual wrote:

> We hope that this can be useful for the Slurm community.

That's really pretty neat!

I can't test myself as we're stuck on RHEL6 for the moment but I do
wonder if you've considered doing the same for Open-MPI so that Slurm
can do checkpoint/resume for it in the same way it does for BLCR at the
moment?

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: SLURM v14 munge issues auth to SLURM v16 DBD

2016-09-11 Thread Christopher Samuel

On 07/09/16 01:51, Chad Cropper wrote:

> I have an external SLURM v16 just for providing a DBD/MySQL. I have a
> SLURM v15 already connected to this DBD. The v14 is our PRD system. The
> v15 is our TST system. We are using the TST system for upgrade testing
> for the PRD system. When starting the slurmctl on v14 it starts but has
> no access to the DBD. 

Slurm supports two older revisions as well as the current version, so
16.05 will talk to 15.08 and 14.11 but *NOT* 14.03 - so it's really
important which of the two major releases made in 2014 you're talking about.

That said, to my untutored eye that looks more like a munge problem than
anything else - you will want to check that your keys are the same and
that your clocks are in sync (NTP is your friend).

Best of luck,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: external slurmdbd for multiple clusters

2016-09-25 Thread Christopher Samuel

On 23/09/16 19:12, Miguel Gila wrote:

> We also do this, run a number of Slurm clusters attached to a single
> slurmdbd. One issue with this is setup is that once you decommission
> a cluster, it needs to be removed from the DB somehow, otherwise your
> DB grows beyond reasonable size...

For accounting reasons we can't do that, consequently our slurmdbd is
around 17GB in size.

To give you an idea the largest table is the job step table for our
BlueGene/Q which takes almost 7GB for pushing 19 million job steps.

The next largest is the job step table for one of our x86 clusters which
is around 3GB for 8 million job steps.

Neither cause us any issues these days (we used to have a problem when,
for complicated historical reasons, slurmdbd was running on a 32-bit VM
and could run out of memory).

Admittedly we do have beefy database servers. :-)

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Bug in node suspend/resume config code with scontrol reconfigure in 16.05.x (bugzilla #3078)

2016-09-25 Thread Christopher Samuel

Hi folks

A heads up for those using or looking to use node suspend/resume aka
power saving aka elastic computing in 16.05.x.

The slurmctld will lose your list of excluded nodes/partitions on an:

scontrol reconfigure

and then will treat all nodes as being eligible for power control,
putting them into a bad state. :-(

This is Slurm bug:

https://bugs.schedmd.com/show_bug.cgi?id=3078

which has been hit separately by two friends of mine at different
places, one of whom I'm helping out with elastic computing/cloudburst.

Hopefully this saves someone else from losing sleep over this!

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: external slurmdbd for multiple clusters

2016-09-25 Thread Christopher Samuel

On 02/09/16 18:54, Paddy Doyle wrote:

> We currently have the dbd on 16.05.4 and have some 15.x clusters still 
> pointing
> to it fine. I can't recall exactly, but in the past we may have even had 2 
> major
> releases behind pointing to an up-to-date dbd... I'm not sure how far back you
> can go, but I suspect 14.x talking to a 16.x dbd would be fine.

Slurm supports 2 major releases behind.

So a 16.05.x slurmdbd should talk to 15.08.x and 14.11.x
slurmctld's but *not* 14.03.x.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Send notification email

2016-10-04 Thread Christopher Samuel

On 03/10/16 23:39, Fanny Pagés Díaz wrote:

> I have a slurm running in the same HPC cluster server, but I need send
> all notification using my corporate mail server, which running in
> another server at my internal network. I not need use the local postfix
> installed at slurm server.

The most reliable solution will be to configure Postfix to send emails
via the corporate server.

All our clusters send using our own mail server quite deliberately.

We set:

relayhost (to say where to relay email via)
myorigin (to set the system name to its proper FQDN)
aliasmaps (to add an LDAP lookup to rewrite users email to the value in
LDAP)

But really this isn't a Slurm issue, it's a host config issue for Postfix.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Send notification email

2016-10-06 Thread Christopher Samuel

On 06/10/16 03:07, Fanny Pagés Díaz wrote:

> Oct  5 11:34:52 compute-0-3 postfix/smtp[6469]: connect to 
> 10.8.52.254[10.8.52.254]:25: Connection refused

So you are blocked from connecting to the mail server you are trying to
talk to on port 25/tcp (SMTP) - you need to get that opened up.

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Best way to control synchronized clocks in cluster?

2016-10-06 Thread Christopher Samuel

On 07/10/16 01:17, Per Lönnborg wrote:

> But what is the preferred way to check that the compute nodes on our
> have correct time, and if not, see to it that Slurm doesn�t allocate
> these nodes to perform tasks?

We run NTP everywhere - we have to because GPFS depends on correct
clocks as well and if they're out of step well then GPFS will stop
working on the node making Slrm the least of your worries. :-)

So just run ntpd.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Slurm 15.08.12 - Issue after upgrading to 15.08 - only one job per node is running

2016-09-19 Thread Christopher Samuel

On 19/09/16 20:06, Diego Zuccato wrote:

> I've set the default to 200, so that users are *strongly* encouraged to
> specify the real amount of RAM they need. Or ask for exclusive access to
> a node.
> 
> Just a possible hint.

For our systems 2 GB/core is 1/2 of their actual RAM/core ratio on the
low memory nodes on one system, it's 1/8th of the low memory nodes on
another system so making it lower doesn't buy us much and 2 GB/core
means most NAMD jobs will run without issues.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: fwd_tree_thread ... failed to forward the message

2016-09-19 Thread Christopher Samuel

On 19/09/16 16:10, Ulf Markwardt wrote:

> We have not deleted the "node_state" file and thus ran into these troubles:

Thanks so much Ulf, you've just answered a puzzle I've been seeing on an
x86 cluster I'm helping out with!

They've been trying to set up cloudburst and were convinced it was
related to that, but this looks like the actual problem.

Now why slurmctld doesn't upgrade that information on an upgrade is
another matter altogether.

Thanks!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Slurm 15.08.12 - Issue after upgrading to 15.08 - only one job per node is running

2016-09-18 Thread Christopher Samuel

On 18/09/16 03:45, John DeSantis wrote:

> Try adding a "DefMemPerCPU" statement in your partition definitions, e.g

You can also set that globally.

# Global default for jobs - request 2GB per core wanted.
DefMemPerCPU=2048

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Slurm 15.08.12 - Issue after upgrading to 15.08 - only one job per node is running

2016-09-19 Thread Christopher Samuel

On 20/09/16 00:14, John DeSantis wrote:

> Nothing like the DRY principle in a config file!

Grin, I guess it is only a single parameter, just that it can have
different values depending on the context. :-)

> All of the times that I've read over the available parameters for
> slurm.conf, this one escaped me!

There's always new ones there, I swear they're breeding..

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: fwd_tree_thread ... failed to forward the message

2016-09-20 Thread Christopher Samuel

On 19/09/16 22:58, Christopher Samuel wrote:

> Thanks so much Ulf, you've just answered a puzzle I've been seeing on an
> x86 cluster I'm helping out with!

...and stopping slurmctld & slurmd's (slurmdbd was left going), moving
/var/spool/slurm/jobs/node_state* out of the way and starting everything
back up fixed it.

We saved the state of drained nodes beforehand and reapplied them after
of course.

Thanks so much Ulf!


-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: X11 plugin problems

2016-09-20 Thread Christopher Samuel

On 21/09/16 06:13, Simpson, Claire L wrote:

> srun: error: x11: unable to connect node node010

Can users ssh back from the compute node to the login node without being
prompted for a password/passphrase or to accept an ssh key?

That's usually the source of those issues in my experience.

SSH host based authentication within the cluster helps with that, along
with caching SSH keys in /etc/ssh/ssh_known_hosts.

Best of luck,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Slurm array scheduling question

2016-09-21 Thread Christopher Samuel

On 21/09/16 14:15, Christopher benjamin Coffey wrote:

> I’d like to get some feedback on this please from other sites and the
> developers if possible.  Thank you!

The best I can offer is this from the job array documentation from Slurm:

# When a job array is submitted to Slurm, only one job record is
# created. Additional job records will only be created when the
# state of a task in the job array changes, typically when a task
# is allocated resources or its state is modified using the
# scontrol command.

In 2.6.x I think there was a record for each element, but this was
cleaned up in a later release (can't remember when sorry!).

Hope that helps,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: strange going-ons with OpenMPI and Infiniband

2016-08-26 Thread Christopher Samuel

On 26/08/16 23:03, Michael Di Domenico wrote:

> is it off by default?  we're running the default openib stack in rhel
> 6.7.   i'm not even sure where to check for it being on/off, i've
> never had to specifically enable/disable UD before, i thought it was
> always the programs choice whether to run UD or RC

Our xCAT postinstall script for creating RAMdisk images for our compute
nodes has:

# Now needed on and off respectively for RHEL 6.6
chroot ${1} chkconfig rdma on
chroot ${1} chkconfig ibacm off

The chroot is because the script runs outside of the area the RAMdisk
image is being constructed inside - we've used this for stock RHEL 6.6,
6.7 and 6.8 with Open-MPI 1.6, 1.8 and 1.10 without issues.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Backfill scheduler should look at all jobs

2016-08-23 Thread Christopher Samuel

On 23/08/16 17:56, Ulf Markwardt wrote:

> Hello Christopher,
> 
> > Isn't this what bf_continue is for?
>
> Not really: bf_continue just lets the bf scheduler continue at the same
> position where it was interrupted (e.g. for job submissions). So it does
> not consider the new jobs but crawls down first. (These interrupts are
> controlled by bf_yield_interval and bf_yield_sleep.)

OK, then I'm missing something because I thought that was what you
wanted.  Sorry for the noise!

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Slurmctld auto restart and kill running job, why ?

2016-09-28 Thread Christopher Samuel

On 29/09/16 01:16, John DeSantis wrote:

> We get the same snippet when our logrotate takes action against the
> cltdlog:

Does your slurmctld restart then too?

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Slurmctld auto restart and kill running job, why ?

2016-09-27 Thread Christopher Samuel

On 26/09/16 17:48, Philippe wrote:

> [2016-09-26T08:02:16.582] Terminate signal (SIGINT or SIGTERM) received

So that's some external process sending one of those two signals to
slurmctld, it's not something it's choosing to do at all.  We've never
seen this.

One other question - you've got the shutdown log from slurmctld and the
start log of a slurmd - what happens when slurmctld starts up?

That might be your clue about why yours jobs are getting killed.

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Invalid Protocol Version

2016-09-28 Thread Christopher Samuel

On 28/09/16 16:25, Barbara Krasovec wrote:

> Yes, this worked! Thank you very much for your help!

My pleasure!

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: CGroups

2016-09-27 Thread Christopher Samuel

On 26/09/16 16:51, Lachlan Musicman wrote:

> Does this mean that it's now considered acceptable to run cgroups for
> ProcTrackType?

We've been running with that on all our x86 clusters since we switched
to Slurm, haven't seen an issue yet.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Slurmctld auto restart and kill running job, why ?

2016-09-27 Thread Christopher Samuel

On 26/09/16 17:48, Philippe wrote:

> [2016-09-26T08:01:44.792] debug:  slurmdbd: Issue with call
> DBD_CLUSTER_CPUS(1407): 4294967295(This cluster hasn't been added to
> accounting yet)

Not related - but it looks like whilst it's been told to talk to
slurmdbd you haven't added the cluster to slurmdbd with "sacctmgr" yet
so I suspect all your accounting info is getting lost.

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Slurmctld auto restart and kill running job, why ?

2016-09-27 Thread Christopher Samuel

On 27/09/16 17:40, Philippe wrote:

>   /usr/sbin/invoke-rc.d --quiet slurm-llnl reconfig >/dev/null

I think you want to check whether that's really restarting it or just
doing an "scontrol reconfigure" which won't (shouldn't) restart it.

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Invalid Protocol Version

2016-09-27 Thread Christopher Samuel

On 27/09/16 23:54, Barbara Krasovec wrote:

> The version of the client and server is the same. I guess the problem is
> in the slurmctld state file, where the slurm protocol version of some
> worker nodes must be wrong.

I suspect this is bug 3050 - we hit it for frontend nodes on BlueGene/Q
and reported it against that, but I've seen the same symptom that
you've hit on x86 clusters too - as has Ulf from Dresden.

https://bugs.schedmd.com/show_bug.cgi?id=3050

Tim is checking the code for the generic case.

> Creating a clean state file would fix the problem but also kill the
> jobs. Is there another way to fix this? Enforce the correct version in
> node_mgr.c and job_mgr.c? Restarting the services doesn't help at all.

Actually you don't lose jobs - but you do lose the reason for nodes being 
offline..

What we did was:

1) save the state of offline nodes and make a script to restore via scontrol
2) shutdown slurmctld and all slurmds
3) move the node_stat* files out of the way
4) start up slurmd again
5) start up slurmctld
6) run the script created at step 1

Hope that helps!

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Query number of cores allocated per node for a job

2016-10-25 Thread Christopher Samuel

Hi all,

I can't help but think I'm missing something blindingly obvious, but
does anyone know how to find out how Slurm has distributed a job in
terms of cores per node?

In other words, if I submit:

sbatch --ntasks=64 --wrap sleep 60

on a system with (say 16 core nodes where nodes are already running
disparate number of jobs using variable cores, how do I see what cores
on what nodes Slurm has allocated my running job?

I know I can go and poke around with cgroups, but is there a way to get
that out of squeue, sstat or sacct?

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: How to restart a job "(launch failed requeued held)"

2016-10-27 Thread Christopher Samuel

On 28/10/16 08:44, Lachlan Musicman wrote:

> So I checked the system, noticed that one node was drained, resumed it.
> Then I tried both
> 
> scontrol requeue 230591
> scontrol resume 230591

What happens if you "scontrol hold" it first before "scontrol release"'ing it?

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: How to account how many cpus/gpus per node has been allocated to a specific job?

2016-11-08 Thread Christopher Samuel

On 09/11/16 12:15, Ran Du wrote:

>Thanks a lot for your reply. However, it's not what I want to
> get. For the example of Job 6449483, it is allocated with only one node,
> what if it was allocated with multiple nodes? I'd like to get the
> accounting statistics about how many CPUs/GPUs separately on each node,
> but not the sum number on all nodes.

Oh sorry, that's my fault, I completely misread what you were after
and managed to invert your request!

I don't know if that information is included in the accounting data.

I believe the allocation is uniform across the nodes, for instance:

$ sbatch --gres=mic:1 --mem=4g --nodes=2 --wrap /bin/true

resulted in:

$ sacct -j 6449484 -o jobid%20,jobname,alloctres%20,allocnodes,allocgres
   JobIDJobNameAllocTRES AllocNodesAllocGRES
 --  -- 
 6449484   wrap  cpu=2,mem=8G,node=2  2mic:2
   6449484.batch  batch  cpu=1,mem=4G,node=1  1mic:2
  6449484.extern extern  cpu=2,mem=8G,node=2  2mic:2

The only oddity there is that the batch step is of course
only on the first node, but it says it was allocated 2 GRES.
I suspect that's just a symptom of Slurm only keeping a total
number.

I don't think Slurm can give you an uneven GRES allocation, but
the SchedMD folks would need to confirm that I'm afraid.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Re:

2016-11-08 Thread Christopher Samuel

Hi there,

On 08/11/16 21:21, Alexandre Strube wrote:

> For example, on slurm 10.05.6, the example config file says:
> 
> StateSaveLocation=/tmp
> 
> Which is not the best place to write sensitive information, but it will
> for sure be there and will be writable by the slurm user.

Frankly using /tmp seems like a *really* bad idea to me.

The reason is that (depending on system configuration) may be cleaned up
either on a reboot (either scripts or by using non-persistent tmpfs) or
periodically using tmpwatch or similar scripts.

So if you've got jobs queued for any period of time that information
will be lost.

We build from source and use:

StateSaveLocation   = /var/spool/slurm/jobs

but the decision is yours where exactly to put it.

But /tmp is almost certainly the second worst place (after /dev/shm).

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Re:

2016-11-08 Thread Christopher Samuel

On 09/11/16 09:50, Lachlan Musicman wrote:

> I don't know Chris, I think that /dev/null would rate tbh. :)

Ah, but that's a file (OK character special device), not a directory. ;-)

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: sinfo man page

2016-11-07 Thread Christopher Samuel

On 08/11/16 11:42, Lachlan Musicman wrote:
> 
> %CNumber of CPUs by state in the format
> "allocated/idle/other/total". Do not use  this  with  a  node  state
> option ("%t" or "%T") or the different node states will be placed on
> separate lines.
> 
> I presume I am doing something wrong?

I think it's a bug (possibly just in the manual page) as the -N option
says there:

   -N, --Node
  Print  information  in a node-oriented format with one
  line per node.  The default is to print informa-
  tion in a partition-oriented format.  This is ignored if
  the --format option is specified.

Except it's not being ignored when you use --format (-o).

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: slurm_load_partitions: Unable to contact slurm controller (connect failure)

2016-10-25 Thread Christopher Samuel

On 25/10/16 10:05, Peixin Qiao wrote:
> 
> I installed slurm-llnl on Debian on one computer. When I ran slurmctld
> and slurmd, I got the error:
> slurm_load_partitions: Unable to contact slurm controller (connect failure).

Check your firewall rules to ensure that those connections aren't
getting blocked, and also check that the hostname correctly resolves.

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: sreport "duplicate" lines

2016-10-20 Thread Christopher Samuel

On 21/10/16 13:07, Andrew Elwell wrote:

> Yep, and for that particular account, not all of the members are
> showing twice - I can't work out what causes it

Looks like you've somehow created partition specific associations for
some people - not something we do at all.

I suspect that's what's triggering the different display in sreport, a
line per association/partition.

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: sreport "duplicate" lines

2016-10-20 Thread Christopher Samuel

On 21/10/16 12:29, Andrew Elwell wrote:

> When running sreport (both 14.11 and 16.05) I'm seeing "duplicate"
> user info with different timings. Can someone say what's being added
> up separately here - it seems to be summing something differently for
> me and I can't work out what makes it split into two:

Not that it helps, but I don't see the same here for me with 16.05.5.

# sreport cluster AccountUtilizationByUser start=2016-07-01 end=2016-10-01 
account=vlsci user=samuel cluster=avoca -t h

Cluster/Account/User Utilization 2016-07-01T00:00:00 - 2016-09-30T23:59:59 
(7948800 secs)
Use reported in TRES Hours

  Cluster Account Login Proper Name Used   Energy
- --- - ---  
avoca   vlscisamuel Christopher Sa+15103    0



-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Gres issue

2016-11-16 Thread Christopher Samuel

On 17/11/16 11:31, Christopher Samuel wrote:

> It depends on the library used to pass options,

Oops - that should be parse, not pass.

Need more caffeine..

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Gres issue

2016-11-16 Thread Christopher Samuel

On 17/11/16 00:04, Michael Di Domenico wrote:

> this might be nothing, but i usually call --gres with an equals
> 
> srun --gres=gpu:k10:8
> 
> i'm not sure if the equals is optional or not

It depends on the library used to pass options, I'm used to it being
mandatory but apparently with Slurm it's not - just tested it out and using:

--gres mic

results in my job being scheduled on a Phi node with OFFLOAD_DEVICES=0
set in its environment.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] RE: Wrong behaviour of "--tasks-per-node" flag

2016-11-17 Thread Christopher Samuel

On 28/10/16 18:20, Manuel Rodríguez Pascual wrote:

> -bash-4.2$ sbatch --ntasks=16  --tasks-per-node=2  test.sh 

Could you post the content of your batch script too please?

We're not seeing this on 16.05.5, but I can't be sure I'm correctly
replicating what you are seeing.

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: How to account how many cpus/gpus per node has been allocated to a specific job?

2016-11-13 Thread Christopher Samuel

On 09/11/16 13:07, Ran Du wrote:

>However, the scheduler must have information about separate
> allocated number on each node, or they cannot track how many resources
> left on each node. The question is, if SLURM keep these separate numbers
> in files(e.g. log files or database), or just keep them in memory. I am
> going to read other docs info, to see if there is any lead.

I strongly suspect it's only held in memory by slurmctld whilst running
(and in its state files of course).

Unfortunately it doesn't even appear in "scontrol show job --detail"
from what I can see. :-(

Here's the lines from a test job of mine:

   TRES=cpu=2,mem=8G,node=2
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*
 Nodes=barcoo[069-070] CPU_IDs=0 Mem=4096
   MinCPUsNode=1 MinMemoryNode=4G MinTmpDiskNode=0
   Features=(null) Gres=mic:1 Reservation=(null)

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Using slurm to control container images?

2016-11-15 Thread Christopher Samuel

On 16/11/16 12:23, Lachlan Musicman wrote:

> Has anyone tried Shifter out and has there been any movement on this? I
> presume the licensing issues remain.

We've got both Shifter and Singularity set up at VLSCI for users.

https://www.vlsci.org.au/documentation/running_jobs/shifter/

https://www.vlsci.org.au/documentation/running_jobs/singularity/

The important thing to recognise for both is that they are *NOT* Docker,
but they are both able to use Docker containers.

Shifter imports directly from the Docker registry (or from other
registries that you configure) and lives entirely on your HPC system,
Singularity needs to be installed on the users system and configured,
but they can do the conversion there and there is no central public repo
of converted containers (a plus if you need a private container, a minus
if you're going to end up with hundreds of copies).

Having private containers is on the roadmap for Shifter.

Shifter also integrates with Slurm.

All the best!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] RE: Wrong behaviour of "--tasks-per-node" flag

2016-11-20 Thread Christopher Samuel

On 19/11/16 03:38, Manuel Rodríguez Pascual wrote:

> sbatch --ntasks=16  --tasks-per-node=2  --wrap 'mpiexec ./helloWorldMPI'

If your MPI stack properly supports Slurm shouldn't that be:

sbatch --ntasks=16  --tasks-per-node=2  --wrap 'srun ./helloWorldMPI'

?

Otherwise you're at the mercy of what your mpiexec chooses to do.

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Slurm versions 16.05.6 and 17.02.0-pre3 are now available

2016-10-30 Thread Christopher Samuel

On 29/10/16 00:58, Peixin Qiao wrote:

> Will the slurm version 16.05.6 support ubuntu 16.04?

If you build it from source I suspect any moderately recent version will
work there.

If you are asking about the Ubuntu packaged version, then that's a
question for Canonical, not SchedMD. :-)

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Passing binding information

2016-10-31 Thread Christopher Samuel

On 01/11/16 05:43, Andy Riebs wrote:

> Does anyone have any recent experience with this code who can answer the
> questions?

Unfortunately it looks like all SchedMD folks have dropped off the
mailing list (apart from posting announcements), presumably due to
workload.  You may want to contact them directly.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Passing binding information

2016-11-02 Thread Christopher Samuel

On 02/11/16 02:01, Riebs, Andy wrote:

> Interesting -- thanks for the info Chris.

No worries, it's a bit sad I think, but I can understand it.

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: SLURM reports much higher memory usage than really used

2016-12-15 Thread Christopher Samuel

On 16/12/16 10:33, Kilian Cavalotti wrote:

> I remember Danny recommending to use jobacct_gather/linux over
> jobacct_gather/cgroup, because "cgroup adds quite a bit of overhead
> with very little benefit".
> 
> Did that change?

We took that advice but reverted because of this issue (from memory).

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: SLURM reports much higher memory usage than really used

2016-12-15 Thread Christopher Samuel

On 16/12/16 02:15, Stefan Doerr wrote:

> If I check on "top" indeed it shows all processes using the same amount
> of memory. Hence if I spawn 10 processes and you sum usages it would
> look like 10x the memory usage.

Do you have:

JobAcctGatherType=jobacct_gather/linux

or:

JobAcctGatherType=jobacct_gather/cgroup

If the former, try the latter and see if it helps get better numbers (we
went to the former after suggestions from SchedMD but from highly
unreliable memory had to revert due to similar issues to those you are
seeing).

Best of luck,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: job arrays, fifo queueing not wanted

2016-12-14 Thread Christopher Samuel

On 14/12/16 04:57, Michael Miller wrote:

> thank you for your answer. I do not need round-robin - I need some
> mechanism that allows both/multiple job arrays to share the resources.

We set CPU limits per association per cluster:

${SACCTMGR} -i modify account set grpcpus=256   where cluster=snowy

So no project can use more than 256 cores at once.

You can also do that for nodes, GrpCpuMins (product of cores and
runtime), etc.

This only makes sense if your cluster is going to be overcommitted most
of the time though, otherwise you may have jobs pending due to limits
with idle resources.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Prolog behavior with and without srun

2017-01-09 Thread Christopher Samuel

On 06/01/17 15:08, Vicker, Darby (JSC-EG311) wrote:

> Among other things we want the prolog and epilog scripts to clean up any 
> stray processes.

I would argue that a much better way to do that is to use Slurm's
cgroups support, that will contain a jobs processes into the cgroup
allowing it to kill off only those processes (and not miss any) when the
job ends.

If you are unlucky enough to have SSH based job launchers then you would
also look at the BYU contributed pam_slurm_adopt which will put those
tasks into the cgroup of that users job on the node they are trying to
SSH into.  You do need PrologFlags=contain for that to ensure that all
jobs get an "extern" batch step on job creation for these processes to
be adopted into.

We use both here with great success.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Prolog behavior with and without srun

2017-01-09 Thread Christopher Samuel

On 10/01/17 10:57, Christopher Samuel wrote:

> If you are unlucky enough to have SSH based job launchers then you would
> also look at the BYU contributed pam_slurm_adopt

Actually this is useful even without that as it allows users to SSH into
a node they have a job on and not disturb the cores allocated to other
jobs on the node, just their own.

You could argue that this is more elegant though, to add an interactive
shell job step to a running job:

[samuel@barcoo ~]$ srun --jobid=6522365  --pty -u ${SHELL} -i -l
[samuel@barcoo010 ~]$
[samuel@barcoo010 ~]$ cat /proc/$$/cgroup
4:cpuacct:/slurm/uid_500/job_6522365/step_1/task_0
3:memory:/slurm/uid_500/job_6522365/step_1
2:cpuset:/slurm/uid_500/job_6522365/step_1
1:freezer:/slurm/uid_500/job_6522365/step_1


-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Question about -m cyclic and --exclusive options to slurm

2017-01-03 Thread Christopher Samuel

On 04/01/17 10:29, Koziol, Lucas wrote:

> I want to have 1 batch script, where I reserve a certain large number
> of CPUs, and then run multiple 1-CPU tasks from within this single
> script. The reason being that I do several cycles of these tasks, and
> I need to process the outputs between tasks.

OK, I'm not sure how Slurm will behave with multiple srun's and cons_res
and CR_LLN but it's still worth a shot.

Best of luck!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Question about -m cyclic and --exclusive options to slurm

2017-01-03 Thread Christopher Samuel

On 04/01/17 04:20, Koziol, Lucas wrote:

> The hope was that all 16 tasks would run on Node 1, and 16 tasks would
> run on Node 2. Unfortunately what happens is that all 32 jobs get
> assigned to Node 1. I thought –m cyclic was supposed to avoid this.

You're only running a single task at a time, so it's a bit hard for srun
to distribute 1 task over multiple nodes. :-)

The confusion is, I suspect, that job steps (an srun instance) are not
the same as tasks (individual processes launched in a job step).

The behaviour in the manual page is for things like MPI jobs where you
want to distribute the many ranks (tasks) over nodes/sockets/cores in a
particular way - in this instance a single srun might be launching 10's
through to 100,000's of tasks (or more) at once.

What might work better for you is to use a job array for your work
instead of a single Slurm job and then have this in your slurm.conf:

SelectType=select/cons_res
SelectTypeParameters=CR_LLN

This should get Slurm to distribute the job array elements across nodes
picking the least loaded (allocated) node in each case.

Job arrays are documented here:

https://slurm.schedmd.com/job_array.html

Hope this helps!

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Question about -m cyclic and --exclusive options to slurm

2017-01-03 Thread Christopher Samuel

On 04/01/17 10:19, Koziol, Lucas wrote:

> Will Slurm read a local slurm.conf file or in my home directory?

No, I'm afraid not, it's a global configuration thing.

> The default slurm.conf file I can't modify. I can ask the admins here
> to modify it I I have to though.

I strongly believe that will be necessary, sorry!

Best of luck,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: mail job status to user

2017-01-15 Thread Christopher Samuel

On 10/01/17 18:56, Ole Holm Nielsen wrote:

> For the record: Torque will always send mail if a job is aborted

It's been a few years since I've used Torque so I don't remember that
behaviour.

Thanks for the info!

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: mail job status to user

2017-01-15 Thread Christopher Samuel

On 14/01/17 09:28, Steven Lo wrote:

> Is it true that there is no configurable parameter to achieve what we
> want to do and user need to specific in
> either the sbatch or the submission script?

Not that I'm aware of.

A submit filter would let you set that up though.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: mail job status to user

2017-01-15 Thread Christopher Samuel

On 16/01/17 15:56, Ryan Novosielski wrote:

> I think you are right actually. Might have also been configurable
> system-wide. The difference though, still, is that you don't have to
> provide an e-mail address, so you could share a script like this and it
> would work for anyone without modifying it. 

You don't need to provide an email address to Slurm either:

 --mail-user=
User  to  receive  email notification of state
changes as defined by --mail-type.  The default
value is the submitting user.

Our Postfix config rewrites their username to their registered email
address that's stored in LDAP.

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Fwd: Scheduling jobs according to the CPU load

2017-03-19 Thread Christopher Samuel

On 19/03/17 23:25, kesim wrote:

> I have 11 nodes and declared 7 CPUs per node. My setup is such that all
> desktop belongs to group members who are using them mainly as graphics
> stations. Therefore from time to time an application is requesting high
> CPU usage.

In this case I would suggest you carve off 3 cores via cgroups for
interactive users and give Slurm the other 7 to parcel out to jobs by
ensuring that Slurm starts within a cgroup dedicated to those 7 cores..

This is similar to the "boot CPU set" concept that SGI came up with (at
least I've not come across people doing that before them).

To be fair this is not really Slurm's problem to solve, Linux gives you
the tools to do this already, it's just that people don't realise that
you can use cgroups to do this.

Your use case is valid, but it isn't really HPC, and you can't really
blame Slurm for not catering to this.  It can use cgroups to partition
cores to jobs precisely so it doesn't need to care what the load average
is - it knows the kernel is ensuring the cores the jobs want are not
being stomped on by other tasks.

Best of luck!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Scheduling jobs according to the CPU load

2017-03-21 Thread Christopher Samuel

On 22/03/17 08:35, kesim wrote:

> You are right. Many thanks for correcting.

Just note that load average is not necessarily the same as CPU load.

If you have tasks blocked for I/O they will contribute to load average
but will not be using much CPU at all.

So, for instance, on one of our compute nodes a Slurm job can ask for 1
core, start 100 tasks doing heavy I/O, they all use the same 1 core and
get the load average to 100 but the other 31 cores on the node are idle
and can quite safely be used for HPC work.

The manual page for "uptime" on RHEL7 describes it thus:

# System load averages is the average number of processes that
# are either in a runnable or uninterruptable state.  A process
# in a runnable state is either using the CPU or waiting to use
# the CPU.  A process in uninterruptable state is waiting for
# some I/O access, eg waiting for disk.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Randomly jobs failures

2017-04-11 Thread Christopher Samuel

On 11/04/17 17:42, Andrea del Monaco wrote:

> [2017-04-11T08:22:03+02:00] error: Error opening file
> /cm/shared/apps/slurm/var/cm/statesave/job.830332/script, No such file
> or directory
> [2017-04-11T08:22:03+02:00] error: Error opening file
> /cm/shared/apps/slurm/var/cm/statesave/job.830332/environment, No such
> file or directory

I would suggest that you are looking at transient NFS failures (which
may not be logged).

Are you using NFSv3 or v4 to talk to the NFS server and what are the
OS's you are using for both?

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Distinguishing past jobs that waited due to dependencies vs resources?

2017-04-11 Thread Christopher Samuel

Hi folks,

We're looking at wait times on our clusters historically but would like
to be able to distinguish jobs that had long wait times due to
dependencies rather than just waiting for resources (or because the user
had too many other jobs in the queue at that time).

A quick 'git grep' of the source code after reading 'man sacct' and not
finding anything (also running 'sacct -e' and not seeing anything useful
there either) doesn't offer much hope.

Anyone else dealing with this?

We're on 16.05.x at the moment with slurmdbd.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: LDAP required?

2017-04-12 Thread Christopher Samuel

On 11/04/17 16:05, Lachlan Musicman wrote:

> Our auth actually backs onto an Active Directory domain

You have my sympathies. That caused us no end of headaches when we tried
that on a cluster I help out on and in the end we gave up and fell back
to running our own LDAP to make things reliable again.

+1 for running your own LDAP.

I would seriously look at a cluster toolkit for running nodes,
especially if it supports making a single image that your compute nodes
then netboot.  That way you know everything is consistent.

Best of luck,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Jobs submitted simultaneously go on the same GPU

2017-04-11 Thread Christopher Samuel

On 10/04/17 21:08, Oliver Grant wrote:

> We did not have a gres.conf file. I've created one:
> cat /cm/shared/apps/slurm/var/etc/gres.conf
> # Configure support for our four GPU
> NodeName=node[001-018] Name=gpu File=/dev/nvidia[0-3]
> 
> I've read about "global" and "per-node" gres.conf, but I don't know how
> to implement them or if I need to?

Yes you do.

Here's an (anonymised) example from a cluster that I help with that has
both GPUs and MIC's on various nodes.

# We will have GPU & KNC nodes so add the GPU & MIC GresType to manage them
GresTypes=gpu,mic
# Node definitions for nodes with GPUs
NodeName=thing-gpu[001-005] Weight=3000 NodeAddr=thing-gpu[001-005] 
RealMemory=254000 CoresPerSocket=6 Sockets=2 Gres=gpu:k80:4
# Node definitions for nodes with Xeon Phi
NodeName=thing-knc[01-03] Weight=2000 NodeAddr=thing-knc[01-03] 
RealMemory=126000 CoresPerSocket=10 Sockets=2 ThreadsPerCore=2 Gres=mic:5110p:2

You'll also need to restart slurmctld & all slurmd's to pick up
this new config, I don't think "scontrol reconfigure" will deal
with this.

Best of luck,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: LDAP required?

2017-04-12 Thread Christopher Samuel

On 13/04/17 01:47, Jeff White wrote:

> +1 for Active Directory bashing. 

I wasn't intending to "bash" AD here, just that the AD that we were
trying to use (and I suspect that Lachlan might me talking to) has tens
of thousands of accounts in it and we just could not get the
Slurm->sssd->AD chain to work reliably to be able to run a production
system.

This was with both sssd trying to enumarate the whole domain and also
(before that) trying to get Slurm to work without sssd enumeration.

Smaller AD domains might work more reliably, but that's not where we sit
so we fell back to using our own LDAP server with Karaage to manage
project/account applications, adding people to slurmdbd, etc.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: reporting used memory with job Accounting or Completion plugins?

2017-03-12 Thread Christopher Samuel

On 11/03/17 09:36, Chris Samuel wrote:

> If you use the slurmdbd accounting database then yes, you get information 
> about memory usage (both RSS and VM).
> 
> Have a look at the sacct manual page and look for MaxRSS and MaxVM. 

I should mention that for jobs that trigger job steps with srun you can
also monitor them as the job is going with 'sstat' (rather than just
post-mortem with sacct).

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Storage of job submission and working directory paths

2017-03-07 Thread Christopher Samuel

On 08/03/17 08:15, Chad Cropper wrote:

> Am I missing something? Why is it that the DBD cannot store these 2
> pieces of information?

I suspect it's just not been requested, I'd suggest opening a feature
request at:

http://bugs.schedmd.com/

Report the bug ID back as it would be useful to us here too.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: LDAP required?

2017-04-19 Thread Christopher Samuel

On 13/04/17 18:32, Janne Blomqvist wrote:

> 15.08 should also work with enumeration disabled, except for
> AllowGroups/DenyGroups partition specifications.

I'm pretty sure this was what we got stuck on, and so had to drop AD.

> So how do you manage user accounts? Just curious if someone has a sane
> middle ground between integrating with the organization user account
> system (AD or whatever) and DIY.

We use some software that was developed at my previous employer called
Karaage to manage our projects, including allowing project leaders to
invite members, talking to LDAP and integrating with slurmdbd.

Sadly my previous employer shut down at the end of 2015 (long after I
left I hasten to add!) and the person who was doing a lot of that work
has moved on to other things and only tinkers with the code base.

That said there are 2 different HPC organisations inside the university
using it and the other group use it with Shibboleth integration so that
people with AAF (Australian Access Federation) credentials can auth to
the web interface with their institutional ID (though of course it still
creates a separate LDAP account for them).

https://github.com/Karaage-Cluster

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: CGroups, Threads as CPUs, TaskPlugins

2017-08-14 Thread Christopher Samuel

On 15/08/17 09:41, Lachlan Musicman wrote:

> I guess I'm not 100% sure what I'm looking for, but I do see that there
> is a
> 
> 1:name=systemd:/user.slice/user-0.slice/session-373.scope
> 
> in /proc/self/cgroup

Something is wrong in your config then. It should look something like:

4:cpuacct:/slurm/uid_3959/job_6779703/step_9/task_1
3:memory:/slurm/uid_3959/job_6779703/step_9/task_1
2:cpuset:/slurm/uid_3959/job_6779703/step_9
1:freezer:/slurm/uid_3959/job_6779703/step_9

for /proc/${PID_OF_PROC}/cgroup

I notice you have /proc/self - that will be the shell you are running in
for your SSH session and not the job!

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Proctrack cgroup; documentation bug

2017-08-13 Thread Christopher Samuel

On 14/08/17 08:55, Lachlan Musicman wrote:

> Was it here I read that proctrack/linuxproc was better than
> proctrack/cgroup?

I think you're thinking of JobAcctGatherType, but even then our
experience there was that jobacct_gather/cgroup was more accurate.

-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Multifactor Priority Plugin for Small clusters

2017-07-03 Thread Christopher Samuel

On 03/07/17 16:02, Loris Bennett wrote:

> I don't think you can achieve what you want with Fairshare and
> Multifactor Priority.  Fairshare looks at distributing resources fairly
> between users over a *period* of time.  At any *point* in time it is
> perfectly possible for all the resources to be allocated to one user.

Loris is quite right about this, but it is possible to impose limits on
a project if you chose to use slurmdbd.

First you need to set up accounting:

https://slurm.schedmd.com/accounting.html

then you can set limits:

https://slurm.schedmd.com/resource_limits.html

Best of luck!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: RebootProgram - who uses it?

2017-08-08 Thread Christopher Samuel

On 07/08/17 14:08, Lachlan Musicman wrote:

> In slurm.conf, there is a RebootProgram - does this need to be a direct
> link to a bin or can it be a command?

We have:

RebootProgram = /sbin/reboot

Works for us.

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: RebootProgram - who uses it?

2017-08-08 Thread Christopher Samuel

On 07/08/17 17:57, Aaron Knister wrote:

> Good grief. "reboot" is a legacy tool?!?! I've about had enough of systemd.

FWIW reboot is provided by the init system implementation (for instance
on RHEL6 it's from upstart), and /sbin/reboot is only optional in the
FHS. Only /sbin/shutdown is required by the FHS.

http://www.pathname.com/fhs/2.2/fhs-3.14.html

On proprietary UNIX versions reboot (not guaranteed to be in /sbin, it
was /etc/reboot on Ultrix 4, /usr/sbin/reboot on Solaris) may not run
shutdown scripts either (eg Solaris), you'd want to use shutdown for that.

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Job ends successfully but spawned processes still run?

2017-05-23 Thread Christopher Samuel

Hiya,

On 24/05/17 13:10, Lachlan Musicman wrote:

> Occasionally I'll see a bunch of processes "running" (sleeping) on a
> node well after the job they are associated with has finished.
> 
> How does this happen - does slurm not make sure all processes spawned by
> a job have finished at completion?

Are you not using cgroups for enforcement?

Usually that picks everything up.

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Job ends successfully but spawned processes still run?

2017-05-23 Thread Christopher Samuel

On 24/05/17 13:45, Lachlan Musicman wrote:

> Not yet - that's part of the next update cycle :/

Ah well that might help, along with pam_slurm_adopt so that users
SSH'ing into nodes they have jobs on are put into a cgroup of theirs on
that node.

Helps catch any legacy SSH based MPI launchers (and other naughtiness).

Good luck!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545



[slurm-dev] Re: sinfo

2017-05-24 Thread Christopher Samuel

On 25/05/17 05:12, Will French wrote:

> We have an alias setup that shows free and allocated nodes grouped by feature:
> 
> sinfo --format="%55N %.35f %.6a %.10A"

Nice, here's an alternative that is more useful in our setup which
groups nodes by reason and GRES.

sinfo --format="%60N %.15G %.30E %.10A"

The reason can be quite long, but there doesn't seem to be a way to just
show the status as down/drain/idle/etc.

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)

2017-06-04 Thread Christopher Samuel

On 03/06/17 07:03, Jacob Chappell wrote:

> Sorry, that was a mouthful, but important. Does anyone know if Slurm can
> accomplish this for me. If so how?

This was how we used to run prior to switching to fair-share.

Basically you set:

PriorityDecayHalfLife=0

which stops the values decaying over time so once they hit their limit
that's it.

We also set:

PriorityUsageResetPeriod=QUARTERLY

so that limits would reset on the quarter boundaries.  This was because
we used to have fixed quarterly allocations for projects.

We went to fair-share because of a change of the funding model for us
meant previous rules were removed and so we could go to fair-share which
meant a massive improvement in utilisation (compute nodes were no longer
idle with jobs waiting but unable to run because of being out of quota).

NOTE: You can't have both fairshare and hard quotas at the same time.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: srun - replacement for --x11?

2017-06-06 Thread Christopher Samuel

On 06/06/17 23:46, Edward Walter wrote:

> Doesn't that functionality come from a spank plugin?
> https://github.com/hautreux/slurm-spank-x11

Yes, that's the one we use. Works nicely.

Provides the --x11 option for srun.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: How to get Qos limits

2017-06-06 Thread Christopher Samuel

On 07/06/17 03:08, Kowshik Thopalli wrote:

>  I wish to know the max number of jobs that as a user I can run. That
> is MaxJobsPerUser*.  *I will be thankful if you can actually provide the
> commands that I will have to execute.

You probably want:

sacctmgr list user ${USER} format=MaxJobsPerUser

For a more general view you would do:

sacctmgr list user ${USER} withassoc

Hope this helps,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: discrepancy between node config and # of cpus found

2017-05-21 Thread Christopher Samuel

On 20/05/17 07:46, Jeff Avila wrote:

> Yes, I did give that a try, though it didn’t seem to make any difference to 
> the error messages I got.

Have you also set DefMemPerCPU and checked how much RAM is allocated to
the jobs?

Remember that you can have free cores but not free memory on a node and
then Slurm isn't going to put more jobs there (unless you tell it to
ignore memory, which is not likely to end well).

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Accounting using LDAP ?

2017-09-19 Thread Christopher Samuel

On 20/09/17 03:03, Carlos Lijeron wrote:

> I'm trying to enable accounting on our SLURM configuration, but our
> cluster is managed by Bright Management which has its own LDAP for users
> and groups.   When setting up SLURM accounting, I don't know how to make
> the connection between the users and groups from the LDAP as opposed to
> the local UNIX.

Slurm just uses the host's NSS config for that, so as long as the OS can
see the users and groups then slurmdbd will be able to see them too.

*However*, you _still_ need to manually create users in slurmdbd to
ensure that they can run jobs, but that's a separate issue to whether
slurmdbd can resolve users in LDAP.

I would hope that Bright would have the ability to do that for you
rather than having you handle it manually, but that's a question for Bright.

Best of luck,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Limiting SSH sessions to cgroups?

2017-09-19 Thread Christopher Samuel

On 20/09/17 06:39, Jacob Chappell wrote:

> Thanks everyone who has replied. I am trying to get pam_slurm_adopt.so
> implemented. Does it work with batch jobs?

It does indeed, we use it as well.

Do you have:

PrologFlags=contain

set?  From slurm.conf:


  Contain At job allocation time, use the ProcTrack plugin to
  create a job  container  on  all  allocated compute
  nodes. This container may be used for user processes
  not launched under Slurm control, for example the PAM
  module may place processes launch through a direct
  user  login into this container. Setting the Contain
  implicitly sets the Alloc flag.


-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Accounting using LDAP ?

2017-09-20 Thread Christopher Samuel

On 20/09/17 17:14, Loris Bennett wrote:

> Is the user management system homegrown or something more generally
> available?

Both, it was started as a project at $JOB-1 and open-sourced.

http://karaage.readthedocs.org/

The current main developer no longer works in HPC (as $JOB-1 folded
years after I left for here) but he's still looking after this as he's
helping the university out in his spare time.  He's currently moving
from deploying it via Debian packages to using Docker.

We have our own custom module for karaage for project creation as we
needed a lot more information from applicants than what it captures by
default, but that's the nice thing, it is modular.

Also includes Shibboleth support.

All the best!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Limiting SSH sessions to cgroups?

2017-09-21 Thread Christopher Samuel

On 21/09/17 00:29, Jacob Chappell wrote:

> I still have one weird issue. I'm probably missing another setting
> somewhere. The cgroup that the SSH session is adopted into does not seem
> to include the /dev files.

That's something I can't help with I'm afraid, we're still on RHEL6.

In a job there I see:

$ cat /proc/$$/cgroup
4:cpuacct:/slurm/uid_500/job_6900206/step_0/task_0
3:memory:/slurm/uid_500/job_6900206/step_0
2:cpuset:/slurm/uid_500/job_6900206/step_0
1:freezer:/slurm/uid_500/job_6900206/step_0

and in my SSH session I see:

$ cat /proc/$$/cgroup
4:cpuacct:/slurm/uid_500/job_6900206/step_extern/task_0
3:memory:/slurm/uid_500/job_6900206/step_extern/task_0
2:cpuset:/slurm/uid_500/job_6900206/step_extern
1:freezer:/slurm/uid_500/job_6900206/step_extern


I'm about to start travelling for the Slurm User Group in a few hours,
so I'll be off-air for quite a while. Good luck!

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Cores, CPUs, and threads: take 2

2017-09-17 Thread Christopher Samuel

On 14/09/17 16:04, Lachlan Musicman wrote:

> It's worth noting that before this change cgroups couldn't get down to
> the thread level. We would only consume at the core level - ie, all jobs
> would get an even number of cpus - jobs that requested an odd number of
> cpus (threads) would be rounded up to the next even number.

Did you have this set too (either explicitly or implicitly)?

  CR_ONE_TASK_PER_CORE
 Allocate one task per core by default.  Without this option,
 by default one task will be allocated per thread on nodes
 with more than one ThreadsPerCore configured.


cheers!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Cores, CPUs, and threads: take 2

2017-09-13 Thread Christopher Samuel

On 14/09/17 11:07, Lachlan Musicman wrote:

> Node configuration differs from hardware: CPUs=8:8(hw) Boards=1:1(hw)
> SocketsPerBoard=8:1(hw) CoresPerSocket=1:4(hw) ThreadsPerCore=1:2(hw)

OK, so this is saying that Slurm is seeing:

8 CPUs
1 board
1 socket per board
4 cores per socket
2 threads per core

which is what lscpu also describes the node as

Whereas the config that it thinks it should have is:

8 CPUs
1 board
8 sockets per board
1 core per socket
1 thread per core

which to me looks like what you would expect with just CPUS=8 in the
config and nothing else.

I guess a couple of questions:

1) Have you restarted slurmctld and slurmd everywhere?

2) Can you confirm that slurm.conf is the same everywhere?

3) what does slurmd -C report?

cheers!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Cores, CPUs, and threads: take 2

2017-09-13 Thread Christopher Samuel

On 14/09/17 11:07, Lachlan Musicman wrote:

> Node configuration differs from hardware: CPUs=8:8(hw) Boards=1:1(hw)
> SocketsPerBoard=8:1(hw) CoresPerSocket=1:4(hw) ThreadsPerCore=1:2(hw)

Hmm, are you virtualised by some chance?

If so it might be that the VM layer is lying to the guest about the
actual hardware layout.

What does "lscpu" say?

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: On the need for slurm uid/gid consistency

2017-09-13 Thread Christopher Samuel

On 13/09/17 04:53, Phil K wrote:

> I'm hoping someone can provide an explanation as to why slurm
> requires uid/gid consistency across nodes, with emphasis on the need
> for the 'SlurmUser' to be uid/gid-consistent.

I think this is a consequence of the use of Munge, rather than being
inherent in Slurm itself.

https://dun.github.io/munge/

# It allows a process to authenticate the UID and GID of another
# local or remote process within a group of hosts having common
# users and groups

Gory details are in the munged(8) manual page:

https://github.com/dun/munge/wiki/Man-8-munged

But I think the core of the matter is:

# When a credential is validated, munged first checks the
# message authentication code to ensure the credential has
# not been subsequently altered. Next, it checks the embedded
# UID/GID restrictions to determine whether the requesting
# client is allowed to decode it.

So if the UID's & GID's of the user differ across systems then it
appears it will not allow the receiver to validate the message.

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Slurm 17.02.7 and PMIx

2017-10-04 Thread Christopher Samuel

Hi folks,

Just wondering if anyone here has had any success getting Slurm to
compile with PMIx support?

I'm trying 17.02.7 and I find that with PMIx I get either:

PMIX v1.2.2: Slurm complains and tells me it wants v2.

PMIX v2.0.1: Slurm can't find it because the header files are not
where it is looking for them, and when I do a symlink hack to make
PMIX detection work it then fails to compile, with:

/bin/sh ../../../../libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. 
-I../../../.. -I../../../../slurm  -I../../../.. -I../../../../src/common 
-I/usr/include -I/usr/local/pmix/latest/include -DHAVE_PMIX_VER=2   -g -O0 
-pthread -Wall -g -O0 -fno-strict-aliasing -MT mpi_pmix_v2_la-pmixp_client.lo 
-MD -MP -MF .deps/mpi_pmix_v2_la-pmixp_client.Tpo -c -o 
mpi_pmix_v2_la-pmixp_client.lo `test -f 'pmixp_client.c' || echo 
'./'`pmixp_client.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../.. -I../../../../slurm 
-I../../../.. -I../../../../src/common -I/usr/include 
-I/usr/local/pmix/latest/include -DHAVE_PMIX_VER=2 -g -O0 -pthread -Wall -g -O0 
-fno-strict-aliasing -MT mpi_pmix_v2_la-pmixp_client.lo -MD -MP -MF 
.deps/mpi_pmix_v2_la-pmixp_client.Tpo -c pmixp_client.c  -fPIC -DPIC -o 
.libs/mpi_pmix_v2_la-pmixp_client.o
pmixp_client.c: In function ‘_set_procdatas’:
pmixp_client.c:468:24: error: request for member ‘size’ in something not a 
structure or union
   kvp->value.data.array.size = count;
^
pmixp_client.c:482:24: error: request for member ‘array’ in something not a 
structure or union
   kvp->value.data.array.array = (pmix_info_t *)info;
^
make[4]: *** [mpi_pmix_v2_la-pmixp_client.lo] Error 1


So I'm guessing that I'm missing something but the documentation
for PMIX in Slurm seems pretty much non-existent. :-(

Anyone had any luck with this?

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Upgrading Slurm

2017-10-04 Thread Christopher Samuel

On 04/10/17 17:12, Loris Bennett wrote:

> Ole's pages on Slurm are indeed very useful (Thanks, Ole!).  I just
> thought I point out that the limitation on only upgrading by 2 major
> versions is for the case that you are upgrading a production system and
> don't want to lose any running jobs. 

The on disk format might for spooled jobs may also change between
releases too, so you probably want to keep that in mind as well..

-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: mysql job_table and step_table growth

2017-10-15 Thread Christopher Samuel

On 14/10/17 00:24, Doug Meyer wrote:

> The job_table.idb and step_table.idb do not clear as part of day-to-day
> slurmdbd.conf
> 
> Have slurmdbd.conf set to purge after 8 weeks but this does not appear
> to be working.

Anything in your slurmdbd logs?

-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Exceeded job memory limit problem

2017-09-06 Thread Christopher Samuel

On 06/09/17 17:38, Sema Atasever wrote:

> I tried the line of code what you recommended but the code still
> generates an error unfortunately.

We've seen issues where using:

JobAcctGatherType=jobacct_gather/linux

gathers incorrect values for jobs (in our experience MPI ones).

We constrain jobs via cgroups and have found that using the cgroup
plugin for this results in jobs not getting killed incorrectly.

Using cgroups in Slurm is a definite win for us, so I would suggest
looking into it if you've not already done so.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Cores, CPUs, and threads: take 2

2017-09-12 Thread Christopher Samuel

On 13/09/17 10:47, Lachlan Musicman wrote:

> Chris how does this sacrifice performance? If none of my software
> (bioinformatics/perl) is HT, surely I'm sacrificing capacity by leaving
> one thread unused as jobs take an entire core?

A HT is not a core, so if you are running multiple processes on a single
core then you will have some form of extra contention - now how much of
an impact that will have will depend on your application mix and your
hardware generation.

As ever, benchmark and see if you gain more than you lose in that
method.  For HPC stuff which tends to be compute bound the usual advice
is to disable HT in the BIOS, but for I/O bound things you may not be so
badly off.

Hope that helps!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Cores, CPUs, and threads: take 2

2017-09-12 Thread Christopher Samuel

On 13/09/17 07:22, Patrick Goetz wrote:

> All I have to say to this is: um, what?

My take has always been that ThreadsPerCore is really for HPC workloads
where you've decided not to disable HT full stop but want to allocate
full cores to each task and then let the code have 2 threads per Slurm
task (for HPC often that's the same as an MPI rank).

> So, moving to a specific
> implementation example, the question is should this configuration work
> properly?  I do what to include memory in the resource allocation
> calculations, if possible.  Hence:
> 
>   SelectType=select/cons_res
>   SelectTypeParameters=CR_CPU_Memory
>   NodeName=n[001-048] CPUs=16 RealMemory=61500 State=UNKNOWN
> 
> 
> Is this going to work as expected?

I would think so, basically you're saying you're willing to sacrifice
performance and consider each HT unit a core to run a job on.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Fair resource scheduling

2017-08-27 Thread Christopher Samuel

On 25/08/17 03:03, Patrick Goetz wrote:

> 1. When users submit (say) 8 long running single core jobs, it doesn't
> appear that Slurm attempts to consolidate them on a single node (each of
> our nodes can accommodate 16 tasks).

How much memory have you configured for your nodes and how much memory
are these single CPU jobs asking for?

That's one thing that can make Slurm need to start jobs on other nodes.

You can also tell it to pack single CPU jobs onto nodes at the other end
of the cluster with this:

pack_serial_at_end
If used with the select/cons_res plugin then put serial jobs at
the end of the available nodes rather than using a best fit
algorithm. This may reduce resource fragmentation for some work-
loads.

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Delete jobs from slurmctld runtime database

2017-08-23 Thread Christopher Samuel

On 22/08/17 01:59, Selch, Brigitte (FIDF) wrote:

> That’s the reason for my question.

I'm not aware of any way to do that, and I would advise against mucking
around in the Slurm MySQL database directly.

The idea of slurmdbd is to have a comprehensive view of all jobs (within
its expiry parameters), and removing them will likely break its
statistics and probably do Bad Things(tm).

Here be dragons..

-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Jobs cancelled "DUE TO TIME LIMIT" long before actual timelimit

2017-08-30 Thread Christopher Samuel

On 30/08/17 04:34, Brian W. Johanson wrote:

> Any idea on what would cause this?

It looks like the job *step* hit the timelimit, not the job itself.

Could you try the sacct command without the -X flag to see what the
timelimit for the step was according to Slurm please?

$ sacct -S 071417 -a --format JobID%20,State%20,timelimit,Elapsed,ExitCode  -j 
1695151 

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Camacho Barranco, Roberto <rcamachobarra...@utep.edu> ssirimu...@utep.edu

2017-10-09 Thread Christopher Samuel

On 10/10/17 07:21, Suman Sirimulla wrote:

> We have installed and configured slurm on our cluster, but unable to
> start the slurmctld daemon. We followed the instructions
> (https://slurm.schedmd.com/troubleshoot.html)
> <https://slurm.schedmd.com/troubleshoot.html%29> and tried to stop and
> restart it multiple times but still not working. Please see the error below.

Check your slurmctld.log, that should have hints about why it won't start.

cheers!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Tasks distribution

2017-10-09 Thread Christopher Samuel

On 09/10/17 22:11, Sysadmin CAOS wrote:

> Now, after that, should srun distribute correctly my tasks as mpirun
> does right?

No, srun will distribute the tasks as how Slurm wants to, remember it's
the MPI implementations job to listen to what the resource manager tells
it to do, not the other way around.

So the issue here is getting Slurm to allocate nodes in the way you
wish.  On my cluster I see:

srun: Warning: can't honor --ntasks-per-node set to 4 which doesn't
match the requested tasks 17 with the number of requested nodes 5.
Ignoring --ntasks-per-node.

That's Slurm 16.05.8.  Do you see the same?

Did you try both having CR_Pack_Nodes *and* specifying this?

-n 17 --ntasks-per-node=4

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Slurm 17.02.7 and PMIx

2017-10-09 Thread Christopher Samuel

On 05/10/17 11:27, Christopher Samuel wrote:

> PMIX v1.2.2: Slurm complains and tells me it wants v2.

I think that was due to a config issue on the system I was helping out
with, after having to install some extra packages (like a C++ compiler)
to get other things working I can no longer reproduce this issue.

So next outage they get we can add PMIx support to Slurm (my test build
compiled OK).

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Is PriorityUsageResetPeriod really required for hard limits?

2017-10-03 Thread Christopher Samuel

On 29/09/17 06:34, Jacob Chappell wrote:

> Hi all. The slurm.conf documentation says that if decayed usage is
> disabled, then PriorityUsageResetPeriod must be set to some value. Is
> this really true? What is the technical reason for this requirement if
> so? Can we set this period to sometime far into the future to have
> effectively an infinite period (no reset)?

Basically this is because once a user exceeds something like their
maximum CPU run time limit then they will never be able to run jobs
again unless you either decay or reset usage.

-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Setting up Environment Modules package

2017-10-04 Thread Christopher Samuel

On 05/10/17 03:11, Mike Cammilleri wrote:

> 2. Install Environment Modules packages in a location visible to the
> entire cluster (NFS or similar), including the compute nodes, and the
> user then includes their 'module load' commands in their actual slurm
> submit scripts since the command would be available on the compute
> nodes - loading software (either local or from network locations
> depending on what they're loading) visible to the nodes

This is what we do, the management node for the cluster exports its
/usr/local read-only to the rest of the cluster.

We also have in our taskprolog.sh:

echo export BASH_ENV=/etc/profile.d/module.sh

to try and ensure that bash shells have modules set up, just in case. :-)

-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: Upgrading Slurm

2017-10-04 Thread Christopher Samuel

On 04/10/17 20:51, Gennaro Oliva wrote:

> If you are talking about Slurm I would backup the configuration files
> also.

Not directly Slurm related but don't forget to install and configure
etckeeper first.

It puts your /etc/ directory under git version control and will do
commits of changes before and after any package upgrade/install/removal
so you have a good history of changes made.

I'm assuming that the slurm config files in the Debian package are under
/etc so that will be helpful to you for this.

> Anyway there have been a lot of major changes in SLURM and in Debian since
> 2013 (Wheezy release date), so be prepared that it will be no picnic.

The Debian package name also changed from slurm-llnl to slurm-wlm at
some point too, so missing the intermediate release may result in that
not transitioning properly.

To be honest I would never use a distros packages for Slurm, I'd always
install it centrally (NFS exported to compute nodes) to keep things
simple.  That way you decouple your Slurm version from the OS and can
keep it up to date (or keep it on a known working version).

All the best!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


<    1   2   3   4   5   >