[slurm-dev] Re: Jobs allocated but don't run

2016-08-30 Thread James Andrew Venning
Just to add, squeue returns JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 7 standard hostname ubuntu R 0:21 1 node0 James Venning On 31 August 2016 at 12:57, James Andrew Venning wrote: > Hi all again, >

[slurm-dev] Jobs allocated but don't run

2016-08-30 Thread James Andrew Venning
Hi all again, Initial problems are fixed, but there are more - thanks in advance for reading this! Ubuntu 14.04 with Slurm 2.6.5-1 Master and controller seem to be talking to each other: scontrol ping Slurmctld(primary/backup) at slurm-master/(NULL) are UP/DOWN exact same message from each slave

[slurm-dev] Re: SLURM daemon doesn't start

2016-08-30 Thread Christopher Samuel
On 31/08/16 10:35, James Andrew Venning wrote: > slurmctld: error: this host (slurm-master) not valid controller > (144.6.230.71 or (null)) That looks like the main issue, for some reason Slurm doesn't think it's running on the node you want it to. > ​As an aside, I installed with sudo apt-get

[slurm-dev] Re: new CRIU plugin

2016-08-30 Thread Christopher Samuel
On 30/08/16 22:11, Manuel Rodríguez Pascual wrote: > We hope that this can be useful for the Slurm community. That's really pretty neat! I can't test myself as we're stuck on RHEL6 for the moment but I do wonder if you've considered doing the same for Open-MPI so that Slurm can do

[slurm-dev] Unable to run slurmdbd with mysql-client in Ubuntu 16.04LTS (Xenial)

2016-08-30 Thread Sunil Sandhu
Dear Slurm-Dev Admin, In Ubuntu 16.04LTS, Slurmdbd (v15.08.7) [1] will not run when running mysql-client (v5.7.13) [2] (both packages are from the Ubuntu repo). This is because MySQL v5.7.13 does not support the "IGNORE" clause when it is used with "ALTER" [3], and Slurmdbd attempts to ALTER

[slurm-dev] Re: SLURM daemon doesn't start

2016-08-30 Thread James Andrew Venning
Thanks for the suggestions guys. I fixed one of my problems (wrong IP address for the master - duh), but still can't get it to run. I'm on Ubuntu 14.04, slurm version is 2.6.5-1. When I run sudo slurmctld -Dvvv, I get ​slurmctld: pidfile not locked, assuming no running daemon slurmctld: error:

[slurm-dev] new CRIU plugin

2016-08-30 Thread Manuel Rodríguez Pascual
Hi all, After working together with CRIU ( https://criu.org/Main_Page ) developers, my team at CIEMAT has developed a CRIU plugin from Slurm. This way, Slurm can employ this checkpoint/restart library to perform these operations. It is stored in my personal github account,

[slurm-dev] Re: The canonical way to write to user's output (stderr) log file on end of job

2016-08-30 Thread Dr. Thomas Orgis
Am Tue, 30 Aug 2016 03:12:54 -0700 schrieb Bjørn-Helge Mevik : > handler for the EXIT "signal", which prints out resource usage. This is an idea, indeed. But users might get the idea to install a handler themselves for cleanup tasks. On the other hand, since we indeed urge

[slurm-dev] RE: Combating idle interactive sessions

2016-08-30 Thread Marcin Stolarek
Maybe you can create a partition/qos for interactive jobs and use job_submit plugin to force all interactive jobs to use it? cheers, Marcin 2016-08-30 8:09 GMT+02:00 John Hearns : > I worked on the same problem in my last job, where engineers had > interactive sessions

[slurm-dev] Re: The canonical way to write to user's output (stderr) log file on end of job

2016-08-30 Thread Bjørn-Helge Mevik
"Dr. Thomas Orgis" writes: > Am Sun, 21 Aug 2016 17:29:39 -0700 > schrieb Christopher Samuel : > >> Unfortunately you can only do this as part of the taskprolog, so >> prepending to the users stdout. > > Thanks for confirming the (for me)

[slurm-dev] Re: Multiple simultaneous jobs on a single node on SLURM 15.08.

2016-08-30 Thread Benjamin Redling
Hi, I didn't see an answer so far, so I try to reason: On 08/29/2016 19:40, Luis Torres wrote: > We have recently deployed SLURM v 15.08.7-build1 on Ubuntu 16.04 > submission and execution nodes with apt-get; we built and installed the > source packages of the same release on Ubuntu 14.04 for

[slurm-dev] Re: SLURM daemon doesn't start

2016-08-30 Thread David Ramírez
Re: [slurm-dev] Re: SLURM daemon doesn't start I fix the systemctl dameon with Centos 7.2 add After=network.target slurmdbd.service ConditionPathExists=/etc/slurm/slurm.conf On /usr/lib/systemd/system/slurmctld.service And now Works fine *DAVID RAMIREZ * HPC

[slurm-dev] Re: SLURM daemon doesn't start

2016-08-30 Thread Marcin Stolarek
Just run slurmctld into the foreground, check the output. If you still don't know the cause of the problem paste a few lines here. cheers, Marcin

[slurm-dev] Re: SLURM daemon doesn't start

2016-08-30 Thread Lachlan Musicman
James, Would be great to know OS and SLURM version. For instance, on Centos 7/Debian 8/Ubuntu 16.04, you might be using systemctl status/start/restart slurmctld (head node) systemctl status/start/restart slurmd (worker nodes) instead? Cheers L. -- The most dangerous phrase in the

[slurm-dev] SLURM daemon doesn't start

2016-08-30 Thread James Andrew Venning
Hi all, I'm trying to set up a slurm system. I've been through the installation instructions, but I'm getting stuck on launching the daemon. If I run, from the slurm controller, scontrol ping it informs me: Slurmctld(primary/backup) at 130.56.254.79/(NULL) are DOWN/DOWN If I run it as sudo I