[slurm-dev] Query number of cores allocated per node for a job

2016-10-25 Thread Christopher Samuel
Hi all, I can't help but think I'm missing something blindingly obvious, but does anyone know how to find out how Slurm has distributed a job in terms of cores per node? In other words, if I submit: sbatch --ntasks=64 --wrap sleep 60 on a system with (say 16 core nodes where nodes are already

[slurm-dev] Re: slurmd: fatal: Frontend not configured correctly in slurm.conf

2016-10-25 Thread Peixin Qiao
Hi Gennaro, My slurm.conf is as follows: # slurm.conf file generated by configurator easy.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # ControlMachine=peixin #ControlAddr= # AuthType=auth/none CacheGroups=0 CryptoType=crypto/openssl

[slurm-dev] Re: slurm_load_partitions: Unable to contact slurm controller (connect failure)

2016-10-25 Thread Christopher Samuel
On 25/10/16 10:05, Peixin Qiao wrote: > > I installed slurm-llnl on Debian on one computer. When I ran slurmctld > and slurmd, I got the error: > slurm_load_partitions: Unable to contact slurm controller (connect failure). Check your firewall rules to ensure that those connections aren't

[slurm-dev] RE: slurm_load_partitions: Unable to contact slurm controller (connect failure)

2016-10-25 Thread suprita.bothra
Hi , I have installed slurm on a 2 node cluster. On the master node when I run sinfo command I get below output. sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST debug* up infinite 2 idle punehpcdl[01-02] But on compute node:Slurmd daemon is also running but it gives the

[slurm-dev] RE: slurm_load_partitions: Unable to contact slurm controller (connect failure)

2016-10-25 Thread champak dutta
Hi, 1) disabled selinux 2) stop iptables. 3) check date and time in both machines. It should same time. 4) restart slurm service in controller and and node. Regards Champak On 25 Oct 2016 11:43 am, wrote: Hi , I have installed slurm on a 2 node cluster. On the

[slurm-dev] RE: slurm_load_partitions: Unable to contact slurm controller (connect failure)

2016-10-25 Thread Benjamin Redling
Hi, are you both working on the same cluster as the OP? On 10/25/2016 08:12, suprita.bot...@wipro.com wrote: > I have installed slurm on a 2 node cluster. > > On the master node when I run sinfo command I get below output. [...] > But on compute node:Slurmd daemon is also running but it gives

[slurm-dev] Job steps facility as in LoadLeveler?

2016-10-25 Thread Patrice Peterson
Hello list, is there a build-in way to queue LoadLeveler-like job steps in SLURM? Something like this:     #!/bin/bash     #SBATCH --num-tasks=1     echo "prepping data, simple stuff"     #SBATCH --- END STEP          #SBATCH --num-tasks=4     echo "main run, needs lots of resources"     srun

[slurm-dev] plan-based scheduler plugin

2016-10-25 Thread Peixin Qiao
Hello, I want to plugin plan-based scheduler to change the FCFS scheduler. I cannot find the FCFS and backfill api. Could you please help me tell me where I can find it? Best Regards, Peixin

[slurm-dev] Re: plan-based scheduler plugin

2016-10-25 Thread Peixin Qiao
I have found it and it is in /src/plugins/sched/. I read Slurm Scheduler Plugin API: http://slurm.schedmd.com/schedplugins.html Slurm scheduler plugins are Slurm plugins that implement the Slurm scheduler API described herein. They must conform to the Slurm Plugin API with the following