Re: [slurm-users] Partition "exclude"

2018-05-21 Thread Brian Andrus
Unless you specify a partition, it should go to the partition defined as default. Do you mean not to run on particular nodes? In that case, you can use the --exclude option: *-x*,*--exclude*= Explicitly exclude certain nodes from the resources granted to the job. Brian Andrus On

[slurm-users] Partition "exclude"

2018-05-21 Thread Almon Gem Otanes
Is there a way for me to tell sbatch not to submit to a list of partition?

Re: [slurm-users] SLURM nodes flap in "Not responding" status when iptables firewall enabled

2018-05-21 Thread Matthieu Hautreux
Glad to hear that you make it work. Regards Matthieu 2018-05-21 21:21 GMT+02:00 Sean Caron : > Just wanted to follow up. In addition to passing all traffic to the SLURM > controller, opened port 6818/TCP to all other compute nodes and this seems > to have resolved the issue.

Re: [slurm-users] SLURM nodes flap in "Not responding" status when iptables firewall enabled

2018-05-21 Thread Sean Caron
Just wanted to follow up. In addition to passing all traffic to the SLURM controller, opened port 6818/TCP to all other compute nodes and this seems to have resolved the issue. Thanks again, Matthieu! Best, Sean On Thu, May 17, 2018 at 8:06 PM, Sean Caron wrote: > Awesome

Re: [slurm-users] network/communication failure

2018-05-21 Thread Turner, Heath
Got it! It was the firewall... Thanks to all for all the suggestions. Heath Professor Graduate Coordinator Chemical and Biological Engineering http://che.eng.ua.edu   University of Alabama 3448 SEC, Box 870203 Tuscaloosa, AL  35487 (205) 348-1733 (phone) (205) 561-7450 (cell) (205) 348-7558 

Re: [slurm-users] network/communication failure

2018-05-21 Thread Eric F. Alemany
I had the same issue although the system clocks were the same on the master and execute nodes. However, I was told to try to configure NTP (network time protocol). That did the trick for me. ._

Re: [slurm-users] network/communication failure

2018-05-21 Thread Turner, Heath
Attached are iptables from master/hosts. I think they look normal. No firewall (as far as I know). Heath Professor Graduate Coordinator Chemical and Biological Engineering http://che.eng.ua.edu University of Alabama 3448 SEC, Box 870203 Tuscaloosa, AL 35487 (205)

Re: [slurm-users] network/communication failure

2018-05-21 Thread Miguel Gutiérrez Páez
selinux? What does getenforce reports? El lun., 21 may. 2018 17:17, Fulcomer, Samuel escribió: > Is there a firewall turned on? What does "iptables -L -v" report on the > three hosts? > > On Mon, May 21, 2018 at 11:05 AM, Turner, Heath > wrote: >

Re: [slurm-users] network/communication failure

2018-05-21 Thread Andy Riebs
Do you have a firewall running? On 05/21/2018 11:05 AM, Turner, Heath wrote: If anyone has advice, I would really appreciate... I am running (just installed) slurm-11.17.6, with a master + 2 hosts. It works locally on the master (controller + execution). However, I cannot establish

Re: [slurm-users] network/communication failure

2018-05-21 Thread Fulcomer, Samuel
Is there a firewall turned on? What does "iptables -L -v" report on the three hosts? On Mon, May 21, 2018 at 11:05 AM, Turner, Heath wrote: > If anyone has advice, I would really appreciate... > > I am running (just installed) slurm-11.17.6, with a master + 2 hosts. It >

[slurm-users] network/communication failure

2018-05-21 Thread Turner, Heath
If anyone has advice, I would really appreciate... I am running (just installed) slurm-11.17.6, with a master + 2 hosts. It works locally on the master (controller + execution). However, I cannot establish communication from master [triumph01] with the 2 hosts [triumph02,triumph03]. Here is

Re: [slurm-users] GPU / cgroup challenges

2018-05-21 Thread R. Paul Wiegand
I am following up on this to first thank everyone for their suggestion and also let you know that indeed, ugrading from 17.11.0 to 17.11.6 solved the problem. Our GPUs are now properly walled off via cgroups per our existing config. Thanks! Paul. > On May 5, 2018, at 9:04 AM, Chris Samuel

Re: [slurm-users] How to find user limit in SLURM

2018-05-21 Thread Sébastien VIGNERON
Hi, If it can help, I use these bash aliases for the same purpose (easier to read than default output format): # show every associations of every user. # if user=username is passed, show only associations for the specific user username # see "man sacctmgr" for more function cri_show_assoc () {

Re: [slurm-users] Slurm does not set CUDA_VISIBLE_DEVICES

2018-05-21 Thread Vladimir Goy
Dear, Miguel >Do you have --gres=gpu:0 on your job script? Is gres.conf properly configured? No. gres.conf worked weel on the Slurm 17.02.2. It is my job file. #!/bin/bash # #SBATCH --job-name=GPU #SBATCH --partition=gpu # #SBATCH --ntasks=1 #SBATCH --mem=100 #SBATCH --time=10 #SBATCH

Re: [slurm-users] Slurm does not set CUDA_VISIBLE_DEVICES

2018-05-21 Thread Miguel Gila
Vova, Do you have --gres=gpu:0 on your job script? Is gres.conf properly configured? I think this is what sets the variable: https://github.com/SchedMD/slurm/blob/bcdd09d3386f4b4038ae9263b0e69d4d742988b2/src/plugins/gres/gpu/gres_gpu.c#L96 Cheers, Miguel > On 21 May 2018, at 08:28, Vladimir

[slurm-users] Slurm does not set CUDA_VISIBLE_DEVICES

2018-05-21 Thread Vladimir Goy
Hello, Please, help me, I would like to ask you about the next bug. Why slurm does not set CUDA_VISIBLE_DEVICES variabels before runs user aplication? This bug I observe after update slurm 17.02.2 -> 17.11.6 . Who/How can fix this problem? Version 17.02.2 works well, but now I can not downgrate,