Re: [slurm-users] Trouble installing slurm-19.05.1-2.el7.centos.x86_64

2019-08-15 Thread Brian Andrus
Lou, Are you installing on the same machine you built? Are the nvidia libraries installed by RPM or a 'make install' on the box you compiled it on? Brian Andrus On 8/15/2019 7:53 AM, Lou Nicotra wrote: I have tried running ldconfig manually as suggested with slurm-19.05.1-2 and it fails

Re: [slurm-users] Slurm 19.05 --workdir non existent?

2019-08-15 Thread Christopher Benjamin Coffey
Ya, I saw that it was almost removed before 19.05. I didn't know about the NEWS file! Yep its right there, mea culpa; I'll check that in the future! Best, Chris — Christopher Coffey High-Performance Computing Northern Arizona University 928-523-1167 On 8/15/19, 11:08 AM, "slurm-users on

Re: [slurm-users] Slurm 19.05 --workdir non existent?

2019-08-15 Thread Christopher Samuel
On 8/15/19 11:02 AM, Mark Hahn wrote: it's in NEWS, if that counts.  also, I note that at least in this commit, --chdir is added but --workdir is not removed from option parsing. It went away here: commit 9118a41e13c2dfb347c19b607bcce91dae70f8c6 Author: Tim Wickberg Date: Tue Mar 12

Re: [slurm-users] Slurm 19.05 --workdir non existent?

2019-08-15 Thread Mark Hahn
Looks like the commit is here: https://github.com/SchedMD/slurm/commit/fddc98533c1f3753e5e43ad6a16407c5cb8c8de8 Yet, no change log on it. Very frustrating. it's in NEWS, if that counts. also, I note that at least in this commit, --chdir is added but --workdir is not removed from option

Re: [slurm-users] AllocNodes on partition no longer working

2019-08-15 Thread Christopher Samuel
On 8/15/19 7:18 AM, Sajdak, Doris wrote: Thanks Chris! That worked. We'd tried IP address but not FQDN. Great to hear! -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

[slurm-users] First draft of patch to allow network interface selection

2019-08-15 Thread Noah Evans
I've attached a first draft of a patch to allow network interface selection for slurm RPCs. As part of the bring up for one of our systems here we've been wanting to switch between our management network and IP over IB. As far as I can tell Slurm doesn't allow the user to select a network

Re: [slurm-users] Trouble installing slurm-19.05.1-2.el7.centos.x86_64

2019-08-15 Thread Philip Kovacs
>I have tried running ldconfig manually as suggested with  slurm-19.05.1-2 and >it fails the same way... >error: Failed dependencies:>        >libnvidia-ml.so.1()(64bit) is needed by slurm-19.05.1-2.el7.centos.x86_64   Lou, that's a packaging mistake on the part of the person who created that

Re: [slurm-users] Slurm 19.05 --workdir non existent?

2019-08-15 Thread Christopher Benjamin Coffey
Looks like the commit is here: https://github.com/SchedMD/slurm/commit/fddc98533c1f3753e5e43ad6a16407c5cb8c8de8 Yet, no change log on it. Very frustrating. Chris — Christopher Coffey High-Performance Computing Northern Arizona University 928-523-1167 On 8/14/19, 1:30 PM, "slurm-users on

Re: [slurm-users] Trouble installing slurm-19.05.1-2.el7.centos.x86_64

2019-08-15 Thread Lou Nicotra
I have tried running ldconfig manually as suggested with slurm-19.05.1-2 and it fails the same way... error: Failed dependencies: libnvidia-ml.so.1()(64bit) is needed by slurm-19.05.1-2.el7.centos.x86_64 ldconfig -p shows: root@panther02 slurm# ldconfig -p|grep libnvidia-ml.

Re: [slurm-users] AllocNodes on partition no longer working

2019-08-15 Thread Sajdak, Doris
Thanks Chris! That worked. We'd tried IP address but not FQDN. Dori -Original Message- From: slurm-users On Behalf Of Christopher Samuel Sent: Wednesday, August 14, 2019 5:11 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] AllocNodes on partition no longer working On

[slurm-users] SLURM_JOB_IB for heterogeneous jobs

2019-08-15 Thread Hendryk Bockelmann
Hello, the docu for heterogeneous jobs [1] says that the envVar SLURM_JOB_ID should be different for each component. However, I cannot reproduce this on a fresh slurm-19.05.1 installation. $ salloc -pcompute -N1 : -pcompute2 -N1 [...] salloc: Granted job allocation 108453 [...] bash-4.1$

Re: [slurm-users] Slurm 19.05 --workdir non existent?

2019-08-15 Thread Bjørn-Helge Mevik
Christopher Benjamin Coffey writes: > It seems that --workdir= is no longer a valid option in batch jobs and > srun in 19.05, and has been replaced by --chdir. I didn't see a change > log about this, did I miss it? Going through the man pages it seems it > hasn't existed for some time now