[slurm-dev] Re: Accounting using LDAP ?

2017-09-19 Thread Loris Bennett
Christopher Samuel writes: > On 20/09/17 03:03, Carlos Lijeron wrote: > >> I'm trying to enable accounting on our SLURM configuration, but our >> cluster is managed by Bright Management which has its own LDAP for users >> and groups.   When setting up SLURM accounting, I

[slurm-dev] Re: Accounting using LDAP ?

2017-09-19 Thread Christopher Samuel
On 20/09/17 03:03, Carlos Lijeron wrote: > I'm trying to enable accounting on our SLURM configuration, but our > cluster is managed by Bright Management which has its own LDAP for users > and groups.   When setting up SLURM accounting, I don't know how to make > the connection between the users

[slurm-dev] Re: Limiting SSH sessions to cgroups?

2017-09-19 Thread Christopher Samuel
On 20/09/17 06:39, Jacob Chappell wrote: > Thanks everyone who has replied. I am trying to get pam_slurm_adopt.so > implemented. Does it work with batch jobs? It does indeed, we use it as well. Do you have: PrologFlags=contain set? From slurm.conf: Contain At job allocation time, use

[slurm-dev] Re: Limiting SSH sessions to cgroups?

2017-09-19 Thread Jacob Chappell
Thanks everyone who has replied. I am trying to get pam_slurm_adopt.so implemented. Does it work with batch jobs? I keep getting errors, even though I have a job running on the node I'm trying to login to: jacob@condo:~$ sbatch nvidia-docker-test.sh Submitted batch job 41 jacob@condo:~$ squeue

[slurm-dev] Re: Limiting SSH sessions to cgroups?

2017-09-19 Thread Trafford, Tyler
> On 9/19/17, 11:49 AM, "Trafford, Tyler" wrote: > >>Have you looked at "pam_slurm_adopt.so"? >> >>We are using that successfully. It "adopts" the cgroup of the user's job. > > We also use pam_slurm_adopt.so, and I¹m mostly happy with it. One caution > is that the doco

[slurm-dev] Re: systemd slurm not starting on boot

2017-09-19 Thread Kyle Mills
Hi Thomas and Ole, Here's an update on what fixed the problem, for future use. I'm not sure exactly what fixed the problem, but a combination of your replies inspired me to run: ```sudo systemctl enable NetworkManager-wait-online.service``` to make sure the network is connected before

[slurm-dev] Re: systemd slurm not starting on boot

2017-09-19 Thread Thomas HAMEL
Hello, Do you use topologyparam=NoctldInAddrAny ? If so, that can be because slurmctld tries to start before the relevant network interface is available. That would explain why it works later. You can solve this by putting the right dependency on systemd (something like

[slurm-dev] Re: Limiting SSH sessions to cgroups?

2017-09-19 Thread Trafford, Tyler
> I found an old mailing list discussion about this. I'm curious if any > progress has been made since and if there is a solution now? > > Is there a way to limit the SSH sessions of users to the cgroup defined by > their jobs? I'm using pam_slurm.so to limit SSH access > to only those users

[slurm-dev] Re: systemd slurm not starting on boot

2017-09-19 Thread Ole Holm Nielsen
The Slurm 15.08.7 is really old, the current version is 17.02.7. Still, if you read my Wiki about Slurm configuration, perhaps the missing item will be discovered: https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration /Ole On 09/19/2017 05:17 PM, Kyle Mills wrote: Hi Ole, I'm using

[slurm-dev] Re: systemd slurm not starting on boot

2017-09-19 Thread Kyle Mills
Hi Ole, I'm using Ubuntu 16.04 on each head/compute node, and have installed slurm-wlm from the apt repositories. It is slurm 15.08.7. On Tue, Sep 19, 2017 at 11:07 AM, Ole Holm Nielsen < ole.h.niel...@fysik.dtu.dk> wrote: > > If your OS is CentOS/RHEL 7, you may want to consult my Wiki page

[slurm-dev] Re: systemd slurm not starting on boot

2017-09-19 Thread Ole Holm Nielsen
If your OS is CentOS/RHEL 7, you may want to consult my Wiki page about setting up Slurm: https://wiki.fysik.dtu.dk/niflheim/SLURM. If you do things correctly, there should be no problems :-) /Ole On 09/19/2017 05:02 PM, Kyle Mills wrote: Hello, I'm trying to get SLURM set up on a small

[slurm-dev] systemd slurm not starting on boot

2017-09-19 Thread Kyle Mills
Hello, I'm trying to get SLURM set up on a small cluster comprised of a head node and 4 compute nodes. On the head node, I have run ``` sudo systemctl enable slurmctld ``` but after a reboot SLURM is not running and `sudo systemctl status slurmctld` returns: ``` ● slurmctld.service - Slurm

[slurm-dev] Re: Limiting SSH sessions to cgroups?

2017-09-19 Thread Jeffrey Frey
>> Is there a way to limit the SSH sessions of users to the cgroup defined by >> their jobs? I'm using pam_slurm.so to limit SSH access to only those users >> with running jobs. However, if a user reserves say 2 GPUs on a 4 GPU system, >> the cgroups only give their job access to 2 GPUs. But,

[slurm-dev] Re: Limiting SSH sessions to cgroups?

2017-09-19 Thread Ole Holm Nielsen
On 09/19/2017 03:25 PM, Jacob Chappell wrote: I found an old mailing list discussion about this. I'm curious if any progress has been made since and if there is a solution now? There is now a better understanding of how to use slurm-pam_slurm with Slurm 17.02.2 or later for limiting SSH

[slurm-dev] Limiting SSH sessions to cgroups?

2017-09-19 Thread Jacob Chappell
All, I found an old mailing list discussion about this. I'm curious if any progress has been made since and if there is a solution now? Is there a way to limit the SSH sessions of users to the cgroup defined by their jobs? I'm using pam_slurm.so to limit SSH access to only those users with

[slurm-dev] Does powering down as suspend action still work?

2017-09-19 Thread Loris Bennett
Hi, Can any one confirm that powering off nodes still works as a suspend action in 16.05 and/or 17.02? Cheers, Loris BTW, the example of slurmctld logging contains the line: [May 02 15:31:25] Power save mode 0 nodes Given that the code now reads if (((now - last_log) > 600) &&

[slurm-dev] Suspend stopped working - debug flag?

2017-09-19 Thread Loris Bennett
Hi, We have been powering down idle nodes for many years now. However, at some point recently, this seems to have stopped working. I can't pinpoint exactly when the problem started, as the cluster is usually full and so the situation in which nodes should be powered down doesn't occur very

[slurm-dev] Re: Job stuck in CONFIGURING, node is 'mix~'

2017-09-19 Thread Loris Bennett
Loris Bennett writes: > Hi, > > I have a node which is powered on and to which I have sent a job. The > output of sinfo is > > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > test up 7-00:00:00 1 mix~ node001 > > The output of squeue is >

[slurm-dev] Re: Behaviour of Partition setting MaxTime

2017-09-19 Thread Greg Wickham
> On 19 Sep 2017, at 8:50 AM, Loris Bennett wrote: > > > Hi Greg, > > Greg Wickham writes: > >> Hi, >> >> What is the behaviour when either root or the SlurmUser update the >> duration of an unprivileged user's running job to exceed the