[slurm-dev] Re: New User Creation Issue

2017-02-15 Thread Christopher Samuel

On 16/02/17 09:45, Lachlan Musicman wrote:

[partitions down in slurm.conf]
> What's the reasoning behind this? So that you can test the cluster still
> works with debug before jobs start getting submitted and failing?

Yeah, pretty much!

It's a continuation from when running Torque+Moab/Maui here and at VPAC
before that - we would always start Moab paused so we could check out
what impact any changes had to our queues & priorities before starting
jobs running.

Measure twice, cut once.

cheers!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: New User Creation Issue

2017-02-15 Thread Lachlan Musicman
On 16 February 2017 at 09:36, Christopher Samuel 
wrote:

>
> We also have all our partitions (other than our debug one reserved for
> sysadmins) marked as "State=DOWN" in slurm.conf so that they won't start
> jobs when slurmctld is brought back up again.
>

Chris,

What's the reasoning behind this? So that you can test the cluster still
works with debug before jobs start getting submitted and failing?

cheers
L.

--
The most dangerous phrase in the language is, "We've always done it this
way."

- Grace Hopper


[slurm-dev] Re: New User Creation Issue

2017-02-15 Thread Christopher Samuel

On 16/02/17 07:09, Katsnelson, Joe wrote:

> Just curious is there a way to restart slurm to get the below working
> without impacting the current jobs that are running?

You should be able to restart slurmctld with running jobs quite safely,
if you are paranoid (like me) then just mark partitions down first so
you know Slurm won't be trying to start any jobs just when you shut it down.

We also have all our partitions (other than our debug one reserved for
sysadmins) marked as "State=DOWN" in slurm.conf so that they won't start
jobs when slurmctld is brought back up again.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: New User Creation Issue

2017-02-15 Thread Christopher Samuel

Hi Joe,

On 27/01/17 09:01, Katsnelson, Joe wrote:

> sacctmgr list clusters

Sorry I missed this before!

Can you check that the machine where your slurmdbd is running can
connect to 172.16.0.1 on port 6817 please?

If it can't then that'll be the reason why you need to restart.

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: New User Creation Issue

2017-02-15 Thread Katsnelson, Joe
Just curious is there a way to restart slurm to get the below working without 
impacting the current jobs that are running?

From: "Sarlo, Jeffrey S" 
Date: Monday, January 23, 2017 at 5:12 PM
To: "Katsnelson, Joe" 
Subject: RE: [slurm-dev] New User Creation Issue

We see this in the 14.x version of slurm, but not the newer versions.

Jeff
University of Houston
IT - HPC

From: Katsnelson, Joe [mailto:joe.katsnel...@nyumc.org]
Sent: Monday, January 23, 2017 3:33 PM
To: slurm-dev
Subject: [slurm-dev] New User Creation Issue

Hi,
 I’m having an issue creating new users on our cluster. After running the 
below commands slurm has to be restarted in order for that user to be able to 
run sbatch. Otherwise they get an error ( sbatch test-job.sh
sbatch: error: Batch job submission failed: Invalid account or 
account/partition combination specified)

sacctmgr create user name=testuser account=isg
sacctmgr modify user testuser set qos+=full



This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is proprietary, 
confidential, and exempt from disclosure under applicable law. Any unauthorized 
review, use, disclosure, or distribution is prohibited. If you have received 
this email in error please notify the sender by return email and delete the 
original message. Please note, the recipient should check this email and any 
attachments for the presence of viruses. The organization accepts no liability 
for any damage caused by any virus transmitted by this email.
=


This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is proprietary, 
confidential, and exempt from disclosure under applicable law. Any unauthorized 
review, use, disclosure, or distribution is prohibited. If you have received 
this email in error please notify the sender by return email and delete the 
original message. Please note, the recipient should check this email and any 
attachments for the presence of viruses. The organization accepts no liability 
for any damage caused by any virus transmitted by this email.
=


[slurm-dev] Re: New User Creation Issue

2017-01-26 Thread Katsnelson, Joe
sacctmgr list clusters
   Cluster ControlHost  ControlPort   RPC Share GrpJobs GrpNodes 
GrpSubmit MaxJobs MaxNodes MaxSubmit MaxWall  QOS   Def QOS 
-- ---  - - ---  
- ---  - ---  - 
slurm_clu+  172.16.0.1 6817  7168 1 
normal   

On 1/24/17, 6:50 PM, "Christopher Samuel"  wrote:


On 24/01/17 08:34, Katsnelson, Joe wrote:

>  I’m having an issue creating new users on our cluster. After
> running the below commands slurm has to be restarted in order for that
> user to be able to run sbatch. Otherwise they get an error

When I've seen that before it's been because the slurmdbd cannot connect
back to slurmctld to send RPCs on the IP address that slurmctld has
registered with slurmdbd.

What does this say?

sacctmgr list clusters

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.vlsci.org.au_=DQIDaQ=j5oPpO0eBH1iio48DtsedbOBGmuw5jHLjgvtN2r4ehE=sxY-nHAtGJi0K0EzjB6cTZ6fKwx41cTaQxnT2RgUUtc=ZT98XWlAvA6H-DBHCGybCvFfrLdpMpl_5t9AmaH6dE0=ZGYHqYF6vCx2KFwZSc0viLCQ4cmlnggohFBNRpRfqCU=
   
https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_vlsci=DQIDaQ=j5oPpO0eBH1iio48DtsedbOBGmuw5jHLjgvtN2r4ehE=sxY-nHAtGJi0K0EzjB6cTZ6fKwx41cTaQxnT2RgUUtc=ZT98XWlAvA6H-DBHCGybCvFfrLdpMpl_5t9AmaH6dE0=JV2Vk5UWS9IbMwGhEtnjk1lIhbMTWm7Yn4e260Y_KUs=
 




This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is proprietary, 
confidential, and exempt from disclosure under applicable law. Any unauthorized 
review, use, disclosure, or distribution is prohibited. If you have received 
this email in error please notify the sender by return email and delete the 
original message. Please note, the recipient should check this email and any 
attachments for the presence of viruses. The organization accepts no liability 
for any damage caused by any virus transmitted by this email.
=


[slurm-dev] Re: New User Creation Issue

2017-01-24 Thread Christopher Samuel

On 24/01/17 08:34, Katsnelson, Joe wrote:

>  I’m having an issue creating new users on our cluster. After
> running the below commands slurm has to be restarted in order for that
> user to be able to run sbatch. Otherwise they get an error

When I've seen that before it's been because the slurmdbd cannot connect
back to slurmctld to send RPCs on the IP address that slurmctld has
registered with slurmdbd.

What does this say?

sacctmgr list clusters

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: New User Creation Issue

2017-01-23 Thread Katsnelson, Joe
We are planning to upgrade to a newer version of bright which comes with  Slurm 
16.05.8. We are currently running on slurm 14.11.11 so hopefully the newer 
version will address this issue as someone mentioned. (I check the configs the 
users match on both).
  The one thing that bright support asked me to ask if the upgrade from 
14.11.11 to 16.05.8 will break anything like the QoS since apparently the 
database doesn't get touched during the upgrade.

On Jan 23, 2017, at 5:07 PM, Lachlan Musicman 
<data...@gmail.com<mailto:data...@gmail.com>> wrote:

Do your entries for SlurmUser match in slurm.conf and slurmdbd.conf? (I don't 
know the cause of your problem, but that's what I'd look at next).

Also, does the AccountingStorageUser in slurm.conf match the user in 
slurmdbd.conf?

cheers
L.

--
The most dangerous phrase in the language is, "We've always done it this way."

- Grace Hopper

On 24 January 2017 at 08:52, Katsnelson, Joe 
<joe.katsnel...@nyumc.org<mailto:joe.katsnel...@nyumc.org>> wrote:
So I’m associating the user with an account in this case called isg and after I 
restart slurm they can submit jobs. Below is how the user looks even before I 
restart slurm

sacctmgr show assoc format=account%20,user,cluster,partition where user=testuser
 Account   UserCluster  Partition
 -- -- --
 isg   testuser slurm_clu+

one thing that I did notice when I add the user I see this error in the 
slurmctld log

[2017-01-23T16:47:34.351] error: Update Association request from non-super user 
uid=450

UID 450 happens to be the slurm user

But when I make the changes I’m logged in as root.


From: Lachlan Musicman <data...@gmail.com<mailto:data...@gmail.com>>
Reply-To: slurm-dev <slurm-dev@schedmd.com<mailto:slurm-dev@schedmd.com>>
Date: Monday, January 23, 2017 at 4:43 PM
To: slurm-dev <slurm-dev@schedmd.com<mailto:slurm-dev@schedmd.com>>
Subject: [slurm-dev] Re: New User Creation Issue

Interesting. To the best of my knowledge, if you are using Accounting, all 
users actually need to be in an association - ie having a user account is 
insufficient.
An Association is a tuple consisting of: cluster, user, account and (optional) 
partition.
Is that the problem?

cheers
L.

--
The most dangerous phrase in the language is, "We've always done it this way."

- Grace Hopper

On 24 January 2017 at 08:32, Katsnelson, Joe 
<joe.katsnel...@nyumc.org<mailto:joe.katsnel...@nyumc.org>> wrote:
Hi,
 I’m having an issue creating new users on our cluster. After running the 
below commands slurm has to be restarted in order for that user to be able to 
run sbatch. Otherwise they get an error ( sbatch test-job.sh
sbatch: error: Batch job submission failed: Invalid account or 
account/partition combination specified)

sacctmgr create user name=testuser account=isg
sacctmgr modify user testuser set qos+=full



This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is proprietary, 
confidential, and exempt from disclosure under applicable law. Any unauthorized 
review, use, disclosure, or distribution is prohibited. If you have received 
this email in error please notify the sender by return email and delete the 
original message. Please note, the recipient should check this email and any 
attachments for the presence of viruses. The organization accepts no liability 
for any damage caused by any virus transmitted by this email.
=



This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is proprietary, 
confidential, and exempt from disclosure under applicable law. Any unauthorized 
review, use, disclosure, or distribution is prohibited. If you have received 
this email in error please notify the sender by return email and delete the 
original message. Please note, the recipient should check this email and any 
attachments for the presence of viruses. The organization accepts no liability 
for any damage caused by any virus transmitted by this email.
=



This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is proprietary, 
confidential, and exempt from disclosure under applicable law. Any unauthorized 
review, use, disclosure, or distribution is prohibited. If you have received 
this email in error please notify the sender by return email and delete the 
original message. Please note, the recipient should check this email and any 
attachments for the presence of viruses. The organization accepts no liability 
for any damage caused by any virus transmitted by this email.
=


[slurm-dev] Re: New User Creation Issue

2017-01-23 Thread Lachlan Musicman
Do your entries for SlurmUser match in slurm.conf and slurmdbd.conf? (I
don't know the cause of your problem, but that's what I'd look at next).

Also, does the AccountingStorageUser in slurm.conf match the user in
slurmdbd.conf?

cheers
L.

--
The most dangerous phrase in the language is, "We've always done it this
way."

- Grace Hopper

On 24 January 2017 at 08:52, Katsnelson, Joe <joe.katsnel...@nyumc.org>
wrote:

> So I’m associating the user with an account in this case called isg and
> after I restart slurm they can submit jobs. Below is how the user looks
> even before I restart slurm
>
>
>
> sacctmgr show assoc format=account%20,user,cluster,partition where
> user=testuser
>
>  Account   UserCluster  Partition
>
>  -- -- --
>
>  isg   testuser slurm_clu+
>
>
>
> one thing that I did notice when I add the user I see this error in the
> slurmctld log
>
>
>
> [2017-01-23T16:47:34.351] error: Update Association request from non-super
> user uid=450
>
>
>
> UID 450 happens to be the slurm user
>
>
>
> But when I make the changes I’m logged in as root.
>
>
>
>
>
> *From: *Lachlan Musicman <data...@gmail.com>
> *Reply-To: *slurm-dev <slurm-dev@schedmd.com>
> *Date: *Monday, January 23, 2017 at 4:43 PM
> *To: *slurm-dev <slurm-dev@schedmd.com>
> *Subject: *[slurm-dev] Re: New User Creation Issue
>
>
>
> Interesting. To the best of my knowledge, if you are using Accounting, all
> users actually need to be in an association - ie having a user account is
> insufficient.
>
> An Association is a tuple consisting of: cluster, user, account and
> (optional) partition.
>
> Is that the problem?
>
>
>
> cheers
>
> L.
>
>
> --
> The most dangerous phrase in the language is, "We've always done it this
> way."
>
> - Grace Hopper
>
>
>
> On 24 January 2017 at 08:32, Katsnelson, Joe <joe.katsnel...@nyumc.org>
> wrote:
>
> Hi,
>
>  I’m having an issue creating new users on our cluster. After running
> the below commands slurm has to be restarted in order for that user to be
> able to run sbatch. Otherwise they get an error ( sbatch test-job.sh
>
> sbatch: error: Batch job submission failed: Invalid account or
> account/partition combination specified)
>
>
>
> sacctmgr create user name=testuser account=isg
>
> sacctmgr modify user testuser set qos+=full
>
>
>
>
> 
> This email message, including any attachments, is for the sole use of the
> intended recipient(s) and may contain information that is proprietary,
> confidential, and exempt from disclosure under applicable law. Any
> unauthorized review, use, disclosure, or distribution is prohibited. If you
> have received this email in error please notify the sender by return email
> and delete the original message. Please note, the recipient should check
> this email and any attachments for the presence of viruses. The
> organization accepts no liability for any damage caused by any virus
> transmitted by this email.
> =
>
>
>
>
> 
> This email message, including any attachments, is for the sole use of the
> intended recipient(s) and may contain information that is proprietary,
> confidential, and exempt from disclosure under applicable law. Any
> unauthorized review, use, disclosure, or distribution is prohibited. If you
> have received this email in error please notify the sender by return email
> and delete the original message. Please note, the recipient should check
> this email and any attachments for the presence of viruses. The
> organization accepts no liability for any damage caused by any virus
> transmitted by this email.
> =
>


[slurm-dev] Re: New User Creation Issue

2017-01-23 Thread Katsnelson, Joe
So I’m associating the user with an account in this case called isg and after I 
restart slurm they can submit jobs. Below is how the user looks even before I 
restart slurm

sacctmgr show assoc format=account%20,user,cluster,partition where user=testuser
 Account   UserCluster  Partition
 -- -- --
 isg   testuser slurm_clu+

one thing that I did notice when I add the user I see this error in the 
slurmctld log

[2017-01-23T16:47:34.351] error: Update Association request from non-super user 
uid=450

UID 450 happens to be the slurm user

But when I make the changes I’m logged in as root.


From: Lachlan Musicman <data...@gmail.com>
Reply-To: slurm-dev <slurm-dev@schedmd.com>
Date: Monday, January 23, 2017 at 4:43 PM
To: slurm-dev <slurm-dev@schedmd.com>
Subject: [slurm-dev] Re: New User Creation Issue

Interesting. To the best of my knowledge, if you are using Accounting, all 
users actually need to be in an association - ie having a user account is 
insufficient.
An Association is a tuple consisting of: cluster, user, account and (optional) 
partition.
Is that the problem?

cheers
L.

--
The most dangerous phrase in the language is, "We've always done it this way."

- Grace Hopper

On 24 January 2017 at 08:32, Katsnelson, Joe 
<joe.katsnel...@nyumc.org<mailto:joe.katsnel...@nyumc.org>> wrote:
Hi,
 I’m having an issue creating new users on our cluster. After running the 
below commands slurm has to be restarted in order for that user to be able to 
run sbatch. Otherwise they get an error ( sbatch test-job.sh
sbatch: error: Batch job submission failed: Invalid account or 
account/partition combination specified)

sacctmgr create user name=testuser account=isg
sacctmgr modify user testuser set qos+=full



This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is proprietary, 
confidential, and exempt from disclosure under applicable law. Any unauthorized 
review, use, disclosure, or distribution is prohibited. If you have received 
this email in error please notify the sender by return email and delete the 
original message. Please note, the recipient should check this email and any 
attachments for the presence of viruses. The organization accepts no liability 
for any damage caused by any virus transmitted by this email.
=



This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is proprietary, 
confidential, and exempt from disclosure under applicable law. Any unauthorized 
review, use, disclosure, or distribution is prohibited. If you have received 
this email in error please notify the sender by return email and delete the 
original message. Please note, the recipient should check this email and any 
attachments for the presence of viruses. The organization accepts no liability 
for any damage caused by any virus transmitted by this email.
=


[slurm-dev] Re: New User Creation Issue

2017-01-23 Thread Lachlan Musicman
Interesting. To the best of my knowledge, if you are using Accounting, all
users actually need to be in an association - ie having a user account is
insufficient.

An Association is a tuple consisting of: cluster, user, account and
(optional) partition.

Is that the problem?

cheers
L.

--
The most dangerous phrase in the language is, "We've always done it this
way."

- Grace Hopper

On 24 January 2017 at 08:32, Katsnelson, Joe 
wrote:

> Hi,
>
>  I’m having an issue creating new users on our cluster. After running
> the below commands slurm has to be restarted in order for that user to be
> able to run sbatch. Otherwise they get an error ( sbatch test-job.sh
>
> sbatch: error: Batch job submission failed: Invalid account or
> account/partition combination specified)
>
>
>
> sacctmgr create user name=testuser account=isg
>
> sacctmgr modify user testuser set qos+=full
>
>
>
>
> 
> This email message, including any attachments, is for the sole use of the
> intended recipient(s) and may contain information that is proprietary,
> confidential, and exempt from disclosure under applicable law. Any
> unauthorized review, use, disclosure, or distribution is prohibited. If you
> have received this email in error please notify the sender by return email
> and delete the original message. Please note, the recipient should check
> this email and any attachments for the presence of viruses. The
> organization accepts no liability for any damage caused by any virus
> transmitted by this email.
> =
>