[slurm-dev] Re: New User Creation Issue
On 16/02/17 09:45, Lachlan Musicman wrote: [partitions down in slurm.conf] > What's the reasoning behind this? So that you can test the cluster still > works with debug before jobs start getting submitted and failing? Yeah, pretty much! It's a continuation from when running Torque+Moab/Maui here and at VPAC before that - we would always start Moab paused so we could check out what impact any changes had to our queues & priorities before starting jobs running. Measure twice, cut once. cheers! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: New User Creation Issue
On 16 February 2017 at 09:36, Christopher Samuelwrote: > > We also have all our partitions (other than our debug one reserved for > sysadmins) marked as "State=DOWN" in slurm.conf so that they won't start > jobs when slurmctld is brought back up again. > Chris, What's the reasoning behind this? So that you can test the cluster still works with debug before jobs start getting submitted and failing? cheers L. -- The most dangerous phrase in the language is, "We've always done it this way." - Grace Hopper
[slurm-dev] Re: New User Creation Issue
On 16/02/17 07:09, Katsnelson, Joe wrote: > Just curious is there a way to restart slurm to get the below working > without impacting the current jobs that are running? You should be able to restart slurmctld with running jobs quite safely, if you are paranoid (like me) then just mark partitions down first so you know Slurm won't be trying to start any jobs just when you shut it down. We also have all our partitions (other than our debug one reserved for sysadmins) marked as "State=DOWN" in slurm.conf so that they won't start jobs when slurmctld is brought back up again. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: New User Creation Issue
Hi Joe, On 27/01/17 09:01, Katsnelson, Joe wrote: > sacctmgr list clusters Sorry I missed this before! Can you check that the machine where your slurmdbd is running can connect to 172.16.0.1 on port 6817 please? If it can't then that'll be the reason why you need to restart. cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: New User Creation Issue
Just curious is there a way to restart slurm to get the below working without impacting the current jobs that are running? From: "Sarlo, Jeffrey S"Date: Monday, January 23, 2017 at 5:12 PM To: "Katsnelson, Joe" Subject: RE: [slurm-dev] New User Creation Issue We see this in the 14.x version of slurm, but not the newer versions. Jeff University of Houston IT - HPC From: Katsnelson, Joe [mailto:joe.katsnel...@nyumc.org] Sent: Monday, January 23, 2017 3:33 PM To: slurm-dev Subject: [slurm-dev] New User Creation Issue Hi, I’m having an issue creating new users on our cluster. After running the below commands slurm has to be restarted in order for that user to be able to run sbatch. Otherwise they get an error ( sbatch test-job.sh sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified) sacctmgr create user name=testuser account=isg sacctmgr modify user testuser set qos+=full This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email. = This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email. =
[slurm-dev] Re: New User Creation Issue
sacctmgr list clusters Cluster ControlHost ControlPort RPC Share GrpJobs GrpNodes GrpSubmit MaxJobs MaxNodes MaxSubmit MaxWall QOS Def QOS -- --- - - --- - --- - --- - slurm_clu+ 172.16.0.1 6817 7168 1 normal On 1/24/17, 6:50 PM, "Christopher Samuel"wrote: On 24/01/17 08:34, Katsnelson, Joe wrote: > I’m having an issue creating new users on our cluster. After > running the below commands slurm has to be restarted in order for that > user to be able to run sbatch. Otherwise they get an error When I've seen that before it's been because the slurmdbd cannot connect back to slurmctld to send RPCs on the IP address that slurmctld has registered with slurmdbd. What does this say? sacctmgr list clusters cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 https://urldefense.proofpoint.com/v2/url?u=http-3A__www.vlsci.org.au_=DQIDaQ=j5oPpO0eBH1iio48DtsedbOBGmuw5jHLjgvtN2r4ehE=sxY-nHAtGJi0K0EzjB6cTZ6fKwx41cTaQxnT2RgUUtc=ZT98XWlAvA6H-DBHCGybCvFfrLdpMpl_5t9AmaH6dE0=ZGYHqYF6vCx2KFwZSc0viLCQ4cmlnggohFBNRpRfqCU= https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_vlsci=DQIDaQ=j5oPpO0eBH1iio48DtsedbOBGmuw5jHLjgvtN2r4ehE=sxY-nHAtGJi0K0EzjB6cTZ6fKwx41cTaQxnT2RgUUtc=ZT98XWlAvA6H-DBHCGybCvFfrLdpMpl_5t9AmaH6dE0=JV2Vk5UWS9IbMwGhEtnjk1lIhbMTWm7Yn4e260Y_KUs= This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email. =
[slurm-dev] Re: New User Creation Issue
On 24/01/17 08:34, Katsnelson, Joe wrote: > I’m having an issue creating new users on our cluster. After > running the below commands slurm has to be restarted in order for that > user to be able to run sbatch. Otherwise they get an error When I've seen that before it's been because the slurmdbd cannot connect back to slurmctld to send RPCs on the IP address that slurmctld has registered with slurmdbd. What does this say? sacctmgr list clusters cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: New User Creation Issue
We are planning to upgrade to a newer version of bright which comes with Slurm 16.05.8. We are currently running on slurm 14.11.11 so hopefully the newer version will address this issue as someone mentioned. (I check the configs the users match on both). The one thing that bright support asked me to ask if the upgrade from 14.11.11 to 16.05.8 will break anything like the QoS since apparently the database doesn't get touched during the upgrade. On Jan 23, 2017, at 5:07 PM, Lachlan Musicman <data...@gmail.com<mailto:data...@gmail.com>> wrote: Do your entries for SlurmUser match in slurm.conf and slurmdbd.conf? (I don't know the cause of your problem, but that's what I'd look at next). Also, does the AccountingStorageUser in slurm.conf match the user in slurmdbd.conf? cheers L. -- The most dangerous phrase in the language is, "We've always done it this way." - Grace Hopper On 24 January 2017 at 08:52, Katsnelson, Joe <joe.katsnel...@nyumc.org<mailto:joe.katsnel...@nyumc.org>> wrote: So I’m associating the user with an account in this case called isg and after I restart slurm they can submit jobs. Below is how the user looks even before I restart slurm sacctmgr show assoc format=account%20,user,cluster,partition where user=testuser Account UserCluster Partition -- -- -- isg testuser slurm_clu+ one thing that I did notice when I add the user I see this error in the slurmctld log [2017-01-23T16:47:34.351] error: Update Association request from non-super user uid=450 UID 450 happens to be the slurm user But when I make the changes I’m logged in as root. From: Lachlan Musicman <data...@gmail.com<mailto:data...@gmail.com>> Reply-To: slurm-dev <slurm-dev@schedmd.com<mailto:slurm-dev@schedmd.com>> Date: Monday, January 23, 2017 at 4:43 PM To: slurm-dev <slurm-dev@schedmd.com<mailto:slurm-dev@schedmd.com>> Subject: [slurm-dev] Re: New User Creation Issue Interesting. To the best of my knowledge, if you are using Accounting, all users actually need to be in an association - ie having a user account is insufficient. An Association is a tuple consisting of: cluster, user, account and (optional) partition. Is that the problem? cheers L. -- The most dangerous phrase in the language is, "We've always done it this way." - Grace Hopper On 24 January 2017 at 08:32, Katsnelson, Joe <joe.katsnel...@nyumc.org<mailto:joe.katsnel...@nyumc.org>> wrote: Hi, I’m having an issue creating new users on our cluster. After running the below commands slurm has to be restarted in order for that user to be able to run sbatch. Otherwise they get an error ( sbatch test-job.sh sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified) sacctmgr create user name=testuser account=isg sacctmgr modify user testuser set qos+=full This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email. = This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email. = This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email. =
[slurm-dev] Re: New User Creation Issue
Do your entries for SlurmUser match in slurm.conf and slurmdbd.conf? (I don't know the cause of your problem, but that's what I'd look at next). Also, does the AccountingStorageUser in slurm.conf match the user in slurmdbd.conf? cheers L. -- The most dangerous phrase in the language is, "We've always done it this way." - Grace Hopper On 24 January 2017 at 08:52, Katsnelson, Joe <joe.katsnel...@nyumc.org> wrote: > So I’m associating the user with an account in this case called isg and > after I restart slurm they can submit jobs. Below is how the user looks > even before I restart slurm > > > > sacctmgr show assoc format=account%20,user,cluster,partition where > user=testuser > > Account UserCluster Partition > > -- -- -- > > isg testuser slurm_clu+ > > > > one thing that I did notice when I add the user I see this error in the > slurmctld log > > > > [2017-01-23T16:47:34.351] error: Update Association request from non-super > user uid=450 > > > > UID 450 happens to be the slurm user > > > > But when I make the changes I’m logged in as root. > > > > > > *From: *Lachlan Musicman <data...@gmail.com> > *Reply-To: *slurm-dev <slurm-dev@schedmd.com> > *Date: *Monday, January 23, 2017 at 4:43 PM > *To: *slurm-dev <slurm-dev@schedmd.com> > *Subject: *[slurm-dev] Re: New User Creation Issue > > > > Interesting. To the best of my knowledge, if you are using Accounting, all > users actually need to be in an association - ie having a user account is > insufficient. > > An Association is a tuple consisting of: cluster, user, account and > (optional) partition. > > Is that the problem? > > > > cheers > > L. > > > -- > The most dangerous phrase in the language is, "We've always done it this > way." > > - Grace Hopper > > > > On 24 January 2017 at 08:32, Katsnelson, Joe <joe.katsnel...@nyumc.org> > wrote: > > Hi, > > I’m having an issue creating new users on our cluster. After running > the below commands slurm has to be restarted in order for that user to be > able to run sbatch. Otherwise they get an error ( sbatch test-job.sh > > sbatch: error: Batch job submission failed: Invalid account or > account/partition combination specified) > > > > sacctmgr create user name=testuser account=isg > > sacctmgr modify user testuser set qos+=full > > > > > > This email message, including any attachments, is for the sole use of the > intended recipient(s) and may contain information that is proprietary, > confidential, and exempt from disclosure under applicable law. Any > unauthorized review, use, disclosure, or distribution is prohibited. If you > have received this email in error please notify the sender by return email > and delete the original message. Please note, the recipient should check > this email and any attachments for the presence of viruses. The > organization accepts no liability for any damage caused by any virus > transmitted by this email. > = > > > > > > This email message, including any attachments, is for the sole use of the > intended recipient(s) and may contain information that is proprietary, > confidential, and exempt from disclosure under applicable law. Any > unauthorized review, use, disclosure, or distribution is prohibited. If you > have received this email in error please notify the sender by return email > and delete the original message. Please note, the recipient should check > this email and any attachments for the presence of viruses. The > organization accepts no liability for any damage caused by any virus > transmitted by this email. > = >
[slurm-dev] Re: New User Creation Issue
So I’m associating the user with an account in this case called isg and after I restart slurm they can submit jobs. Below is how the user looks even before I restart slurm sacctmgr show assoc format=account%20,user,cluster,partition where user=testuser Account UserCluster Partition -- -- -- isg testuser slurm_clu+ one thing that I did notice when I add the user I see this error in the slurmctld log [2017-01-23T16:47:34.351] error: Update Association request from non-super user uid=450 UID 450 happens to be the slurm user But when I make the changes I’m logged in as root. From: Lachlan Musicman <data...@gmail.com> Reply-To: slurm-dev <slurm-dev@schedmd.com> Date: Monday, January 23, 2017 at 4:43 PM To: slurm-dev <slurm-dev@schedmd.com> Subject: [slurm-dev] Re: New User Creation Issue Interesting. To the best of my knowledge, if you are using Accounting, all users actually need to be in an association - ie having a user account is insufficient. An Association is a tuple consisting of: cluster, user, account and (optional) partition. Is that the problem? cheers L. -- The most dangerous phrase in the language is, "We've always done it this way." - Grace Hopper On 24 January 2017 at 08:32, Katsnelson, Joe <joe.katsnel...@nyumc.org<mailto:joe.katsnel...@nyumc.org>> wrote: Hi, I’m having an issue creating new users on our cluster. After running the below commands slurm has to be restarted in order for that user to be able to run sbatch. Otherwise they get an error ( sbatch test-job.sh sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified) sacctmgr create user name=testuser account=isg sacctmgr modify user testuser set qos+=full This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email. = This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email. =
[slurm-dev] Re: New User Creation Issue
Interesting. To the best of my knowledge, if you are using Accounting, all users actually need to be in an association - ie having a user account is insufficient. An Association is a tuple consisting of: cluster, user, account and (optional) partition. Is that the problem? cheers L. -- The most dangerous phrase in the language is, "We've always done it this way." - Grace Hopper On 24 January 2017 at 08:32, Katsnelson, Joewrote: > Hi, > > I’m having an issue creating new users on our cluster. After running > the below commands slurm has to be restarted in order for that user to be > able to run sbatch. Otherwise they get an error ( sbatch test-job.sh > > sbatch: error: Batch job submission failed: Invalid account or > account/partition combination specified) > > > > sacctmgr create user name=testuser account=isg > > sacctmgr modify user testuser set qos+=full > > > > > > This email message, including any attachments, is for the sole use of the > intended recipient(s) and may contain information that is proprietary, > confidential, and exempt from disclosure under applicable law. Any > unauthorized review, use, disclosure, or distribution is prohibited. If you > have received this email in error please notify the sender by return email > and delete the original message. Please note, the recipient should check > this email and any attachments for the presence of viruses. The > organization accepts no liability for any damage caused by any virus > transmitted by this email. > = >