Re: [slurm-users] Seff error with Slurm-18.08.1

2018-11-06 Thread Baker D . J .
Hello Mike et al,


This is a known bug in slurm v18.08*. We installed the initial release a short 
while ago and came across this issue very quickly. We actually use this script 
at the end of the job epilog to report job efficiency to users, and so it is 
real shame that it is now broken! The good new is that I am assured by SchedMD 
that the bug has been fixed in v18.08.3. Having said that we will probably live 
with this issue rather than disrupt users with another upgrade so soon . We 
have found a few other minor bugs in the new version of slurm, however I am 
glad to say that none of them are "life threatening".


If you're keen to have a work around in the meantime then please feel free to 
use our replacement script, "seff_new" -- a copy is attached with this email. 
It's not the most elegant of scripts, however it does work.


Best regards,

David


From: slurm-users  on behalf of Mike 
Cammilleri 
Sent: 05 November 2018 21:39
To: slurm-us...@schedmd.com; Slurm User Community List
Subject: Re: [slurm-users] Seff error with Slurm-18.08.1


I'm also interested in this issue since I've come across the same error today. 
We built Slurm-18.08.1 with the contribs packages on Ubuntu Bionic and seff is 
also complaining with

$ /s/slurm/bin/seff 36
perl: error: plugin_load_from_file: 
dlopen(/s/slurm/lib/slurm/accounting_storage_slurmdbd.so): 
/s/slurm/lib/slurm/accounting_storage_slurmdbd.so: undefined symbol: 
node_record_count
perl: error: Couldn't load specified plugin name for 
accounting_storage/slurmdbd: Dlopen of plugin file failed
perl: error: cannot create accounting_storage context for 
accounting_storage/slurmdbd
perl: error: plugin_load_from_file: 
dlopen(/s/slurm/lib/slurm/accounting_storage_slurmdbd.so): 
/s/slurm/lib/slurm/accounting_storage_slurmdbd.so: undefined symbol: 
node_record_count
perl: error: Couldn't load specified plugin name for 
accounting_storage/slurmdbd: Dlopen of plugin file failed
perl: error: cannot create accounting_storage context for 
accounting_storage/slurmdbd
Job not found.





Mike Cammilleri

Systems Administrator

Department of Statistics | UW-Madison

1300 University Ave | Room 1280
608-263-6673 | mi...@stat.wisc.edu



From: slurm-users  on behalf of Miguel 
A. Sánchez 
Sent: Tuesday, October 23, 2018 10:26 AM
To: slurm-us...@schedmd.com
Subject: [slurm-users] Seff error with Slurm-18.08.1

Hi all

I have updated my slurm from the 17.11.0 version to the 18.08.1. With
the previous version, the 17.11.0 version, the seff tool was working
fine but with the 18.08.1 version, when I try to run the seff tool I
receive the next error message:

# ./seff 
perl: error: plugin_load_from_file:
dlopen(/usr/local/slurm-18.08.2/lib/slurm/accounting_storage_slurmdbd.so):
/usr/local/slurm-18.08.2/lib/slurm/accounting_storage_slurmdbd.so:
undefined symbol: node_record_count
perl: error: Couldn't load specified plugin name for
accounting_storage/slurmdbd: Dlopen of plugin file failed
perl: error: cannot create accounting_storage context for
accounting_storage/slurmdbd
perl: error: plugin_load_from_file:
dlopen(/usr/local/slurm-18.08.2/lib/slurm/accounting_storage_slurmdbd.so):
/usr/local/slurm-18.08.2/lib/slurm/accounting_storage_slurmdbd.so:
undefined symbol: node_record_count
perl: error: Couldn't load specified plugin name for
accounting_storage/slurmdbd: Dlopen of plugin file failed
perl: error: cannot create accounting_storage context for
accounting_storage/slurmdbd
Job not found.
#

Both Slurm installations has been compiled from sources in the same
computer but only the seff that was compiled in the 17.11.0 version
works fine. To compile the seff tool, from the source Slurm tree:

cd contrib

make

make install

I think the problem is in the perlapi. Could it be a bug? Any Idea about
how can I fix this problem? Thanks a lot.


--

Miguel A. Sánchez Gómez
System Administrator
Research Programme on Biomedical Informatics - GRIB (IMIM-UPF)

Barcelona Biomedical Research Park (office 4.80)
Doctor Aiguader 88 | 08003 Barcelona (Spain)
Phone: +34/ 93 316 0522 | Fax: +34/ 93 3160 550
e-mail: miguelangel.sanc...@upf.edu




seff_new
Description: seff_new


Re: [slurm-users] Seff error with Slurm-18.08.1

2018-11-06 Thread Chris Samuel

On 6/11/18 7:49 pm, Baker D.J. wrote:

The good new is that I am assured by SchedMD that the bug has been fixed 
in v18.08.3.


Looks like it's fixed in this commmit.

commit 3d85c8f9240542d9e6dfb727244e75e449430aac
Author: Danny Auble 
Date:   Wed Oct 24 14:10:12 2018 -0600

Handle symbol resolution errors in the 18.08 slurmdbd.

Caused by b1ff43429f6426c when moving the slurmdbd agent internals.

Bug 5882.


Having said that we will probably live with this issue 
rather than disrupt users with another upgrade so soon .


An upgrade to 18.08.3 from 18.08.1 shouldn't be disruptive though, 
should it?  We just flip a symlink and the users see the new binaries, 
libraries, etc immediately, we can then restart daemons as and when we 
need to (in the right order of course, slurmdbd, slurmctld and then 
slurmd's).


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



Re: [slurm-users] Accounting: set default account with no access

2018-11-06 Thread Yair Yarom
Hi,

You can set the maxsubmitjob=0 on that default account. That should prevent
anyone from using it, but it won't have a specific message like with the
lua plugin. E.g.
sacctmgr update account default set maxsubmitjob=0

Regards,
Yair.


On Tue, Nov 6, 2018 at 12:58 AM Renfro, Michael  wrote:

> From https://stackoverflow.com/a/46176694:
>
> >> I had the same requirement to force users to specify accounts and,
> after finding several ways to fulfill it with slurm, I decided to revive
> this post with the shortest/easiest solution.
> >>
> >> The slurm lua submit plugin sees the job description before the default
> account is applied. Hence, you can install the slurm-lua package, add
> "JobSubmitPlugins=lua" to the slurm.conf, restart the slurmctld, and
> directly test against whether the account was defined via the
> job_submit.lua script (create the script wherever you keep your slurm.conf;
> typically in /etc/slurm/):
> >>
> >> -- /etc/slurm/job_submit.lua to reject jobs with no account specified
> >>
> >> function slurm_job_submit(job_desc, part_list, submit_uid)
> >> if job_desc.account == nil then
> >> slurm.log_error("User %s did not specify an account.",
> job_desc.user_id)
> >> slurm.log_user("You must specify an account!")
> >> return slurm.ERROR
> >> end
> >> return slurm.SUCCESS
> >> end
> >>
> >> function slurm_job_modify(job_desc, job_rec, part_list, modify_uid)
> >> return slurm.SUCCESS
> >> end
> >>
> >> return slurm.SUCCESS
>
> > On Nov 5, 2018, at 4:09 PM, Brian Andrus  wrote:
> >
> > All,
> >
> > I am trying to figure the best way to require users to explicitly
> specify an account when submitting jobs (--account= )
> >
> > What I was thinking was to create a default account for the users that
> has no ability to submit any jobs, so if they don't specify, any submission
> would fail.
> >
> > What I'm not seeing is how to set such an option on an account. I was
> hoping to do something like cluster=none for it's access, but that is not
> allowed.
> >
> >
> > Is there a way to set an account to not have access to submit jobs?
> > Alternatively is there an easier way to require the --account= option
> for jobs?
> >
> >
> > Brian Andrus
> >
> >
>
>
>


[slurm-users] constraints question

2018-11-06 Thread Tina Friedrich
Hello,

I hope this is a quick question.

The way I read the man page (srun/sbatch), I should be allowed a request like

--constraint="broadwell|haswell"

to get either a broadwell or a haswell node, or not? (I mean yes, assuming 
nodes with that feature exists).

I can't get that to work; when I try, I get this:

srun -C "broadwell|haswell" --pty /bin/bash
srun: error: Unable to allocate resources: Invalid KNL configuration (MCDRAM or 
NUMA option)

Both 'srun -C "broadwell"  and 'srun -C "haswell" ...'  work, i.e. I can 
request either a broadwell or a haswell node. 

I can get the '&' operator to work just fine - e.g. 

srun -C "broadwell&E5-2640v4" --pty /bin/bash

works as expected. 

So what am I doing wrong with the 'or'?

This is slurm 17.11.8, btw.

(And yes, I have the 'knl_generic' plugin enabled; there were some KNL nodes, 
although they are no longer there. Still, I'm not even requesting 'knl' here?)

Google didn't really yield anything, so I thought asking might be quicker.

Thanks!
Tina

-- 
Tina Friedrich, Snr HPC Systems Administrator, Advanced Research Computing
Research Computing and Support Services, Academic IT 
IT Services, University of Oxford 
http://www.arc.ox.ac.uk



Re: [slurm-users] Accounting: set default account with no access

2018-11-06 Thread Brian Andrus
But isn't that a user association setting and not an account setting? So 
I would have to set it for every user/default account association, no? 
Technically doable, but definitely more difficult to manage.


Brian Andrus


On 11/6/2018 3:58 AM, Yair Yarom wrote:

Hi,

You can set the maxsubmitjob=0 on that default account. That should 
prevent anyone from using it, but it won't have a specific message 
like with the lua plugin. E.g.

sacctmgr update account default set maxsubmitjob=0

Regards,
    Yair.


On Tue, Nov 6, 2018 at 12:58 AM Renfro, Michael > wrote:


From https://stackoverflow.com/a/46176694:

>> I had the same requirement to force users to specify accounts
and, after finding several ways to fulfill it with slurm, I
decided to revive this post with the shortest/easiest solution.
>>
>> The slurm lua submit plugin sees the job description before the
default account is applied. Hence, you can install the slurm-lua
package, add "JobSubmitPlugins=lua" to the slurm.conf, restart the
slurmctld, and directly test against whether the account was
defined via the job_submit.lua script (create the script wherever
you keep your slurm.conf; typically in /etc/slurm/):
>>
>> -- /etc/slurm/job_submit.lua to reject jobs with no account
specified
>>
>> function slurm_job_submit(job_desc, part_list, submit_uid)
>>     if job_desc.account == nil then
>>             slurm.log_error("User %s did not specify an
account.", job_desc.user_id)
>>             slurm.log_user("You must specify an account!")
>>             return slurm.ERROR
>>     end
>>     return slurm.SUCCESS
>> end
>>
>> function slurm_job_modify(job_desc, job_rec, part_list, modify_uid)
>>     return slurm.SUCCESS
>> end
>>
>> return slurm.SUCCESS

> On Nov 5, 2018, at 4:09 PM, Brian Andrus mailto:toomuc...@gmail.com>> wrote:
>
> All,
>
> I am trying to figure the best way to require users to
explicitly specify an account when submitting jobs (--account= )
>
> What I was thinking was to create a default account for the
users that has no ability to submit any jobs, so if they don't
specify, any submission would fail.
>
> What I'm not seeing is how to set such an option on an account.
I was hoping to do something like cluster=none for it's access,
but that is not allowed.
>
>
> Is there a way to set an account to not have access to submit jobs?
> Alternatively is there an easier way to require the --account=
option for jobs?
>
>
> Brian Andrus
>
>






Re: [slurm-users] Seff error with Slurm-18.08.1

2018-11-06 Thread Mike Cammilleri
Thanks for this. We'll try the workaround script. It is not mission-critical 
but our users have gotten accustomed to seeing these metrics at the end of each 
run and its nice to have. We are currently doing this in a test VM environment, 
so by the time we actually do the upgrade to the cluster perhaps the fix will 
be available then.


Mike Cammilleri

Systems Administrator

Department of Statistics | UW-Madison

1300 University Ave | Room 1280
608-263-6673 | mi...@stat.wisc.edu



From: slurm-users  on behalf of Chris 
Samuel 
Sent: Tuesday, November 6, 2018 5:03 AM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Seff error with Slurm-18.08.1

On 6/11/18 7:49 pm, Baker D.J. wrote:

> The good new is that I am assured by SchedMD that the bug has been fixed
> in v18.08.3.

Looks like it's fixed in this commmit.

commit 3d85c8f9240542d9e6dfb727244e75e449430aac
Author: Danny Auble 
Date:   Wed Oct 24 14:10:12 2018 -0600

 Handle symbol resolution errors in the 18.08 slurmdbd.

 Caused by b1ff43429f6426c when moving the slurmdbd agent internals.

 Bug 5882.


> Having said that we will probably live with this issue
> rather than disrupt users with another upgrade so soon .

An upgrade to 18.08.3 from 18.08.1 shouldn't be disruptive though,
should it?  We just flip a symlink and the users see the new binaries,
libraries, etc immediately, we can then restart daemons as and when we
need to (in the right order of course, slurmdbd, slurmctld and then
slurmd's).

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



Re: [slurm-users] Accounting: set default account with no access

2018-11-06 Thread Sam Hawarden
Hi Yair,


You can set maxsubmitjob=0 on an account.


The error message isn't helpful beyond the obvious though:


] salloc
salloc: error: AssocMaxSubmitJobLimit
salloc: error: Job submit/allocate failed: Job violates accounting/QOS policy 
(job submit limit, user's size and/or time limits)

So the lua script is preferable.


Kind regards,

  Sam

?

From: slurm-users  on behalf of Yair 
Yarom 
Sent: Wednesday, 7 November 2018 00:58
To: Slurm User Community List
Subject: Re: [slurm-users] Accounting: set default account with no access

Hi,

You can set the maxsubmitjob=0 on that default account. That should prevent 
anyone from using it, but it won't have a specific message like with the lua 
plugin. E.g.
sacctmgr update account default set maxsubmitjob=0

Regards,
Yair.


On Tue, Nov 6, 2018 at 12:58 AM Renfro, Michael 
mailto:ren...@tntech.edu>> wrote:
>From https://stackoverflow.com/a/46176694:

>> I had the same requirement to force users to specify accounts and, after 
>> finding several ways to fulfill it with slurm, I decided to revive this post 
>> with the shortest/easiest solution.
>>
>> The slurm lua submit plugin sees the job description before the default 
>> account is applied. Hence, you can install the slurm-lua package, add 
>> "JobSubmitPlugins=lua" to the slurm.conf, restart the slurmctld, and 
>> directly test against whether the account was defined via the job_submit.lua 
>> script (create the script wherever you keep your slurm.conf; typically in 
>> /etc/slurm/):
>>
>> -- /etc/slurm/job_submit.lua to reject jobs with no account specified
>>
>> function slurm_job_submit(job_desc, part_list, submit_uid)
>> if job_desc.account == nil then
>> slurm.log_error("User %s did not specify an account.", 
>> job_desc.user_id)
>> slurm.log_user("You must specify an account!")
>> return slurm.ERROR
>> end
>> return slurm.SUCCESS
>> end
>>
>> function slurm_job_modify(job_desc, job_rec, part_list, modify_uid)
>> return slurm.SUCCESS
>> end
>>
>> return slurm.SUCCESS

> On Nov 5, 2018, at 4:09 PM, Brian Andrus 
> mailto:toomuc...@gmail.com>> wrote:
>
> All,
>
> I am trying to figure the best way to require users to explicitly specify an 
> account when submitting jobs (--account= )
>
> What I was thinking was to create a default account for the users that has no 
> ability to submit any jobs, so if they don't specify, any submission would 
> fail.
>
> What I'm not seeing is how to set such an option on an account. I was hoping 
> to do something like cluster=none for it's access, but that is not allowed.
>
>
> Is there a way to set an account to not have access to submit jobs?
> Alternatively is there an easier way to require the --account= option for 
> jobs?
>
>
> Brian Andrus
>
>




[slurm-users] Accounting - running with 'wrong' account on cluster

2018-11-06 Thread Brian Andrus
All,
Ok, I set up a few clusters in slurmdb. They are not federated.
I set up some accounts too. One primary for each cluster, plus a few child
accounts (project codes)

Something like:
sacctmgr add account DevOps Cluster=cluster1,cluster2
sacctmgr -add account projectA Parent=DevOps
sacctmgr -add account projectB Parent=DevOps
sacctmgr add account Prod Cluster=cluster3

Then I added my user:
sacctmgr add user andrus DefaultAccount=DevOps Account=projectA,projectB

I set up a LUA that requires the --account option to submit anything. That
seems to work, but...

I am able to submit using account=projectB on cluster3. ???
Since 'projectB' is a child of account ' DevOps', which is only associated
with cluster1 and cluster2, shouldn't I be denied the ability to run using
that accout on cluster3?

Brian Andrus


Re: [slurm-users] Accounting - running with 'wrong' account on cluster

2018-11-06 Thread Christopher Samuel

On 7/11/18 7:35 am, Brian Andrus wrote:


I am able to submit using account=projectB on cluster3. ???
Since 'projectB' is a child of account ' DevOps', which is only 
associated with cluster1 and cluster2, shouldn't I be denied the ability 
to run using that accout on cluster3?


What does this say for you?

scontrol show config | fgrep AccountingStorageEnforce

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



Re: [slurm-users] Accounting - running with 'wrong' account on cluster

2018-11-06 Thread Brian Andrus
Ah. I thought I had set that.
So I did and now it is:
AccountingStorageEnforce = associations,limits

But I am still able to request and get resources on cluster3 using projectA
as my account..
Heck, I just tried using a fake account (account=asdas) and it worked...

"That ain't right..." - Guy Fleegman (GalaxyQuest)

Brian Andrus

On Tue, Nov 6, 2018 at 4:39 PM Christopher Samuel  wrote:

> On 7/11/18 7:35 am, Brian Andrus wrote:
>
> > I am able to submit using account=projectB on cluster3. ???
> > Since 'projectB' is a child of account ' DevOps', which is only
> > associated with cluster1 and cluster2, shouldn't I be denied the ability
> > to run using that accout on cluster3?
>
> What does this say for you?
>
> scontrol show config | fgrep AccountingStorageEnforce
>
> All the best,
> Chris
> --
>   Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>
>


Re: [slurm-users] Accounting - running with 'wrong' account on cluster

2018-11-06 Thread Christopher Samuel

On 7/11/18 1:57 pm, Brian Andrus wrote:


Ah. I thought I had set that.
So I did and now it is:
AccountingStorageEnforce = associations,limits

But I am still able to request and get resources on cluster3 using 
projectA as my account..

Heck, I just tried using a fake account (account=asdas) and it worked...

"That ain't right..." - Guy Fleegman (GalaxyQuest)


That's very odd, we have:

AccountingStorageEnforce = associations,limits,qos,safe

and for an account I'm not part of I get:

[csamuel@farnarkle1 ~]$ salloc -A oz015
salloc: error: Job submit/allocate failed: Invalid account or 
account/partition combination specified


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



Re: [slurm-users] Accounting - running with 'wrong' account on cluster

2018-11-06 Thread Brian Andrus
Ah just scontrol reconfigure doesn't actually make it take effect.
Restarting slurmctld did it.

On Tue, Nov 6, 2018 at 7:07 PM Christopher Samuel  wrote:

> On 7/11/18 1:57 pm, Brian Andrus wrote:
>
> > Ah. I thought I had set that.
> > So I did and now it is:
> > AccountingStorageEnforce = associations,limits
> >
> > But I am still able to request and get resources on cluster3 using
> > projectA as my account..
> > Heck, I just tried using a fake account (account=asdas) and it worked...
> >
> > "That ain't right..." - Guy Fleegman (GalaxyQuest)
>
> That's very odd, we have:
>
> AccountingStorageEnforce = associations,limits,qos,safe
>
> and for an account I'm not part of I get:
>
> [csamuel@farnarkle1 ~]$ salloc -A oz015
> salloc: error: Job submit/allocate failed: Invalid account or
> account/partition combination specified
>
> All the best,
> Chris
> --
>   Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>
>


Re: [slurm-users] Accounting - running with 'wrong' account on cluster

2018-11-06 Thread Christopher Samuel

On 7/11/18 2:44 pm, Brian Andrus wrote:

Ah just scontrol reconfigure doesn't actually make it take effect. 
Restarting slurmctld did it.


Phew!  Glad to hear that's sorted out.. :-)

--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



Re: [slurm-users] Accounting - running with 'wrong' account on cluster

2018-11-06 Thread Brian Andrus
Hmm. ok, so using unmatched accounts makes a fail:
(on cluster1)
$ srun -n16 -A  Prod--pty bash
*srun: error: Unable to allocate resources: Invalid account or
account/partition combination specified*

But using a valid account also fails:
$ srun -n16 -A projectA --pty bash
*srun: error: Unable to allocate resources: Invalid account or
account/partition combination specified*

So now I don't seem to be able to run anything...

On Tue, Nov 6, 2018 at 7:53 PM Christopher Samuel  wrote:

> On 7/11/18 2:44 pm, Brian Andrus wrote:
>
> > Ah just scontrol reconfigure doesn't actually make it take effect.
> > Restarting slurmctld did it.
>
> Phew!  Glad to hear that's sorted out.. :-)
>
> --
>   Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>
>


Re: [slurm-users] Accounting - running with 'wrong' account on cluster

2018-11-06 Thread Brian Andrus
Ah. I was getting ahead of myself. I used 'limits' and I have no limits
configured, only associations.
Changed it to just associations and all is good.

On Tue, Nov 6, 2018 at 8:34 PM Brian Andrus  wrote:

> Hmm. ok, so using unmatched accounts makes a fail:
> (on cluster1)
> $ srun -n16 -A  Prod--pty bash
> *srun: error: Unable to allocate resources: Invalid account or
> account/partition combination specified*
>
> But using a valid account also fails:
> $ srun -n16 -A projectA --pty bash
> *srun: error: Unable to allocate resources: Invalid account or
> account/partition combination specified*
>
> So now I don't seem to be able to run anything...
>
> On Tue, Nov 6, 2018 at 7:53 PM Christopher Samuel 
> wrote:
>
>> On 7/11/18 2:44 pm, Brian Andrus wrote:
>>
>> > Ah just scontrol reconfigure doesn't actually make it take effect.
>> > Restarting slurmctld did it.
>>
>> Phew!  Glad to hear that's sorted out.. :-)
>>
>> --
>>   Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>>
>>