[slurm-dev] Re: Slurm array scheduling question

2016-09-21 Thread Christopher benjamin Coffey
Hi Janne,

> AFAIU the major optimization wrt. array job scheduling is that if the 
> scheduler finds that it cannot schedule a job in a job array, it skips over 
> all the rest of the jobs in the array.

If that’s true, then I think most of my worries should be nullified.  That 
would definitely make sense.

> Currently we have MaxJobCount=300k and MaxArraySize=100k (similar to your 
> case, we had some users that wanted to run huge array jobs). In order to 
> prevent individual users from hogging the entire cluster, we use the 
> GrpTRESRunMins limits (GrpCPURunMins if you're stuck on an older slurm 
> version).

We are running 16.05 and are utilizing GrpTRESRunMins (cpu, and mem); it 
working really well.  So with that in place I’m not worried about the 
researcher’s huge array clobbering the cluster.  I’ve been more thinking of the 
response time for users and such querying with squeue, as well as scheduling 
overhead.  But, it sounds like it’s working out well for your site!

Thank you for your feedback.

Best,
Chris

> On Sep 20, 2016, at 11:32 PM, Blomqvist Janne <janne.blomqv...@aalto.fi> 
> wrote:
> 
> Hi,
> 
> AFAIU the major optimization wrt. array job scheduling is that if the 
> scheduler finds that it cannot schedule a job in a job array, it skips over 
> all the rest of the jobs in the array. There's also some memory benefits, 
> e.g. a pending job array is stored as a single object in the job queue, 
> rather than being broken up into a zillion separate jobs.
> 
> But, from the perspective of various limits (MaxJobCount, and the various job 
> number limits you can set per account/user if you use accounting, etc.), a 
> job array with N jobs counts as N jobs and not as one. 
> 
> Back in the slurm 2.something days, we found that we had to keep the 
> MaxJobCount somewhat reasonable (10k or such) lest the scheduler would bog 
> down, but current versions are a lot better in this respect. Currently we 
> have MaxJobCount=300k and MaxArraySize=100k (similar to your case, we had 
> some users that wanted to run huge array jobs). In order to prevent 
> individual users from hogging the entire cluster, we use the GrpTRESRunMins 
> limits (GrpCPURunMins if you're stuck on an older slurm version).
> 
> --
> Janne Blomqvist
> 
> 
> From: Christopher benjamin Coffey [chris.cof...@nau.edu]
> Sent: Wednesday, September 21, 2016 7:14
> To: slurm-dev
> Subject: [slurm-dev] Slurm array scheduling question
> 
> Hi,
> 
> When slurm is considering jobs to schedule including job arrays out of all 
> pending jobs, does slurm consider only the job array individually, or does it 
> consider the child jobs behind them?  I’m curious as I’ve to date limited the 
> size of job arrays to 4000 to be proportional with our max queue limit of 
> 13,000.  I’ve done this in order to keep the job depth at a reasonable size 
> for efficient slurm scheduling and backfilling (maybe not needed!).  But to 
> date, I, and the folks utilizing our cluster have been pleased with the 
> scheduling being done, and speed on our cluster; I don’t want to change that! 
> ☺
> 
> I have a researcher now wanting to process 100K+ inputs with slurm arrays, 
> and my 4000 limit is becoming a burden where we’ve been looking into ways to 
> work around it. I’ve started rethinking my original 4000 number and am now 
> wondering if it’s necessary to keep the array size so low.
> 
> A man on slurm.conf gives the impression that if I change the slurm array 
> size, the max queue size has to be augmented to a higher value.  This would 
> indicate to me that this would in fact impact the scheduling significantly as 
> now for backfill, there has to be potentially many more jobs tested before 
> starting them.
> 
> I’d like to get some feedback on this please from other sites and the 
> developers if possible.  Thank you!
> 
> Best,
> Chris
> 
> —
> Christopher Coffey
> High-Performance Computing
> Northern Arizona University
> 928-523-1167
> 



[slurm-dev] Re: set maximum CPU usage per user

2016-10-19 Thread Christopher Benjamin Coffey
Hi Steven,

If you are trying to restrict the cpus for a group, I believe you need to set 
the account value:

sacctmgr modify account normal set Grpcpus=300

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167

On 10/18/16, 4:04 PM, "Steven Lo"  wrote:



Hi,

We are trying to limit 300 CPU usage per user in our cluster.

We have tried:

sacctmgr modify qos normal set Grpcpus=300

and

sacctmgr modify user username set GrpCPUs=300


Both seems to allow job to run which asking for 308 CPUs.


Is there other way to implement this requirement?


Thanks advance for your suggestion.


Steven.




[slurm-dev] Slurmdb access outside of slurm commands

2016-10-18 Thread Christopher Benjamin Coffey
Hi,

We are building a webapp which will utilize data stored in the slurm msyql db.  
Is there anything wrong with adding a read only user that the app can use 
indirectly to cache statistics? I don’t see any issue with it, but I’m curious 
if it would get in the way of any normal slurm operations.  Any considerations 
you can think of to prevent degradation of normal slurm performance?  Thank you!

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167



[slurm-dev] Re: Requirement of no firewall on compute nodes?

2016-10-27 Thread Christopher Benjamin Coffey
Hi Ole,

I don’t see a reason for a firewall to exist on a compute node, is it a 
requirement on your new cluster?  If not, disable it.  I don’t see Moe’s 
statement as saying that you can’t have a firewall, just that if there is one, 
you should open it up to allow all slurm communication.

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167

On 10/27/16, 5:58 AM, "Ole Holm Nielsen"  wrote:


In the process of developing our new cluster using Slurm, I've been 
bitten by the firewall settings on the compute nodes preventing MPI jobs 
from spawning tasks on remote nodes.

I now believe that Slurm actually has a requirement that compute nodes 
must have their Linux firewall disabled.  I haven't been able to find 
any hint of this requirement in the official Slurm documentation.  I did 
find an old slurm-devel posting by Moe Jette (pretty authoritative!) in 2010
   https://groups.google.com/forum/#!topic/slurm-devel/wOHcXopbaXw
saying:

> Other communications (say between srun and the spawned tasks) are 
intended to operate within a cluster
> and have no port restrictions. If there is a firewall between nodes in 
your cluster (at least as a "cluster" is
> configured in SLURM), then logic would need to be added to SLURM to 
provide the functionality you describe.

Can anyone confirm that Moe's statement is still valid with the current 
Slurm version?

Conclusion: Compute nodes must have their Linux firewall disabled.

Thanks,
Ole




[slurm-dev] Re: Power user sstat rights

2017-03-14 Thread Christopher Benjamin Coffey
Hello, anyone know if this is possible? Thanks! ☺

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167

On 3/8/17, 9:19 AM, "Christopher Benjamin Coffey" <chris.cof...@nau.edu> wrote:

Hello,

Is it possible to create a slurm account that has privileges to get sstat 
read access for all running jobs without giving modification privileges? Thank 
you.

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167





[slurm-dev] Re: Power user sstat rights

2017-04-18 Thread Christopher Benjamin Coffey
Thanks for taking a look at that Doug. Maybe we’ll write a feature patch and 
submit it.

Best,
Chris

On 4/17/17, 10:39 PM, "Douglas Jacobsen" <dmjacob...@lbl.gov> wrote:

Well, so looking at it, sstat generates an RPC which is forwarded to all 
the slurmds relevant to the job (see src/sstat/sstat.c), REQUEST_JOB_STEP_STAT. 
 When slurmd receives REQUEST_JOB_STEP_STAT, it calls
_rpc_stat_jobacct() in src/slurmd/slurmd/req.c 
(https://github.com/SchedMD/slurm/blob/834a9d6de6b945e6a3546e25a2b255d0ea7936f2/src/slurmd/slurmd/req.c#L3417)


It appears from there that the authorized users are:


/*
* check that requesting user ID is the SLURM UID or root
*/
if ((req_uid != uid) && (!_slurm_authorized_user(req_uid))) {
error("stat_jobacct from uid %ld for job %u "
  "owned by uid %ld",


So either the user needs to be the original requesting user, or needs to be 
admitted by _slurm_authorized_user(), which is defined as:

static bool
_slurm_authorized_user(uid_t uid)
{
return ((uid == (uid_t) 0) || (uid == conf->slurm_user_id));
}


So, only the requesting user, uid 0 (root), or the user running slurmctld 
(SlurmUser in slurm.conf) can run sstat.


I suppose if you want to add others you could consider allowing sudo access 
to sstat.  But no, this doesn't seem to be a bug, just the way it's written so 
far.



Doug Jacobsen, Ph.D.

NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center <http://www.nersc.gov>

dmjacob...@lbl.gov



- __o
-- _ '\<,_
--(_)/  (_)__





    

    

    On Mon, Apr 17, 2017 at 3:54 PM, Christopher Benjamin Coffey
<chris.cof...@nau.edu> wrote:

Hello all,

In my attempt to create another “root” user, I’ve found that it is not 
possible to create another user with the ability to “sstat jobid” every job on 
the cluster. This must be a bug. Can anyone confirm this? Thanks!

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167 

On 3/14/17, 12:55 PM, "Christopher Benjamin Coffey" <chris.cof...@nau.edu> 
wrote:

Hello, anyone know if this is possible? Thanks! ☺

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
    928-523-1167 

On 3/8/17, 9:19 AM, "Christopher Benjamin Coffey" 
<chris.cof...@nau.edu> wrote:

Hello,

Is it possible to create a slurm account that has privileges to get 
sstat read access for all running jobs without giving modification privileges? 
Thank you.

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167 
















[slurm-dev] Power user sstat rights

2017-03-08 Thread Christopher Benjamin Coffey
Hello,

Is it possible to create a slurm account that has privileges to get sstat read 
access for all running jobs without giving modification privileges? Thank you.

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167



[slurm-dev] Re: Power user sstat rights

2017-04-17 Thread Christopher Benjamin Coffey
Hello all,

In my attempt to create another “root” user, I’ve found that it is not possible 
to create another user with the ability to “sstat jobid” every job on the 
cluster. This must be a bug. Can anyone confirm this? Thanks!

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167

On 3/14/17, 12:55 PM, "Christopher Benjamin Coffey" <chris.cof...@nau.edu> 
wrote:

Hello, anyone know if this is possible? Thanks! ☺

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167

On 3/8/17, 9:19 AM, "Christopher Benjamin Coffey" <chris.cof...@nau.edu> 
wrote:

Hello,

Is it possible to create a slurm account that has privileges to get 
sstat read access for all running jobs without giving modification privileges? 
Thank you.

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167