[slurm-dev] Re: Slurm array scheduling question

2016-09-21 Thread Christopher benjamin Coffey
Hi Janne,

> AFAIU the major optimization wrt. array job scheduling is that if the 
> scheduler finds that it cannot schedule a job in a job array, it skips over 
> all the rest of the jobs in the array.

If that’s true, then I think most of my worries should be nullified.  That 
would definitely make sense.

> Currently we have MaxJobCount=300k and MaxArraySize=100k (similar to your 
> case, we had some users that wanted to run huge array jobs). In order to 
> prevent individual users from hogging the entire cluster, we use the 
> GrpTRESRunMins limits (GrpCPURunMins if you're stuck on an older slurm 
> version).

We are running 16.05 and are utilizing GrpTRESRunMins (cpu, and mem); it 
working really well.  So with that in place I’m not worried about the 
researcher’s huge array clobbering the cluster.  I’ve been more thinking of the 
response time for users and such querying with squeue, as well as scheduling 
overhead.  But, it sounds like it’s working out well for your site!

Thank you for your feedback.

Best,
Chris

> On Sep 20, 2016, at 11:32 PM, Blomqvist Janne  
> wrote:
> 
> Hi,
> 
> AFAIU the major optimization wrt. array job scheduling is that if the 
> scheduler finds that it cannot schedule a job in a job array, it skips over 
> all the rest of the jobs in the array. There's also some memory benefits, 
> e.g. a pending job array is stored as a single object in the job queue, 
> rather than being broken up into a zillion separate jobs.
> 
> But, from the perspective of various limits (MaxJobCount, and the various job 
> number limits you can set per account/user if you use accounting, etc.), a 
> job array with N jobs counts as N jobs and not as one. 
> 
> Back in the slurm 2.something days, we found that we had to keep the 
> MaxJobCount somewhat reasonable (10k or such) lest the scheduler would bog 
> down, but current versions are a lot better in this respect. Currently we 
> have MaxJobCount=300k and MaxArraySize=100k (similar to your case, we had 
> some users that wanted to run huge array jobs). In order to prevent 
> individual users from hogging the entire cluster, we use the GrpTRESRunMins 
> limits (GrpCPURunMins if you're stuck on an older slurm version).
> 
> --
> Janne Blomqvist
> 
> 
> From: Christopher benjamin Coffey [chris.cof...@nau.edu]
> Sent: Wednesday, September 21, 2016 7:14
> To: slurm-dev
> Subject: [slurm-dev] Slurm array scheduling question
> 
> Hi,
> 
> When slurm is considering jobs to schedule including job arrays out of all 
> pending jobs, does slurm consider only the job array individually, or does it 
> consider the child jobs behind them?  I’m curious as I’ve to date limited the 
> size of job arrays to 4000 to be proportional with our max queue limit of 
> 13,000.  I’ve done this in order to keep the job depth at a reasonable size 
> for efficient slurm scheduling and backfilling (maybe not needed!).  But to 
> date, I, and the folks utilizing our cluster have been pleased with the 
> scheduling being done, and speed on our cluster; I don’t want to change that! 
> ☺
> 
> I have a researcher now wanting to process 100K+ inputs with slurm arrays, 
> and my 4000 limit is becoming a burden where we’ve been looking into ways to 
> work around it. I’ve started rethinking my original 4000 number and am now 
> wondering if it’s necessary to keep the array size so low.
> 
> A man on slurm.conf gives the impression that if I change the slurm array 
> size, the max queue size has to be augmented to a higher value.  This would 
> indicate to me that this would in fact impact the scheduling significantly as 
> now for backfill, there has to be potentially many more jobs tested before 
> starting them.
> 
> I’d like to get some feedback on this please from other sites and the 
> developers if possible.  Thank you!
> 
> Best,
> Chris
> 
> —
> Christopher Coffey
> High-Performance Computing
> Northern Arizona University
> 928-523-1167
> 



[slurm-dev] Re: Slurm array scheduling question

2016-09-20 Thread Christopher Samuel

On 21/09/16 14:15, Christopher benjamin Coffey wrote:

> I’d like to get some feedback on this please from other sites and the
> developers if possible.  Thank you!

The best I can offer is this from the job array documentation from Slurm:

# When a job array is submitted to Slurm, only one job record is
# created. Additional job records will only be created when the
# state of a task in the job array changes, typically when a task
# is allocated resources or its state is modified using the
# scontrol command.

In 2.6.x I think there was a record for each element, but this was
cleaned up in a later release (can't remember when sorry!).

Hope that helps,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] RE: Slurm array scheduling question

2016-09-20 Thread Blomqvist Janne
Hi,

AFAIU the major optimization wrt. array job scheduling is that if the scheduler 
finds that it cannot schedule a job in a job array, it skips over all the rest 
of the jobs in the array. There's also some memory benefits, e.g. a pending job 
array is stored as a single object in the job queue, rather than being broken 
up into a zillion separate jobs.

But, from the perspective of various limits (MaxJobCount, and the various job 
number limits you can set per account/user if you use accounting, etc.), a job 
array with N jobs counts as N jobs and not as one. 

Back in the slurm 2.something days, we found that we had to keep the 
MaxJobCount somewhat reasonable (10k or such) lest the scheduler would bog 
down, but current versions are a lot better in this respect. Currently we have 
MaxJobCount=300k and MaxArraySize=100k (similar to your case, we had some users 
that wanted to run huge array jobs). In order to prevent individual users from 
hogging the entire cluster, we use the GrpTRESRunMins limits (GrpCPURunMins if 
you're stuck on an older slurm version).

--
Janne Blomqvist


From: Christopher benjamin Coffey [chris.cof...@nau.edu]
Sent: Wednesday, September 21, 2016 7:14
To: slurm-dev
Subject: [slurm-dev] Slurm array scheduling question

Hi,

When slurm is considering jobs to schedule including job arrays out of all 
pending jobs, does slurm consider only the job array individually, or does it 
consider the child jobs behind them?  I’m curious as I’ve to date limited the 
size of job arrays to 4000 to be proportional with our max queue limit of 
13,000.  I’ve done this in order to keep the job depth at a reasonable size for 
efficient slurm scheduling and backfilling (maybe not needed!).  But to date, 
I, and the folks utilizing our cluster have been pleased with the scheduling 
being done, and speed on our cluster; I don’t want to change that! ☺

I have a researcher now wanting to process 100K+ inputs with slurm arrays, and 
my 4000 limit is becoming a burden where we’ve been looking into ways to work 
around it. I’ve started rethinking my original 4000 number and am now wondering 
if it’s necessary to keep the array size so low.

A man on slurm.conf gives the impression that if I change the slurm array size, 
the max queue size has to be augmented to a higher value.  This would indicate 
to me that this would in fact impact the scheduling significantly as now for 
backfill, there has to be potentially many more jobs tested before starting 
them.

I’d like to get some feedback on this please from other sites and the 
developers if possible.  Thank you!

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167