Re: [Beowulf] Interactive vs batch, and schedulers

2020-01-17 Thread David Mathog
On Thu, 16 Jan 2020 23:24:56 "Lux, Jim (US 337K)" wrote: What I’m interested in is the idea of jobs that, if spread across many nodes (dozens) can complete in seconds (<1 minute) providing essentially “interactive” access, in the context of large jobs taking days to complete. It’s not clear to

Re: [Beowulf] Interactive vs batch, and schedulers

2020-01-17 Thread Scott Atchley
Hi Jim, While we allow both batch and interactive, the scheduler handles them the same. The scheduler uses queue time, node count, requested wall time, project id, and others to determine when items run. We have backfill turned on so that when the scheduler allocates a large job and the time to

Re: [Beowulf] Interactive vs batch, and schedulers [EXT]

2020-01-17 Thread Skylar Thompson
In the Grid Engine world, we've worked around some of the resource fragmentation issues by assigning static sequence numbers to queue instances (a node publishing resources to a queue) and then having the scheduler fill nodes by sequence number rather than spreading jobs across the cluster. This

Re: [Beowulf] Interactive vs batch, and schedulers

2020-01-17 Thread Luc Vereecken
Hi Jim, Something like this can be done within traditional resource managers by using consumable generic resources, and oversubscribing your nodes. E.g. a 32core node would be defined as a 64 node core, with 32 "batch" resources, and 32 "interactive" resources. Submitting a job to a batch queue

Re: [Beowulf] Interactive vs batch, and schedulers [EXT]

2020-01-17 Thread Tim Cutts
Indeed, and you can quite easily get into a “boulders and sand” scheduling problem; if you allow the small interactive jobs (the sand) free access to everything, the scheduler tends to find them easy to schedule, partially fills nodes with them, and then finds it can’t find contiguous resources

Re: [Beowulf] Interactive vs batch, and schedulers

2020-01-16 Thread Chris Samuel
On 16/1/20 3:24 pm, Lux, Jim (US 337K) via Beowulf wrote: What I’m interested in is the idea of jobs that, if spread across many nodes (dozens) can complete in seconds (<1 minute) providing essentially “interactive” access, in the context of large jobs taking days to complete.   It’s not

Re: [Beowulf] Interactive vs batch, and schedulers

2020-01-16 Thread Alex Chekholko via Beowulf
Hey Jim, There is an inverse relationship between latency and throughput. Most supercomputing centers aim to keep their overall utilization high, so the queue always needs to be full of jobs. If you can have 1000 nodes always idle and available, then your 1000 node jobs will usually take 10

[Beowulf] Interactive vs batch, and schedulers

2020-01-16 Thread Lux, Jim (US 337K) via Beowulf
Are there any references out there that discuss the tradeoffs between interactive and batch scheduling (perhaps some from the 60s and 70s?) – Most big HPC systems have a mix of giant jobs and smaller ones managed by some process like PBS or SLURM, with queues of various sized jobs. What I’m