On Thu, 16 Jan 2020 23:24:56 "Lux, Jim (US 337K)" wrote:
What I’m interested in is the idea of jobs that, if spread across many
nodes (dozens) can complete in seconds (<1 minute) providing
essentially “interactive” access, in the context of large jobs taking
days to complete. It’s not clear to
Hi Jim,
While we allow both batch and interactive, the scheduler handles them the
same. The scheduler uses queue time, node count, requested wall time,
project id, and others to determine when items run. We have backfill turned
on so that when the scheduler allocates a large job and the time to
In the Grid Engine world, we've worked around some of the resource
fragmentation issues by assigning static sequence numbers to queue
instances (a node publishing resources to a queue) and then having the
scheduler fill nodes by sequence number rather than spreading jobs across
the cluster. This
Hi Jim,
Something like this can be done within traditional resource managers by
using consumable generic resources, and oversubscribing your nodes. E.g.
a 32core node would be defined as a 64 node core, with 32 "batch"
resources, and 32 "interactive" resources. Submitting a job to a batch
queue
Indeed, and you can quite easily get into a “boulders and sand” scheduling
problem; if you allow the small interactive jobs (the sand) free access to
everything, the scheduler tends to find them easy to schedule, partially fills
nodes with them, and then finds it can’t find contiguous resources
On 16/1/20 3:24 pm, Lux, Jim (US 337K) via Beowulf wrote:
What I’m interested in is the idea of jobs that, if spread across many
nodes (dozens) can complete in seconds (<1 minute) providing essentially
“interactive” access, in the context of large jobs taking days to
complete. It’s not
Hey Jim,
There is an inverse relationship between latency and throughput. Most
supercomputing centers aim to keep their overall utilization high, so the
queue always needs to be full of jobs.
If you can have 1000 nodes always idle and available, then your 1000 node
jobs will usually take 10
Are there any references out there that discuss the tradeoffs between
interactive and batch scheduling (perhaps some from the 60s and 70s?) –
Most big HPC systems have a mix of giant jobs and smaller ones managed by some
process like PBS or SLURM, with queues of various sized jobs.
What I’m