Hi,
I have one more, took me half a day to locate it.
(Again I'm using 2.6.3, but it's not fixed on newer ones).
(I have a 3-node default partition, 10CPU per node)
srun -c 2 -N 2 some-job
This will set job_prt->details to:
min_cpus = 4 - this is c * 1 (default task_per_node) * N
And that works fine.
But now srun -c 2 -N 2-3 some-job
min_cpus = 2 - c * 1 (default task_per_node) * 1 (because node count
not known I guess?)
Srun will return with error : "srun: error: Unable to create job step:
More processors requested than permitted"
The issue is this:
1) cons_res/job_test.c line 2532
job_res is created and job_res->ncpus is set to job->min_cpus = 2
At this point the job_res->cpus (=cpu_count returned from
_select_nodes) is valid.
2) Line 2625 -> cr_dist is called, which in the process rebuilds that array
3) cons_res/disk_tasks.c, function _compute_c_b_task_dist, line 153
maxtasks = maxtasks (which is equal to job_res->ncpus) / job->cpus_per_task
So in the example, maxtasks = 1.
Now job_res->cpus[1] will be 0, because after copying the first entry,
task_count = 1 and the loop will terminate.
4) unknown location : apparently you cannot start a job with
job_res->cpus[x] == 0
Unfortunately I don't know how to fix this, as you can do it in many
places and I don't know the whole system good enough.
Using 'tasks-per-nodes' bypasses that problem, because then there is a
different formula for 'maxtasks'.
-------------
There is also a another "bug" with srun -c X (without -n).
According to srun manual: "If -c is specified without -n, as many
tasks will be allocated per node as possible while satisfying the -c
restriction. For instance on a cluster with 8 CPUs per node, a job
request for 4 nodes and 3 CPUs per task may be allocated 3 or 6 CPUs
per node"
In the current implementation this cannot happen. You can allocate a
maximum of "min_cpus" per node, but it is equal to "-c" value (or c *
N, but in that case
the "min_cpus" is distributed over N nodes, so it's still 'c' per node).
-------------
By the way: could I get an explanation for the procedure in
"cons_res/job_test.c" cr_job_test function?
There are 6(+ a zero step).
Zero step checks if a job can run on any resources.
First step checks if a job can run on free resources.
What is the rest for? We check jobs from other partitions, but as I
understand you cannot stop a running job without preemption? What I am
missing here?
Will be grateful for any help.
-------------
Best regards,
Filip Skalski
P.s Sorry for the lengthy post!