Hi,

I have one more, took me half a day to locate it.
(Again I'm using 2.6.3, but it's not fixed on newer ones).

(I have a 3-node default partition, 10CPU per node)
srun -c 2 -N 2 some-job

This will set job_prt->details to:
min_cpus = 4 - this is c * 1 (default task_per_node) * N

And that works fine.

But now srun -c 2 -N 2-3 some-job

min_cpus = 2 - c * 1 (default task_per_node) * 1 (because node count not known I guess?)

Srun will return with error : "srun: error: Unable to create job step: More processors requested than permitted"

The issue is this:
1) cons_res/job_test.c line 2532
job_res is created and job_res->ncpus is set to job->min_cpus = 2
At this point the job_res->cpus (=cpu_count returned from _select_nodes) is valid.
2) Line 2625 -> cr_dist is called, which in the process rebuilds that array
3) cons_res/disk_tasks.c, function _compute_c_b_task_dist, line 153
maxtasks = maxtasks (which is equal to job_res->ncpus) / job->cpus_per_task
So in the example, maxtasks = 1.
Now job_res->cpus[1] will be 0, because after copying the first entry, task_count = 1 and the loop will terminate. 4) unknown location : apparently you cannot start a job with job_res->cpus[x] == 0

Unfortunately I don't know how to fix this, as you can do it in many places and I don't know the whole system good enough.

Using 'tasks-per-nodes' bypasses that problem, because then there is a different formula for 'maxtasks'.

-------------

There is also a another "bug" with srun -c X (without -n).
According to srun manual: "If -c is specified without -n, as many tasks will be allocated per node as possible while satisfying the -c restriction. For instance on a cluster with 8 CPUs per node, a job request for 4 nodes and 3 CPUs per task may be allocated 3 or 6 CPUs per node"

In the current implementation this cannot happen. You can allocate a maximum of "min_cpus" per node, but it is equal to "-c" value (or c * N, but in that case
the "min_cpus" is distributed over N nodes, so it's still 'c' per node).

-------------

By the way: could I get an explanation for the procedure in "cons_res/job_test.c" cr_job_test function?

There are 6(+ a zero step).
Zero step checks if a job can run on any resources.
First step checks if a job can run on free resources.

What is the rest for? We check jobs from other partitions, but as I understand you cannot stop a running job without preemption? What I am missing here?

Will be grateful for any help.

-------------

Best regards,
Filip Skalski

P.s Sorry for the lengthy post!

Reply via email to