Dear list,

in the latest slurm-14.11.3 and the current git master no_consume
resources do not work properly.

Consider the following node specification in slurm.conf:

NodeName=node1 [...] Gres=potion:no_consume:10000 [...]

and the following gres definition in gres.conf:

Name=potion Count=10000

When executing

$ srun -n 2 --gres=potion:10 --pty /bin/bash -i

the srun command hangs indefinitely. In the slurmctld log files the
following error appears:

[2015-01-23T14:10:26.269] error: gres/potion: step_test 75.4294967294
node offset invalid (0 >= 0)
[2015-01-23T14:10:26.269] error: gres/potion: step_test 75.4294967294
node offset invalid (0 >= 0)

The root cause for this issue is that the node_cnt variable in the job
gres data is zero because
the function _job_alloc() exits early if node_gres_ptr->no_consume is
true. The patch below fixes
this issue by introducing a quick exit path in _step_test() if the
node_cnt is zero. An alternative
approach would be to patch _job_alloc() to properly initialize node_cnt.
In this case however, also
gres_cnt_step_alloc would need to be allocated and initialized.

The patch applies to the current (as of this writing) master branch and
also to slurm-14.11.3.

Best regards,
Dorian

From 3e97e9986df898bf515ebb7b07b59cec7be7d7f8 Mon Sep 17 00:00:00 2001
From: Dorian Krause <[email protected]>
Date: Fri, 23 Jan 2015 14:12:13 +0100
Subject: [PATCH] Fix issue with no_consume resources

---
 src/common/gres.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/common/gres.c b/src/common/gres.c
index d47e9a9..03ed18e 100644
--- a/src/common/gres.c
+++ b/src/common/gres.c
@@ -4665,7 +4665,8 @@ static uint32_t _step_test(void *step_gres_data, void
        xassert(job_gres_ptr);
        xassert(step_gres_ptr);

-       if (node_offset == NO_VAL) {
+       if ((node_offset == NO_VAL) ||
+           (0 == job_gres_ptr->node_cnt)) {    /* no_consume */
                if (step_gres_ptr->gres_cnt_alloc >
                    job_gres_ptr->gres_cnt_alloc)
                        return 0;
--
1.9.3






------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

Reply via email to