Dear list,
in the latest slurm-14.11.3 and the current git master no_consume resources do not work properly. Consider the following node specification in slurm.conf: NodeName=node1 [...] Gres=potion:no_consume:10000 [...] and the following gres definition in gres.conf: Name=potion Count=10000 When executing $ srun -n 2 --gres=potion:10 --pty /bin/bash -i the srun command hangs indefinitely. In the slurmctld log files the following error appears: [2015-01-23T14:10:26.269] error: gres/potion: step_test 75.4294967294 node offset invalid (0 >= 0) [2015-01-23T14:10:26.269] error: gres/potion: step_test 75.4294967294 node offset invalid (0 >= 0) The root cause for this issue is that the node_cnt variable in the job gres data is zero because the function _job_alloc() exits early if node_gres_ptr->no_consume is true. The patch below fixes this issue by introducing a quick exit path in _step_test() if the node_cnt is zero. An alternative approach would be to patch _job_alloc() to properly initialize node_cnt. In this case however, also gres_cnt_step_alloc would need to be allocated and initialized. The patch applies to the current (as of this writing) master branch and also to slurm-14.11.3. Best regards, Dorian From 3e97e9986df898bf515ebb7b07b59cec7be7d7f8 Mon Sep 17 00:00:00 2001 From: Dorian Krause <[email protected]> Date: Fri, 23 Jan 2015 14:12:13 +0100 Subject: [PATCH] Fix issue with no_consume resources --- src/common/gres.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/common/gres.c b/src/common/gres.c index d47e9a9..03ed18e 100644 --- a/src/common/gres.c +++ b/src/common/gres.c @@ -4665,7 +4665,8 @@ static uint32_t _step_test(void *step_gres_data, void xassert(job_gres_ptr); xassert(step_gres_ptr); - if (node_offset == NO_VAL) { + if ((node_offset == NO_VAL) || + (0 == job_gres_ptr->node_cnt)) { /* no_consume */ if (step_gres_ptr->gres_cnt_alloc > job_gres_ptr->gres_cnt_alloc) return 0; -- 1.9.3 ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------
