Dear Slurmers,
after updating to 15.08 we made a funny observation. From calling "squeue -r |
wc -l" it seemed not all jobs were submitted anymore for large arrays.
Let me illustrate:
$ sbatch
--array=25212,25213,25214,25216,25217,25218,25219,25220,25221,25222,25223,25268,25269,25270,25271,25273,25274,25275,25276,25338,25339,25340,25341,25347,25353,25354,25356,25357,25372,25374,25378,25379,25380,25381,25382
<<EOF
#!/bin/bash
sleep 200
EOF
Submitted batch job 428763
Afterwards, with 15.08 (my username is olifre):
$ squeue -r -u olifre
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
428763_0 normal sbatch olifre PD 0:00 1 (Resources)
428763_25212 normal sbatch olifre PD 0:00 1 (Resources)
428763_25213 normal sbatch olifre PD 0:00 1 (Resources)
428763_25214 normal sbatch olifre PD 0:00 1 (Resources)
428763_25216 normal sbatch olifre PD 0:00 1 (Resources)
428763_25217 normal sbatch olifre PD 0:00 1 (Resources)
428763_25218 normal sbatch olifre PD 0:00 1 (Resources)
428763_25219 normal sbatch olifre PD 0:00 1 (Resources)
428763_25220 normal sbatch olifre PD 0:00 1 (Resources)
428763_25221 normal sbatch olifre PD 0:00 1 (Resources)
428763_25222 normal sbatch olifre PD 0:00 1 (Resources)
428763_25223 normal sbatch olifre PD 0:00 1 (Resources)
428763_25268 normal sbatch olifre PD 0:00 1 (Resources)
428763_25269 normal sbatch olifre PD 0:00 1 (Resources)
428763_25270 normal sbatch olifre PD 0:00 1 (Resources)
428763_25271 normal sbatch olifre PD 0:00 1 (Resources)
428763_25273 normal sbatch olifre PD 0:00 1 (Resources)
428763_25274 normal sbatch olifre PD 0:00 1 (Resources)
428763_25275 normal sbatch olifre PD 0:00 1 (Resources)
428763_25276 normal sbatch olifre PD 0:00 1 (Resources)
428763_25338 normal sbatch olifre PD 0:00 1 (Resources)
428763_25339 normal sbatch olifre PD 0:00 1 (Resources)
428763_25340 normal sbatch olifre PD 0:00 1 (Resources)
428763_25341 normal sbatch olifre PD 0:00 1 (Resources)
The total jobcount does not match, and we observee a strange JobID "428763_0".
After careful investigation, we found that:
- All jobs are still executed.
- This cutoff happens if the full array_task_string is "too long", i.e. one
uses many non-consecutive task-IDs (which happens often in our usecase).
Using "scontrol show job 428763", we get:
JobId=428763 ArrayJobId=428763
ArrayTaskId=25212-25214,25216-25223,25268-25271,25273-25276,25338-25341,...
JobName=sbatch
...
Note the final ellipsis - perfectly fine for display purposes.
We then checked the slurm-code and found that the array_task_str is now decoded
client-side in slurm_protocol_pack.c (good!).
However, by default, an arbitrary limit on the array_task_str to "64" is
enforced, ellipsizing the rest. Since "squeue -r" parses the array_task_str
using strtok and atoi, "atoi("...")" is transformed into a magic task-id "0".
So at least squeue and probably other slurm-tools parse the array_task_str
internally and are thus producing wrong output.
As expected (or, from documentation, rather unexpected...), "export
SLURM_BITSTR_LEN=0" 'fixes' the situation and "squeue -r" works correctly
again.
The manpages only explain the effects of this environment variable on the
commands which display the array_task_str itself to the user, here such a limit
is perfectly expected. However, breaking the "-r" functionality of squeue (and
probably other functionalities, of other tools?) by also ellipsizing the
array_task_str when it is not meant for "human parsing" is completely
unexpected and a regression to previous slurm versions.
Cheers,
Oliver