Dear Slurmers, 

after updating to 15.08 we made a funny observation. From calling "squeue -r | 
wc -l" it seemed not all jobs were submitted anymore for large arrays. 

Let me illustrate: 
$ sbatch 
--array=25212,25213,25214,25216,25217,25218,25219,25220,25221,25222,25223,25268,25269,25270,25271,25273,25274,25275,25276,25338,25339,25340,25341,25347,25353,25354,25356,25357,25372,25374,25378,25379,25380,25381,25382
 <<EOF
#!/bin/bash
sleep 200
EOF
Submitted batch job 428763

Afterwards, with 15.08 (my username is olifre): 
 $ squeue -r -u olifre
             JOBID PARTITION     NAME     USER ST       TIME  NODES 
NODELIST(REASON)
          428763_0    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25212    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25213    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25214    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25216    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25217    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25218    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25219    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25220    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25221    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25222    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25223    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25268    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25269    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25270    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25271    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25273    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25274    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25275    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25276    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25338    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25339    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25340    normal   sbatch   olifre PD       0:00      1 (Resources)
      428763_25341    normal   sbatch   olifre PD       0:00      1 (Resources)

The total jobcount does not match, and we observee a strange JobID "428763_0". 

After careful investigation, we found that: 
- All jobs are still executed. 
- This cutoff happens if the full array_task_string is "too long", i.e. one 
uses many non-consecutive task-IDs (which happens often in our usecase). 

Using "scontrol show job 428763", we get: 
JobId=428763 ArrayJobId=428763 
ArrayTaskId=25212-25214,25216-25223,25268-25271,25273-25276,25338-25341,... 
JobName=sbatch
...

Note the final ellipsis - perfectly fine for display purposes. 

We then checked the slurm-code and found that the array_task_str is now decoded 
client-side in slurm_protocol_pack.c (good!). 
However, by default, an arbitrary limit on the array_task_str to "64" is 
enforced, ellipsizing the rest. Since "squeue -r" parses the array_task_str 
using strtok and atoi, "atoi("...")" is transformed into a magic task-id "0". 
So at least squeue and probably other slurm-tools parse the array_task_str 
internally and are thus producing wrong output. 

As expected (or, from documentation, rather unexpected...), "export 
SLURM_BITSTR_LEN=0" 'fixes' the situation and "squeue -r" works correctly 
again. 

The manpages only explain the effects of this environment variable on the 
commands which display the array_task_str itself to the user, here such a limit 
is perfectly expected. However, breaking the "-r" functionality of squeue (and 
probably other functionalities, of other tools?) by also ellipsizing the 
array_task_str when it is not meant for "human parsing" is completely 
unexpected and a regression to previous slurm versions. 

Cheers, 
        Oliver

Reply via email to