Dear slurm-dev,

We have a question on the value of  gres_alloc which is reported to the slurm 
accounting. Either we do not understand the meaning of this parameter, or there 
may be a bug.

The observation is the following:

When "srunning" a job 3572543 using one gres GPU on a node, we observe in 
slurm's accounting database that the job has requested one GPU and one GPU has 
been allocated:


+------------+---------+-------------+-------------+----------+------------+-----------+
| job_db_inx | id_job  | nodelist    | nodes_alloc | gres_req | gres_alloc | 
gres_used |
+------------+---------+-------------+-------------+----------+------------+-----------+
|    3928247 | 3572543 | <--NODE1--> |           1 | gpu:1    | gpu:1      |    
       |
+------------+---------+-------------+-------------+----------+------------+-----------+


So far, so good, however, when submitting a second job 3572544 (using one GPU 
as well), we observe in the accounting, that the second job has requested one 
GPU (correct) but two GPUs are allocated (wrong?):


+------------+---------+-------------+-------------+----------+------------+-----------+
| job_db_inx | id_job  | nodelist    | nodes_alloc | gres_req | gres_alloc | 
gres_used |
+------------+---------+-------------+-------------+----------+------------+-----------+
|    3928249 | 3572544 | <--NODE1--> |           1 | gpu:1    | gpu:2      |    
       |
+------------+---------+-------------+-------------+----------+------------+-----------+

Please note that the two jobs are running simultaneously on the same node as 
the same user. Following this approach and submitting a third job, we observe 
that the value of gres_alloc is "gpu:3".

When looking at the slurm code (function _build_gres_alloc_string in 
node_scheduler.c), I do not see any dependency of the job when collecting the 
allocated general resources.  Thus, I have two questions:


1) What does the parameter gres_alloc exactly describe?
2) Why is there no dependency on the job when collecting the value of 
gres_alloc?


Best regards,
Oliver

--
  Dr. Oliver Fortmeier
  Technical Analyst High-Performance Computing,
  Bull GmbH, Germany
  Phone: +49 (0) 2203 / 305 2465
  Mobile: +49 (0) 173 / 5887589
  E-mail: [email protected]


Bull GmbH
Sitz Köln, Amtsgericht Köln, HR B 8173
Ust-Id-Nr.: DE 121965133, WEEE-Reg.-Nr. DE 64193985
Geschäftsführer: Gerd-Lothar Leonhart, Michael Heinrichs, Philippe Miltin
Zentrale:
51149 Köln, Von-der-Wettern-Strasse 27
Telefon: +49 (0) 2203 305-0
Telefax: +49 (0) 2203 305-1699
http://www.bull.de

Bull, Architect of an Open World TM
** Folgen Sie uns auf Twitter: http://twitter.com/bull_de
** Bull Firmenprofil bei XING: https://www.xing.com/companies/bullgmbh

Reply via email to