Dear Moe Jette, Thanks for the link to the patch.
Oliver > -----Ursprüngliche Nachricht----- > Von: Moe Jette [mailto:[email protected]] > Gesendet: Mittwoch, 27. November 2013 22:06 > An: slurm-dev > Betreff: [slurm-dev] Re: Questions on gres_alloc > > > This bug in Slurm will be fixed in version 2.6.5 when released. You can gat > an early patch for this here: > https://github.com/SchedMD/slurm/commit/1ae427dd88133ae62183dba0444d2c68afccb > 55c.patch > > > Quoting Oliver Fortmeier <[email protected]>: > > > Dear slurm-dev, > > > > We have a question on the value of gres_alloc which is reported to > > the slurm accounting. Either we do not understand the meaning of this > > parameter, or there may be a bug. > > > > The observation is the following: > > > > When "srunning" a job 3572543 using one gres GPU on a node, we observe > > in slurm's accounting database that the job has requested one GPU and > > one GPU has been allocated: > > > > > > +------------+---------+-------------+-------------+----------+------------ > +-----------+ > > | job_db_inx | id_job | nodelist | nodes_alloc | gres_req | > > gres_alloc | gres_used | > > +------------+---------+-------------+-------------+----------+------------ > +-----------+ > > | 3928247 | 3572543 | <--NODE1--> | 1 | gpu:1 | > > gpu:1 | | > > +------------+---------+-------------+-------------+----------+------------ > +-----------+ > > > > > > So far, so good, however, when submitting a second job 3572544 > > (using one GPU as well), we observe in the accounting, that the > > second job has requested one GPU (correct) but two GPUs are > > allocated (wrong?): > > > > > > +------------+---------+-------------+-------------+----------+------------ > +-----------+ > > | job_db_inx | id_job | nodelist | nodes_alloc | gres_req | > > gres_alloc | gres_used | > > +------------+---------+-------------+-------------+----------+------------ > +-----------+ > > | 3928249 | 3572544 | <--NODE1--> | 1 | gpu:1 | > > gpu:2 | | > > +------------+---------+-------------+-------------+----------+------------ > +-----------+ > > > > Please note that the two jobs are running simultaneously on the same > > node as the same user. Following this approach and submitting a > > third job, we observe that the value of gres_alloc is "gpu:3". > > > > When looking at the slurm code (function _build_gres_alloc_string in > > node_scheduler.c), I do not see any dependency of the job when > > collecting the allocated general resources. Thus, I have two > > questions: > > > > > > 1) What does the parameter gres_alloc exactly describe? > > 2) Why is there no dependency on the job when collecting the value > > of gres_alloc? > > > > > > Best regards, > > Oliver > > Bull GmbH Sitz Köln, Amtsgericht Köln, HR B 8173 Ust-Id-Nr.: DE 121965133, WEEE-Reg.-Nr. DE 64193985 Geschäftsführer: Gerd-Lothar Leonhart, Michael Heinrichs, Philippe Miltin Zentrale: 51149 Köln, Von-der-Wettern-Strasse 27 Telefon: +49 (0) 2203 305-0 Telefax: +49 (0) 2203 305-1699 http://www.bull.de Bull, Architect of an Open World TM ** Folgen Sie uns auf Twitter: http://twitter.com/bull_de ** Bull Firmenprofil bei XING: https://www.xing.com/companies/bullgmbh
