Dear Moe Jette,

Thanks for the link to the patch.

Oliver

> -----Ursprüngliche Nachricht-----
> Von: Moe Jette [mailto:[email protected]]
> Gesendet: Mittwoch, 27. November 2013 22:06
> An: slurm-dev
> Betreff: [slurm-dev] Re: Questions on gres_alloc
>
>
> This bug in Slurm will be fixed in version 2.6.5 when released. You can gat
> an early patch for this here:
> https://github.com/SchedMD/slurm/commit/1ae427dd88133ae62183dba0444d2c68afccb
> 55c.patch
>
>
> Quoting Oliver Fortmeier <[email protected]>:
>
> > Dear slurm-dev,
> >
> > We have a question on the value of  gres_alloc which is reported to
> > the slurm accounting. Either we do not understand the meaning of this
> > parameter, or there may be a bug.
> >
> > The observation is the following:
> >
> > When "srunning" a job 3572543 using one gres GPU on a node, we observe
> > in slurm's accounting database that the job has requested one GPU and
> > one GPU has been allocated:
> >
> >
> > +------------+---------+-------------+-------------+----------+------------
> +-----------+
> > | job_db_inx | id_job  | nodelist    | nodes_alloc | gres_req |
> > gres_alloc | gres_used |
> > +------------+---------+-------------+-------------+----------+------------
> +-----------+
> > |    3928247 | 3572543 | <--NODE1--> |           1 | gpu:1    |
> > gpu:1      |           |
> > +------------+---------+-------------+-------------+----------+------------
> +-----------+
> >
> >
> > So far, so good, however, when submitting a second job 3572544
> > (using one GPU as well), we observe in the accounting, that the
> > second job has requested one GPU (correct) but two GPUs are
> > allocated (wrong?):
> >
> >
> > +------------+---------+-------------+-------------+----------+------------
> +-----------+
> > | job_db_inx | id_job  | nodelist    | nodes_alloc | gres_req |
> > gres_alloc | gres_used |
> > +------------+---------+-------------+-------------+----------+------------
> +-----------+
> > |    3928249 | 3572544 | <--NODE1--> |           1 | gpu:1    |
> > gpu:2      |           |
> > +------------+---------+-------------+-------------+----------+------------
> +-----------+
> >
> > Please note that the two jobs are running simultaneously on the same
> > node as the same user. Following this approach and submitting a
> > third job, we observe that the value of gres_alloc is "gpu:3".
> >
> > When looking at the slurm code (function _build_gres_alloc_string in
> > node_scheduler.c), I do not see any dependency of the job when
> > collecting the allocated general resources.  Thus, I have two
> > questions:
> >
> >
> > 1) What does the parameter gres_alloc exactly describe?
> > 2) Why is there no dependency on the job when collecting the value
> > of gres_alloc?
> >
> >
> > Best regards,
> > Oliver
> >
Bull GmbH
Sitz Köln, Amtsgericht Köln, HR B 8173
Ust-Id-Nr.: DE 121965133, WEEE-Reg.-Nr. DE 64193985
Geschäftsführer: Gerd-Lothar Leonhart, Michael Heinrichs, Philippe Miltin
Zentrale:
51149 Köln, Von-der-Wettern-Strasse 27
Telefon: +49 (0) 2203 305-0
Telefax: +49 (0) 2203 305-1699
http://www.bull.de

Bull, Architect of an Open World TM
** Folgen Sie uns auf Twitter: http://twitter.com/bull_de
** Bull Firmenprofil bei XING: https://www.xing.com/companies/bullgmbh

Reply via email to