Hi all,

I am currently using slurm 2.2.7 but I am just about to start the cluster upgrade process
which includes upgrading to slurm 14.11.2.
At the current production version (2.2.7), I am using the command 'scontrol wait_job $SLURM_JOB_ID' at the prolog script defined by the 'PrologSlurmctld' parameter so that I wait for all the nodes just been powered up and then I check if the lustre file system is already mounted in all of them. If I try the same procedure with 14.11.2 version, the 'scrontrol wait_job $SLURM_JOB_ID' command never ends even thought all the nodes are up and ready. If I put the same command in a sbatch script or a shell passed
to salloc, it works. That means it releases when nodes are up and ready

The question is:

Is the 'scontrol wait_job $SLURM_JOB_ID' command working differently in 2.2.7 and 14.11.2 versions if been put at the prolog script defined by the 'PrologSlurmctld' parameter?

--
Ramiro Alba

Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu

Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 8928


--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que est� net.

Reply via email to