Hi all,
I am currently using slurm 2.2.7 but I am just about to start the
cluster upgrade process
which includes upgrading to slurm 14.11.2.
At the current production version (2.2.7), I am using the command
'scontrol wait_job $SLURM_JOB_ID'
at the prolog script defined by the 'PrologSlurmctld' parameter so that
I wait for all the nodes just
been powered up and then I check if the lustre file system is already
mounted in all of them.
If I try the same procedure with 14.11.2 version, the 'scrontrol
wait_job $SLURM_JOB_ID' command never ends
even thought all the nodes are up and ready. If I put the same command
in a sbatch script or a shell passed
to salloc, it works. That means it releases when nodes are up and ready
The question is:
Is the 'scontrol wait_job $SLURM_JOB_ID' command working differently in
2.2.7 and 14.11.2 versions if
been put at the prolog script defined by the 'PrologSlurmctld'
parameter?
--
Ramiro Alba
Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu
Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 8928
--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que est� net.