Hello,
    I'm seeing some inconsistency when using this new config parameter.

1. In the first case, when using the following script:

######################
#!/bin/bash
#SBATCH -p debug
#SBATCH -N 2
#SBATCH -t 00:15:00
#SBATCH -J jobname

sleep 10m
#######################

Regardless of whether Alloc is set or not, prolog only runs on the first allocated node.



2. In the second case of a simple:

srun -N 2 hostname

If Alloc is NOT set. Prolog runs on both nodes. If Alloc IS set, prolog only runs on the first node.




My expectation was that with Alloc set, prolog would be run on both nodes at allocation time. But I'm seeing that with Alloc set, prolog never runs on the second node no matter how I launch the job.

I've simplified my test case down to the attached slurm.conf. And prolog is just doing:

echo $SLURM_JOB_ID >> /tmp/slurm-prolog_info.txt

Thanks

Martins
#
# Example slurm.conf file. Please run configurator.html
# (in doc/html) to build a configuration file customized
# for your environment.
#
#
# slurm.conf file generated by configurator.html.
#
# See the slurm.conf man page for more information.
#
ClusterName=testing
ControlMachine=slurmcontrol
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
StateSaveLocation=/tmp
SlurmdSpoolDir=/tmp/slurmd
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/pgid
CacheGroups=0
ReturnToService=0
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
SchedulerType=sched/backfill
SelectType=select/linear
FastSchedule=1
#Prolog
PrologFlags=Alloc
Prolog=/usr/local/bin/slurm-prolog
# LOGGING
SlurmctldDebug=3
SlurmdDebug=3
SlurmSchedLogFile=/tmp/sched.log
SlurmctldLogFile=/tmp/slurmctld.log
SlurmdLogFile=/tmp/slurmd.log
JobCompType=jobcomp/none
# COMPUTE NODES
NodeName=node0[1-2] Procs=1 State=UNKNOWN
PartitionName=debug Nodes=node0[1-2] Default=YES MaxTime=INFINITE State=UP

Reply via email to