If we run the simple, contrived perl script below, we get the behaviour we expect (based on our understanding of the man pages). That is, each jobid.jobstep.out file contains the output of one hostname command.
#!/usr/bin/env perl
use strict;
use warnings;
for my $i (1..100) {
my $cmd = "srun -J job${i} -o %J.out /bin/hostname >> ./srun.out 2>&1 &";
system($cmd);
}
However if we run
% sbatch -w "awarnach[1-10]" -o batch.out test.pl
some jobid.jobstep.out files will have many lines of output each showing
a different hostname, while others will have no output. Based on our
configuration, we expected the same results as if we ran the script
directly.
Also, if we remove the loop in the perl script so that it only contains
one srun command, we typically see no output.
Are we confusing some fundamental concept? Is our configuration somehow
broken?
This is slurm 14.11.3 on FreeBSD 10.1 using the slurm.conf below.
Regards,
Joseph
AuthType=auth/munge
CacheGroups=0
ClusterName=awarnach
ControlMachine=awarnach
DebugFlags=NO_CONF_HASH
MailProg=/usr/bin/mail
MpiDefault=none
ProctrackType=proctrack/pgid
ReturnToService=0
SlurmUser=slurm
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
StateSaveLocation=/var/run/slurm
SwitchType=switch/none
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=300
SlurmdTimeout=300
Waittime=0
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_CPU
JobCompType=jobcomp/none
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurmd.log
NodeName=awarnach[1-16],awarnach[18-19] Procs=4 State=UNKNOWN
NodeName=awarnach[20] Procs=48 State=UNKNOWN
PartitionName=all Nodes=awarnach[1-16],awarnach[18-20] Default=YES
MaxTime=INFINITE State=UP
signature.asc
Description: PGP signature
