Running Slurm 2.2.7, I'm seeing my logs filled with the likes of:
[2011-10-24T11:33:17] _slurm_rpc_job_step_create: SlurmctldProlog is
still running
The prolog script (below) is very small and doesn't even execute for
most people. The jobs which are apparently generating this error
don't belong to the configured user list. Runninh explicit tests of
it this as listed users and as not, it does do the right thing. Is
there something obvious I'm missing here. The specified file system
is mounted on the root node and all permissions are correct too.
Thanks in advance for any advice or opinion,
Jeff Katcher
---- prolog script as configured in slurm.conf ----
#!/bin/sh
SCRATCH_ROOT=/mnt/ruminant/scratch
create_scratch()
{
SCRATCH_DIR="$SCRATCH_ROOT/$1"
sudo /bin/mkdir -p $SCRATCH_DIR
sudo /bin/chown $2 $SCRATCH_DIR
}
case "$SLURM_JOB_USER" in
curly|larry|moe|shemp)
create_scratch $SLURM_JOB_ID $SLURM_JOB_USER
;;
*)
#echo "nothing happening here, return to your homes"
;;
esac
exit 0