Hi Jeff,
I believe this is just the result of a race condition. While the
script below will executed very quickly, there is not usually much
time between a job create request and a job step create request
either. The attempt to create a job step will be make again and
everything should run as expected. I am just going to change the
logging so that you will not see this message by default in version
2.3.2 or higher. A patch similar to this should work with version
2.2.7 also.
diff --git a/src/slurmctld/proc_req.c b/src/slurmctld/proc_req.c
index e62d7af..990e80d 100644
--- a/src/slurmctld/proc_req.c
+++ b/src/slurmctld/proc_req.c
@@ -1666,8 +1666,13 @@ static void
_slurm_rpc_job_step_create(slurm_msg_t * msg)
/* return result */
if (error_code) {
unlock_slurmctld(job_write_lock);
- info("_slurm_rpc_job_step_create for job %u: %s",
- req_step_msg->job_id, slurm_strerror(error_code));
+ if (error_code == ESLURM_PROLOG_RUNNING) {
+ debug("_slurm_rpc_job_step_create for job %u: %s",
+ req_step_msg->job_id,
slurm_strerror(error_code));
+ } else {
+ info("_slurm_rpc_job_step_create for job %u: %s",
+ req_step_msg->job_id,
slurm_strerror(error_code));
+ }
slurm_send_rc_msg(msg, error_code);
} else {
slurm_step_layout_t *layout = step_rec->step_layout;
Quoting [email protected]:
Running Slurm 2.2.7, I'm seeing my logs filled with the likes of:
[2011-10-24T11:33:17] _slurm_rpc_job_step_create: SlurmctldProlog is
still running
The prolog script (below) is very small and doesn't even execute for
most people. The jobs which are apparently generating this error
don't belong to the configured user list. Runninh explicit tests of
it this as listed users and as not, it does do the right thing. Is
there something obvious I'm missing here. The specified file system
is mounted on the root node and all permissions are correct too.
Thanks in advance for any advice or opinion,
Jeff Katcher
---- prolog script as configured in slurm.conf ----
#!/bin/sh
SCRATCH_ROOT=/mnt/ruminant/scratch
create_scratch()
{
SCRATCH_DIR="$SCRATCH_ROOT/$1"
sudo /bin/mkdir -p $SCRATCH_DIR
sudo /bin/chown $2 $SCRATCH_DIR
}
case "$SLURM_JOB_USER" in
curly|larry|moe|shemp)
create_scratch $SLURM_JOB_ID $SLURM_JOB_USER
;;
*)
#echo "nothing happening here, return to your homes"
;;
esac
exit 0