Hi Jeff,

I believe this is just the result of a race condition. While the script below will executed very quickly, there is not usually much time between a job create request and a job step create request either. The attempt to create a job step will be make again and everything should run as expected. I am just going to change the logging so that you will not see this message by default in version 2.3.2 or higher. A patch similar to this should work with version 2.2.7 also.

diff --git a/src/slurmctld/proc_req.c b/src/slurmctld/proc_req.c
index e62d7af..990e80d 100644
--- a/src/slurmctld/proc_req.c
+++ b/src/slurmctld/proc_req.c
@@ -1666,8 +1666,13 @@ static void _slurm_rpc_job_step_create(slurm_msg_t * msg)
        /* return result */
        if (error_code) {
                unlock_slurmctld(job_write_lock);
-               info("_slurm_rpc_job_step_create for job %u: %s",
-                    req_step_msg->job_id, slurm_strerror(error_code));
+               if (error_code == ESLURM_PROLOG_RUNNING) {
+                       debug("_slurm_rpc_job_step_create for job %u: %s",
+ req_step_msg->job_id, slurm_strerror(error_code));
+               } else {
+                       info("_slurm_rpc_job_step_create for job %u: %s",
+ req_step_msg->job_id, slurm_strerror(error_code));
+               }
                slurm_send_rc_msg(msg, error_code);
        } else {
                slurm_step_layout_t *layout = step_rec->step_layout;


Quoting [email protected]:

Running Slurm 2.2.7, I'm seeing my logs filled with the likes of:
[2011-10-24T11:33:17] _slurm_rpc_job_step_create: SlurmctldProlog is
still running

The prolog script (below) is very small and doesn't even execute for
most people.  The jobs which are apparently generating this error
don't belong to the configured user list.  Runninh explicit tests of
it this as listed users and as not, it does do the right thing.  Is
there something obvious I'm missing here.  The specified file system
is mounted on the root node and all permissions are correct too.

Thanks in advance for any advice or opinion,
Jeff Katcher

---- prolog script as configured in slurm.conf ----
#!/bin/sh

SCRATCH_ROOT=/mnt/ruminant/scratch

create_scratch()
{
        SCRATCH_DIR="$SCRATCH_ROOT/$1"
        sudo /bin/mkdir -p $SCRATCH_DIR
        sudo /bin/chown $2 $SCRATCH_DIR
}

case "$SLURM_JOB_USER" in
       curly|larry|moe|shemp)
                create_scratch $SLURM_JOB_ID $SLURM_JOB_USER
                ;;
        *)
                #echo "nothing happening here, return to your homes"
                ;;
esac

exit 0




Reply via email to