We are running a bluegene SLURM configuration.   We have noticed that when
running with 2 front end nodes (bgqfen1, bgqfen2).

when we run this:
for x in {1..50}; do sbatch --nodelist=bgq0001 --nodes=32 ./rbscratch.sh;
done

We see that 25 jobs run on bgqfen1 and 25 run on bgqfen2.

Picking through the slurm source code we found:

 */
extern front_end_record_t *assign_front_end(struct job_record *job_ptr)
{
#ifdef HAVE_FRONT_END
    static int last_assigned = -1;
    front_end_record_t *front_end_ptr;
    uint16_t state_flags;
    int i;
for (i = 0; i < front_end_node_cnt; i++) {
        last_assigned = (last_assigned + 1) % front_end_node_cnt;
        front_end_ptr = front_end_nodes + last_assigned;
        if (job_ptr->batch_host) {   /* Find specific front-end node */
            if (strcmp(job_ptr->batch_host, front_end_ptr->name))
                continue;
            if (!_front_end_access(front_end_ptr, job_ptr))
                break;
        } else {        /* Find some usable front-end node */
            if (IS_NODE_DOWN(front_end_ptr) ||
                IS_NODE_DRAIN(front_end_ptr) ||
                IS_NODE_NO_RESPOND(front_end_ptr))
                continue;
            if (!_front_end_access(front_end_ptr, job_ptr))
                continue;
        }
        state_flags = front_end_nodes[last_assigned].node_state &
                  NODE_STATE_FLAGS;
        front_end_nodes[last_assigned].node_state =
                NODE_STATE_ALLOCATED | state_flags;
        front_end_nodes[last_assigned].job_cnt_run++;
        return front_end_ptr;
    }
    if (job_ptr->batch_host) {  /* Find specific front-end node */
        error("assign_front_end: front end node %s not found",
              job_ptr->batch_host);
    } else {        /* Find some usable front-end node */
        error("assign_front_end: no available front end nodes found");
    }
#endif
    return NULL;
}

This code appears to imply that, if the field "batch_host" in the structure
job_record, somehow one could tell it what front end node to pick.

Does anyone know of what sbatch command line option, or what prolog script
code that one could write to explictly set this field so it runs on the
same front end node that the job was queued from?




Ralph Bellofatto
IBM TJ Watson Research
1-914-945-3321
[email protected]

Reply via email to