Chris, This one line patch seems to fix the problem.
diff --git a/src/slurmctld/node_scheduler.c b/src/slurmctld/node_scheduler.c index 1ce9158..1ef5cf1 100644 --- a/src/slurmctld/node_scheduler.c +++ b/src/slurmctld/node_scheduler.c @@ -137,6 +137,7 @@ extern void allocate_nodes(struct job_record *job_ptr) job_ptr->batch_host = xstrdup(job_ptr->front_end_ptr->name); #endif + xfree(job_ptr->batch_host); for (i = 0; i < node_record_count; i++) { if (!bit_test(job_ptr->node_bitmap, i)) continue; Quoting Chris Scheller <sche...@pobox.com>:
I am seeing a lot of errors in my logs from slurmstepd with the message 'error: Invalid host_index -1 for job ' which then shows the jobs as failing with a NODE_FAIL in sacct. Seems to happen to jobs that have been preeempted and requeued. Any idea what could be going on here? -- Chris Scheller | http://www.pobox.com/~schelcj ---------------------------------------------- "By the time they had diminished from 50 to 8, the other dwarves began to suspect 'Hungry' ..." -- Gary Larson, "The Far Side"