Chris,
This one line patch seems to fix the problem.
diff --git a/src/slurmctld/node_scheduler.c b/src/slurmctld/node_scheduler.c
index 1ce9158..1ef5cf1 100644
--- a/src/slurmctld/node_scheduler.c
+++ b/src/slurmctld/node_scheduler.c
@@ -137,6 +137,7 @@ extern void allocate_nodes(struct job_record *job_ptr)
job_ptr->batch_host = xstrdup(job_ptr->front_end_ptr->name);
#endif
+ xfree(job_ptr->batch_host);
for (i = 0; i < node_record_count; i++) {
if (!bit_test(job_ptr->node_bitmap, i))
continue;
Quoting Chris Scheller <[email protected]>:
I am seeing a lot of errors in my logs from slurmstepd
with the message 'error: Invalid host_index -1 for job ' which then
shows the jobs as failing with a NODE_FAIL in sacct. Seems to happen
to jobs that have been preeempted and requeued. Any idea what could be
going on here?
--
Chris Scheller | http://www.pobox.com/~schelcj
----------------------------------------------
"By the time they had diminished from 50 to 8, the other dwarves began
to suspect 'Hungry' ..."
-- Gary Larson, "The Far Side"