Chris,

This one line patch seems to fix the problem.

diff --git a/src/slurmctld/node_scheduler.c b/src/slurmctld/node_scheduler.c
index 1ce9158..1ef5cf1 100644
--- a/src/slurmctld/node_scheduler.c
+++ b/src/slurmctld/node_scheduler.c
@@ -137,6 +137,7 @@ extern void allocate_nodes(struct job_record *job_ptr)
        job_ptr->batch_host = xstrdup(job_ptr->front_end_ptr->name);
 #endif

+       xfree(job_ptr->batch_host);
        for (i = 0; i < node_record_count; i++) {
                if (!bit_test(job_ptr->node_bitmap, i))
                        continue;


Quoting Chris Scheller <sche...@pobox.com>:

I am seeing a lot of errors in my logs from slurmstepd
with the message 'error: Invalid host_index -1 for job ' which then
shows the jobs as failing with a NODE_FAIL in sacct. Seems to happen
to jobs that have been preeempted and requeued. Any idea what could be
going on here?

--
Chris Scheller | http://www.pobox.com/~schelcj
----------------------------------------------
"By the time they had diminished from 50 to 8, the other dwarves began
to suspect 'Hungry' ..."
                -- Gary Larson, "The Far Side"




Reply via email to