I am seeing a lot of errors in my logs from slurmstepd 
with the message 'error: Invalid host_index -1 for job ' which then
shows the jobs as failing with a NODE_FAIL in sacct. Seems to happen
to jobs that have been preeempted and requeued. Any idea what could be
going on here?

-- 
Chris Scheller | http://www.pobox.com/~schelcj
----------------------------------------------
"By the time they had diminished from 50 to 8, the other dwarves began
to suspect 'Hungry' ..."
                -- Gary Larson, "The Far Side"

Reply via email to