I am seeing a lot of errors in my logs from slurmstepd with the message 'error: Invalid host_index -1 for job ' which then shows the jobs as failing with a NODE_FAIL in sacct. Seems to happen to jobs that have been preeempted and requeued. Any idea what could be going on here?
-- Chris Scheller | http://www.pobox.com/~schelcj ---------------------------------------------- "By the time they had diminished from 50 to 8, the other dwarves began to suspect 'Hungry' ..." -- Gary Larson, "The Far Side"