While investigating jobs that have been running for way too long, I've found out that qhost shows nodes that are dead with "alive stats" such as load, memuse and swapus. qstat also shows them processing jobs with state "r", as if the node was there and working.
I'm used to see qhost display "-" under "alive stats" when nodes are missing or not properly configured. Also, qstat usually displays an error state if a job is stuck on such nodes. What can cause qhost not to reflect reality? It's kind of annoying if I have to add a script to confirm that nodes returned by the qhost commands are still alive. Any tips for pinging them? Thanks Mich _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
