While investigating jobs that have been running for way too long, I've
found out that qhost shows nodes that are dead with "alive stats" such
as load, memuse and swapus. qstat also shows them processing jobs with
state "r", as if the node was there and working.

I'm used to see qhost display "-" under "alive stats" when nodes are
missing or not properly configured. Also, qstat usually displays an
error state if a job is stuck on such nodes.

What can cause qhost not to reflect reality? It's kind of annoying if
I have to add a script to confirm that nodes returned by the qhost
commands are still alive. Any tips for pinging them?

Thanks
Mich
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to