We discovered a host where the infiniband connection was playing up.
Our normal procedure for this is to remove the host from the
hostgroups it is normally in and add it to a hostgroup associated with
queues that only
accept single node jobs (ie serial jobs and PEs with an allocation
method of $pe_slots) until we can investigate.  However rather than
removing it from our parallel queues this mysteriously caused queue
instances for every configured queue in the cluster to appear on the
host (as evidenced by the output of qstat -f).
i)Neither the hostgroup nor the host are referenced directly or
indirectly from any  queues bar the single node ones AFAICT.
ii)Removing the host from the hostgroup causes the queue instances to
dissapear.  Adding it back causes them to reappear.
iii)Other hosts in this hostgroup have only the queues they are supposed to.

William
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to