We discovered a host where the infiniband connection was playing up. Our normal procedure for this is to remove the host from the hostgroups it is normally in and add it to a hostgroup associated with queues that only accept single node jobs (ie serial jobs and PEs with an allocation method of $pe_slots) until we can investigate. However rather than removing it from our parallel queues this mysteriously caused queue instances for every configured queue in the cluster to appear on the host (as evidenced by the output of qstat -f). i)Neither the hostgroup nor the host are referenced directly or indirectly from any queues bar the single node ones AFAICT. ii)Removing the host from the hostgroup causes the queue instances to dissapear. Adding it back causes them to reappear. iii)Other hosts in this hostgroup have only the queues they are supposed to.
William _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
