Can you do a strace on the command as it's failing? Something like "strace -e trace=open,connect qstat > /dev/null" should at least give a pointer for where the failure is occuring. My first thought is that nscd is caching some negative response for a few minutes, rather than retrying.
On Fri, Aug 24, 2018 at 10:25:42AM -0400, Valerio Luccio wrote: > Hello all, > > we have a rather old installation of SGE that has been running for years > without any problems. In the last 2-3 weeks I've been experiencing an > odd problem: when issuing any command (qsub, qstat, qping, etc) I get > the following error: > > error: commlib error: access denied (server host resolves destination > host "<server address>" as "(HOST_NOT_RESOLVABLE)") > > error: unable to contact qmaster using port 6444 on host "<server > address>" > > There are several odd things about this: > > * Nothing has changed on the server or the clients in the months > before the error started appearing. > * This happens from most of the clients, but not all. > * The error persists for 5-10 minutes, and then everything works fine. > * Both gethostbyname and gethostbyaddr return the correct values from > the client while the error occurs (I haven't had a chance to try > them from the master during these episodes). > > I get a feeling that this has something to do with DNS and reverse > lookup, but I don't know where to start debugging it. > > Anyone have any clue what I should look at ? > > Thanks, > > -- > Valerio Luccio (212) 998-8736 > Center for Brain Imaging 4 Washington Place, Room 157 > New York University New York, NY 10003 > > "In an open world, who needs windows or gates ?" > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users -- -- Skylar Thompson (skyl...@u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users