After some investigation, I've narrowed the problem down somewhat. But, first 
some system information:

CentOS 5.8, kernel 2.6.18-308.el5, 64-bit Intel system. We're using dnsmasq to 
cache dns-queries and nscd. We have an internal bind which xCAT manages.

Once the qmaster starts up, I see this in the messages-file:
10/31/2012 09:48:11|  main|srvname|W|local configuration srvname not defined - 
using global configuration
I'm not entirely sure what that means, but some searches pointed to 
resolver/hostname issues.

This led me to check the resolving, so I turned the hosts-section of 
nsswitch.conf to files only, and then added everything needed to /etc/hosts. I 
turned off nscd and dnsmasq. After that, it doesn't segfault anymore and 
everything seems to work as it should. 

Has anyone seen anything like this before? Even if the DNS has bogus and faulty 
information (although I can't see that it has), the qmaster shouldn't segfault, 
should it? As soon as I change back from pure files to "files dns" it takes 2-3 
minutes and the qmaster segfaults again. It might be worth noting that this 
host is an SGE 6.2u5 qmaster usually, with the original configuration of the 
resolver, it works without problems.

Any ideas of what could lead to this rather strange behaviour, or how to dig up 
more information to further narrow it down?

Wbr
Andreas




--------------------------------------------------------------------------
Confidentiality Notice: This message is private and may contain confidential 
and proprietary information. If you have received this message in error, please 
notify us and remove it from your system and note that you must not copy, 
distribute or take any action in reliance on it. Any unauthorized use or 
disclosure of the contents of this message is not permitted and may be unlawful.
 

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to