On Thu, Sep 01, 2011 at 11:32:32AM +0300, Matti Rintala wrote: > We are running Samba on Solaris 10 cluster as a HA > service. There are two nodes in the cluster and Samba > versions are 3.5.8 on other node and 3.5.5 on another. > Samba build is one that ships with Solaris 10. We are > using Sun (Oracle) LDAP for user account data so passwd > and group databases related information is retrieved from > there. Authentication is done against Windows 2008 AD. > > This Samba service is serving users home directories. Same > data is also shared using NFS. We have over 11000 user > accounts. During summer this new service was working > nicely but when user count has increased we are > experiencing severe problems. When smbd process limit hits > about 500 Samba just stops responding and we have to > restart it. Usually Oracle Solaris Cluster does restart > but it fails because one smbd process won't die even with > -9 signal. Nothing really crashes and at least for some > time mother smbd keeps forking new childs so process count > keeps increasing.
Not being able to kill a process with -9 is a kernel problem. You need to find out what the process is doing, although I'm not sure how to do that under Solaris. Can truss or some other tool inspect a process that is stuck? > We have opened support case to Oracle and together with > them we have speculated that this issue might be caused by > naming service and/or LDAP issue. So we disabled nscd but > that didn't have any effect. We have also switched hosts' > ldap_cachemgr to use more efficient LDAP server without > success. Naa, not being able to kill -9 is VERY unlikely to be a LDAP issue. That's mostly user space, except SUN might have some door implementation of nss. Volker -- SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen phone: +49-551-370000-0, fax: +49-551-370000-9 AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba
