> -----Original Message----- > From: Thomas Nau [mailto:[email protected]] > Sent: 1. syyskuuta 2011 13:36 > To: Matti Rintala > Cc: [email protected] > Subject: Re: [Samba] Samba 3.5.8 (and 3.5.5) shipped with Solaris 10 > keeps crashing when smbd process count hits about 500-600 > > On 09/01/2011 10:32 AM, Matti Rintala wrote: > > Hi, > > > > We are running Samba on Solaris 10 cluster as a HA service. There are > two nodes in the cluster and Samba versions are 3.5.8 on other node and > 3.5.5 on another. Samba build is one that ships with Solaris 10. We are > using Sun (Oracle) LDAP for user account data so passwd and group > databases related information is retrieved from there. Authentication > is done against Windows 2008 AD. > > > > This Samba service is serving users home directories. Same data is > also shared using NFS. We have over 11000 user accounts. During summer > this new service was working nicely but when user count has increased > we are experiencing severe problems. When smbd process limit hits about > 500 Samba just stops responding and we have to restart it. Usually > Oracle Solaris Cluster does restart but it fails because one smbd > process won't die even with -9 signal. Nothing really crashes and at > least for some time mother smbd keeps forking new childs so process > count keeps increasing. > > I'm not sure if any of the p* commands or truss will be of some help in > that state. Nevertheless you could check callstack and open files using > pfiles and pstack > > If those don't help one idea that pops up in my mind is to use dtrace
Thanks for the hints. > > > > We have opened support case to Oracle and together with them we have > speculated that this issue might be caused by naming service and/or > LDAP issue. So we disabled nscd but that didn't have any effect. We > have also switched hosts' ldap_cachemgr to use more efficient LDAP > server without success. > > I doubt that as those are not kernel related and the "kill -9" issue > point to some kernel "problem" I'm currently installing recommended patches to one of the cluster nodes to rule out kernel related issues or other known bugs. Matti > > > Any ideas what could be wrong or any ideas how to debug the problem, > please? We are still continuing investigations with Oracle too. > > Thomas -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba
