For several months now, we've been having smbd processes which 'lock' and escalate to 99% CPU utilization effectively locking the end-user out entirely and hanging their client machines. Almost exclusively happening while the user is saving either MS Word or Excel file, and even more specifically only narrowed to a couple of users.

We've tried various patches offered by members of the samba team here through the list, which over the past few versions of samba have helped greatly (thanx guys), but to no avail has the problem ever ceased to exist. Admittedly, the state of our network was rather poor and ineffective for debugging purposes.

Recently, we moved to change that when a nice thunderstorm took out three of our existing switches. We have since replaced the network hardware in both the main server room, and the network branch with which all the users encountering this problem exist. The network now consists of NetGear Layer 2 Managed switches, (1- 12 PORT SFP switch in the server room operating at 1000Mbit full duplex with 2 independant fiber links to (2) 24 Port 10/100 switches with 1000mbit fiber uplinks via GBICs). Figuring that perhaps the issue was indeed out network disconnecting users, and thus leaving a stale smbd processes locking the file they were using and escalating to 99% cpu in some way-ward loop of code somewhere...

Now, things are running a lot faster, but the problem seems to be getting trickier. We're having users encounter a similar problem as to before, except now the first smbd process belonging to a specific client becomes locked without escalating to 100% cpu utilization. Essentially I get something similar to this:

  (wmpoff25 is the machine/client in question in this case, user usually
   calls to say 'my machine is locked up'):

wmptwo# /server/bin/samba-3.0.13/bin/net status sessions | grep wmpoff25
10135   cboakes       shop          wmpoff25     (10.0.0.27)
10015   cboakes       shop          wmpoff25     (10.0.0.27)

A simple 'kill 10015' does nothing, repeat... nothing, finally, 'kill -9 10015' , and poof - the end user's system comes back to them and all runs well until the next time they call us.

The problem therefore the same as before, and our resolution much the same, except that now the process does not climb to high cpu utilization.

In my dispair I started to think perhaps the issue is with the LDAP tree, noting that the slapd process cannot exit cleanly on our systems, (seems to be a bug in openldap/freebsd-amd64/threads), so I've since re-compiled ldap and re-created the tree from a 'slapcat' backup using a copy of ldap which is not utilizing threads. This cripples our setup a little, as slurp will not compile/run without threading support - to say nothing of the obvious performance issues in not using a threaded version of slapd. But for now, at least slapd starts, runs, and exits cleanly. We depend on ldap not only for our samba user database, but also for our unix user base via pam_ldap and nss_ldap to multiple servers and even a few *_nix workstations.

So here I am again, at a loss. I tried compiling samba-3.0.20, and all compiles well, smbd starts, but nobody's home for some reason. Admittedly have not had the time nor capability to properly debug or roll-out 3.0.20, because these servers are in production environment now running slightly hacked copy of 3.0.13. I cannot stop our systems from running to 'try' them with 3.0.20, and have not a test machine capable of running freebsd/amd64 which is not already in use. Our servers are all dual AMD Opteron based boxes with dual homed gigabit ethernet connections (one link to the main network, and one amongst each other).

Aside from 'try 3.0.20', any suggestions someone may offer? I will be setting up a test server shortly and trying to get 3.0.20 to run cleanly on it, but I figured it may be worth posting now to see if anyone had some other ideas. Any and all constructive feedback would be greatly appreciated.

We're running FreeBSD 5.3-RELEASE/AMD64, with OpenLDAP 2.2.26 (no thread support), and samba-3.0.13 (with one server running 3.0.7 for print server with no errors thus far).




--
Nathan Vidican
[EMAIL PROTECTED]
Windsor Match Plate & Tool Ltd.
http://www.wmptl.com/
--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/listinfo/samba

Reply via email to