[Samba] smbd hung processes - Samba 3.0.7
I have the very same problem with samba 3.0.14a running on Debian Sarge and have found no solution yet. Has anyone found a solution in the meanwhile? Could it perhaps be due to a failure in our networking hardware? > We've seen Samba crash and burn twice in the last 48 hours - it just > started happening, and we have no idea what might be causing it. I'm > hoping that someone will recognize this problem. > Platform: we are running RedHat Enterprise Server, with Samba 3.0.7. > We're using security=domain in an old-style NT4 domain environment. > The symptom that we're seeing is that the number of smbd processes > suddenly begins to increase. We normally run with betwen 100 and 150 smb > processes, but when Samba fails, the number starts to increase quickly, > and users start to have problems accessing files. > smbstatus reports approximately the right number of clients (133), but ps > shows a much larger number of smbd processes active (680). Smbstatus > reports a list of active smbd processes - this list includes the oldest > processes and the newest processes, but there is a block of smbd processes > in the middle that are not in the smbstatus report. What we THINK is > happening is that the smbd processes begin to hang, the clients time out, > they initiate a new session with Samba server, which respawns another smbd > server process (leaving the old, hung process running). This keeps > happening over and over until we kill samba. The hung processes need to > be kill -9'ed. -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/listinfo/samba
Re: [Samba] smbd hung processes - Samba 3.0.7
We're still experiencing this issue. I've observed a couple of things during the latest event. I mentioned before that netstat -a shows many smb processes in CLOSE_WAIT state when this problem occurs. I happened to strace on a process that was stuck in "recvfrom(1315,". When I killed this process many of the hung smb processes terminated and the Samba service was responding again. I've noticed the following error in the messages file: Dec 13 18:16:20 valhalla smbd[18005]: [2004/12/13 18:16:20, 0] tdb/tdbutil.c:tdb_log(725) Dec 13 18:16:20 valhalla smbd[18005]: tdb(/var/cache/samba/locking.tdb): tdb_lock failed on list 99 ltype =0 (Resource deadlock avoided) This particular error showed up prior to samba being restarted. When the service is restarted many of these show up in the log. I also notice that many times during an event the messages file contains the following errors: Dec 13 19:29:33 valhalla smbd[28820]: domain_client_validate: Domain password server not available. Dec 13 19:29:33 valhalla smbd[28820]: [2004/12/13 19:29:33, 0] auth/auth_domain.c:domain_client_validate(17 0) Not sure if this is a coincidence. -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/listinfo/samba
Re: [Samba] smbd hung processes - Samba 3.0.7
This time things behaved a little differently. Samba was not replying to mount requests. An strace of the parent smbd process showed that the requests coming in were attempted to be spawned, however, were failing. I observed this behavior when we had processes hanging and the spawned processes appeared to be hung on a lock. We've since removed strick locking and now the spawned processes die immediately after being spawned. clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb72142e8) = 29287 close(22) = 0 select(1024, [18 19 20], NULL, NULL, NULL) = ? ERESTARTNOHAND (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- waitpid(-1, NULL, WNOHANG) = 29284 waitpid(-1, NULL, WNOHANG) = 0 sigreturn() = ? (mask now [FPE USR2 PIPE]) select(1024, [18 19 20], NULL, NULL, NULL) = ? ERESTARTNOHAND (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- waitpid(-1, NULL, WNOHANG) = 29285 waitpid(-1, NULL, WNOHANG) = 0 sigreturn() = ? (mask now [FPE USR2 PIPE]) select(1024, [18 19 20], NULL, NULL, NULL) = 1 (in [18]) time(NULL) = 1102529684 accept(18, {sa_family=AF_INET, sin_port=htons(1865), sin_addr=inet_addr("131.101.53.201")}, [16]) = 22 fcntl64(22, F_GETFL)= 0x2 (flags O_RDWR) fcntl64(22, F_SETFL, O_RDWR)= 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb72142e8) = 29288 close(22) = 0 select(1024, [18 19 20], NULL, NULL, NULL) = ? ERESTARTNOHAND (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- waitpid(-1, NULL, WNOHANG) = 29286 waitpid(-1, NULL, WNOHANG) = 0 sigreturn() = ? (mask now [FPE USR2 PIPE]) select(1024, [18 19 20], NULL, NULL, NULL) = 1 (in [18]) time(NULL) = 1102529684 accept(18, {sa_family=AF_INET, sin_port=htons(1304), sin_addr=inet_addr("131.101.18.20")}, [16]) = 22 fcntl64(22, F_GETFL)= 0x2 (flags O_RDWR) fcntl64(22, F_SETFL, O_RDWR)= 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb72142e8) = 29289 close(22) = 0 select(1024, [18 19 20], NULL, NULL, NULL) = ? ERESTARTNOHAND (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- waitpid(-1, NULL, WNOHANG) = 29287 waitpid(-1, NULL, WNOHANG) = 0 sigreturn() = ? (mask now [FPE USR2 PIPE]) select(1024, [18 19 20], NULL, NULL, NULL) = 1 (in [18]) time(NULL) = 1102529684 accept(18, {sa_family=AF_INET, sin_port=htons(2064), sin_addr=inet_addr("131.101.185.75")}, [16]) = 22 fcntl64(22, F_GETFL)= 0x2 (flags O_RDWR) fcntl64(22, F_SETFL, O_RDWR)= 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb72142e8) = 29290 close(22) = 0 select(1024, [18 19 20], NULL, NULL, NULL) = ? ERESTARTNOHAND (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- "Gerald (Jerry) Carter" <[EMAIL PROTECTED]> 12/08/2004 11:53 AM To: [EMAIL PROTECTED] cc: [EMAIL PROTECTED] Subject: Re: [Samba] smbd hung processes - Samba 3.0.7 -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 [EMAIL PROTECTED] wrote: | Hmmm. So do you think turning off strict locking will | help or is there something "wrong" with the tdb records | that we can clear? First we need to find out what file that fd is associated with. Then we can start working backwards to find root cause. cheers, jerry - - Alleviating the pain of Windows(tm) --- http://www.samba.org GnuPG Key- http://www.plainjoe.org/gpg_public.asc "If we're adding to the noise, turn off this song"--Switchfoot (2003) -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBtzGPIR7qMdg1EfYRAi6dAJ9ShhAuixBiW4PLkq2BbM0h7IIF+QCfZjJX Z2Mc3N+SCOQm3RgKfDEwxCY= =ZDqD -END PGP SIGNATURE- -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
Re: [Samba] smbd hung processes - Samba 3.0.7
Looks like it's a link to /var/cache/samba/gencache.tdb. -John "Gerald (Jerry) Carter" <[EMAIL PROTECTED]> 12/08/2004 11:53 AM To: [EMAIL PROTECTED] cc: [EMAIL PROTECTED] Subject: Re: [Samba] smbd hung processes - Samba 3.0.7 -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 [EMAIL PROTECTED] wrote: | Hmmm. So do you think turning off strict locking will | help or is there something "wrong" with the tdb records | that we can clear? First we need to find out what file that fd is associated with. Then we can start working backwards to find root cause. cheers, jerry - - Alleviating the pain of Windows(tm) --- http://www.samba.org GnuPG Key- http://www.plainjoe.org/gpg_public.asc "If we're adding to the noise, turn off this song"--Switchfoot (2003) -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBtzGPIR7qMdg1EfYRAi6dAJ9ShhAuixBiW4PLkq2BbM0h7IIF+QCfZjJX Z2Mc3N+SCOQm3RgKfDEwxCY= =ZDqD -END PGP SIGNATURE- -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
Re: [Samba] smbd hung processes - Samba 3.0.7
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 [EMAIL PROTECTED] wrote: | Hmmm. So do you think turning off strict locking will | help or is there something "wrong" with the tdb records | that we can clear? First we need to find out what file that fd is associated with. Then we can start working backwards to find root cause. cheers, jerry - - Alleviating the pain of Windows(tm) --- http://www.samba.org GnuPG Key- http://www.plainjoe.org/gpg_public.asc "If we're adding to the noise, turn off this song"--Switchfoot (2003) -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBtzGPIR7qMdg1EfYRAi6dAJ9ShhAuixBiW4PLkq2BbM0h7IIF+QCfZjJX Z2Mc3N+SCOQm3RgKfDEwxCY= =ZDqD -END PGP SIGNATURE- -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
Re: [Samba] smbd hung processes - Samba 3.0.7
Hi Jerry, Thanks for the reply. I'll check this if it reoccurs again. We've turned off strict locking to see if this helps. This was on a hunch that it was a lock issue. To answer your question, the access to the main share on this server is via the automounter to a local directory. For example the automount map /hwnet/ccvobs mounts /export/vobs on this server. The share [vobs] is mapped to /hwnet/vobs. The default timeout is 60 seconds and we do see the automounter expire and remount this mount point frequently. While we're not re-exporting this file system there are certainly times when the automounter will apparently unmount and remount it. Note: that during the "event" the filesystem is available both locally and via the automounter. -John "Gerald (Jerry) Carter" <[EMAIL PROTECTED]> 12/08/2004 11:04 AM To: [EMAIL PROTECTED] cc: [EMAIL PROTECTED] Subject: Re: [Samba] smbd hung processes - Samba 3.0.7 -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 [EMAIL PROTECTED] wrote: | We have upgraded to the 3.0.7-1.3E.1 RH Samba update | and this problem still occurs. Has anyone else experienced | this or does anyone have any ideas on what's causing this? | | -John | | | [EMAIL PROTECTED] wrote: | | |> We've seen Samba crash and burn twice in the last 48 hours |> - it just started happening, and we have no idea what |> might be causing it. I'm hoping that someone will |> recognize this problem. Are you reexporting NFS shares by chance? |> in the middle that are not in the smbstatus report. |> What we THINK is happening is that the smbd processes |> begin to hang, the clients time out, A good theory (which would be true if re-exporting NFS shares and the NFS server got stuck). |> # strace -p 20403 |> Process 20403 attached - interrupt to quit |> fcntl64(13, F_SETLKW64, {type=F_RDLCK, whence=SEEK_SET, start=280, |>len=1} |> look in /proc/http://www.samba.org GnuPG Key- http://www.plainjoe.org/gpg_public.asc "If we're adding to the noise, turn off this song"--Switchfoot (2003) -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBtyX5IR7qMdg1EfYRAmD+AKCvqab8SuxkEFDp8PxPNsqOMJxHmQCfQHpz FMflmk9WH2CP7Jfr52aktkA= =tLj7 -END PGP SIGNATURE- -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
Re: [Samba] smbd hung processes - Samba 3.0.7
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 [EMAIL PROTECTED] wrote: | We have upgraded to the 3.0.7-1.3E.1 RH Samba update | and this problem still occurs. Has anyone else experienced | this or does anyone have any ideas on what's causing this? | | -John | | | [EMAIL PROTECTED] wrote: | | |> We've seen Samba crash and burn twice in the last 48 hours |> - it just started happening, and we have no idea what |> might be causing it. I'm hoping that someone will |> recognize this problem. Are you reexporting NFS shares by chance? |> in the middle that are not in the smbstatus report. |> What we THINK is happening is that the smbd processes |> begin to hang, the clients time out, A good theory (which would be true if re-exporting NFS shares and the NFS server got stuck). |> # strace -p 20403 |> Process 20403 attached - interrupt to quit |> fcntl64(13, F_SETLKW64, {type=F_RDLCK, whence=SEEK_SET, start=280, |>len=1} |> look in /proc/ cheers, jerry - - Alleviating the pain of Windows(tm) --- http://www.samba.org GnuPG Key- http://www.plainjoe.org/gpg_public.asc "If we're adding to the noise, turn off this song"--Switchfoot (2003) -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBtyX5IR7qMdg1EfYRAmD+AKCvqab8SuxkEFDp8PxPNsqOMJxHmQCfQHpz FMflmk9WH2CP7Jfr52aktkA= =tLj7 -END PGP SIGNATURE- -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
Re: [Samba] smbd hung processes - Samba 3.0.7
We have upgraded to the 3.0.7-1.3E.1 RH Samba update and this problem still occurs. Has anyone else experienced this or does anyone have any ideas on what's causing this? -John [EMAIL PROTECTED] wrote: >We've seen Samba crash and burn twice in the last 48 hours - it just >started happening, and we have no idea what might be causing it. I'm >hoping that someone will recognize this problem. > >Platform: we are running RedHat Enterprise Server, with Samba 3.0.7. >We're using security=domain in an old-style NT4 domain environment. > >The symptom that we're seeing is that the number of smbd processes >suddenly begins to increase. We normally run with betwen 100 and 150 smb >processes, but when Samba fails, the number starts to increase quickly, >and users start to have problems accessing files. > >smbstatus reports approximately the right number of clients (133), but ps >shows a much larger number of smbd processes active (680). Smbstatus >reports a list of active smbd processes - this list includes the oldest >processes and the newest processes, but there is a block of smbd processes >in the middle that are not in the smbstatus report. What we THINK is >happening is that the smbd processes begin to hang, the clients time out, >they initiate a new session with Samba server, which respawns another smbd >server process (leaving the old, hung process running). This keeps >happening over and over until we kill samba. The hung processes need to >be kill -9'ed. > >If you do a "strace" on these apparently hung processes, you see this: > ># strace -p 20403 >Process 20403 attached - interrupt to quit >fcntl64(13, F_SETLKW64, {type=F_RDLCK, whence=SEEK_SET, start=280, >len=1} > > >I'm not sure if it's relevent, but netstat -a reports a large number of >sockets in the CLOSE_WAIT state (I've included a small sample): > >Proto Recv-Q Send-Q Local Address Foreign Address State >tcp1 0 valhalla:microsoft-ds army39:1455 CLOSE_WAIT >tcp1 0 valhalla:microsoft-ds 131.101.40.174:2531 >CLOSE_WAIT >tcp 54 0 valhalla:microsoft-ds army39:1435 CLOSE_WAIT >tcp 54 0 valhalla:microsoft-ds 131.101.40.174:2512 >CLOSE_WAIT > >In this log, valhalla is the Samba server, and microsoft-ds is port 445 >(the CIFS port). > >There doesn't seem to be anything relevent in the smbd log files (we were >using log level 1). We've increased the log level to 3 in the hope that >we'll get more information the next time Samba goes wild. > >Our smb.conf file isn't complicated - the global section looks like this: > >[global] > workgroup = ICD > netbios name = VALHALLA > security = domain > password server = * > wins server = nn.nn.nn.nn mm.mm.mm.mm > server string = Linux ClearCase Server %v %h > log file = /var/log/samba/%m.log > log level = 3 > max log size = 4000 > username map = /etc/samba/smbusers > read raw = no > oplocks = no > kernel oplocks = no > level2 oplocks = no > create mask = 0774 > directory mask = 0775 > map archive = No > preserve case = yes > deadtime = 0 > > > Is this by any chance with the 3.0.7-1.3E.1 RH Samba update that was just recently released or one of the previous 3.0.7 RH packages? Christian -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
Re: [Samba] smbd hung processes - Samba 3.0.7
>Is this by any chance with the 3.0.7-1.3E.1 RH Samba update that was >just recently released or one of the previous 3.0.7 RH packages? Sorry, I probably should have been more specific. It's 3.0.7-1.3E (i.e. NOT the latest security patch version). Do you believe that 3.0.7-1.3E.1 would fix the problem? I didn't see anything in the update release notes that would indicate that it might address this issue. -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
Re: [Samba] smbd hung processes - Samba 3.0.7
[EMAIL PROTECTED] wrote: We've seen Samba crash and burn twice in the last 48 hours - it just started happening, and we have no idea what might be causing it. I'm hoping that someone will recognize this problem. Platform: we are running RedHat Enterprise Server, with Samba 3.0.7. We're using security=domain in an old-style NT4 domain environment. The symptom that we're seeing is that the number of smbd processes suddenly begins to increase. We normally run with betwen 100 and 150 smb processes, but when Samba fails, the number starts to increase quickly, and users start to have problems accessing files. smbstatus reports approximately the right number of clients (133), but ps shows a much larger number of smbd processes active (680). Smbstatus reports a list of active smbd processes - this list includes the oldest processes and the newest processes, but there is a block of smbd processes in the middle that are not in the smbstatus report. What we THINK is happening is that the smbd processes begin to hang, the clients time out, they initiate a new session with Samba server, which respawns another smbd server process (leaving the old, hung process running). This keeps happening over and over until we kill samba. The hung processes need to be kill -9'ed. If you do a "strace" on these apparently hung processes, you see this: # strace -p 20403 Process 20403 attached - interrupt to quit fcntl64(13, F_SETLKW64, {type=F_RDLCK, whence=SEEK_SET, start=280, len=1} I'm not sure if it's relevent, but netstat -a reports a large number of sockets in the CLOSE_WAIT state (I've included a small sample): Proto Recv-Q Send-Q Local Address Foreign Address State tcp1 0 valhalla:microsoft-ds army39:1455 CLOSE_WAIT tcp1 0 valhalla:microsoft-ds 131.101.40.174:2531 CLOSE_WAIT tcp 54 0 valhalla:microsoft-ds army39:1435 CLOSE_WAIT tcp 54 0 valhalla:microsoft-ds 131.101.40.174:2512 CLOSE_WAIT In this log, valhalla is the Samba server, and microsoft-ds is port 445 (the CIFS port). There doesn't seem to be anything relevent in the smbd log files (we were using log level 1). We've increased the log level to 3 in the hope that we'll get more information the next time Samba goes wild. Our smb.conf file isn't complicated - the global section looks like this: [global] workgroup = ICD netbios name = VALHALLA security = domain password server = * wins server = nn.nn.nn.nn mm.mm.mm.mm server string = Linux ClearCase Server %v %h log file = /var/log/samba/%m.log log level = 3 max log size = 4000 username map = /etc/samba/smbusers read raw = no oplocks = no kernel oplocks = no level2 oplocks = no create mask = 0774 directory mask = 0775 map archive = No preserve case = yes deadtime = 0 Is this by any chance with the 3.0.7-1.3E.1 RH Samba update that was just recently released or one of the previous 3.0.7 RH packages? Christian -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
[Samba] smbd hung processes - Samba 3.0.7
We've seen Samba crash and burn twice in the last 48 hours - it just started happening, and we have no idea what might be causing it. I'm hoping that someone will recognize this problem. Platform: we are running RedHat Enterprise Server, with Samba 3.0.7. We're using security=domain in an old-style NT4 domain environment. The symptom that we're seeing is that the number of smbd processes suddenly begins to increase. We normally run with betwen 100 and 150 smb processes, but when Samba fails, the number starts to increase quickly, and users start to have problems accessing files. smbstatus reports approximately the right number of clients (133), but ps shows a much larger number of smbd processes active (680). Smbstatus reports a list of active smbd processes - this list includes the oldest processes and the newest processes, but there is a block of smbd processes in the middle that are not in the smbstatus report. What we THINK is happening is that the smbd processes begin to hang, the clients time out, they initiate a new session with Samba server, which respawns another smbd server process (leaving the old, hung process running). This keeps happening over and over until we kill samba. The hung processes need to be kill -9'ed. If you do a "strace" on these apparently hung processes, you see this: # strace -p 20403 Process 20403 attached - interrupt to quit fcntl64(13, F_SETLKW64, {type=F_RDLCK, whence=SEEK_SET, start=280, len=1} I'm not sure if it's relevent, but netstat -a reports a large number of sockets in the CLOSE_WAIT state (I've included a small sample): Proto Recv-Q Send-Q Local Address Foreign Address State tcp1 0 valhalla:microsoft-ds army39:1455 CLOSE_WAIT tcp1 0 valhalla:microsoft-ds 131.101.40.174:2531 CLOSE_WAIT tcp 54 0 valhalla:microsoft-ds army39:1435 CLOSE_WAIT tcp 54 0 valhalla:microsoft-ds 131.101.40.174:2512 CLOSE_WAIT In this log, valhalla is the Samba server, and microsoft-ds is port 445 (the CIFS port). There doesn't seem to be anything relevent in the smbd log files (we were using log level 1). We've increased the log level to 3 in the hope that we'll get more information the next time Samba goes wild. Our smb.conf file isn't complicated - the global section looks like this: [global] workgroup = ICD netbios name = VALHALLA security = domain password server = * wins server = nn.nn.nn.nn mm.mm.mm.mm server string = Linux ClearCase Server %v %h log file = /var/log/samba/%m.log log level = 3 max log size = 4000 username map = /etc/samba/smbusers read raw = no oplocks = no kernel oplocks = no level2 oplocks = no create mask = 0774 directory mask = 0775 map archive = No preserve case = yes deadtime = 0 -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba