Re: [Samba] Tall tale of woe....
On Mon, 15 Dec 2003, Gerald (Jerry) Carter wrote: The kernel should log the oops in /var/log/messages. Yeah, its not there. log stops at 11:29:07 the next entry is at 11:47 when its booting. We can't be blamed for a kernel oops. If a user space app can cause the kernel to die, then that's a kernel bug. I would start pursuing this with RedHat (if you have support), or logging it in bugzilla.redhat.com. not trying to aportion blame here. Just trying to get the good old stable server back :/ was wondering if anyone else has had anything like this before? i will contact redhat and see if they can offer any suggestions. many thanks Ross McInnes -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
Re: [Samba] Tall tale of woe....
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ross McInnes (Systems) wrote: | not trying to aportion blame here. Just trying to get | the good old stable server back :/ was wondering if anyone | else has had anything like this before? I wasn't on the defensive. Just stating that it would have to a kernel bug in this case (one that I've not see come up before). It is possilbe that a hardware component is failing (e.g. RAM). cheers, jerry -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.1 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQE/3xFVIR7qMdg1EfYRAokOAJ0VWHOo42PAOM/hGmzZdv6jpjPjcACeJHQj Cgs6zc0YctQb2pv1o+jIUuI= =eQAw -END PGP SIGNATURE- -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
Re: [Samba] Tall tale of woe....
On Tue, 16 Dec 2003, Gerald (Jerry) Carter wrote: I wasn't on the defensive. Just stating that it would have to a kernel bug in this case (one that I've not see come up before). It is possilbe that a hardware component is failing (e.g. RAM). sorry i didnt mean it to come across like that. If its something thats not been seen before then it must be a hardware/kernal issue. Many Thanks Ross McInnes -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
Re: [Samba] Tall tale of woe....
Jerry... It logs to stdout. Ah ok so redirect to another file will be in order. I think the key will be figuring out which tdb the runaway smbd is reading. Probably. Does ifconfig show an abnormal amount of errors? If not, then you are probably ok wrt duplex settings, et. al. And to clarify, when the smbd starts sucking up CPU, check to which client it is connected to and look at the traffic pattern from that client to see if the smbd process is doing real work on behalf of the client. no its fine, so thats one less thing to worry about.. or not. eth0 Link encap:Ethernet HWaddr 00:06:5B:F2:89:25 inet addr:172.16.128.254 Bcast:172.16.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1106496 errors:0 dropped:0 overruns:0 frame:0 TX packets:1078245 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:228665930 (218.0 Mb) TX bytes:768785456 (733.1 Mb) Interrupt:28 Base address:0xdce0 Memory:fe8e-fe90 half way through writing this reply the server just panic'd and halted. on the screen was (or there abouts) smbd process PID 19579, stackpage = f300f000 calltrace [c013e86b] __kmem_cache_alloc followed by e1000_alloc_rx_buffers e1000_alloc_rx_irq might put some light onto it. dont suppose you know where RH writes panics to? i cant seem to find it. when i look at the samba.log there is nothing untoward [2003/12/15 11:29:06, 1] smbd/service.c:make_connection(636) m6-1 (172.16.175.10) connect to service dmn01 as user dmn01 (uid=1269, gid=102) (pid 18746) [2003/12/15 11:29:07, 0] lib/util_sock.c:read_data(436) read_data: read failure for 4. Error = Connection reset by peer [2003/12/15 11:29:07, 1] smbd/service.c:close_cnum(677) m5-3 (172.16.142.30) closed connection to service exams [2003/12/15 11:29:07, 1] smbd/service.c:close_cnum(677) m5-3 (172.16.142.30) closed connection to service shared [2003/12/15 11:29:07, 1] smbd/service.c:close_cnum(677) m5-3 (172.16.142.30) closed connection to service intranet [2003/12/15 11:29:07, 1] smbd/service.c:close_cnum(677) m5-3 (172.16.142.30) closed connection to service winfiles [2003/12/15 11:29:07, 1] smbd/service.c:close_cnum(677) m5-3 (172.16.142.30) closed connection to service netlogon [2003/12/15 11:29:07, 1] smbd/service.c:close_cnum(677) m5-3 (172.16.142.30) closed connection to service ab02 [2003/12/15 11:46:24, 1] smbd/service.c:make_connection(636) premises (172.16.180.10) connect to service rsmith as user rsmith (uid=1029, gid=101) (pid 890) m6-8 (172.16.175.80) connect to service pn02 as user pn02 (uid=2906, gid=102) (pid 19579) [2003/12/15 11:27:49, 1] smbd/service.c:make_connection(636) is the offending user/pid nothing untoward in his account or network traffic to or from his computer at the time. unfortunatly i was unaware of the slowdown/problems so i was unable to perform strace on the pid. im guessing it panics when the offending pid is left alone, and not kill -9 'd like i normally do. Many thanks A perturbed Ross McInnes -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
Re: [Samba] Tall tale of woe....
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ross McInnes (Systems) wrote: | half way through writing this reply the server just panic'd and halted. | on the screen was (or there abouts) | smbd process PID 19579, stackpage = f300f000 |calltrace [c013e86b] __kmem_cache_alloc | followed by |e1000_alloc_rx_buffers |e1000_alloc_rx_irq | | might put some light onto it. | dont suppose you know where RH writes panics | to? i cant seem to find it. The kernel should log the oops in /var/log/messages. | when i look at the samba.log there is nothing untoward | | [2003/12/15 11:29:06, 1] smbd/service.c:make_connection(636) | m6-1 (172.16.175.10) connect to service dmn01 as user dmn01 (uid=1269, | gid=102) (pid 18746) | [2003/12/15 11:29:07, 0] lib/util_sock.c:read_data(436) | read_data: read failure for 4. Error = Connection reset by peer | [2003/12/15 11:29:07, 1] smbd/service.c:close_cnum(677) | m5-3 (172.16.142.30) closed connection to service exams | [2003/12/15 11:29:07, 1] smbd/service.c:close_cnum(677) | m5-3 (172.16.142.30) closed connection to service shared ... | | m6-8 (172.16.175.80) connect to service pn02 as user pn02 | (uid=2906, gid=102) (pid 19579) | [2003/12/15 11:27:49, 1] smbd/service.c:make_connection(636) | | is the offending user/pid nothing untoward in his account or network | traffic to or from his computer at the time. | | unfortunatly i was unaware of the slowdown/problems so i was unable to | perform strace on the pid. | | im guessing it panics when the offending pid is left alone, and not kill | -9 'd like i normally do. We can't be blamed for a kernel oops. If a user space app can cause the kernel to die, then that's a kernel bug. I would start pursuing this with RedHat (if you have support), or logging it in bugzilla.redhat.com. cheers, jerry ~ -- ~ Hewlett-Packard- http://www.hp.com ~ SAMBA Team -- http://www.samba.org ~ GnuPG Key http://www.plainjoe.org/gpg_public.asc ~ If we're adding to the noise, turn off this song --Switchfoot (2003) -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.1 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQE/3dNyIR7qMdg1EfYRApdZAJ9htkTwywXzJZX0Ovv4oH3PApHWggCeIMRj 9lP0MyIVNBHMb+jErsEbLmA= =GwKN -END PGP SIGNATURE- -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
Re: [Samba] Tall tale of woe....
Many thanks for your reply gerry, its certainly put some light to all of this. in answer to your questions I'm assuming that you are running version 2.2.x (included with RH8).. Have you tested 3.0 (wait until 3.0.1 if you haven't yet since there are a lot of bug fixes in it). im running 2.2.8a with no imediate plans to upgrade, unless everything else ive tried fails. What is the smbd process doing ? Trying running strace or get a backtrace in gdb to find out where it is spending its time. When and if it happens again i will try and get an strace im assuming its simply strace -p PID does it log the results somewhere? or do i to a log file? was thinking just in case it was a lot of information. Probably fctnl() calls when looking up data in a tdb. Find out which tdb (withe look in /proc/pid/fd to match the file descriptor or us lsof). Also check the network traffic at this point. very useful command that lsof. again when it happens i will definatly have a look. Are you servning printers by chance? If so have you set 'disable spoolss = yes' ? I've seen high CPU utilization cases in relation to this param. yes i am serving printers.. ive just checked the config and i dont have 'disable spoolss = yes' use mii-tool and check the duplex settings. And any hardware can have problem no matter what the price tag says :-) Chgeck you routers. Maybe they are getting overloaded or are dropping packets. Ah, yes network traffic. i ran mii-tool and it reports eth0: negotiated 100baseTx-FD flow-control, link ok However, its a GB card and acording to the switch linked at a GB. im hoping mii-tool is wrong. ive just enabled monitoring on switch and will keep an eye on this... we also run an admin network here, which is on the same equipment. Never during these outages have they complained about it being slow or unusable... however thats not to say that its nothing to do with it, since it could be just that server and the port its on. i'd just like to say i really apriciate your help in this matter. Many Thanks Ross McInnes -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
Re: [Samba] Tall tale of woe....
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ross McInnes (Systems) wrote: | When and if it happens again i will try and get an strace | im assuming its simply strace -p PID | does it log the results somewhere? or do i to a log file? | was thinking just in case it was a lot of information. It logs to stdout. | Are you servning printers by chance? If so have you | set 'disable spoolss = yes' ? I've seen high CPU utilization | cases in relation to this param. | | yes i am serving printers.. ive just checked the config | and i dont have 'disable spoolss = yes' I think the key will be figuring out which tdb the runaway smbd is reading. | use mii-tool and check the duplex settings. And any | hardware can have problem no matter what the price tag | says :-) Chgeck you routers. Maybe they are getting | overloaded or are dropping packets. | | Ah, yes network traffic. i ran mii-tool and it reports | eth0: negotiated 100baseTx-FD flow-control, link ok | | However, its a GB card and acording to the switch linked at a GB. im | hoping mii-tool is wrong. Probably. Does ifconfig show an abnormal amount of errors? If not, then you are probably ok wrt duplex settings, et. al. And to clarify, when the smbd starts sucking up CPU, check to which client it is connected to and look at the traffic pattern from that client to see if the smbd process is doing real work on behalf of the client. cheers, jerry ~ -- ~ Hewlett-Packard- http://www.hp.com ~ SAMBA Team -- http://www.samba.org ~ GnuPG Key http://www.plainjoe.org/gpg_public.asc ~ If we're adding to the noise, turn off this song --Switchfoot (2003) -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.1 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQE/2czDIR7qMdg1EfYRArbGAJ48JseuqNzY56LSLB95ER63P4NslgCfTd7n YZ5Bg3WeSzn4Z4PFyai8fWk= =8Cd8 -END PGP SIGNATURE- -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
Re: [Samba] Tall tale of woe....
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ross McInnes (Systems) wrote: | The main one is that a smbd process which belongs to a | user logging in will appear in top (a cpu monitor program) | using massives amount of CPU etc. although the system says | it still has about 10-15% idle, this generally stops | everyone logging in. I'm assuming that you are running version 2.2.x (included with RH8).. Have you tested 3.0 (wait until 3.0.1 if you haven't yet since there are a lot of bug fixes in it). What is the smbd process doing ? Trying running strace or get a backtrace in gdb to find out where it is spending its time. | When this problem occours it pushes system upto 50-80%!!! Probably fctnl() calls when looking up data in a tdb. Find out which tdb (withe look in /proc/pid/fd to match the file descriptor or us lsof). Also check the network traffic at this point. | Now i have read other peoples emails and gone through | the archives about this and read about failure for 4. Error = | No route to host, lib/util_sock.c:read_data(436) | and oplockingproblems as they all appear to be more | pronounced around the time of this high CPU/rouge smbd | process. Are you servning printers by chance? If so have you set 'disable spoolss = yes' ? I've seen high CPU utilization cases in relation to this param. | However it would seem a lot of the oplocking problems | seem to be hardware related. I use decent 3com kit here | with a 4950 as a core and 4400's at edge (i.e not cheap | and cheerful netgear/dlink/etc stuff) so im wondering if | anyone else has had these problems with this kit. or if its | not the kit what can it actually be? use mii-tool and check the duplex settings. And any hardware can have problem no matter what the price tag says :-) | any ideas on the read_data(436) and failure for 4. Error = No route to | host ? Chgeck you routers. Maybe they are getting overloaded or are dropping packets. - -- cheers, jerry ~ -- ~ Hewlett-Packard- http://www.hp.com ~ SAMBA Team -- http://www.samba.org ~ GnuPG Key http://www.plainjoe.org/gpg_public.asc ~ If we're adding to the noise, turn off this song --Switchfoot (2003) -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.1 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQE/2KzEIR7qMdg1EfYRAnB+AKDbcg2rGSS4meUkdPt/rkUB232z0gCdEclP avVw21Ch7NUW5HlcRq2bCZ8= =kKjK -END PGP SIGNATURE- -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
[Samba] Tall tale of woe....
For the last year or so i have been having problems in general with samba (various versions) on the same box. Dell 2500 Xeon 1.8 with 2gb of ram running Redhat 8. What will happen from time to time (although its now happened 3 times in the last 5 days, hence this email) is people will be slow to log in, if at all. Several things appear to happen. The main one is that a smbd process which belongs to a user logging in will appear in top (a cpu monitor program) using massives amount of CPU etc. although the system says it still has about 10-15% idle, this generally stops everyone logging in. Now as part of top on RH (doesnt look the same on bsd) it has a system entry with a % of cpu given over to that. Now system basically means anything I/O or kernal related. since the kernal governs resources this isnt uncommon. During a period of 4 hours i monitored this system and it never went above 10% and even then for a matter of seconds. When this problem occours it pushes system upto 50-80%!!! i look at the server and the disks are pretty much idle so its not Disk Related. i am at a loss to find out what it is actually doing to cause this. however once i kill off this process it seems to slowly get back to normal. Now i have read other peoples emails and gone through the archives about this and read about failure for 4. Error = No route to host, lib/util_sock.c:read_data(436) and oplocking problems as they all appear to be more pronounced around the time of this high CPU/rouge smbd process. However it would seem a lot of the oplocking problems seem to be hardware related. I use decent 3com kit here with a 4950 as a core and 4400's at edge (i.e not cheap and cheerful netgear/dlink/etc stuff) so im wondering if anyone else has had these problems with this kit. or if its not the kit what can it actually be? ive tried turning oplocks on and off to no avail. it still has this issue. any ideas on the read_data(436) and failure for 4. Error = No route to host ? Any help offered very gratefully recieved. With thanks Ross McInnes -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
RE: [Samba] Tall tale of woe....
Next time it happens, running an strace on the offending process strace -p process_id can provide some insight as to what it's beating around on, especially if it's system related. That might help pinpoint a spot in the code where it's having problems. Eric -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ross McInnes (Systems) Sent: Tuesday, December 09, 2003 9:34 AM To: [EMAIL PROTECTED] Subject: [Samba] Tall tale of woe For the last year or so i have been having problems in general with samba (various versions) on the same box. Dell 2500 Xeon 1.8 with 2gb of ram running Redhat 8. What will happen from time to time (although its now happened 3 times in the last 5 days, hence this email) is people will be slow to log in, if at all. Several things appear to happen. The main one is that a smbd process which belongs to a user logging in will appear in top (a cpu monitor program) using massives amount of CPU etc. although the system says it still has about 10-15% idle, this generally stops everyone logging in. Now as part of top on RH (doesnt look the same on bsd) it has a system entry with a % of cpu given over to that. Now system basically means anything I/O or kernal related. since the kernal governs resources this isnt uncommon. During a period of 4 hours i monitored this system and it never went above 10% and even then for a matter of seconds. When this problem occours it pushes system upto 50-80%!!! i look at the server and the disks are pretty much idle so its not Disk Related. i am at a loss to find out what it is actually doing to cause this. however once i kill off this process it seems to slowly get back to normal. Now i have read other peoples emails and gone through the archives about this and read about failure for 4. Error = No route to host, lib/util_sock.c:read_data(436) and oplocking problems as they all appear to be more pronounced around the time of this high CPU/rouge smbd process. However it would seem a lot of the oplocking problems seem to be hardware related. I use decent 3com kit here with a 4950 as a core and 4400's at edge (i.e not cheap and cheerful netgear/dlink/etc stuff) so im wondering if anyone else has had these problems with this kit. or if its not the kit what can it actually be? ive tried turning oplocks on and off to no avail. it still has this issue. any ideas on the read_data(436) and failure for 4. Error = No route to host ? Any help offered very gratefully recieved. With thanks Ross McInnes -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba
AW: [Samba] Tall tale of woe....
Hello Ross, I've got a similar problem since a week with my server. I'm running RH 8.0 on a Xeon 1.4Ghz with 2GB RAM, too. My CPU usage was pretty low (90% idle) but the CPU load average was at 14. For now I'm waiting for the next time it happens to get some more information. Best regards, Reiner -Ursprüngliche Nachricht- Von: Ross McInnes (Systems) [mailto:[EMAIL PROTECTED] Gesendet: Dienstag, 9. Dezember 2003 16:34 An: [EMAIL PROTECTED] Betreff: [Samba] Tall tale of woe For the last year or so i have been having problems in general with samba (various versions) on the same box. Dell 2500 Xeon 1.8 with 2gb of ram running Redhat 8. What will happen from time to time (although its now happened 3 times in the last 5 days, hence this email) is people will be slow to log in, if at all. Several things appear to happen. The main one is that a smbd process which belongs to a user logging in will appear in top (a cpu monitor program) using massives amount of CPU etc. although the system says it still has about 10-15% idle, this generally stops everyone logging in. Now as part of top on RH (doesnt look the same on bsd) it has a system entry with a % of cpu given over to that. Now system basically means anything I/O or kernal related. since the kernal governs resources this isnt uncommon. During a period of 4 hours i monitored this system and it never went above 10% and even then for a matter of seconds. When this problem occours it pushes system upto 50-80%!!! i look at the server and the disks are pretty much idle so its not Disk Related. i am at a loss to find out what it is actually doing to cause this. however once i kill off this process it seems to slowly get back to normal. Now i have read other peoples emails and gone through the archives about this and read about failure for 4. Error = No route to host, lib/util_sock.c:read_data(436) and oplocking problems as they all appear to be more pronounced around the time of this high CPU/rouge smbd process. However it would seem a lot of the oplocking problems seem to be hardware related. I use decent 3com kit here with a 4950 as a core and 4400's at edge (i.e not cheap and cheerful netgear/dlink/etc stuff) so im wondering if anyone else has had these problems with this kit. or if its not the kit what can it actually be? ive tried turning oplocks on and off to no avail. it still has this issue. any ideas on the read_data(436) and failure for 4. Error = No route to host ? Any help offered very gratefully recieved. With thanks Ross McInnes -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba