Re: Server slowdown...
In article <[EMAIL PROTECTED]> you wrote: > It sounds like a slow memory leak to me. I had the > same problem years and years ago... it finally sorted > itself out with another upgrade a year later. They > can be the devil to find. For this it is good to have a look at /proc/slabinfo and of course on the blockin/out and swapin/out counters of vmstat. If the slowdown ovvurs, I would run the commands (make sure to run them in normal load situations, too), try to unplug the network or try a init 1 opposed to reboot finally. Greetings Bernd -- -- eckes privat - http://www.eckes.org/ Project Freefire - http://www.freefire.org/
Re: Server slowdown...
In article <[EMAIL PROTECTED]> you wrote: > It sounds like a slow memory leak to me. I had the > same problem years and years ago... it finally sorted > itself out with another upgrade a year later. They > can be the devil to find. For this it is good to have a look at /proc/slabinfo and of course on the blockin/out and swapin/out counters of vmstat. If the slowdown ovvurs, I would run the commands (make sure to run them in normal load situations, too), try to unplug the network or try a init 1 opposed to reboot finally. Greetings Bernd -- -- eckes privat - http://www.eckes.org/ Project Freefire - http://www.freefire.org/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Server slowdown...
On Wed, Apr 14, 2004 at 11:20:49PM +0200, Jaroslaw Tabor wrote: > I'm almost sure that this is software problem. The machine is working > without hardware changes for years, and it didn't happend before. > The only changes I did, are software updates (from debian-security) > and kernel upgrade after last holes were discovered. It sounds like a slow memory leak to me. I had the same problem years and years ago... it finally sorted itself out with another upgrade a year later. They can be the devil to find. Are you getting any disk thrashing as it approaches 'death'? -- -- Dale Amon [EMAIL PROTECTED]+44-7802-188325 International linux systems consultancy Hardware & software system design, security and networking, systems programming and Admin "Have Laptop, Will Travel" --
Re: Server slowdown...
Hello! W liście z pon, 12-04-2004, godz. 02:00, Joe Bouchard pisze: > In a meeting at work (I'm part of the IT group at a large corporation) someone > mentioned a particular kind of network hardware which would stop working > correctly after a while. We have a pretty busy network with broadcasts and > what > not, and apparently this device would croak after "x number of packets", > perhaps > 2^32 or something. The time frame was a few weeks for the device to get to > that > point. I'm almost sure that this is software problem. The machine is working without hardware changes for years, and it didn't happend before. The only changes I did, are software updates (from debian-security) and kernel upgrade after last holes were discovered. regards JT.
Re: Server slowdown...
On Wed, Apr 14, 2004 at 11:20:49PM +0200, Jaroslaw Tabor wrote: > I'm almost sure that this is software problem. The machine is working > without hardware changes for years, and it didn't happend before. > The only changes I did, are software updates (from debian-security) > and kernel upgrade after last holes were discovered. It sounds like a slow memory leak to me. I had the same problem years and years ago... it finally sorted itself out with another upgrade a year later. They can be the devil to find. Are you getting any disk thrashing as it approaches 'death'? -- -- Dale Amon [EMAIL PROTECTED]+44-7802-188325 International linux systems consultancy Hardware & software system design, security and networking, systems programming and Admin "Have Laptop, Will Travel" -- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Server slowdown...
Hello! W liście z pon, 12-04-2004, godz. 02:00, Joe Bouchard pisze: > In a meeting at work (I'm part of the IT group at a large corporation) someone > mentioned a particular kind of network hardware which would stop working > correctly after a while. We have a pretty busy network with broadcasts and what > not, and apparently this device would croak after "x number of packets", perhaps > 2^32 or something. The time frame was a few weeks for the device to get to that > point. I'm almost sure that this is software problem. The machine is working without hardware changes for years, and it didn't happend before. The only changes I did, are software updates (from debian-security) and kernel upgrade after last holes were discovered. regards JT.
Re: Server slowdown...
In article <[EMAIL PROTECTED]> you wrote: > On shared media (such as 10base2) accidentally leave an interface in > promiscuous mode (there used to be a bug in tcpdump whereby running two > copies of it at the same time could cause the interface to remain in > promiscuous mode after both copies had exited). A moderately busy 10base2 > could destroy the performance of a decent 1995 server machine if an interface > was in promiscuous mode, and as the CPU use occurred in interrupt context > none of the usual tools would tell you what was happening. > > Send lots of minimal size packets to a server or to the media broadcast > address. Both can be seen with the software (si) und hardware (hi) interrupt times in recent top output, as well as the interrupt count for the various IRQs. You can also see the number of context switches, which are usefull to monitor. So using "vmstat 1" is a good first help, the "in" column are the interruopts, it is usually below 1020 on a idle system (with 2.6.4). The number of context switches (cs) between 200 and 350. On a network loaded system you can see minimum 20% hi/si cpu time and 5000in/cs in vmstat. Try to use "cat /proc/interrupts" to dentify the hardware which is causing the interrupts. On an idle system, the timer interrupts are about 20 times as many as any other interrupts (on 2.6 kernels) You normally have to compare it with "normal" workload, otherwise the numbers are not very easy to decipher. Greetings Bernd -- eckes privat - http://www.eckes.org/ Project Freefire - http://www.freefire.org/
Re: Server slowdown...
In article <[EMAIL PROTECTED]> you wrote: > On shared media (such as 10base2) accidentally leave an interface in > promiscuous mode (there used to be a bug in tcpdump whereby running two > copies of it at the same time could cause the interface to remain in > promiscuous mode after both copies had exited). A moderately busy 10base2 > could destroy the performance of a decent 1995 server machine if an interface > was in promiscuous mode, and as the CPU use occurred in interrupt context > none of the usual tools would tell you what was happening. > > Send lots of minimal size packets to a server or to the media broadcast > address. Both can be seen with the software (si) und hardware (hi) interrupt times in recent top output, as well as the interrupt count for the various IRQs. You can also see the number of context switches, which are usefull to monitor. So using "vmstat 1" is a good first help, the "in" column are the interruopts, it is usually below 1020 on a idle system (with 2.6.4). The number of context switches (cs) between 200 and 350. On a network loaded system you can see minimum 20% hi/si cpu time and 5000in/cs in vmstat. Try to use "cat /proc/interrupts" to dentify the hardware which is causing the interrupts. On an idle system, the timer interrupts are about 20 times as many as any other interrupts (on 2.6 kernels) You normally have to compare it with "normal" workload, otherwise the numbers are not very easy to decipher. Greetings Bernd -- eckes privat - http://www.eckes.org/ Project Freefire - http://www.freefire.org/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Server slowdown...
On Mon, 12 Apr 2004 10:00, Joe Bouchard <[EMAIL PROTECTED]> wrote: > In a meeting at work (I'm part of the IT group at a large corporation) > someone mentioned a particular kind of network hardware which would stop > working correctly after a while. Here are some ways that network issues can slow down a server: On shared media (such as 10base2) accidentally leave an interface in promiscuous mode (there used to be a bug in tcpdump whereby running two copies of it at the same time could cause the interface to remain in promiscuous mode after both copies had exited). A moderately busy 10base2 could destroy the performance of a decent 1995 server machine if an interface was in promiscuous mode, and as the CPU use occurred in interrupt context none of the usual tools would tell you what was happening. Send lots of minimal size packets to a server or to the media broadcast address. Until recently minimal size packets on 10Mb media could destroy the performance of most systems. Now with Gig-E even using 1500 byte packets you can destroy the performance of most systems. If you had a router break and repeatedly send a single IP datagram to your server on a Gig-E link then the likely result would be a dramatic loss of performance. If you suspect this then the best thing to do is run a program to measure system performance on the console and unplug the network cables. -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page
Re: Server slowdown...
On Mon, 12 Apr 2004 10:00, Joe Bouchard <[EMAIL PROTECTED]> wrote: > In a meeting at work (I'm part of the IT group at a large corporation) > someone mentioned a particular kind of network hardware which would stop > working correctly after a while. Here are some ways that network issues can slow down a server: On shared media (such as 10base2) accidentally leave an interface in promiscuous mode (there used to be a bug in tcpdump whereby running two copies of it at the same time could cause the interface to remain in promiscuous mode after both copies had exited). A moderately busy 10base2 could destroy the performance of a decent 1995 server machine if an interface was in promiscuous mode, and as the CPU use occurred in interrupt context none of the usual tools would tell you what was happening. Send lots of minimal size packets to a server or to the media broadcast address. Until recently minimal size packets on 10Mb media could destroy the performance of most systems. Now with Gig-E even using 1500 byte packets you can destroy the performance of most systems. If you had a router break and repeatedly send a single IP datagram to your server on a Gig-E link then the likely result would be a dramatic loss of performance. If you suspect this then the best thing to do is run a program to measure system performance on the console and unplug the network cables. -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Server slowdown...
On Sun, Apr 11, 2004 at 12:28:31AM +0200, Jaroslaw Tabor wrote: > Hello! > > I''ve strange problem with one of my servers. From time to time (once > per 2-3 months), something strange happends, and server starts working > very slow. What is strange, CPU load (from top) is about 5%, but > response time for network services is extremly high. Usually gives > timeout. > After reboot, everything is working perfect. The question is where to > start investigation. Can someone suggest some tool, to record statistics > of CPU, Network, IO(drives) in correlation with processes ? > Due to the fact, that problem occurs for all services, I suspect kernel > (2.2.26) problem, but how to extract it? > I see that 2.2.27pre1 has some fixes for tcp keepalive bug, and tcp seq > nr wrapping bug. Can it be related ? I'll throw this out, I don't know if it is true or urban legend . . . In a meeting at work (I'm part of the IT group at a large corporation) someone mentioned a particular kind of network hardware which would stop working correctly after a while. We have a pretty busy network with broadcasts and what not, and apparently this device would croak after "x number of packets", perhaps 2^32 or something. The time frame was a few weeks for the device to get to that point. Then someone else said some of the Dell office PC's had NICs with the same affliction, to which I joked "That's what the sticker `made for Windows XX' means, they expect it to be rebooted frequently enough so you don't get to that point." :-) At any rate, that story bears some similarity to your situation. That's all I'll say. You might try to find out if your particular NIC has any sort of limitation like this. -- Thank you, Joe Bouchard Powered by Debian GNU/Linux
Re: Server slowdown...
On Sun, Apr 11, 2004 at 12:28:31AM +0200, Jaroslaw Tabor wrote: > Hello! > > I''ve strange problem with one of my servers. From time to time (once > per 2-3 months), something strange happends, and server starts working > very slow. What is strange, CPU load (from top) is about 5%, but > response time for network services is extremly high. Usually gives > timeout. > After reboot, everything is working perfect. The question is where to > start investigation. Can someone suggest some tool, to record statistics > of CPU, Network, IO(drives) in correlation with processes ? > Due to the fact, that problem occurs for all services, I suspect kernel > (2.2.26) problem, but how to extract it? > I see that 2.2.27pre1 has some fixes for tcp keepalive bug, and tcp seq > nr wrapping bug. Can it be related ? I'll throw this out, I don't know if it is true or urban legend . . . In a meeting at work (I'm part of the IT group at a large corporation) someone mentioned a particular kind of network hardware which would stop working correctly after a while. We have a pretty busy network with broadcasts and what not, and apparently this device would croak after "x number of packets", perhaps 2^32 or something. The time frame was a few weeks for the device to get to that point. Then someone else said some of the Dell office PC's had NICs with the same affliction, to which I joked "That's what the sticker `made for Windows XX' means, they expect it to be rebooted frequently enough so you don't get to that point." :-) At any rate, that story bears some similarity to your situation. That's all I'll say. You might try to find out if your particular NIC has any sort of limitation like this. -- Thank you, Joe Bouchard Powered by Debian GNU/Linux -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Server slowdown...
On Sun, Apr 11, 2004 at 12:28:31AM +0200, Jaroslaw Tabor wrote: (..) > After reboot, everything is working perfect. The question is where to > start investigation. Can someone suggest some tool, to record statistics > of CPU, Network, IO(drives) in correlation with processes ? Use sysstat, as soon as you install it it will start logging data at /var/log/sysstat (which can be analysed with sar(1) as well as with some other utilities). Regards Javier signature.asc Description: Digital signature
Re: Server slowdown...
On Sun, Apr 11, 2004 at 12:28:31AM +0200, Jaroslaw Tabor wrote: (..) > After reboot, everything is working perfect. The question is where to > start investigation. Can someone suggest some tool, to record statistics > of CPU, Network, IO(drives) in correlation with processes ? Use sysstat, as soon as you install it it will start logging data at /var/log/sysstat (which can be analysed with sar(1) as well as with some other utilities). Regards Javier signature.asc Description: Digital signature
Re: Server slowdown...
Le 12519ième jour après Epoch, Jaroslaw Tabor écrivait: > Hello! > > I''ve strange problem with one of my servers. From time to time (once > per 2-3 months), something strange happends, and server starts working > very slow. What is strange, CPU load (from top) is about 5%, but > response time for network services is extremly high. Usually gives > timeout. > After reboot, everything is working perfect. The question is where to > start investigation. Can someone suggest some tool, to record statistics > of CPU, Network, IO(drives) in correlation with processes ? > Due to the fact, that problem occurs for all services, I suspect kernel > (2.2.26) problem, but how to extract it? > I see that 2.2.27pre1 has some fixes for tcp keepalive bug, and tcp seq > nr wrapping bug. Can it be related ? Check memory usage, and swap usage. Check cpu usage on System side rather than user. Slowdown of you system can be caused by swap, or by memory used by some processes. Check response time of other processes, like ls or find for disk usage, or others for memory/cpu usage. -- Nice boy, but about as sharp as a sack of wet mice. -- Foghorn Leghorn
Re: Server slowdown...
Le 12519ième jour après Epoch, Jaroslaw Tabor écrivait: > Hello! > > I''ve strange problem with one of my servers. From time to time (once > per 2-3 months), something strange happends, and server starts working > very slow. What is strange, CPU load (from top) is about 5%, but > response time for network services is extremly high. Usually gives > timeout. > After reboot, everything is working perfect. The question is where to > start investigation. Can someone suggest some tool, to record statistics > of CPU, Network, IO(drives) in correlation with processes ? > Due to the fact, that problem occurs for all services, I suspect kernel > (2.2.26) problem, but how to extract it? > I see that 2.2.27pre1 has some fixes for tcp keepalive bug, and tcp seq > nr wrapping bug. Can it be related ? Check memory usage, and swap usage. Check cpu usage on System side rather than user. Slowdown of you system can be caused by swap, or by memory used by some processes. Check response time of other processes, like ls or find for disk usage, or others for memory/cpu usage. -- Nice boy, but about as sharp as a sack of wet mice. -- Foghorn Leghorn -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Server slowdown...
Hello! I''ve strange problem with one of my servers. From time to time (once per 2-3 months), something strange happends, and server starts working very slow. What is strange, CPU load (from top) is about 5%, but response time for network services is extremly high. Usually gives timeout. After reboot, everything is working perfect. The question is where to start investigation. Can someone suggest some tool, to record statistics of CPU, Network, IO(drives) in correlation with processes ? Due to the fact, that problem occurs for all services, I suspect kernel (2.2.26) problem, but how to extract it? I see that 2.2.27pre1 has some fixes for tcp keepalive bug, and tcp seq nr wrapping bug. Can it be related ? reagrds JT
Server slowdown...
Hello! I''ve strange problem with one of my servers. From time to time (once per 2-3 months), something strange happends, and server starts working very slow. What is strange, CPU load (from top) is about 5%, but response time for network services is extremly high. Usually gives timeout. After reboot, everything is working perfect. The question is where to start investigation. Can someone suggest some tool, to record statistics of CPU, Network, IO(drives) in correlation with processes ? Due to the fact, that problem occurs for all services, I suspect kernel (2.2.26) problem, but how to extract it? I see that 2.2.27pre1 has some fixes for tcp keepalive bug, and tcp seq nr wrapping bug. Can it be related ? reagrds JT -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]