Re: Server slowdown...

2004-04-15 Thread Bernd Eckenfels
In article <[EMAIL PROTECTED]> you wrote:
> It sounds like a slow memory leak to me. I had the
> same problem years and years ago... it finally sorted
> itself out with another upgrade a year later. They
> can be the devil to find.

For this it is good to have a look at /proc/slabinfo and of course on the
blockin/out and swapin/out counters of vmstat.

If the slowdown ovvurs, I would run the commands (make sure to run them in
normal load situations, too), try to unplug the network or try a init 1
opposed to reboot finally.

Greetings
Bernd
-- 
-- 
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/



Re: Server slowdown...

2004-04-15 Thread Bernd Eckenfels
In article <[EMAIL PROTECTED]> you wrote:
> It sounds like a slow memory leak to me. I had the
> same problem years and years ago... it finally sorted
> itself out with another upgrade a year later. They
> can be the devil to find.

For this it is good to have a look at /proc/slabinfo and of course on the
blockin/out and swapin/out counters of vmstat.

If the slowdown ovvurs, I would run the commands (make sure to run them in
normal load situations, too), try to unplug the network or try a init 1
opposed to reboot finally.

Greetings
Bernd
-- 
-- 
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Server slowdown...

2004-04-14 Thread Dale Amon
On Wed, Apr 14, 2004 at 11:20:49PM +0200, Jaroslaw Tabor wrote:
> I'm almost sure that this is software problem. The machine is working
> without hardware changes for years, and it didn't happend before.
> The only changes I did, are software updates (from debian-security)
> and kernel upgrade after last holes were discovered.

It sounds like a slow memory leak to me. I had the
same problem years and years ago... it finally sorted
itself out with another upgrade a year later. They
can be the devil to find.

Are you getting any disk thrashing as it approaches
'death'?

-- 
--
   Dale Amon [EMAIL PROTECTED]+44-7802-188325
   International linux systems consultancy
 Hardware & software system design, security
and networking, systems programming and Admin
  "Have Laptop, Will Travel"
--



Re: Server slowdown...

2004-04-14 Thread Jaroslaw Tabor
Hello!

W liście z pon, 12-04-2004, godz. 02:00, Joe Bouchard pisze: 
> In a meeting at work (I'm part of the IT group at a large corporation) someone
> mentioned a particular kind of network hardware which would stop working
> correctly after a while.  We have a pretty busy network with broadcasts and 
> what
> not, and apparently this device would croak after "x number of packets", 
> perhaps
> 2^32 or something.  The time frame was a few weeks for the device to get to 
> that
> point.  

I'm almost sure that this is software problem. The machine is working
without hardware changes for years, and it didn't happend before.
The only changes I did, are software updates (from debian-security)
and kernel upgrade after last holes were discovered.

regards
JT.



Re: Server slowdown...

2004-04-14 Thread Dale Amon
On Wed, Apr 14, 2004 at 11:20:49PM +0200, Jaroslaw Tabor wrote:
> I'm almost sure that this is software problem. The machine is working
> without hardware changes for years, and it didn't happend before.
> The only changes I did, are software updates (from debian-security)
> and kernel upgrade after last holes were discovered.

It sounds like a slow memory leak to me. I had the
same problem years and years ago... it finally sorted
itself out with another upgrade a year later. They
can be the devil to find.

Are you getting any disk thrashing as it approaches
'death'?

-- 
--
   Dale Amon [EMAIL PROTECTED]+44-7802-188325
   International linux systems consultancy
 Hardware & software system design, security
and networking, systems programming and Admin
  "Have Laptop, Will Travel"
--


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Server slowdown...

2004-04-14 Thread Jaroslaw Tabor
Hello!

W liście z pon, 12-04-2004, godz. 02:00, Joe Bouchard pisze: 
> In a meeting at work (I'm part of the IT group at a large corporation) someone
> mentioned a particular kind of network hardware which would stop working
> correctly after a while.  We have a pretty busy network with broadcasts and what
> not, and apparently this device would croak after "x number of packets", perhaps
> 2^32 or something.  The time frame was a few weeks for the device to get to that
> point.  

I'm almost sure that this is software problem. The machine is working
without hardware changes for years, and it didn't happend before.
The only changes I did, are software updates (from debian-security)
and kernel upgrade after last holes were discovered.

regards
JT.



Re: Server slowdown...

2004-04-12 Thread Bernd Eckenfels
In article <[EMAIL PROTECTED]> you wrote:
> On shared media (such as 10base2) accidentally leave an interface in 
> promiscuous mode (there used to be a bug in tcpdump whereby running two 
> copies of it at the same time could cause the interface to remain in 
> promiscuous mode after both copies had exited).  A moderately busy 10base2 
> could destroy the performance of a decent 1995 server machine if an interface 
> was in promiscuous mode, and as the CPU use occurred in interrupt context 
> none of the usual tools would tell you what was happening.
> 
> Send lots of minimal size packets to a server or to the media broadcast 
> address.

Both can be seen with the software (si) und hardware (hi) interrupt times in
recent top output, as well as the interrupt count for the various IRQs. You
can also see the number of context switches, which are usefull to monitor.

So using "vmstat 1" is a good first help, the "in" column are the
interruopts, it is usually below 1020 on a idle system (with 2.6.4). The
number of context switches (cs) between 200 and 350. On a network loaded
system you can see minimum 20% hi/si cpu time and 5000in/cs in vmstat.

Try to use "cat /proc/interrupts" to dentify the hardware which is causing
the interrupts. On an idle system, the timer interrupts are about 20 times
as many as any other interrupts (on 2.6 kernels)

You normally have to compare it with "normal" workload, otherwise the numbers 
are not
very easy to decipher. 

Greetings
Bernd
-- 
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/



Re: Server slowdown...

2004-04-12 Thread Bernd Eckenfels
In article <[EMAIL PROTECTED]> you wrote:
> On shared media (such as 10base2) accidentally leave an interface in 
> promiscuous mode (there used to be a bug in tcpdump whereby running two 
> copies of it at the same time could cause the interface to remain in 
> promiscuous mode after both copies had exited).  A moderately busy 10base2 
> could destroy the performance of a decent 1995 server machine if an interface 
> was in promiscuous mode, and as the CPU use occurred in interrupt context 
> none of the usual tools would tell you what was happening.
> 
> Send lots of minimal size packets to a server or to the media broadcast 
> address.

Both can be seen with the software (si) und hardware (hi) interrupt times in
recent top output, as well as the interrupt count for the various IRQs. You
can also see the number of context switches, which are usefull to monitor.

So using "vmstat 1" is a good first help, the "in" column are the
interruopts, it is usually below 1020 on a idle system (with 2.6.4). The
number of context switches (cs) between 200 and 350. On a network loaded
system you can see minimum 20% hi/si cpu time and 5000in/cs in vmstat.

Try to use "cat /proc/interrupts" to dentify the hardware which is causing
the interrupts. On an idle system, the timer interrupts are about 20 times
as many as any other interrupts (on 2.6 kernels)

You normally have to compare it with "normal" workload, otherwise the numbers are not
very easy to decipher. 

Greetings
Bernd
-- 
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Server slowdown...

2004-04-12 Thread Russell Coker
On Mon, 12 Apr 2004 10:00, Joe Bouchard <[EMAIL PROTECTED]> wrote:
> In a meeting at work (I'm part of the IT group at a large corporation)
> someone mentioned a particular kind of network hardware which would stop
> working correctly after a while.

Here are some ways that network issues can slow down a server:

On shared media (such as 10base2) accidentally leave an interface in 
promiscuous mode (there used to be a bug in tcpdump whereby running two 
copies of it at the same time could cause the interface to remain in 
promiscuous mode after both copies had exited).  A moderately busy 10base2 
could destroy the performance of a decent 1995 server machine if an interface 
was in promiscuous mode, and as the CPU use occurred in interrupt context 
none of the usual tools would tell you what was happening.

Send lots of minimal size packets to a server or to the media broadcast 
address.  Until recently minimal size packets on 10Mb media could destroy the 
performance of most systems.  Now with Gig-E even using 1500 byte packets you 
can destroy the performance of most systems.

If you had a router break and repeatedly send a single IP datagram to your 
server on a Gig-E link then the likely result would be a dramatic loss of 
performance.

If you suspect this then the best thing to do is run a program to measure 
system performance on the console and unplug the network cables.

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page



Re: Server slowdown...

2004-04-12 Thread Russell Coker
On Mon, 12 Apr 2004 10:00, Joe Bouchard <[EMAIL PROTECTED]> wrote:
> In a meeting at work (I'm part of the IT group at a large corporation)
> someone mentioned a particular kind of network hardware which would stop
> working correctly after a while.

Here are some ways that network issues can slow down a server:

On shared media (such as 10base2) accidentally leave an interface in 
promiscuous mode (there used to be a bug in tcpdump whereby running two 
copies of it at the same time could cause the interface to remain in 
promiscuous mode after both copies had exited).  A moderately busy 10base2 
could destroy the performance of a decent 1995 server machine if an interface 
was in promiscuous mode, and as the CPU use occurred in interrupt context 
none of the usual tools would tell you what was happening.

Send lots of minimal size packets to a server or to the media broadcast 
address.  Until recently minimal size packets on 10Mb media could destroy the 
performance of most systems.  Now with Gig-E even using 1500 byte packets you 
can destroy the performance of most systems.

If you had a router break and repeatedly send a single IP datagram to your 
server on a Gig-E link then the likely result would be a dramatic loss of 
performance.

If you suspect this then the best thing to do is run a program to measure 
system performance on the console and unplug the network cables.

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Server slowdown...

2004-04-11 Thread Joe Bouchard
On Sun, Apr 11, 2004 at 12:28:31AM +0200, Jaroslaw Tabor wrote:
> Hello!
> 
>   I''ve strange problem with one of my servers. From time to time (once
> per 2-3 months), something strange happends, and server starts working
> very slow. What is strange, CPU load (from top) is about 5%, but
> response time for network services is extremly high. Usually gives
> timeout.
>   After reboot, everything is working perfect. The question is where to
> start investigation. Can someone suggest some tool, to record statistics
> of CPU, Network, IO(drives) in correlation with processes ?
> Due to the fact, that problem occurs for all services, I suspect kernel
> (2.2.26) problem, but how to extract it?
> I see that 2.2.27pre1 has some fixes for tcp keepalive bug, and tcp seq
> nr wrapping bug. Can it be related ?

I'll throw this out, I don't know if it is true or urban legend . . .

In a meeting at work (I'm part of the IT group at a large corporation) someone
mentioned a particular kind of network hardware which would stop working
correctly after a while.  We have a pretty busy network with broadcasts and what
not, and apparently this device would croak after "x number of packets", perhaps
2^32 or something.  The time frame was a few weeks for the device to get to that
point.  Then someone else said some of the Dell office PC's had NICs with the
same affliction, to which I joked "That's what the sticker `made for Windows XX'
means, they expect it to be rebooted frequently enough so you don't get to that
point."  :-)

At any rate, that story bears some similarity to your situation.  That's all
I'll say.  You might try to find out if your particular NIC has any sort of
limitation like this.

-- 

Thank you,
Joe Bouchard

Powered by Debian GNU/Linux



Re: Server slowdown...

2004-04-11 Thread Joe Bouchard
On Sun, Apr 11, 2004 at 12:28:31AM +0200, Jaroslaw Tabor wrote:
> Hello!
> 
>   I''ve strange problem with one of my servers. From time to time (once
> per 2-3 months), something strange happends, and server starts working
> very slow. What is strange, CPU load (from top) is about 5%, but
> response time for network services is extremly high. Usually gives
> timeout.
>   After reboot, everything is working perfect. The question is where to
> start investigation. Can someone suggest some tool, to record statistics
> of CPU, Network, IO(drives) in correlation with processes ?
> Due to the fact, that problem occurs for all services, I suspect kernel
> (2.2.26) problem, but how to extract it?
> I see that 2.2.27pre1 has some fixes for tcp keepalive bug, and tcp seq
> nr wrapping bug. Can it be related ?

I'll throw this out, I don't know if it is true or urban legend . . .

In a meeting at work (I'm part of the IT group at a large corporation) someone
mentioned a particular kind of network hardware which would stop working
correctly after a while.  We have a pretty busy network with broadcasts and what
not, and apparently this device would croak after "x number of packets", perhaps
2^32 or something.  The time frame was a few weeks for the device to get to that
point.  Then someone else said some of the Dell office PC's had NICs with the
same affliction, to which I joked "That's what the sticker `made for Windows XX'
means, they expect it to be rebooted frequently enough so you don't get to that
point."  :-)

At any rate, that story bears some similarity to your situation.  That's all
I'll say.  You might try to find out if your particular NIC has any sort of
limitation like this.

-- 

Thank you,
Joe Bouchard

Powered by Debian GNU/Linux


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Server slowdown...

2004-04-11 Thread Javier Fernández-Sanguino Peña
On Sun, Apr 11, 2004 at 12:28:31AM +0200, Jaroslaw Tabor wrote:
(..)
>   After reboot, everything is working perfect. The question is where to
> start investigation. Can someone suggest some tool, to record statistics
> of CPU, Network, IO(drives) in correlation with processes ?

Use sysstat, as soon as you install it it will start logging data at 
/var/log/sysstat (which can be analysed with sar(1) as well as with some 
other utilities).

Regards

Javier


signature.asc
Description: Digital signature


Re: Server slowdown...

2004-04-11 Thread Javier Fernández-Sanguino Peña
On Sun, Apr 11, 2004 at 12:28:31AM +0200, Jaroslaw Tabor wrote:
(..)
>   After reboot, everything is working perfect. The question is where to
> start investigation. Can someone suggest some tool, to record statistics
> of CPU, Network, IO(drives) in correlation with processes ?

Use sysstat, as soon as you install it it will start logging data at 
/var/log/sysstat (which can be analysed with sar(1) as well as with some 
other utilities).

Regards

Javier


signature.asc
Description: Digital signature


Re: Server slowdown...

2004-04-10 Thread François TOURDE
Le 12519ième jour après Epoch,
Jaroslaw Tabor écrivait:

> Hello!
>
>   I''ve strange problem with one of my servers. From time to time (once
> per 2-3 months), something strange happends, and server starts working
> very slow. What is strange, CPU load (from top) is about 5%, but
> response time for network services is extremly high. Usually gives
> timeout.
>   After reboot, everything is working perfect. The question is where to
> start investigation. Can someone suggest some tool, to record statistics
> of CPU, Network, IO(drives) in correlation with processes ?
> Due to the fact, that problem occurs for all services, I suspect kernel
> (2.2.26) problem, but how to extract it?
> I see that 2.2.27pre1 has some fixes for tcp keepalive bug, and tcp seq
> nr wrapping bug. Can it be related ?

Check memory usage, and swap usage. Check cpu usage on System side
rather than user.

Slowdown of you system can be caused by swap, or by memory used by
some processes.

Check response time of other processes, like ls or find for disk
usage, or others for memory/cpu usage.

-- 
Nice boy, but about as sharp as a sack of wet mice.
-- Foghorn Leghorn



Re: Server slowdown...

2004-04-10 Thread François TOURDE
Le 12519ième jour après Epoch,
Jaroslaw Tabor écrivait:

> Hello!
>
>   I''ve strange problem with one of my servers. From time to time (once
> per 2-3 months), something strange happends, and server starts working
> very slow. What is strange, CPU load (from top) is about 5%, but
> response time for network services is extremly high. Usually gives
> timeout.
>   After reboot, everything is working perfect. The question is where to
> start investigation. Can someone suggest some tool, to record statistics
> of CPU, Network, IO(drives) in correlation with processes ?
> Due to the fact, that problem occurs for all services, I suspect kernel
> (2.2.26) problem, but how to extract it?
> I see that 2.2.27pre1 has some fixes for tcp keepalive bug, and tcp seq
> nr wrapping bug. Can it be related ?

Check memory usage, and swap usage. Check cpu usage on System side
rather than user.

Slowdown of you system can be caused by swap, or by memory used by
some processes.

Check response time of other processes, like ls or find for disk
usage, or others for memory/cpu usage.

-- 
Nice boy, but about as sharp as a sack of wet mice.
-- Foghorn Leghorn


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Server slowdown...

2004-04-10 Thread Jaroslaw Tabor
Hello!

I''ve strange problem with one of my servers. From time to time (once
per 2-3 months), something strange happends, and server starts working
very slow. What is strange, CPU load (from top) is about 5%, but
response time for network services is extremly high. Usually gives
timeout.
After reboot, everything is working perfect. The question is where to
start investigation. Can someone suggest some tool, to record statistics
of CPU, Network, IO(drives) in correlation with processes ?
Due to the fact, that problem occurs for all services, I suspect kernel
(2.2.26) problem, but how to extract it?
I see that 2.2.27pre1 has some fixes for tcp keepalive bug, and tcp seq
nr wrapping bug. Can it be related ?

reagrds
JT



Server slowdown...

2004-04-10 Thread Jaroslaw Tabor
Hello!

I''ve strange problem with one of my servers. From time to time (once
per 2-3 months), something strange happends, and server starts working
very slow. What is strange, CPU load (from top) is about 5%, but
response time for network services is extremly high. Usually gives
timeout.
After reboot, everything is working perfect. The question is where to
start investigation. Can someone suggest some tool, to record statistics
of CPU, Network, IO(drives) in correlation with processes ?
Due to the fact, that problem occurs for all services, I suspect kernel
(2.2.26) problem, but how to extract it?
I see that 2.2.27pre1 has some fixes for tcp keepalive bug, and tcp seq
nr wrapping bug. Can it be related ?

reagrds
JT


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]