Re: kernel go-slow

2003-02-06 Thread Alexander Lyamin
Mon, Feb 03, 2003 at 12:27:40AM +0100, Russell Coker wrote:
 I'm running a number of machines with 2.4.20 and the ReiserFS journal patches.
 
 One problem that has started occuring is that periodically some of the 
 machines will go really slow for a while.  It's as if the CPU speed has just 
 dropped to 1% of it's regular speed.  Then after 10 minutes or so it will 
 continue as normal.

when its slows down, please check with vmstat for IO or with your
led for disk activity. thats a simply and stupid.

but theres no really good way to understand whats goining on in kernel
if you are userland yourself. so go in kernel with profiling and see
where does it spend it precisious time. slightly more complicated then
method above, but much more effective.

 
 Has anyone heard of such things before?
 
 I am asking here first because the ReiserFS patch is the most significant 
 kernel patch I've applied on what is otherwise a stock 2.4.20 kernel.
 
 Interestingly the machines that have the problems are not the most active in 
 the file system (mail store), but the mail spool machines.  The mail spool 
 machines do a good amount of file access (but well below the limits of the 
 hardware) and also use more memory and have large load spikes on occasion 
 (virus and spam scanning).

-- 
Cache remedies via multi-variable logic shorts will leave you crying.(cl)
Lex Lyamin



Re: kernel go-slow

2003-02-06 Thread Alexander Lyamin
Thu, Feb 06, 2003 at 02:26:49PM +0300, Alexander Lyamin wrote:
 Mon, Feb 03, 2003 at 12:27:40AM +0100, Russell Coker wrote:
  I'm running a number of machines with 2.4.20 and the ReiserFS journal patches.
  
  One problem that has started occuring is that periodically some of the 
  machines will go really slow for a while.  It's as if the CPU speed has just 
  dropped to 1% of it's regular speed.  Then after 10 minutes or so it will 
  continue as normal.
 
 when its slows down, please check with vmstat for IO or with your
i think i wasnt clear enough.
so - first , if you go-slow on a disk activity, chances are good
that it caused by FS or VM or their misunderstandings.

but there is possible situations that will not generate disk activity,
but may cause your system to go-slow, if there you have some 
unussual IO numbers while disk activity is moderate to low -
most likely same sweet pair.

but Oleg Drokin pointed at situations when even IO will not indicate
whats going on :)

so advice is still the same - if you having slowdowns profiling might help
you much better then  withchy methods described above.

 led for disk activity. thats a simply and stupid.
 
 but theres no really good way to understand whats goining on in kernel
 if you are userland yourself. so go in kernel with profiling and see
 where does it spend it precisious time. slightly more complicated then
 method above, but much more effective.
 
  
  Has anyone heard of such things before?
  
  I am asking here first because the ReiserFS patch is the most significant 
  kernel patch I've applied on what is otherwise a stock 2.4.20 kernel.
  
  Interestingly the machines that have the problems are not the most active in 
  the file system (mail store), but the mail spool machines.  The mail spool 
  machines do a good amount of file access (but well below the limits of the 
  hardware) and also use more memory and have large load spikes on occasion 
  (virus and spam scanning).
talking about  virus/spam scanning - what do you use and how its integrated in
your SMTP MTA ?

-- 
Cache remedies via multi-variable logic shorts will leave you crying.(cl)
Lex Lyamin



Re: kernel go-slow

2003-02-06 Thread Russell Coker
On Thu, 6 Feb 2003 17:32, Alexander Lyamin wrote:
   One problem that has started occuring is that periodically some of the
   machines will go really slow for a while.  It's as if the CPU speed has
   just dropped to 1% of it's regular speed.  Then after 10 minutes or so
   it will continue as normal.
 
  when its slows down, please check with vmstat for IO or with your

 i think i wasnt clear enough.
 so - first , if you go-slow on a disk activity, chances are good
 that it caused by FS or VM or their misunderstandings.

vmstat doesn't work properly.  CPU time is 99% system which suggests that one 
CPU is spending all it's time in kernel space (for both threads of a 
hyper-threaded CPU) or that both CPUs have each got one thread locked in 
kernel space.

It's not disk related, those machines don't have a huge disk access.  The 
machines with the serious disk activity don't have any problems.

 but there is possible situations that will not generate disk activity,
 but may cause your system to go-slow, if there you have some
 unussual IO numbers while disk activity is moderate to low -
 most likely same sweet pair.

The problem is that sar etc product jumbled results.  Profiling the kernel may 
help, but may also hide the error, and it's not something I can easily do.

The servers are locked in a managed server room on the other side of the city 
so seeing the blinken lights is not an option.

I've put the aa1 kernel on half the machines and now I'll wait to see what 
happens.  If the aa1 machines don't have the problem but the others do then 
I'll go all aa1.

   Interestingly the machines that have the problems are not the most
   active in the file system (mail store), but the mail spool machines. 
   The mail spool machines do a good amount of file access (but well below
   the limits of the hardware) and also use more memory and have large
   load spikes on occasion (virus and spam scanning).

 talking about  virus/spam scanning - what do you use and how its integrated
 in your SMTP MTA ?

RAV.  I'm not sure of the details, I think it runs as a daemon that qmail 
talks to.  I try to avoid the anti-virus stuff.

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page




Re: kernel go-slow

2003-02-06 Thread Oleg Drokin
Hello!

On Thu, Feb 06, 2003 at 05:41:46PM +0100, Russell Coker wrote:

  but there is possible situations that will not generate disk activity,
  but may cause your system to go-slow, if there you have some
  unussual IO numbers while disk activity is moderate to low -
  most likely same sweet pair.
 The problem is that sar etc product jumbled results.  Profiling the kernel may 
 help, but may also hide the error, and it's not something I can easily do.

Well, you can do it very easily.
reboot with profile=2 kernel option.
when 100% sys cpu situation started - execute readprofile -r
when it is finished, execute readprofile -m /path/to/System.map somefile
then sort somefile and you are done, you are now seeing where is most of the time
is spent.

 The servers are locked in a managed server room on the other side of the city 
 so seeing the blinken lights is not an option.

;)
humourwebcam/humour

 I've put the aa1 kernel on half the machines and now I'll wait to see what 
 happens.  If the aa1 machines don't have the problem but the others do then 
 I'll go all aa1.

Ah, if your problem was with highmem I/O not present, then that might actually help.

Bye,
Oleg



Re: kernel go-slow

2003-02-06 Thread Hans Reiser
Russell Coker wrote:


On Thu, 6 Feb 2003 17:32, Alexander Lyamin wrote:
 

One problem that has started occuring is that periodically some of the
machines will go really slow for a while.  It's as if the CPU speed has
just dropped to 1% of it's regular speed.  Then after 10 minutes or so
it will continue as normal.
   

when its slows down, please check with vmstat for IO or with your
 

i think i wasnt clear enough.
so - first , if you go-slow on a disk activity, chances are good
that it caused by FS or VM or their misunderstandings.
   


vmstat doesn't work properly.  CPU time is 99% system which suggests that one 
CPU is spending all it's time in kernel space (for both threads of a 
hyper-threaded CPU) or that both CPUs have each got one thread locked in 
kernel space.

 

I propose that you try reversing the datalogging patch for long enough 
to know whether it is our new code that is buggy.

If it is not our code, and it matters enough to justify the cost, we can 
remote login kernel analyze for you for an hourly fee.  Probably the fee 
you charge them is good enough for us too.;-)

--
Hans




kernel go-slow

2003-02-02 Thread Russell Coker
I'm running a number of machines with 2.4.20 and the ReiserFS journal patches.

One problem that has started occuring is that periodically some of the 
machines will go really slow for a while.  It's as if the CPU speed has just 
dropped to 1% of it's regular speed.  Then after 10 minutes or so it will 
continue as normal.

Has anyone heard of such things before?

I am asking here first because the ReiserFS patch is the most significant 
kernel patch I've applied on what is otherwise a stock 2.4.20 kernel.

Interestingly the machines that have the problems are not the most active in 
the file system (mail store), but the mail spool machines.  The mail spool 
machines do a good amount of file access (but well below the limits of the 
hardware) and also use more memory and have large load spikes on occasion 
(virus and spam scanning).

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page




Re: kernel go-slow

2003-02-02 Thread Rudy L. Zijlstra
Russell Coker wrote:


I'm running a number of machines with 2.4.20 and the ReiserFS journal patches.

One problem that has started occuring is that periodically some of the 
machines will go really slow for a while.  It's as if the CPU speed has just 
dropped to 1% of it's regular speed.  Then after 10 minutes or so it will 
continue as normal.

Has anyone heard of such things before?

 

Russell,

I am (was) running a vanilla 2.4.20 kernel and experienced a slow-down 
each night during virus scan. System would not respond to http during 
undefined moments. But rather repeatable each night, though each time at 
a different moment during the night. I've just rebooted into 2.4.19 to 
check whether its 2.4.20 or the results of hardware modification I did 2 
weeks ago. System is lightly loaded. file systems in use mostly Reiserfs 
and a spattering of left-over ext2.

Cheers,

Rudy



Re: kernel go-slow

2003-02-02 Thread Ookhoi
Russell Coker wrote (ao):
 I'm running a number of machines with 2.4.20 and the ReiserFS journal
 patches.

 One problem that has started occuring is that periodically some of the
 machines will go really slow for a while. It's as if the CPU speed has
 just dropped to 1% of it's regular speed. Then after 10 minutes or so
 it will continue as normal.

 Has anyone heard of such things before?

It seems there is a 'bug' in 2.4.20 which causes the stall. (don't know
the details, but you're not the only one).

Maybe a -pre fixes it, though in your case I would wait for .21 I think.