Re: system slowdown - vnode related

2003-06-16 Thread Masachika ISHIZUKA
   I have still vnodes problem in 4.8-stable with /sys/kern/vfs_subr.c
 1.249.2.30.
 
 Ishizuka-san, could you possibly try the following command line
 repeatedly while slowdown is being observed?
 
   % vmstat -m | grep '^ *vfscache'
 
 If the third number of its output is approaching or hitting the fourth,
 the chances are your kernel is running out of memory for namecache,
 which was actually the case on my machines.

  Hi, nagao-san.

  I stopped 310.locate of weekly cron and the slow down is not occurred so
often. The slow down was just occurred as follows with a dual Xeon machine
(Xeon 2.4GHz x 2, 2 giga byte rams, 4.8-Stable with SMP and HTT option).

% sysctl -a|grep vnodes
kern.maxvnodes: 14
kern.minvnodes: 33722
debug.numvnodes: 140025
debug.wantfreevnodes: 25
debug.freevnodes: 76
% vmstat -m | grep '^ *vfscache'
 vfscache818445 52184K  72819K102400K 197586220 0  64,128,256,512K

  It seems that the third number is smaller enough than fourth.
  I typed 'sysctl kern.maxvnodes=15' and the machine is recovered.

-- 
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system slowdown - vnode related

2003-06-09 Thread Masachika ISHIZUKA
   I have still vnodes problem in 4.8-stable with /sys/kern/vfs_subr.c
 1.249.2.30.
 
 (1) #1 machine (Celeron 466 with 256 mega byte rams)
 
   % sysctl kern.maxvnodes
   kern.maxvnodes: 17979
   % sysctl vm.zone | grep VNODE
   VNODE:   192,0,  18004,122,18004
 
 This looks pretty normal to me for a quiescent system.

  Hi, David-san.
  Thank you for mail.
  I think the used(18004) exceeds maxvnodes(17979), isn't it ?

 I would actually suggest raising maxvnodes if you have lots of
 little files.  Does the number of vnodes shoot up when 310.locate
 runs?

  The value shown above is the value at slow down time of 310.locate.
The number of used vnodes is low at boot up until 310.locate invoked.

 Did you get a backtrace from the panics?

  It's too hard for me. Is there any way to do it ?

-- 
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system slowdown - vnode related

2003-06-09 Thread David Schultz
On Mon, Jun 09, 2003, Masachika ISHIZUKA wrote:
I have still vnodes problem in 4.8-stable with /sys/kern/vfs_subr.c
  1.249.2.30.
  
  (1) #1 machine (Celeron 466 with 256 mega byte rams)
  
% sysctl kern.maxvnodes
kern.maxvnodes: 17979
% sysctl vm.zone | grep VNODE
VNODE:   192,0,  18004,122,18004
  
  This looks pretty normal to me for a quiescent system.
 
   Hi, David-san.
   Thank you for mail.
   I think the used(18004) exceeds maxvnodes(17979), isn't it ?

Only by a little bit.  maxvnodes isn't a hard limit, since making
it a hard limit would lead to deadlocks.  Instead, the system
garbage collects vnodes to keep the number roughly in line with
maxvnodes.  Judging by the numbers above, it's doing a pretty good
job, but that's probably because, from the looks of it, you
just booted the system.

The reason it might make sense to increase maxvnodes is that
having vnlru work overtime to keep your vnode count low may
result in vnodes being freed that are still needed, e.g. by the
buffer cache.  This would cause the slowdown you were mentioning.

(As a disclaimer, Tor Egge and Matt Dillon know far more about
this than I do.)

  I would actually suggest raising maxvnodes if you have lots of
  little files.  Does the number of vnodes shoot up when 310.locate
  runs?
 
   The value shown above is the value at slow down time of 310.locate.
 The number of used vnodes is low at boot up until 310.locate invoked.
 
  Did you get a backtrace from the panics?
 
   It's too hard for me. Is there any way to do it ?

The panics might be unrelated to the number of vnodes, so it's
important that we have additional information.  See:

http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system slowdown - vnode related

2003-06-09 Thread Masachika ISHIZUKA
   I have still vnodes problem in 4.8-stable with /sys/kern/vfs_subr.c
 1.249.2.30.
 
   % sysctl kern.maxvnodes
   kern.maxvnodes: 17979
   % sysctl vm.zone | grep VNODE
   VNODE:   192,0,  18004,122,18004
 
 This looks pretty normal to me for a quiescent system.

   I think the used(18004) exceeds maxvnodes(17979), isn't it ?
 
 Only by a little bit.  maxvnodes isn't a hard limit, since making
 it a hard limit would lead to deadlocks.  Instead, the system
 garbage collects vnodes to keep the number roughly in line with
 maxvnodes.  Judging by the numbers above, it's doing a pretty good
 job, but that's probably because, from the looks of it, you
 just booted the system.

  Hi, David-san.
  Thank you for mail.
  I understood.

 The reason it might make sense to increase maxvnodes is that
 having vnlru work overtime to keep your vnode count low may
 result in vnodes being freed that are still needed, e.g. by the
 buffer cache.  This would cause the slowdown you were mentioning.

  I will try to increase kern.maxvnodes when the machine is slowdown.
But I can not reproduce slowdown in experimental environment, yet.

 Did you get a backtrace from the panics?
 
   It's too hard for me. Is there any way to do it ?
 
 The panics might be unrelated to the number of vnodes, so it's
 important that we have additional information.  See:
 
 http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html

  I'll try.  Thank you very much.

-- 
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system slowdown - vnode related

2003-05-27 Thread Marc G. Fournier
On Mon, 25 May 2003, Mike Harding wrote:

 I'm running a very recent RELENG-4 - but I had a suspicion that this was
 unionfs related, so I unmounted the /usr/ports union mounts under a jail
 in case this was causing the problem, and haven't seen the problem
 since.  It's possible I accidently reverted to 4.8 when I built a
 release, but I don't see how...

'K, I wouldn't touch anything less then 4.8-STABLE ... the last set of
vnode related patches that I'm aware of were made *post* 4.8-RELEASE,
which, I believe, won't be included in RELENG-4 ...

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system slowdown - vnode related

2003-05-27 Thread Marc G. Fournier
On Mon, 26 May 2003, Mike Harding wrote:

 Er - are any changes made to RELENG_4_8 that aren't made to RELENG_4?  I
 thought it was the other way around - that 4_8 only got _some_ of the
 changes to RELENG_4...

Ack, my fault ... sorry, wasn't thinking :(  RELENG_4 is correct ... I
should have confirmed my settings before blathering on ...

One of the scripts I used extensively while debugging this ... a quite
simple one .. was:

#!/bin/tcsh
while ( 1 )
  echo `sysctl debug.numvnodes` - `sysctl debug.freevnodes` - `sysctl 
debug.vnlru_nowhere` - `ps auxl | grep vnlru | grep -v grep | awk '{print $20}'`
  sleep 10
end

which outputs this:

debug.numvnodes: 463421 - debug.freevnodes: 220349 - debug.vnlru_nowhere: 3 - vlruwt

I have my maxvnodes set to 512k right now ... now, when the server hung,
the output would look something like (this would be with 'default' vnodes):

debug.numvnodes: 199252 - debug.freevnodes: 23 - debug.vnlru_nowhere: 12 - vlrup

with the critical bit being the vlruwt - vlrup change ...

with unionfs, you are using two vnodes per file, instead of one in
non-union mode, which is why I went to 512k vs the default of ~256k vnodes
... it doesn't *fix* the problem, it only reduces its occurance ...
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system slowdown - vnode related

2003-05-27 Thread Marc G. Fournier

Ack, only 512Meg of memory? :(  k, now you are beyond me on this one ...
I'm running 4Gig on our server, with 2Gig allocated to kernel memory ...

only thing I can suggest is try slowly incrementing up your maxvnodes and
see if that helps, but, I'm not sure where your upper threshhold is with
512Meg of RAM ...

On Mon, 26 May 2003, Mike Harding wrote:

 On my sytem, with 512 meg of memory, I have the following (default)
 vnode related values:

 bash-2.05b$ sysctl -a | grep vnode
 kern.maxvnodes: 36079
 kern.minvnodes: 9019
 vm.stats.vm.v_vnodein: 140817
 vm.stats.vm.v_vnodeout: 0
 vm.stats.vm.v_vnodepgsin: 543264
 vm.stats.vm.v_vnodepgsout: 0
 debug.sizeof.vnode: 168
 debug.numvnodes: 33711
 debug.wantfreevnodes: 25
 debug.freevnodes: 5823

 ...is this really low?  Is this something that should go into
 tuning(7)?  I searched on google and found basically nothing related to
 adjust vnodes - although I am admittedly flogging the system - I have
 leafnode+ running, a mirrored CVS tree, an experimental CVS tree,
 mount_union'd /usr/ports in a jaile, and so on.  Damn those $1 a
 gigabyte drives!

 On Mon, 2003-05-26 at 09:12, Marc G. Fournier wrote:
  On Mon, 26 May 2003, Mike Harding wrote:
 
   Er - are any changes made to RELENG_4_8 that aren't made to RELENG_4?  I
   thought it was the other way around - that 4_8 only got _some_ of the
   changes to RELENG_4...
 
  Ack, my fault ... sorry, wasn't thinking :(  RELENG_4 is correct ... I
  should have confirmed my settings before blathering on ...
 
  One of the scripts I used extensively while debugging this ... a quite
  simple one .. was:
 
  #!/bin/tcsh
  while ( 1 )
echo `sysctl debug.numvnodes` - `sysctl debug.freevnodes` - `sysctl 
  debug.vnlru_nowhere` - `ps auxl | grep vnlru | grep -v grep | awk '{print $20}'`
sleep 10
  end
 
  which outputs this:
 
  debug.numvnodes: 463421 - debug.freevnodes: 220349 - debug.vnlru_nowhere: 3 - 
  vlruwt
 
  I have my maxvnodes set to 512k right now ... now, when the server hung,
  the output would look something like (this would be with 'default' vnodes):
 
  debug.numvnodes: 199252 - debug.freevnodes: 23 - debug.vnlru_nowhere: 12 - vlrup
 
  with the critical bit being the vlruwt - vlrup change ...
 
  with unionfs, you are using two vnodes per file, instead of one in
  non-union mode, which is why I went to 512k vs the default of ~256k vnodes
  ... it doesn't *fix* the problem, it only reduces its occurance ...


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system slowdown - vnode related

2003-05-27 Thread Matthew Dillon
I'm a little confused.  What state is the vnlru kernel thread in?  It
sounds like vnlru must be stuck.

Note that you can gdb the live kernel and get a stack backtrace of any
stuck process.

gdb -k /kernel.debug /dev/mem   (or whatever)
proc N  (e.g. vnlru's pid)
back

All the processes stuck in 'inode' are likely associated with the 
problem, but if that is what is causing vnlru to be stuck I would expect
vnlru itself to be stuck in 'inode'.

unionfs is probably responsible.  I would not be surprised at all if 
unionfs is causing a deadlock somewhere which is creating a chain of
processes stuck in 'inode' which is in turn causing vnlru to get stuck.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

:
:On Mon, 26 May 2003, Mike Harding wrote:
:
: Er - are any changes made to RELENG_4_8 that aren't made to RELENG_4?  I
: thought it was the other way around - that 4_8 only got _some_ of the
: changes to RELENG_4...
:
:Ack, my fault ... sorry, wasn't thinking :(  RELENG_4 is correct ... I
:should have confirmed my settings before blathering on ...
:
:One of the scripts I used extensively while debugging this ... a quite
:simple one .. was:
:
:#!/bin/tcsh
:while ( 1 )
:  echo `sysctl debug.numvnodes` - `sysctl debug.freevnodes` - `sysctl 
debug.vnlru_nowhere` - `ps auxl | grep vnlru | grep -v grep | awk '{print $20}'`
:  sleep 10
:end
:
:which outputs this:
:
:debug.numvnodes: 463421 - debug.freevnodes: 220349 - debug.vnlru_nowhere: 3 - vlruwt
:
:I have my maxvnodes set to 512k right now ... now, when the server hung,
:the output would look something like (this would be with 'default' vnodes):
:
:debug.numvnodes: 199252 - debug.freevnodes: 23 - debug.vnlru_nowhere: 12 - vlrup
:
:with the critical bit being the vlruwt - vlrup change ...
:
:with unionfs, you are using two vnodes per file, instead of one in
:non-union mode, which is why I went to 512k vs the default of ~256k vnodes
:... it doesn't *fix* the problem, it only reduces its occurance ...
:___
:[EMAIL PROTECTED] mailing list
:http://lists.freebsd.org/mailman/listinfo/freebsd-stable
:To unsubscribe, send any mail to [EMAIL PROTECTED]
:

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system slowdown - vnode related

2003-05-27 Thread Mike Harding
I'll try this if I can tickle the bug again.

I may have just run out of freevnodes - I only have about 1-2000 free
right now.  I was just surprised because I have never seen a reference
to tuning this sysctl.

- Mike H.

On Tue, 2003-05-27 at 11:09, Matthew Dillon wrote:
 I'm a little confused.  What state is the vnlru kernel thread in?  It
 sounds like vnlru must be stuck.
 
 Note that you can gdb the live kernel and get a stack backtrace of any
 stuck process.
 
 gdb -k /kernel.debug /dev/mem (or whatever)
 proc N(e.g. vnlru's pid)
 back
 
 All the processes stuck in 'inode' are likely associated with the 
 problem, but if that is what is causing vnlru to be stuck I would expect
 vnlru itself to be stuck in 'inode'.
 
 unionfs is probably responsible.  I would not be surprised at all if 
 unionfs is causing a deadlock somewhere which is creating a chain of
 processes stuck in 'inode' which is in turn causing vnlru to get stuck.
 
   -Matt
   Matthew Dillon 
   [EMAIL PROTECTED]
 
 :
 :On Mon, 26 May 2003, Mike Harding wrote:
 :
 : Er - are any changes made to RELENG_4_8 that aren't made to RELENG_4?  I
 : thought it was the other way around - that 4_8 only got _some_ of the
 : changes to RELENG_4...
 :
 :Ack, my fault ... sorry, wasn't thinking :(  RELENG_4 is correct ... I
 :should have confirmed my settings before blathering on ...
 :
 :One of the scripts I used extensively while debugging this ... a quite
 :simple one .. was:
 :
 :#!/bin/tcsh
 :while ( 1 )
 :  echo `sysctl debug.numvnodes` - `sysctl debug.freevnodes` - `sysctl 
 debug.vnlru_nowhere` - `ps auxl | grep vnlru | grep -v grep | awk '{print $20}'`
 :  sleep 10
 :end
 :
 :which outputs this:
 :
 :debug.numvnodes: 463421 - debug.freevnodes: 220349 - debug.vnlru_nowhere: 3 - vlruwt
 :
 :I have my maxvnodes set to 512k right now ... now, when the server hung,
 :the output would look something like (this would be with 'default' vnodes):
 :
 :debug.numvnodes: 199252 - debug.freevnodes: 23 - debug.vnlru_nowhere: 12 - vlrup
 :
 :with the critical bit being the vlruwt - vlrup change ...
 :
 :with unionfs, you are using two vnodes per file, instead of one in
 :non-union mode, which is why I went to 512k vs the default of ~256k vnodes
 :... it doesn't *fix* the problem, it only reduces its occurance ...
 :___
 :[EMAIL PROTECTED] mailing list
 :http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 :To unsubscribe, send any mail to [EMAIL PROTECTED]
 :

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system slowdown - vnode related

2003-05-27 Thread Matthew Dillon

:I'll try this if I can tickle the bug again.
:
:I may have just run out of freevnodes - I only have about 1-2000 free
:right now.  I was just surprised because I have never seen a reference
:to tuning this sysctl.
:
:- Mike H.

The vnode subsystem is *VERY* sensitive to running out of KVM, meaning
that setting too high a kern.maxvnodes value is virtually guarenteed to
lockup the system under certain circumstances.  If you can reliably
reproduce the lockup with maxvnodes set fairly low (e.g. less then
100,000) then it ought to be easier to track the deadlock down.

Historically speaking systems did not have enough physical memory to
actually run out of vnodes.. they would run out of physical memory
first which would cause VM pages to be reused and their underlying
vnodes deallocated when the last page went away.  Hence the amount of
KVM being used to manage vnodes (vnode and inode structures) was kept
under control.

But today's Intel systems have far more physical memory relative to
available KVM and it is possible for the vnode management to run
out of KVM before the VM system runs out of physical memory.

The vnlru kernel thread is an attempt to control this problem but it
has had only mixed success in complex vnode management situations 
like unionfs where an operation on a vnode may cause accesses to
additional underlying vnodes.  In otherwords, vnlru can potentially
shoot itself in the foot in such situations while trying to flush out
vnodes.

-Matt

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]