Re: dogfooding over in clusteradm land

2012-01-03 Thread Don Lewis
On  2 Jan, Don Lewis wrote:
 On  2 Jan, Don Lewis wrote:
 On  2 Jan, Florian Smeets wrote:
 
 This does not make a difference. I tried on 32K/4K with/without journal
 and on 16K/2K all exhibit the same problem. At some point during the
 cvs2svn conversion the sycer starts to use 100% CPU. The whole process
 hangs at that point sometimes for hours, from time to time it does
 continue doing some work, but really really slow. It's usually between
 revision 21 and 22, when the resulting svn file gets bigger than
 about 11-12Gb. At that point an ls in the target dir hangs in state ufs.
 
 I broke into ddb and ran all commands which i thought could be useful.
 The output is at http://tb.smeets.im/~flo/giant-ape_syncer.txt
 
 Tracing command syncer pid 9 tid 100183 td 0xfe00120e9000
 cpustop_handler() at cpustop_handler+0x2b
 ipi_nmi_handler() at ipi_nmi_handler+0x50
 trap() at trap+0x1a8
 nmi_calltrap() at nmi_calltrap+0x8
 --- trap 0x13, rip = 0x8082ba43, rsp = 0xff8000270fe0, rbp = 
 0xff88c97829a0 ---
 _mtx_assert() at _mtx_assert+0x13
 pmap_remove_write() at pmap_remove_write+0x38
 vm_object_page_remove_write() at vm_object_page_remove_write+0x1f
 vm_object_page_clean() at vm_object_page_clean+0x14d
 vfs_msync() at vfs_msync+0xf1
 sync_fsync() at sync_fsync+0x12a
 sync_vnode() at sync_vnode+0x157
 sched_sync() at sched_sync+0x1d1
 fork_exit() at fork_exit+0x135
 fork_trampoline() at fork_trampoline+0xe
 --- trap 0, rip = 0, rsp = 0xff88c9782d00, rbp = 0 ---
 
 I thinks this explains why the r228838 patch seems to help the problem.
 Instead of an application call to msync(), you're getting bitten by the
 syncer doing the equivalent.  I don't know why the syncer is CPU bound,
 though.  From my understanding of the patch it only optimizes the I/O.
 Without the patch, I would expect that the syncer would just spend a lot
 of time waiting on I/O.  My guess is that this is actually a vm problem.
 There are nested loops in vm_object_page_clean() and
 vm_object_page_remove_write(), so you could be doing something that's
 causing lots of looping in that code.
 
 Does the machine recover if you suspend cvs2svn?  I think what is
 happening is that cvs2svn is continuing to dirty pages while the syncer
 is trying to sync the file.  From my limited understanding of this code,
 it looks to me like every time cvs2svn dirties a page, it will trigger a
 call to vm_object_set_writeable_dirty(), which will increment
 object-generation.  Whenever vm_object_page_clean() detects a change in
 the generation count, it restarts its scan of the pages associated with
 the object.  This is probably not optimal ...

Since the syncer is only trying to flush out pages that have been dirty
for the last 30 seconds, I think that vm_object_set_writeable_dirty()
should just make one pass through the object, ignoring generation, and
then return when it is called from the syncer.  That should keep
vm_object_set_writeable_dirty() from looping over the object again and
again if another process is actively dirtying the object.


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: dogfooding over in clusteradm land

2012-01-03 Thread Kostik Belousov
On Tue, Jan 03, 2012 at 12:02:22AM -0800, Don Lewis wrote:
 On  2 Jan, Don Lewis wrote:
  On  2 Jan, Don Lewis wrote:
  On  2 Jan, Florian Smeets wrote:
  
  This does not make a difference. I tried on 32K/4K with/without journal
  and on 16K/2K all exhibit the same problem. At some point during the
  cvs2svn conversion the sycer starts to use 100% CPU. The whole process
  hangs at that point sometimes for hours, from time to time it does
  continue doing some work, but really really slow. It's usually between
  revision 21 and 22, when the resulting svn file gets bigger than
  about 11-12Gb. At that point an ls in the target dir hangs in state ufs.
  
  I broke into ddb and ran all commands which i thought could be useful.
  The output is at http://tb.smeets.im/~flo/giant-ape_syncer.txt
  
  Tracing command syncer pid 9 tid 100183 td 0xfe00120e9000
  cpustop_handler() at cpustop_handler+0x2b
  ipi_nmi_handler() at ipi_nmi_handler+0x50
  trap() at trap+0x1a8
  nmi_calltrap() at nmi_calltrap+0x8
  --- trap 0x13, rip = 0x8082ba43, rsp = 0xff8000270fe0, rbp = 
  0xff88c97829a0 ---
  _mtx_assert() at _mtx_assert+0x13
  pmap_remove_write() at pmap_remove_write+0x38
  vm_object_page_remove_write() at vm_object_page_remove_write+0x1f
  vm_object_page_clean() at vm_object_page_clean+0x14d
  vfs_msync() at vfs_msync+0xf1
  sync_fsync() at sync_fsync+0x12a
  sync_vnode() at sync_vnode+0x157
  sched_sync() at sched_sync+0x1d1
  fork_exit() at fork_exit+0x135
  fork_trampoline() at fork_trampoline+0xe
  --- trap 0, rip = 0, rsp = 0xff88c9782d00, rbp = 0 ---
  
  I thinks this explains why the r228838 patch seems to help the problem.
  Instead of an application call to msync(), you're getting bitten by the
  syncer doing the equivalent.  I don't know why the syncer is CPU bound,
  though.  From my understanding of the patch it only optimizes the I/O.
  Without the patch, I would expect that the syncer would just spend a lot
  of time waiting on I/O.  My guess is that this is actually a vm problem.
  There are nested loops in vm_object_page_clean() and
  vm_object_page_remove_write(), so you could be doing something that's
  causing lots of looping in that code.
  
  Does the machine recover if you suspend cvs2svn?  I think what is
  happening is that cvs2svn is continuing to dirty pages while the syncer
  is trying to sync the file.  From my limited understanding of this code,
  it looks to me like every time cvs2svn dirties a page, it will trigger a
  call to vm_object_set_writeable_dirty(), which will increment
  object-generation.  Whenever vm_object_page_clean() detects a change in
  the generation count, it restarts its scan of the pages associated with
  the object.  This is probably not optimal ...
 
 Since the syncer is only trying to flush out pages that have been dirty
 for the last 30 seconds, I think that vm_object_set_writeable_dirty()
 should just make one pass through the object, ignoring generation, and
 then return when it is called from the syncer.  That should keep
 vm_object_set_writeable_dirty() from looping over the object again and
 again if another process is actively dirtying the object.
 
This sounds very plausible. I think that there is no sense in restarting
the scan if it is requested in async mode at all. See below.

Would be thrilled if this finally solves the svn2cvs issues.

commit 41aaafe5e3be5387949f303b8766da64ee4a521f
Author: Kostik Belousov kostik@sirion
Date:   Tue Jan 3 11:16:30 2012 +0200

Do not restart the scan in vm_object_page_clean() if requested
mode is async.

Proposed by:truckman

diff --git a/sys/vm/vm_object.c b/sys/vm/vm_object.c
index 716916f..52fc08b 100644
--- a/sys/vm/vm_object.c
+++ b/sys/vm/vm_object.c
@@ -841,7 +841,8 @@ rescan:
if (p-valid == 0)
continue;
if (vm_page_sleep_if_busy(p, TRUE, vpcwai)) {
-   if (object-generation != curgeneration)
+   if ((flags  OBJPC_SYNC) != 0 
+   object-generation != curgeneration)
goto rescan;
np = vm_page_find_least(object, pi);
continue;
@@ -851,7 +852,8 @@ rescan:
 
n = vm_object_page_collect_flush(object, p, pagerflags,
flags, clearobjflags);
-   if (object-generation != curgeneration)
+   if ((flags  OBJPC_SYNC) != 0 
+   object-generation != curgeneration)
goto rescan;
 
/*


pgpCaQxKBZews.pgp
Description: PGP signature


Re: dogfooding over in clusteradm land

2012-01-03 Thread Don Lewis
On  3 Jan, Kostik Belousov wrote:

 This sounds very plausible. I think that there is no sense in restarting
 the scan if it is requested in async mode at all. See below.
 
 Would be thrilled if this finally solves the svn2cvs issues.
 
 commit 41aaafe5e3be5387949f303b8766da64ee4a521f
 Author: Kostik Belousov kostik@sirion
 Date:   Tue Jan 3 11:16:30 2012 +0200
 
 Do not restart the scan in vm_object_page_clean() if requested
 mode is async.
 
 Proposed by:  truckman
 
 diff --git a/sys/vm/vm_object.c b/sys/vm/vm_object.c
 index 716916f..52fc08b 100644
 --- a/sys/vm/vm_object.c
 +++ b/sys/vm/vm_object.c
 @@ -841,7 +841,8 @@ rescan:
   if (p-valid == 0)
   continue;
   if (vm_page_sleep_if_busy(p, TRUE, vpcwai)) {
 - if (object-generation != curgeneration)
 + if ((flags  OBJPC_SYNC) != 0 
 + object-generation != curgeneration)
   goto rescan;
   np = vm_page_find_least(object, pi);
   continue;

I wonder if it would make more sense to just skip the busy pages in
async mode instead of sleeping ...


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: dogfooding over in clusteradm land

2012-01-03 Thread Kostik Belousov
On Tue, Jan 03, 2012 at 01:45:26AM -0800, Don Lewis wrote:
 On  3 Jan, Kostik Belousov wrote:
 
  This sounds very plausible. I think that there is no sense in restarting
  the scan if it is requested in async mode at all. See below.
  
  Would be thrilled if this finally solves the svn2cvs issues.
  
  commit 41aaafe5e3be5387949f303b8766da64ee4a521f
  Author: Kostik Belousov kostik@sirion
  Date:   Tue Jan 3 11:16:30 2012 +0200
  
  Do not restart the scan in vm_object_page_clean() if requested
  mode is async.
  
  Proposed by:truckman
  
  diff --git a/sys/vm/vm_object.c b/sys/vm/vm_object.c
  index 716916f..52fc08b 100644
  --- a/sys/vm/vm_object.c
  +++ b/sys/vm/vm_object.c
  @@ -841,7 +841,8 @@ rescan:
  if (p-valid == 0)
  continue;
  if (vm_page_sleep_if_busy(p, TRUE, vpcwai)) {
  -   if (object-generation != curgeneration)
  +   if ((flags  OBJPC_SYNC) != 0 
  +   object-generation != curgeneration)
  goto rescan;
  np = vm_page_find_least(object, pi);
  continue;
 
 I wonder if it would make more sense to just skip the busy pages in
 async mode instead of sleeping ...
 
It would be too much weakening the guarantee of the vfs_msync(MNT_NOWAIT)
to not write such pages, IMO. Busy state indeed means that the page most
likely undergoing the i/o, but in case it is not, we would not write it
at all.

Lets see whether the change alone helps. Do you agree ?


pgpsejHYZyDCu.pgp
Description: PGP signature


Re: dogfooding over in clusteradm land

2012-01-03 Thread Don Lewis
On  3 Jan, Kostik Belousov wrote:
 On Tue, Jan 03, 2012 at 01:45:26AM -0800, Don Lewis wrote:
 On  3 Jan, Kostik Belousov wrote:
 
  This sounds very plausible. I think that there is no sense in restarting
  the scan if it is requested in async mode at all. See below.
  
  Would be thrilled if this finally solves the svn2cvs issues.
  
  commit 41aaafe5e3be5387949f303b8766da64ee4a521f
  Author: Kostik Belousov kostik@sirion
  Date:   Tue Jan 3 11:16:30 2012 +0200
  
  Do not restart the scan in vm_object_page_clean() if requested
  mode is async.
  
  Proposed by:   truckman
  
  diff --git a/sys/vm/vm_object.c b/sys/vm/vm_object.c
  index 716916f..52fc08b 100644
  --- a/sys/vm/vm_object.c
  +++ b/sys/vm/vm_object.c
  @@ -841,7 +841,8 @@ rescan:
 if (p-valid == 0)
 continue;
 if (vm_page_sleep_if_busy(p, TRUE, vpcwai)) {
  -  if (object-generation != curgeneration)
  +  if ((flags  OBJPC_SYNC) != 0 
  +  object-generation != curgeneration)
 goto rescan;
 np = vm_page_find_least(object, pi);
 continue;
 
 I wonder if it would make more sense to just skip the busy pages in
 async mode instead of sleeping ...
 
 It would be too much weakening the guarantee of the vfs_msync(MNT_NOWAIT)
 to not write such pages, IMO. Busy state indeed means that the page most
 likely undergoing the i/o, but in case it is not, we would not write it
 at all.

If the original code detects a busy page, it sleeps and then continues
with the next page if generation hasn't changed.  If generation has
changed, then it restarts the scan.

With your change above, the code will skip the busy page after sleeping
if it is running in async mode.  It won't make another attempt to write
this page because it no longer attempts to rescan.

My suggestion just omits the sleep in this particular case.

The syncer should write the page the next time it runs, unless we're
particularly unlucky ...

 Lets see whether the change alone helps. Do you agree ?

Your patch is definitely worth trying as-is.  My latest suggestion is
probably a minor additional optimization.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: dogfooding over in clusteradm land

2012-01-03 Thread Kostik Belousov
On Tue, Jan 03, 2012 at 02:57:17AM -0800, Don Lewis wrote:
 On  3 Jan, Kostik Belousov wrote:
  On Tue, Jan 03, 2012 at 01:45:26AM -0800, Don Lewis wrote:
  On  3 Jan, Kostik Belousov wrote:
  
   This sounds very plausible. I think that there is no sense in restarting
   the scan if it is requested in async mode at all. See below.
   
   Would be thrilled if this finally solves the svn2cvs issues.
   
   commit 41aaafe5e3be5387949f303b8766da64ee4a521f
   Author: Kostik Belousov kostik@sirion
   Date:   Tue Jan 3 11:16:30 2012 +0200
   
   Do not restart the scan in vm_object_page_clean() if requested
   mode is async.
   
   Proposed by: truckman
   
   diff --git a/sys/vm/vm_object.c b/sys/vm/vm_object.c
   index 716916f..52fc08b 100644
   --- a/sys/vm/vm_object.c
   +++ b/sys/vm/vm_object.c
   @@ -841,7 +841,8 @@ rescan:
if (p-valid == 0)
continue;
if (vm_page_sleep_if_busy(p, TRUE, vpcwai)) {
   -if (object-generation != curgeneration)
   +if ((flags  OBJPC_SYNC) != 0 
   +object-generation != curgeneration)
goto rescan;
np = vm_page_find_least(object, pi);
continue;
  
  I wonder if it would make more sense to just skip the busy pages in
  async mode instead of sleeping ...
  
  It would be too much weakening the guarantee of the vfs_msync(MNT_NOWAIT)
  to not write such pages, IMO. Busy state indeed means that the page most
  likely undergoing the i/o, but in case it is not, we would not write it
  at all.
 
 If the original code detects a busy page, it sleeps and then continues
 with the next page if generation hasn't changed.  If generation has
 changed, then it restarts the scan.
 
 With your change above, the code will skip the busy page after sleeping
 if it is running in async mode.  It won't make another attempt to write
 this page because it no longer attempts to rescan.
Why would it skip it ? Please note the call to vm_page_find_least()
with the pindex of the busy page right after the check for
generation. If a page with the pindex is still present in the object,
vm_page_find_least() should return it, and vm_object_page_clean() should
make another attempt at processing it.

Am I missing something ?
 
 My suggestion just omits the sleep in this particular case.
 
 The syncer should write the page the next time it runs, unless we're
 particularly unlucky ...
 
  Lets see whether the change alone helps. Do you agree ?
 
 Your patch is definitely worth trying as-is.  My latest suggestion is
 probably a minor additional optimization.


pgpe8aTZBd2Ul.pgp
Description: PGP signature


Re: dogfooding over in clusteradm land

2012-01-03 Thread Don Lewis
On  3 Jan, Kostik Belousov wrote:

 With your change above, the code will skip the busy page after sleeping
 if it is running in async mode.  It won't make another attempt to write
 this page because it no longer attempts to rescan.
 Why would it skip it ? Please note the call to vm_page_find_least()
 with the pindex of the busy page right after the check for
 generation. If a page with the pindex is still present in the object,
 vm_page_find_least() should return it, and vm_object_page_clean() should
 make another attempt at processing it.
 
 Am I missing something ?

Nope, I was missing something ...

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: dogfooding over in clusteradm land

2012-01-03 Thread Florian Smeets
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03.01.2012 10:18, Kostik Belousov wrote:
 On Tue, Jan 03, 2012 at 12:02:22AM -0800, Don Lewis wrote:
 On  2 Jan, Don Lewis wrote:
 On  2 Jan, Don Lewis wrote:
 On  2 Jan, Florian Smeets wrote:
 
 This does not make a difference. I tried on 32K/4K 
 with/without journal and on 16K/2K all exhibit the same 
 problem. At some point during the cvs2svn conversion the 
 sycer starts to use 100% CPU. The whole process hangs at 
 that point sometimes for hours, from time to time it does 
 continue doing some work, but really really slow. It's 
 usually between revision 21 and 22, when the 
 resulting svn file gets bigger than about 11-12Gb. At that 
 point an ls in the target dir hangs in state ufs.
 
 I broke into ddb and ran all commands which i thought
 could be useful. The output is at 
 http://tb.smeets.im/~flo/giant-ape_syncer.txt
 
 Tracing command syncer pid 9 tid 100183 td 0xfe00120e9000
 cpustop_handler() at cpustop_handler+0x2b ipi_nmi_handler()
 at ipi_nmi_handler+0x50 trap() at trap+0x1a8 nmi_calltrap()
 at nmi_calltrap+0x8 --- trap 0x13, rip = 0x8082ba43,
 rsp = 0xff8000270fe0, rbp = 0xff88c97829a0 ---
 _mtx_assert() at _mtx_assert+0x13 pmap_remove_write() at
 pmap_remove_write+0x38 vm_object_page_remove_write() at 
 vm_object_page_remove_write+0x1f vm_object_page_clean() at 
 vm_object_page_clean+0x14d vfs_msync() at vfs_msync+0xf1 
 sync_fsync() at sync_fsync+0x12a sync_vnode() at 
 sync_vnode+0x157 sched_sync() at sched_sync+0x1d1
 fork_exit() at fork_exit+0x135 fork_trampoline() at
 fork_trampoline+0xe --- trap 0, rip = 0, rsp =
 0xff88c9782d00, rbp = 0 ---
 
 I thinks this explains why the r228838 patch seems to help 
 the problem. Instead of an application call to msync(), 
 you're getting bitten by the syncer doing the equivalent.  I 
 don't know why the syncer is CPU bound, though.  From my 
 understanding of the patch it only optimizes the I/O.
 Without the patch, I would expect that the syncer would just
 spend a lot of time waiting on I/O.  My guess is that this
 is actually a vm problem. There are nested loops in 
 vm_object_page_clean() and vm_object_page_remove_write(), so 
 you could be doing something that's causing lots of looping 
 in that code.
 
 Does the machine recover if you suspend cvs2svn?  I think what 
 is happening is that cvs2svn is continuing to dirty pages
 while the syncer is trying to sync the file.  From my limited 
 understanding of this code, it looks to me like every time 
 cvs2svn dirties a page, it will trigger a call to 
 vm_object_set_writeable_dirty(), which will increment 
 object-generation.  Whenever vm_object_page_clean() detects a 
 change in the generation count, it restarts its scan of the 
 pages associated with the object.  This is probably not
 optimal ...
 
 Since the syncer is only trying to flush out pages that have
 been dirty for the last 30 seconds, I think that 
 vm_object_set_writeable_dirty() should just make one pass
 through the object, ignoring generation, and then return when it
 is called from the syncer.  That should keep 
 vm_object_set_writeable_dirty() from looping over the object 
 again and again if another process is actively dirtying the 
 object.
 
 This sounds very plausible. I think that there is no sense in 
 restarting the scan if it is requested in async mode at all. See 
 below.
 
 Would be thrilled if this finally solves the svn2cvs issues.
 
 commit 41aaafe5e3be5387949f303b8766da64ee4a521f Author: Kostik 
 Belousov kostik@sirion Date:   Tue Jan 3 11:16:30 2012 +0200
 
 Do not restart the scan in vm_object_page_clean() if requested
 mode is async.
 
 Proposed by:  truckman
 
 diff --git a/sys/vm/vm_object.c b/sys/vm/vm_object.c index 
 716916f..52fc08b 100644 --- a/sys/vm/vm_object.c +++ 
 b/sys/vm/vm_object.c @@ -841,7 +841,8 @@ rescan: if (p-valid == 0)
 continue; if (vm_page_sleep_if_busy(p, TRUE, vpcwai)) { -   
 if 
 (object-generation != curgeneration) +   if ((flags  
 OBJPC_SYNC) 
 != 0  + object-generation != curgeneration) goto 
 rescan; 
 np = vm_page_find_least(object, pi); continue; @@ -851,7 +852,8 @@ 
 rescan:
 
 n = vm_object_page_collect_flush(object, p, pagerflags, flags, 
 clearobjflags); -if (object-generation != curgeneration) +  
 if 
 ((flags  OBJPC_SYNC) != 0  +   object-generation != 
 curgeneration) goto rescan;
 
 /*

Yes, the patch fixes the problem. The cvs2svn run completed this time.

 9132.25 real  8387.05 user   403.86 sys

I did not see any significant syncer activity in top -S anymore.

Thanks a lot.
Florian
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8C+KYACgkQapo8P8lCvwkc+QCeLY8+OkEQo1/wB3J2TyjfXyc0
b0IAn1OJo1XUlBYPZRoU5NFSO5dnNbne
=IGEW
-END PGP SIGNATURE-

Re: dogfooding over in clusteradm land

2012-01-03 Thread Sean Bruno


On Tue, 2012-01-03 at 04:46 -0800, Florian Smeets wrote:
 Yes, the patch fixes the problem. The cvs2svn run completed this time.
 
  9132.25 real  8387.05 user   403.86 sys
 
 I did not see any significant syncer activity in top -S anymore.
 
 Thanks a lot.
 Florian 

Currently running stable-9 + this patch on crush.freebsd.org.  First run
was successful and took about 4 hours start to finish.  Nicely done
folks.

diff --git a/sys/vm/vm_object.c b/sys/vm/vm_object.c
index 716916f..52fc08b 100644
--- a/sys/vm/vm_object.c
+++ b/sys/vm/vm_object.c
@@ -841,7 +841,8 @@ rescan:
if (p-valid == 0)
continue;
if (vm_page_sleep_if_busy(p, TRUE, vpcwai)) {
-   if (object-generation != curgeneration)
+   if ((flags  OBJPC_SYNC) != 0 
+   object-generation != curgeneration)
goto rescan;
np = vm_page_find_least(object, pi);
continue;
@@ -851,7 +852,8 @@ rescan:

n = vm_object_page_collect_flush(object, p, pagerflags,
flags, clearobjflags);
-   if (object-generation != curgeneration)
+   if ((flags  OBJPC_SYNC) != 0 
+   object-generation != curgeneration)
goto rescan;

/* 


signature.asc
Description: This is a digitally signed message part


Re: dogfooding over in clusteradm land

2012-01-02 Thread Florian Smeets
On 29.12.11 01:04, Kirk McKusick wrote:
 Rather than changing BKVASIZE, I would try running the cvs2svn
 conversion on a 16K/2K filesystem and see if that sorts out the
 problem. If it does, it tells us that doubling the main block
 size and reducing the number of buffers by half is the problem.
 If that is the problem, then we will have to increase the KVM
 allocated to the buffer cache.
 

This does not make a difference. I tried on 32K/4K with/without journal
and on 16K/2K all exhibit the same problem. At some point during the
cvs2svn conversion the sycer starts to use 100% CPU. The whole process
hangs at that point sometimes for hours, from time to time it does
continue doing some work, but really really slow. It's usually between
revision 21 and 22, when the resulting svn file gets bigger than
about 11-12Gb. At that point an ls in the target dir hangs in state ufs.

I broke into ddb and ran all commands which i thought could be useful.
The output is at http://tb.smeets.im/~flo/giant-ape_syncer.txt

The machine is still in ddb and i could run any additional commands, the
kernel is from Attilio's vmcontention branch, which was MFCed yesterday,
and updated after the MFC. The same problem happens on 9.0-RC3.

If i run the same test on a zfs filesystem i don't see any problems.

Florian



signature.asc
Description: OpenPGP digital signature


Re: dogfooding over in clusteradm land

2012-01-02 Thread Don Lewis
On  2 Jan, Florian Smeets wrote:
 On 29.12.11 01:04, Kirk McKusick wrote:
 Rather than changing BKVASIZE, I would try running the cvs2svn
 conversion on a 16K/2K filesystem and see if that sorts out the
 problem. If it does, it tells us that doubling the main block
 size and reducing the number of buffers by half is the problem.
 If that is the problem, then we will have to increase the KVM
 allocated to the buffer cache.
 
 
 This does not make a difference. I tried on 32K/4K with/without journal
 and on 16K/2K all exhibit the same problem. At some point during the
 cvs2svn conversion the sycer starts to use 100% CPU. The whole process
 hangs at that point sometimes for hours, from time to time it does
 continue doing some work, but really really slow. It's usually between
 revision 21 and 22, when the resulting svn file gets bigger than
 about 11-12Gb. At that point an ls in the target dir hangs in state ufs.
 
 I broke into ddb and ran all commands which i thought could be useful.
 The output is at http://tb.smeets.im/~flo/giant-ape_syncer.txt

Tracing command syncer pid 9 tid 100183 td 0xfe00120e9000
cpustop_handler() at cpustop_handler+0x2b
ipi_nmi_handler() at ipi_nmi_handler+0x50
trap() at trap+0x1a8
nmi_calltrap() at nmi_calltrap+0x8
--- trap 0x13, rip = 0x8082ba43, rsp = 0xff8000270fe0, rbp = 
0xff88c97829a0 ---
_mtx_assert() at _mtx_assert+0x13
pmap_remove_write() at pmap_remove_write+0x38
vm_object_page_remove_write() at vm_object_page_remove_write+0x1f
vm_object_page_clean() at vm_object_page_clean+0x14d
vfs_msync() at vfs_msync+0xf1
sync_fsync() at sync_fsync+0x12a
sync_vnode() at sync_vnode+0x157
sched_sync() at sched_sync+0x1d1
fork_exit() at fork_exit+0x135
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff88c9782d00, rbp = 0 ---

I thinks this explains why the r228838 patch seems to help the problem.
Instead of an application call to msync(), you're getting bitten by the
syncer doing the equivalent.  I don't know why the syncer is CPU bound,
though.  From my understanding of the patch it only optimizes the I/O.
Without the patch, I would expect that the syncer would just spend a lot
of time waiting on I/O.  My guess is that this is actually a vm problem.
There are nested loops in vm_object_page_clean() and
vm_object_page_remove_write(), so you could be doing something that's
causing lots of looping in that code.

I think that ls is hanging because it's stumbling across the vnode that
the syncer has locked.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: dogfooding over in clusteradm land

2012-01-02 Thread Kostik Belousov
On Mon, Jan 02, 2012 at 12:47:03PM -0800, Don Lewis wrote:
 On  2 Jan, Florian Smeets wrote:
  On 29.12.11 01:04, Kirk McKusick wrote:
  Rather than changing BKVASIZE, I would try running the cvs2svn
  conversion on a 16K/2K filesystem and see if that sorts out the
  problem. If it does, it tells us that doubling the main block
  size and reducing the number of buffers by half is the problem.
  If that is the problem, then we will have to increase the KVM
  allocated to the buffer cache.
  
  
  This does not make a difference. I tried on 32K/4K with/without journal
  and on 16K/2K all exhibit the same problem. At some point during the
  cvs2svn conversion the sycer starts to use 100% CPU. The whole process
  hangs at that point sometimes for hours, from time to time it does
  continue doing some work, but really really slow. It's usually between
  revision 21 and 22, when the resulting svn file gets bigger than
  about 11-12Gb. At that point an ls in the target dir hangs in state ufs.
  
  I broke into ddb and ran all commands which i thought could be useful.
  The output is at http://tb.smeets.im/~flo/giant-ape_syncer.txt
 
 Tracing command syncer pid 9 tid 100183 td 0xfe00120e9000
 cpustop_handler() at cpustop_handler+0x2b
 ipi_nmi_handler() at ipi_nmi_handler+0x50
 trap() at trap+0x1a8
 nmi_calltrap() at nmi_calltrap+0x8
 --- trap 0x13, rip = 0x8082ba43, rsp = 0xff8000270fe0, rbp = 
 0xff88c97829a0 ---
 _mtx_assert() at _mtx_assert+0x13
 pmap_remove_write() at pmap_remove_write+0x38
 vm_object_page_remove_write() at vm_object_page_remove_write+0x1f
 vm_object_page_clean() at vm_object_page_clean+0x14d
 vfs_msync() at vfs_msync+0xf1
 sync_fsync() at sync_fsync+0x12a
 sync_vnode() at sync_vnode+0x157
 sched_sync() at sched_sync+0x1d1
 fork_exit() at fork_exit+0x135
 fork_trampoline() at fork_trampoline+0xe
 --- trap 0, rip = 0, rsp = 0xff88c9782d00, rbp = 0 ---
 
 I thinks this explains why the r228838 patch seems to help the problem.
 Instead of an application call to msync(), you're getting bitten by the
 syncer doing the equivalent.  I don't know why the syncer is CPU bound,
 though.  From my understanding of the patch it only optimizes the I/O.
 Without the patch, I would expect that the syncer would just spend a lot
 of time waiting on I/O.  My guess is that this is actually a vm problem.
 There are nested loops in vm_object_page_clean() and
 vm_object_page_remove_write(), so you could be doing something that's
 causing lots of looping in that code.
r228838 allows the system to skip 50-70% of the code when initiating a
write of the UFS file page, due to async clustering. The system has to
maintain 75% less amount of writes in progress.

 I think that ls is hanging because it's stumbling across the vnode that
 the syncer has locked.
This is the only reasonable explanation.

Low-tech profile is to periodically break out into ddb and do backtrace
for the syncer thread. More advanced techniques is to use dtrace or normal
profiling.


pgpqweDffT9HY.pgp
Description: PGP signature


Re: dogfooding over in clusteradm land

2012-01-02 Thread Don Lewis
On  2 Jan, Don Lewis wrote:
 On  2 Jan, Florian Smeets wrote:

 This does not make a difference. I tried on 32K/4K with/without journal
 and on 16K/2K all exhibit the same problem. At some point during the
 cvs2svn conversion the sycer starts to use 100% CPU. The whole process
 hangs at that point sometimes for hours, from time to time it does
 continue doing some work, but really really slow. It's usually between
 revision 21 and 22, when the resulting svn file gets bigger than
 about 11-12Gb. At that point an ls in the target dir hangs in state ufs.
 
 I broke into ddb and ran all commands which i thought could be useful.
 The output is at http://tb.smeets.im/~flo/giant-ape_syncer.txt
 
 Tracing command syncer pid 9 tid 100183 td 0xfe00120e9000
 cpustop_handler() at cpustop_handler+0x2b
 ipi_nmi_handler() at ipi_nmi_handler+0x50
 trap() at trap+0x1a8
 nmi_calltrap() at nmi_calltrap+0x8
 --- trap 0x13, rip = 0x8082ba43, rsp = 0xff8000270fe0, rbp = 
 0xff88c97829a0 ---
 _mtx_assert() at _mtx_assert+0x13
 pmap_remove_write() at pmap_remove_write+0x38
 vm_object_page_remove_write() at vm_object_page_remove_write+0x1f
 vm_object_page_clean() at vm_object_page_clean+0x14d
 vfs_msync() at vfs_msync+0xf1
 sync_fsync() at sync_fsync+0x12a
 sync_vnode() at sync_vnode+0x157
 sched_sync() at sched_sync+0x1d1
 fork_exit() at fork_exit+0x135
 fork_trampoline() at fork_trampoline+0xe
 --- trap 0, rip = 0, rsp = 0xff88c9782d00, rbp = 0 ---
 
 I thinks this explains why the r228838 patch seems to help the problem.
 Instead of an application call to msync(), you're getting bitten by the
 syncer doing the equivalent.  I don't know why the syncer is CPU bound,
 though.  From my understanding of the patch it only optimizes the I/O.
 Without the patch, I would expect that the syncer would just spend a lot
 of time waiting on I/O.  My guess is that this is actually a vm problem.
 There are nested loops in vm_object_page_clean() and
 vm_object_page_remove_write(), so you could be doing something that's
 causing lots of looping in that code.

Does the machine recover if you suspend cvs2svn?  I think what is
happening is that cvs2svn is continuing to dirty pages while the syncer
is trying to sync the file.  From my limited understanding of this code,
it looks to me like every time cvs2svn dirties a page, it will trigger a
call to vm_object_set_writeable_dirty(), which will increment
object-generation.  Whenever vm_object_page_clean() detects a change in
the generation count, it restarts its scan of the pages associated with
the object.  This is probably not optimal ...

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: dogfooding over in clusteradm land

2011-12-28 Thread Kirk McKusick
Rather than changing BKVASIZE, I would try running the cvs2svn
conversion on a 16K/2K filesystem and see if that sorts out the
problem. If it does, it tells us that doubling the main block
size and reducing the number of buffers by half is the problem.
If that is the problem, then we will have to increase the KVM
allocated to the buffer cache.

Kirk McKusick
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: dogfooding over in clusteradm land

2011-12-27 Thread Florian Smeets
On 14.12.11 14:20, Sean Bruno wrote:
 We're seeing what looks like a syncher/ufs resource starvation on 9.0 on
 the cvs2svn ports conversion box.  I'm not sure what resource is tapped
 out.  Effectively, I cannot access the directory under use and the
 converter application stalls out waiting for some resource that isn't
 clear. (Peter had posited kmem of some kind).
 
 I've upped maxvnodes a bit on the host, turned off SUJ and mounted the
 f/s in question with async and noatime for performance reasons.
 
 Can someone hit me up with the cluebat?  I can give you direct access to
 the box for debuginationing.
 

Just for the archives. This is fixed or at least considerably improved
by r228838.

The ports cvs2svn run went down from panicking after about ~22h to being
finished after ~10h.

Thanks to Sean and Attilio for giving me access to test boxes.

Florian



signature.asc
Description: OpenPGP digital signature


Re: dogfooding over in clusteradm land

2011-12-16 Thread Ulrich Spörlein
On Thu, 2011-12-15 at 18:39:59 -0800, Doug Barton wrote:
 On 12/14/2011 05:20, Sean Bruno wrote:
  We're seeing what looks like a syncher/ufs resource starvation on 9.0 on
  the cvs2svn ports conversion box.
 
 ... sounds like a good reason not to migrate the history to me. :)

Sounds more like a new regression test that we could use :)

Uli
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: dogfooding over in clusteradm land

2011-12-15 Thread Don Lewis
On 14 Dec, Poul-Henning Kamp wrote:
 In message 1323868832.5283.9.ca...@hitfishpass-lx.corp.yahoo.com, Sean 
 Bruno 
 writes:
 
We're seeing what looks like a syncher/ufs resource starvation on 9.0 on
the cvs2svn ports conversion box.  I'm not sure what resource is tapped
out.
 
 Search mailarcive for lemming-syncer

That should only produce a slowdown every 30 seconds but not cause a
deadlock.

I'd be more suspicious of a memory allocation deadlock.  This can happen
if the system runs short of free memory because there are a large number
of dirty buffers, but it needs to allocate some memory to flush the
buffers to disk.

This could be more likely to happen if you are using a software raid
layer, but I suspect that the recent change to the default UFS block
size from 16K to 32K is the culprit.  In another thread bde pointed out
that the BKVASIZE definition in sys/param.h hadn't been updated to match
the new default UFS block size.

 * BKVASIZE -   Nominal buffer space per buffer, in bytes.  BKVASIZE is the
 *  minimum KVM memory reservation the kernel is willing to make.
 *  Filesystems can of course request smaller chunks.  Actual 
 *  backing memory uses a chunk size of a page (PAGE_SIZE).
 *
 *  If you make BKVASIZE too small you risk seriously fragmenting
 *  the buffer KVM map which may slow things down a bit.  If you
 *  make it too big the kernel will not be able to optimally use 
 *  the KVM memory reserved for the buffer cache and will wind 
 *  up with too-few buffers.
 *
 *  The default is 16384, roughly 2x the block size used by a
 *  normal UFS filesystem.
 */
#define MAXBSIZE65536   /* must be power of 2 */
#define BKVASIZE16384   /* must be power of 2 */

The problem is that BKVASIZE is used in a number of the tuning
calculations in vfs_bio.c:

/*
 * The nominal buffer size (and minimum KVA allocation) is BKVASIZE.
 * For the first 64MB of ram nominally allocate sufficient buffers to
 * cover 1/4 of our ram.  Beyond the first 64MB allocate additional
 * buffers to cover 1/10 of our ram over 64MB.  When auto-sizing
 * the buffer cache we limit the eventual kva reservation to
 * maxbcache bytes.
 *
 * factor represents the 1/4 x ram conversion.
 */
if (nbuf == 0) {
int factor = 4 * BKVASIZE / 1024;

nbuf = 50;
if (physmem_est  4096)
nbuf += min((physmem_est - 4096) / factor,
65536 / factor);
if (physmem_est  65536)
nbuf += (physmem_est - 65536) * 2 / (factor * 5);

if (maxbcache  nbuf  maxbcache / BKVASIZE)
nbuf = maxbcache / BKVASIZE;
tuned_nbuf = 1;
} else
tuned_nbuf = 0;

/* XXX Avoid unsigned long overflows later on with maxbufspace. */
maxbuf = (LONG_MAX / 3) / BKVASIZE;


/*
 * maxbufspace is the absolute maximum amount of buffer space we are 
 * allowed to reserve in KVM and in real terms.  The absolute maximum
 * is nominally used by buf_daemon.  hibufspace is the nominal maximum
 * used by most other processes.  The differential is required to 
 * ensure that buf_daemon is able to run when other processes might 
 * be blocked waiting for buffer space.
 *
 * maxbufspace is based on BKVASIZE.  Allocating buffers larger then
 * this may result in KVM fragmentation which is not handled optimally
 * by the system.
 */
maxbufspace = (long)nbuf * BKVASIZE;
hibufspace = lmax(3 * maxbufspace / 4, maxbufspace - MAXBSIZE * 10);
lobufspace = hibufspace - MAXBSIZE;


If you are using the new 32K default filesystem block size, then you may
be consuming twice as much memory for buffers than the tuning
calculations think you are using.  Increasing maxvnodes is probably the
wrong way to go, since it will increase memory pressure.

As a quick and dirty test, try cutting kern.nbuf in half.  The correct
fix is probably to rebuild the kernel with BKVASIZE doubled.


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: dogfooding over in clusteradm land

2011-12-15 Thread Doug Barton
On 12/14/2011 05:20, Sean Bruno wrote:
 We're seeing what looks like a syncher/ufs resource starvation on 9.0 on
 the cvs2svn ports conversion box.

... sounds like a good reason not to migrate the history to me. :)


-- 

[^L]

Breadth of IT experience, and depth of knowledge in the DNS.
Yours for the right price.  :)  http://SupersetSolutions.com/

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


dogfooding over in clusteradm land

2011-12-14 Thread Sean Bruno
We're seeing what looks like a syncher/ufs resource starvation on 9.0 on
the cvs2svn ports conversion box.  I'm not sure what resource is tapped
out.  Effectively, I cannot access the directory under use and the
converter application stalls out waiting for some resource that isn't
clear. (Peter had posited kmem of some kind).

I've upped maxvnodes a bit on the host, turned off SUJ and mounted the
f/s in question with async and noatime for performance reasons.

Can someone hit me up with the cluebat?  I can give you direct access to
the box for debuginationing.

Sean

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: dogfooding over in clusteradm land [cvs2svn for ports]

2011-12-14 Thread Sean Bruno
On Wed, 2011-12-14 at 05:20 -0800, Sean Bruno wrote:
 We're seeing what looks like a syncher/ufs resource starvation on 9.0 on
 the cvs2svn ports conversion box.  I'm not sure what resource is tapped
 out.  Effectively, I cannot access the directory under use and the
 converter application stalls out waiting for some resource that isn't
 clear. (Peter had posited kmem of some kind).
 
 I've upped maxvnodes a bit on the host, turned off SUJ and mounted the
 f/s in question with async and noatime for performance reasons.
 
 Can someone hit me up with the cluebat?  I can give you direct access to
 the box for debuginationing.
 
 Sean

BTW, this project is sort of stalled out by this problem.

Sean


signature.asc
Description: This is a digitally signed message part


Re: dogfooding over in clusteradm land [cvs2svn for ports]

2011-12-14 Thread Garrett Cooper
On Wed, Dec 14, 2011 at 10:39 AM, Sean Bruno sean...@yahoo-inc.com wrote:
 On Wed, 2011-12-14 at 05:20 -0800, Sean Bruno wrote:
 We're seeing what looks like a syncher/ufs resource starvation on 9.0 on
 the cvs2svn ports conversion box.  I'm not sure what resource is tapped
 out.  Effectively, I cannot access the directory under use and the
 converter application stalls out waiting for some resource that isn't
 clear. (Peter had posited kmem of some kind).

 I've upped maxvnodes a bit on the host, turned off SUJ and mounted the
 f/s in question with async and noatime for performance reasons.

 Can someone hit me up with the cluebat?  I can give you direct access to
 the box for debuginationing.

 Sean

 BTW, this project is sort of stalled out by this problem.

A few things come to mind (in no particular order):

1. What does svn say before it dies?
2. What does df for the affected partition output?
3. Do you have syslog output that indicates where the starvation is occurring?
4. What do the following sysctls print out?

kern.maxvnodes kern.minvnodes vfs.freevnodes vfs.wantfreevnodes vfs.numvnodes

5. What does top / vmstat -z say for memory right before svn goes south?
6. Are you running the import as an unprivileged user, or root?
7. Has the login.conf been changed on the box?

Thanks,
-Garrett
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: dogfooding over in clusteradm land

2011-12-14 Thread Poul-Henning Kamp
In message 1323868832.5283.9.ca...@hitfishpass-lx.corp.yahoo.com, Sean Bruno 
writes:

We're seeing what looks like a syncher/ufs resource starvation on 9.0 on
the cvs2svn ports conversion box.  I'm not sure what resource is tapped
out.

Search mailarcive for lemming-syncer

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org