Re: [3.2-rc7] slowdown, warning + oops creating lots of files

2012-01-12 Thread Christian Brunner
2012/1/7 Christian Brunner c...@muc.de:
 2012/1/5 Chris Mason chris.ma...@oracle.com:
 On Fri, Jan 06, 2012 at 07:12:16AM +1100, Dave Chinner wrote:
 On Thu, Jan 05, 2012 at 02:45:00PM -0500, Chris Mason wrote:
  On Thu, Jan 05, 2012 at 01:46:57PM -0500, Chris Mason wrote:
  
   Unfortunately, this one works for me.  I'll try it again and see if I
   can push harder.  If not, I'll see if I can trade beer for some
   diagnostic runs.
 
  Aha, if I try it just on the ssd instead of on my full array it triggers
  at 88M files.  Great.

 Good to know.  The error that is generating the BUG on my machine is
 -28 (ENOSPC).  Given there's 17TB free on my filesystem

 Yeah, same thing here.  I'm testing a fix now, it's pretty dumb.  We're
 not allocating more metadata chunks from the drive because of where the
 allocation is happening, so it is just a check for do we need a new
 chunk in the right place.

 I'll make sure it can fill my ssd and then send to you.

 Could you send the patch to the list (or to me), please? Telling from
 what you mentioned on IRC this sounds quite interesting and I would
 like to see if this solves my performance problems with ceph, too...

I apologize for bothering you again, but I would really like to give it a spin.

Thanks,
Christian
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.2-rc7] slowdown, warning + oops creating lots of files

2012-01-07 Thread Christian Brunner
2012/1/5 Chris Mason chris.ma...@oracle.com:
 On Fri, Jan 06, 2012 at 07:12:16AM +1100, Dave Chinner wrote:
 On Thu, Jan 05, 2012 at 02:45:00PM -0500, Chris Mason wrote:
  On Thu, Jan 05, 2012 at 01:46:57PM -0500, Chris Mason wrote:
   On Thu, Jan 05, 2012 at 10:01:22AM +1100, Dave Chinner wrote:
On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
 On 05/01/12 09:11, Dave Chinner wrote:

  Looks to be reproducable.

 Does this happen with rc6 ?
   
I haven't tried. All I'm doing is running some benchmarks to get
numbers for a talk I'm giving about improvements in XFS metadata
scalability, so I wanted to update my last set of numbers from
2.6.39.
   
As it was, these benchmarks also failed on btrfs with oopsen and
corruptions back in 2.6.39 time frame.  e.g. same VM, same
test, different crashes, similar slowdowns as reported here:
http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062
   
Given that there is now a history of this simple test uncovering
problems, perhaps this is a test that should be run more regularly
by btrfs developers?
  
   Unfortunately, this one works for me.  I'll try it again and see if I
   can push harder.  If not, I'll see if I can trade beer for some
   diagnostic runs.
 
  Aha, if I try it just on the ssd instead of on my full array it triggers
  at 88M files.  Great.

 Good to know.  The error that is generating the BUG on my machine is
 -28 (ENOSPC).  Given there's 17TB free on my filesystem

 Yeah, same thing here.  I'm testing a fix now, it's pretty dumb.  We're
 not allocating more metadata chunks from the drive because of where the
 allocation is happening, so it is just a check for do we need a new
 chunk in the right place.

 I'll make sure it can fill my ssd and then send to you.

Could you send the patch to the list (or to me), please? Telling from
what you mentioned on IRC this sounds quite interesting and I would
like to see if this solves my performance problems with ceph, too...

Thanks,
Christian
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.2-rc7] slowdown, warning + oops creating lots of files

2012-01-05 Thread Dave Chinner
On Thu, Jan 05, 2012 at 02:11:31PM -0500, Liu Bo wrote:
 On 01/04/2012 09:26 PM, Dave Chinner wrote:
  On Wed, Jan 04, 2012 at 09:23:18PM -0500, Liu Bo wrote:
  On 01/04/2012 06:01 PM, Dave Chinner wrote:
  On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
  On 05/01/12 09:11, Dave Chinner wrote:
 
  Looks to be reproducable.
  Does this happen with rc6 ?
  I haven't tried. All I'm doing is running some benchmarks to get
  numbers for a talk I'm giving about improvements in XFS metadata
  scalability, so I wanted to update my last set of numbers from
  2.6.39.
 
  As it was, these benchmarks also failed on btrfs with oopsen and
  corruptions back in 2.6.39 time frame.  e.g. same VM, same
  test, different crashes, similar slowdowns as reported here:
  http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062
 
  Given that there is now a history of this simple test uncovering
  problems, perhaps this is a test that should be run more regularly
  by btrfs developers?
 
  If not then it might be easy to track down as there are only
  2 modifications between rc6 and rc7..
  They don't look like they'd be responsible for fixing an extent tree
  corruption, and I don't really have the time to do an open-ended
  bisect to find where the problem fix arose.
 
  As it is, 3rd attempt failed at 22m inodes, without the warning this
  time:
  
  .
  
  It's hard to tell exactly what path gets to that BUG_ON(), so much
  code is inlined by the compiler into run_clustered_refs() that I
  can't tell exactly how it got to the BUG_ON() triggered in
  alloc_reserved_tree_block().
 
  This seems to be an oops led by ENOSPC.
  
  At the time of the oops, this is the space used on the filesystem:
  
  $ df -h /mnt/scratch
  Filesystem  Size  Used Avail Use% Mounted on
  /dev/vdc 17T   31G   17T   1% /mnt/scratch
  
  It's less than 0.2% full, so I think ENOSPC can be ruled out here.
  
 
 This bug has done something with our block reservation allocator, not the 
 real disk space.
 
 Can you try the below one and see what happens?

Still crashes, still has severe slowdowns.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.2-rc7] slowdown, warning + oops creating lots of files

2012-01-05 Thread Chris Mason
On Thu, Jan 05, 2012 at 10:01:22AM +1100, Dave Chinner wrote:
 On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
  On 05/01/12 09:11, Dave Chinner wrote:
  
   Looks to be reproducable.
  
  Does this happen with rc6 ?
 
 I haven't tried. All I'm doing is running some benchmarks to get
 numbers for a talk I'm giving about improvements in XFS metadata
 scalability, so I wanted to update my last set of numbers from
 2.6.39.
 
 As it was, these benchmarks also failed on btrfs with oopsen and
 corruptions back in 2.6.39 time frame.  e.g. same VM, same
 test, different crashes, similar slowdowns as reported here:
 http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062
 
 Given that there is now a history of this simple test uncovering
 problems, perhaps this is a test that should be run more regularly
 by btrfs developers?

Unfortunately, this one works for me.  I'll try it again and see if I
can push harder.  If not, I'll see if I can trade beer for some
diagnostic runs.

Thanks Dave

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.2-rc7] slowdown, warning + oops creating lots of files

2012-01-05 Thread Chris Mason
On Thu, Jan 05, 2012 at 01:46:57PM -0500, Chris Mason wrote:
 On Thu, Jan 05, 2012 at 10:01:22AM +1100, Dave Chinner wrote:
  On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
   On 05/01/12 09:11, Dave Chinner wrote:
   
Looks to be reproducable.
   
   Does this happen with rc6 ?
  
  I haven't tried. All I'm doing is running some benchmarks to get
  numbers for a talk I'm giving about improvements in XFS metadata
  scalability, so I wanted to update my last set of numbers from
  2.6.39.
  
  As it was, these benchmarks also failed on btrfs with oopsen and
  corruptions back in 2.6.39 time frame.  e.g. same VM, same
  test, different crashes, similar slowdowns as reported here:
  http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062
  
  Given that there is now a history of this simple test uncovering
  problems, perhaps this is a test that should be run more regularly
  by btrfs developers?
 
 Unfortunately, this one works for me.  I'll try it again and see if I
 can push harder.  If not, I'll see if I can trade beer for some
 diagnostic runs.

Aha, if I try it just on the ssd instead of on my full array it triggers
at 88M files.  Great.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.2-rc7] slowdown, warning + oops creating lots of files

2012-01-05 Thread Dave Chinner
On Thu, Jan 05, 2012 at 02:45:00PM -0500, Chris Mason wrote:
 On Thu, Jan 05, 2012 at 01:46:57PM -0500, Chris Mason wrote:
  On Thu, Jan 05, 2012 at 10:01:22AM +1100, Dave Chinner wrote:
   On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
On 05/01/12 09:11, Dave Chinner wrote:

 Looks to be reproducable.

Does this happen with rc6 ?
   
   I haven't tried. All I'm doing is running some benchmarks to get
   numbers for a talk I'm giving about improvements in XFS metadata
   scalability, so I wanted to update my last set of numbers from
   2.6.39.
   
   As it was, these benchmarks also failed on btrfs with oopsen and
   corruptions back in 2.6.39 time frame.  e.g. same VM, same
   test, different crashes, similar slowdowns as reported here:
   http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062
   
   Given that there is now a history of this simple test uncovering
   problems, perhaps this is a test that should be run more regularly
   by btrfs developers?
  
  Unfortunately, this one works for me.  I'll try it again and see if I
  can push harder.  If not, I'll see if I can trade beer for some
  diagnostic runs.
 
 Aha, if I try it just on the ssd instead of on my full array it triggers
 at 88M files.  Great.

Good to know.  The error that is generating the BUG on my machine is
-28 (ENOSPC).  Given there's 17TB free on my filesystem

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.2-rc7] slowdown, warning + oops creating lots of files

2012-01-05 Thread Chris Mason
On Fri, Jan 06, 2012 at 07:12:16AM +1100, Dave Chinner wrote:
 On Thu, Jan 05, 2012 at 02:45:00PM -0500, Chris Mason wrote:
  On Thu, Jan 05, 2012 at 01:46:57PM -0500, Chris Mason wrote:
   On Thu, Jan 05, 2012 at 10:01:22AM +1100, Dave Chinner wrote:
On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
 On 05/01/12 09:11, Dave Chinner wrote:
 
  Looks to be reproducable.
 
 Does this happen with rc6 ?

I haven't tried. All I'm doing is running some benchmarks to get
numbers for a talk I'm giving about improvements in XFS metadata
scalability, so I wanted to update my last set of numbers from
2.6.39.

As it was, these benchmarks also failed on btrfs with oopsen and
corruptions back in 2.6.39 time frame.  e.g. same VM, same
test, different crashes, similar slowdowns as reported here:
http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062

Given that there is now a history of this simple test uncovering
problems, perhaps this is a test that should be run more regularly
by btrfs developers?
   
   Unfortunately, this one works for me.  I'll try it again and see if I
   can push harder.  If not, I'll see if I can trade beer for some
   diagnostic runs.
  
  Aha, if I try it just on the ssd instead of on my full array it triggers
  at 88M files.  Great.
 
 Good to know.  The error that is generating the BUG on my machine is
 -28 (ENOSPC).  Given there's 17TB free on my filesystem

Yeah, same thing here.  I'm testing a fix now, it's pretty dumb.  We're
not allocating more metadata chunks from the drive because of where the
allocation is happening, so it is just a check for do we need a new
chunk in the right place.

I'll make sure it can fill my ssd and then send to you.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.2-rc7] slowdown, warning + oops creating lots of files

2012-01-05 Thread Chris Samuel
On Fri,   6 Jan 2012, 08:02:55 EST, Chris Mason chris.ma...@oracle.com wrote:

 Yeah, same thing here.   I'm testing a fix now, it's pretty dumb.   We're
 not allocating more metadata chunks from the drive because of where the
 allocation is happening, so it is just a check for do we need a new
 chunk in the right place.
 
 I'll make sure it can fill my ssd and then send to you.

*If* the fix works would this be a candidate for a
stable 3.2.1 release rather than having to wait for
3.3?

cheers,
Chris
-- 
Chris Samuel - http://www.csamuel.org/ - on mobile
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.2-rc7] slowdown, warning + oops creating lots of files

2012-01-05 Thread Chris Mason
On Fri, Jan 06, 2012 at 08:24:32AM +1100, Chris Samuel wrote:
 On Fri,   6 Jan 2012, 08:02:55 EST, Chris Mason chris.ma...@oracle.com 
 wrote:
 
  Yeah, same thing here.   I'm testing a fix now, it's pretty dumb.   We're
  not allocating more metadata chunks from the drive because of where the
  allocation is happening, so it is just a check for do we need a new
  chunk in the right place.
  
  I'll make sure it can fill my ssd and then send to you.
 
 *If* the fix works would this be a candidate for a
 stable 3.2.1 release rather than having to wait for
 3.3?

Yes, it's pretty safe.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.2-rc7] slowdown, warning + oops creating lots of files

2012-01-04 Thread Dave Chinner
On Thu, Jan 05, 2012 at 08:44:45AM +1100, Dave Chinner wrote:
 Hi there buttery folks,
 
 I just hit this warning and oops running a parallel fs_mark create
 workload on a test VM using a 17TB btrfs filesystem (12 disk dm
 RAID0) using default mkfs and mount parmeters, mounted on
 /mnt/scratch. The VM has 8p and 4GB RAM, and the fs_mark command
 line was:
 
 $ ./fs_mark  -D  1  -S0  -n  10 -s  0  -L  250 \
   -d /mnt/scratch/0  -d /mnt/scratch/1 \
   -d /mnt/scratch/2  -d /mnt/scratch/3 \
   -d /mnt/scratch/4  -d /mnt/scratch/5 \
   -d /mnt/scratch/6  -d /mnt/scratch/7
 
 The attached image should give you a better idea of the performance
 drop-off that was well under way when the crash occurred at about 96
 million files created.
 
 I'm rerunning the test on a fresh filesystem, so I guess I'll see if
 this a one-off in the next couple of hours

Looks to be reproducable. With a fresh filesystem performance was all
over the place from the start, and the warning/oops occurred at
about 43M files created. the failure stacks this time are:

[ 1490.841957] device fsid 4b7ec51b-9747-4244-a568-fbecdb157383 devid 1 transid 
4 /dev/vdc
[ 1490.847408] btrfs: disk space caching is enabled
[ 3027.690722] [ cut here ]
[ 3027.692612] WARNING: at fs/btrfs/extent-tree.c:4771 
__btrfs_free_extent+0x630/0x6d0()
[ 3027.695607] Hardware name: Bochs
[ 3027.696968] Modules linked in:
[ 3027.697894] Pid: 3460, comm: fs_mark Not tainted 3.2.0-rc7-dgc+ #167
[ 3027.699581] Call Trace:
[ 3027.700452]  [8108a69f] warn_slowpath_common+0x7f/0xc0
[ 3027.701973]  [8108a6fa] warn_slowpath_null+0x1a/0x20
[ 3027.703480]  [815b0680] __btrfs_free_extent+0x630/0x6d0
[ 3027.704981]  [815ac110] ? block_group_cache_tree_search+0x90/0xc0
[ 3027.706368]  [815b42f1] run_clustered_refs+0x381/0x800
[ 3027.707897]  [815b483a] btrfs_run_delayed_refs+0xca/0x220
[ 3027.709347]  [815b8f1c] ? btrfs_update_root+0x9c/0xe0
[ 3027.710909]  [815c3c33] commit_cowonly_roots+0x33/0x1e0
[ 3027.712370]  [81ab168e] ? _raw_spin_lock+0xe/0x20
[ 3027.713220]  [815c54cf] btrfs_commit_transaction+0x3bf/0x840
[ 3027.714412]  [810ac420] ? add_wait_queue+0x60/0x60
[ 3027.715460]  [815c5da4] ? start_transaction+0x94/0x2b0
[ 3027.716790]  [815ac80c] may_commit_transaction+0x6c/0x100
[ 3027.717843]  [815b2b47] reserve_metadata_bytes.isra.71+0x5a7/0x660
[ 3027.719223]  [81073c23] ? __wake_up+0x53/0x70
[ 3027.720328]  [815a43ba] ? btrfs_free_path+0x2a/0x40
[ 3027.721511]  [815b2f9e] btrfs_block_rsv_add+0x3e/0x70
[ 3027.722610]  [81666dfb] ? security_d_instantiate+0x1b/0x30
[ 3027.723765]  [815c5f65] start_transaction+0x255/0x2b0
[ 3027.725204]  [815c6283] btrfs_start_transaction+0x13/0x20
[ 3027.726273]  [815d2236] btrfs_create+0x46/0x220
[ 3027.727275]  [8116c204] vfs_create+0xb4/0xf0
[ 3027.728344]  [8116e1d7] do_last.isra.45+0x547/0x7c0
[ 3027.729400]  [8116f7ab] path_openat+0xcb/0x3d0
[ 3027.730363]  [81ab168e] ? _raw_spin_lock+0xe/0x20
[ 3027.731394]  [8117cc1e] ? vfsmount_lock_local_unlock+0x1e/0x30
[ 3027.733077]  [8116fbd2] do_filp_open+0x42/0xa0
[ 3027.733949]  [8117c487] ? alloc_fd+0xf7/0x150
[ 3027.734911]  [8115f8e7] do_sys_open+0xf7/0x1d0
[ 3027.735894]  [810b572a] ? do_gettimeofday+0x1a/0x50
[ 3027.737304]  [8115f9e0] sys_open+0x20/0x30
[ 3027.738099]  [81ab9502] system_call_fastpath+0x16/0x1b
[ 3027.739199] ---[ end trace df586861a93ef3bf ]---
[ 3027.740348] btrfs unable to find ref byte nr 19982405632 parent 0 root 2  
owner 0 offset 0
[ 3027.742001] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[ 3027.743502] IP: [815e60f2] map_private_extent_buffer+0x12/0x150
[ 3027.744982] PGD 109d8e067 PUD 1050a9067 PMD 0
[ 3027.745968] Oops:  [#1] SMP
[ 3027.745968] CPU 7
[ 3027.745968] Modules linked in:
[ 3027.745968]
[ 3027.745968] Pid: 3460, comm: fs_mark Tainted: GW3.2.0-rc7-dgc+ 
#167 Bochs Bochs
[ 3027.745968] RIP: 0010:[815e60f2]  [815e60f2] 
map_private_extent_buffer+0x12/0x150
[ 3027.745968] RSP: 0018:8800d2ac36d8  EFLAGS: 00010296
[ 3027.745968] RAX:  RBX: 0065 RCX: 8800d2ac3708
[ 3027.745968] RDX: 0004 RSI: 007a RDI: 
[ 3027.745968] RBP: 8800d2ac36f8 R08: 8800d2ac3710 R09: 8800d2ac3718
[ 3027.745968] R10:  R11: 0001 R12: 007a
[ 3027.745968] R13:  R14: ffe4 R15: 1000
[ 3027.745968] FS:  7f3bf8ab5700() GS:88011fdc() 
knlGS:
[ 3027.745968] CS:  0010 DS:  ES:  CR0: 8005003b
[ 3027.745968] CR2: 7fe424b0e000 CR3: 000106c33000 CR4: 06e0
[ 3027.745968] DR0: 

Re: [3.2-rc7] slowdown, warning + oops creating lots of files

2012-01-04 Thread Chris Samuel
On 05/01/12 09:11, Dave Chinner wrote:

 Looks to be reproducable.

Does this happen with rc6 ?

If not then it might be easy to track down as there are only
2 modifications between rc6 and rc7..

commit 08c422c27f855d27b0b3d9fa30ebd938d4ae6f1f
Author: Al Viro v...@zeniv.linux.org.uk
Date:   Fri Dec 23 07:58:13 2011 -0500

Btrfs: call d_instantiate after all ops are setup

This closes races where btrfs is calling d_instantiate too soon during
inode creation.  All of the callers of btrfs_add_nondir are updated to
instantiate after the inode is fully setup in memory.

Signed-off-by: Al Viro v...@zeniv.linux.org.uk
Signed-off-by: Chris Mason chris.ma...@oracle.com

commit 8d532b2afb2eacc84588db709ec280a3d1219be3
Author: Chris Mason chris.ma...@oracle.com
Date:   Fri Dec 23 07:53:00 2011 -0500

Btrfs: fix worker lock misuse in find_worker

Dan Carpenter noticed that we were doing a double unlock on the worker
lock, and sometimes picking a worker thread without the lock held.

This fixes both errors.

Signed-off-by: Chris Mason chris.ma...@oracle.com
Reported-by: Dan Carpenter dan.carpen...@oracle.com


-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.2-rc7] slowdown, warning + oops creating lots of files

2012-01-04 Thread Dave Chinner
On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
 On 05/01/12 09:11, Dave Chinner wrote:
 
  Looks to be reproducable.
 
 Does this happen with rc6 ?

I haven't tried. All I'm doing is running some benchmarks to get
numbers for a talk I'm giving about improvements in XFS metadata
scalability, so I wanted to update my last set of numbers from
2.6.39.

As it was, these benchmarks also failed on btrfs with oopsen and
corruptions back in 2.6.39 time frame.  e.g. same VM, same
test, different crashes, similar slowdowns as reported here:
http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062

Given that there is now a history of this simple test uncovering
problems, perhaps this is a test that should be run more regularly
by btrfs developers?

 If not then it might be easy to track down as there are only
 2 modifications between rc6 and rc7..

They don't look like they'd be responsible for fixing an extent tree
corruption, and I don't really have the time to do an open-ended
bisect to find where the problem fix arose.

As it is, 3rd attempt failed at 22m inodes, without the warning this
time:

[   59.433452] device fsid 4d27dc14-562d-4722-9591-723bd2bbe94c devid 1 transid 
4 /dev/vdc
[   59.437050] btrfs: disk space caching is enabled
[  753.258465] [ cut here ]
[  753.259806] kernel BUG at fs/btrfs/extent-tree.c:5797!
[  753.260014] invalid opcode:  [#1] SMP
[  753.260014] CPU 7
[  753.260014] Modules linked in:
[  753.260014]
[  753.260014] Pid: 2874, comm: fs_mark Not tainted 3.2.0-rc7-dgc+ #167 Bochs 
Bochs
[  753.260014] RIP: 0010:[815b475b]  [815b475b] 
run_clustered_refs+0x7eb/0x800
[  753.260014] RSP: 0018:8800430258a8  EFLAGS: 00010286
[  753.260014] RAX: ffe4 RBX: 88009c8ab1c0 RCX: 
[  753.260014] RDX: 0008 RSI: 0282 RDI: 
[  753.260014] RBP: 880043025988 R08:  R09: 0002
[  753.260014] R10: 8801188f6000 R11: 880101b50d20 R12: 88008fc1ad40
[  753.260014] R13: 88003940a6c0 R14: 880118a49000 R15: 88010fc77e80
[  753.260014] FS:  7f416ce90700() GS:88011fdc() 
knlGS:
[  753.260014] CS:  0010 DS:  ES:  CR0: 8005003b
[  753.260014] CR2: 7f416c2f6000 CR3: 3aaea000 CR4: 06e0
[  753.260014] DR0:  DR1:  DR2: 
[  753.260014] DR3:  DR6: 0ff0 DR7: 0400
[  753.260014] Process fs_mark (pid: 2874, threadinfo 880043024000, task 
8800090e6180)
[  753.260014] Stack:
[  753.260014]    8801 

[  753.260014]  88010fc77f38 0e92  
0002
[  753.260014]  0e03 0e68  
8800430259d8
[  753.260014] Call Trace:
[  753.260014]  [815b483a] btrfs_run_delayed_refs+0xca/0x220
[  753.260014]  [815c5469] btrfs_commit_transaction+0x359/0x840
[  753.260014]  [810ac420] ? add_wait_queue+0x60/0x60
[  753.260014]  [815c5da4] ? start_transaction+0x94/0x2b0
[  753.260014]  [815ac80c] may_commit_transaction+0x6c/0x100
[  753.260014]  [815b2b47] reserve_metadata_bytes.isra.71+0x5a7/0x660
[  753.260014]  [81073c23] ? __wake_up+0x53/0x70
[  753.260014]  [815a43ba] ? btrfs_free_path+0x2a/0x40
[  753.260014]  [815b2f9e] btrfs_block_rsv_add+0x3e/0x70
[  753.260014]  [81666dfb] ? security_d_instantiate+0x1b/0x30
[  753.260014]  [815c5f65] start_transaction+0x255/0x2b0
[  753.260014]  [815c6283] btrfs_start_transaction+0x13/0x20
[  753.260014]  [815d2236] btrfs_create+0x46/0x220
[  753.260014]  [8116c204] vfs_create+0xb4/0xf0
[  753.260014]  [8116e1d7] do_last.isra.45+0x547/0x7c0
[  753.260014]  [8116f7ab] path_openat+0xcb/0x3d0
[  753.260014]  [81ab168e] ? _raw_spin_lock+0xe/0x20
[  753.260014]  [8117cc1e] ? vfsmount_lock_local_unlock+0x1e/0x30
[  753.260014]  [8116fbd2] do_filp_open+0x42/0xa0
[  753.260014]  [8117c487] ? alloc_fd+0xf7/0x150
[  753.260014]  [8115f8e7] do_sys_open+0xf7/0x1d0
[  753.260014]  [810b572a] ? do_gettimeofday+0x1a/0x50
[  753.260014]  [8115f9e0] sys_open+0x20/0x30
[  753.260014]  [81ab9502] system_call_fastpath+0x16/0x1b
[  753.260014] Code: ff e9 37 f9 ff ff be 95 00 00 00 48 c7 c7 43 6f df 81 e8 
99 5f ad ff e9 36 f9 ff ff 80 fa b2 0f 84 d0 f9 ff ff 0f 0b 0f 0b 0f 0b 0f 0b 
0f 0b 0f
[  753.260014] RIP  [815b475b] run_clustered_refs+0x7eb/0x800
[  753.260014]  RSP 8800430258a8
[  753.330089] ---[ end trace f3d0e286a928c349 ]---

It's hard to tell exactly what path gets to that BUG_ON(), so much
code is inlined by the compiler into run_clustered_refs() that I
can't tell exactly how it got to the BUG_ON() triggered in

Re: [3.2-rc7] slowdown, warning + oops creating lots of files

2012-01-04 Thread Liu Bo
On 01/04/2012 06:01 PM, Dave Chinner wrote:
 On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
 On 05/01/12 09:11, Dave Chinner wrote:

 Looks to be reproducable.
 Does this happen with rc6 ?
 
 I haven't tried. All I'm doing is running some benchmarks to get
 numbers for a talk I'm giving about improvements in XFS metadata
 scalability, so I wanted to update my last set of numbers from
 2.6.39.
 
 As it was, these benchmarks also failed on btrfs with oopsen and
 corruptions back in 2.6.39 time frame.  e.g. same VM, same
 test, different crashes, similar slowdowns as reported here:
 http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062
 
 Given that there is now a history of this simple test uncovering
 problems, perhaps this is a test that should be run more regularly
 by btrfs developers?
 
 If not then it might be easy to track down as there are only
 2 modifications between rc6 and rc7..
 
 They don't look like they'd be responsible for fixing an extent tree
 corruption, and I don't really have the time to do an open-ended
 bisect to find where the problem fix arose.
 
 As it is, 3rd attempt failed at 22m inodes, without the warning this
 time:
 
 [   59.433452] device fsid 4d27dc14-562d-4722-9591-723bd2bbe94c devid 1 
 transid 4 /dev/vdc
 [   59.437050] btrfs: disk space caching is enabled
 [  753.258465] [ cut here ]
 [  753.259806] kernel BUG at fs/btrfs/extent-tree.c:5797!
 [  753.260014] invalid opcode:  [#1] SMP
 [  753.260014] CPU 7
 [  753.260014] Modules linked in:
 [  753.260014]
 [  753.260014] Pid: 2874, comm: fs_mark Not tainted 3.2.0-rc7-dgc+ #167 Bochs 
 Bochs
 [  753.260014] RIP: 0010:[815b475b]  [815b475b] 
 run_clustered_refs+0x7eb/0x800
 [  753.260014] RSP: 0018:8800430258a8  EFLAGS: 00010286
 [  753.260014] RAX: ffe4 RBX: 88009c8ab1c0 RCX: 
 
 [  753.260014] RDX: 0008 RSI: 0282 RDI: 
 
 [  753.260014] RBP: 880043025988 R08:  R09: 
 0002
 [  753.260014] R10: 8801188f6000 R11: 880101b50d20 R12: 
 88008fc1ad40
 [  753.260014] R13: 88003940a6c0 R14: 880118a49000 R15: 
 88010fc77e80
 [  753.260014] FS:  7f416ce90700() GS:88011fdc() 
 knlGS:
 [  753.260014] CS:  0010 DS:  ES:  CR0: 8005003b
 [  753.260014] CR2: 7f416c2f6000 CR3: 3aaea000 CR4: 
 06e0
 [  753.260014] DR0:  DR1:  DR2: 
 
 [  753.260014] DR3:  DR6: 0ff0 DR7: 
 0400
 [  753.260014] Process fs_mark (pid: 2874, threadinfo 880043024000, task 
 8800090e6180)
 [  753.260014] Stack:
 [  753.260014]    8801 
 
 [  753.260014]  88010fc77f38 0e92  
 0002
 [  753.260014]  0e03 0e68  
 8800430259d8
 [  753.260014] Call Trace:
 [  753.260014]  [815b483a] btrfs_run_delayed_refs+0xca/0x220
 [  753.260014]  [815c5469] btrfs_commit_transaction+0x359/0x840
 [  753.260014]  [810ac420] ? add_wait_queue+0x60/0x60
 [  753.260014]  [815c5da4] ? start_transaction+0x94/0x2b0
 [  753.260014]  [815ac80c] may_commit_transaction+0x6c/0x100
 [  753.260014]  [815b2b47] 
 reserve_metadata_bytes.isra.71+0x5a7/0x660
 [  753.260014]  [81073c23] ? __wake_up+0x53/0x70
 [  753.260014]  [815a43ba] ? btrfs_free_path+0x2a/0x40
 [  753.260014]  [815b2f9e] btrfs_block_rsv_add+0x3e/0x70
 [  753.260014]  [81666dfb] ? security_d_instantiate+0x1b/0x30
 [  753.260014]  [815c5f65] start_transaction+0x255/0x2b0
 [  753.260014]  [815c6283] btrfs_start_transaction+0x13/0x20
 [  753.260014]  [815d2236] btrfs_create+0x46/0x220
 [  753.260014]  [8116c204] vfs_create+0xb4/0xf0
 [  753.260014]  [8116e1d7] do_last.isra.45+0x547/0x7c0
 [  753.260014]  [8116f7ab] path_openat+0xcb/0x3d0
 [  753.260014]  [81ab168e] ? _raw_spin_lock+0xe/0x20
 [  753.260014]  [8117cc1e] ? vfsmount_lock_local_unlock+0x1e/0x30
 [  753.260014]  [8116fbd2] do_filp_open+0x42/0xa0
 [  753.260014]  [8117c487] ? alloc_fd+0xf7/0x150
 [  753.260014]  [8115f8e7] do_sys_open+0xf7/0x1d0
 [  753.260014]  [810b572a] ? do_gettimeofday+0x1a/0x50
 [  753.260014]  [8115f9e0] sys_open+0x20/0x30
 [  753.260014]  [81ab9502] system_call_fastpath+0x16/0x1b
 [  753.260014] Code: ff e9 37 f9 ff ff be 95 00 00 00 48 c7 c7 43 6f df 81 e8 
 99 5f ad ff e9 36 f9 ff ff 80 fa b2 0f 84 d0 f9 ff ff 0f 0b 0f 0b 0f 0b 0f 
 0b 0f 0b 0f
 [  753.260014] RIP  [815b475b] run_clustered_refs+0x7eb/0x800
 [  753.260014]  RSP 8800430258a8
 [  753.330089] ---[ end trace f3d0e286a928c349 ]---
 
 It's hard to tell exactly what path gets to that 

Re: [3.2-rc7] slowdown, warning + oops creating lots of files

2012-01-04 Thread Dave Chinner
On Wed, Jan 04, 2012 at 09:23:18PM -0500, Liu Bo wrote:
 On 01/04/2012 06:01 PM, Dave Chinner wrote:
  On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
  On 05/01/12 09:11, Dave Chinner wrote:
 
  Looks to be reproducable.
  Does this happen with rc6 ?
  
  I haven't tried. All I'm doing is running some benchmarks to get
  numbers for a talk I'm giving about improvements in XFS metadata
  scalability, so I wanted to update my last set of numbers from
  2.6.39.
  
  As it was, these benchmarks also failed on btrfs with oopsen and
  corruptions back in 2.6.39 time frame.  e.g. same VM, same
  test, different crashes, similar slowdowns as reported here:
  http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062
  
  Given that there is now a history of this simple test uncovering
  problems, perhaps this is a test that should be run more regularly
  by btrfs developers?
  
  If not then it might be easy to track down as there are only
  2 modifications between rc6 and rc7..
  
  They don't look like they'd be responsible for fixing an extent tree
  corruption, and I don't really have the time to do an open-ended
  bisect to find where the problem fix arose.
  
  As it is, 3rd attempt failed at 22m inodes, without the warning this
  time:

.

  It's hard to tell exactly what path gets to that BUG_ON(), so much
  code is inlined by the compiler into run_clustered_refs() that I
  can't tell exactly how it got to the BUG_ON() triggered in
  alloc_reserved_tree_block().
  
 
 This seems to be an oops led by ENOSPC.

At the time of the oops, this is the space used on the filesystem:

$ df -h /mnt/scratch
Filesystem  Size  Used Avail Use% Mounted on
/dev/vdc 17T   31G   17T   1% /mnt/scratch

It's less than 0.2% full, so I think ENOSPC can be ruled out here.

I have noticed one thing, however, in that the there are significant
numbers of reads coming from disk when the slowdowns and oops occur.
When everything runs fast, there are virtually no reads occurring at
all.  It looks to me that maybe the working set of metadata is being
kicked out of memory, only to be read back in again short while
later. Maybe that is a contributing factor.

BTW, there is a lot of CPU time being spent on the tree locks. perf
shows this as the top 2 CPU consumers:

-   9.49%  [kernel]  [k] __write_lock_failed
   - __write_lock_failed
  - 99.80% _raw_write_lock
 - 79.35% btrfs_try_tree_write_lock
  99.99% btrfs_search_slot
 - 20.63% btrfs_tree_lock
  89.19% btrfs_search_slot
  10.54% btrfs_lock_root_node
 btrfs_search_slot
-   9.25%  [kernel]  [k] _raw_spin_unlock_irqrestore
   - _raw_spin_unlock_irqrestore
  - 55.87% __wake_up
 + 93.89% btrfs_clear_lock_blocking_rw
 + 3.46% btrfs_tree_read_unlock_blocking
 + 2.35% btrfs_tree_unlock

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.2-rc7] slowdown, warning + oops creating lots of files

2012-01-04 Thread Liu Bo
On 01/04/2012 09:26 PM, Dave Chinner wrote:
 On Wed, Jan 04, 2012 at 09:23:18PM -0500, Liu Bo wrote:
 On 01/04/2012 06:01 PM, Dave Chinner wrote:
 On Thu, Jan 05, 2012 at 09:23:52AM +1100, Chris Samuel wrote:
 On 05/01/12 09:11, Dave Chinner wrote:

 Looks to be reproducable.
 Does this happen with rc6 ?
 I haven't tried. All I'm doing is running some benchmarks to get
 numbers for a talk I'm giving about improvements in XFS metadata
 scalability, so I wanted to update my last set of numbers from
 2.6.39.

 As it was, these benchmarks also failed on btrfs with oopsen and
 corruptions back in 2.6.39 time frame.  e.g. same VM, same
 test, different crashes, similar slowdowns as reported here:
 http://comments.gmane.org/gmane.comp.file-systems.btrfs/11062

 Given that there is now a history of this simple test uncovering
 problems, perhaps this is a test that should be run more regularly
 by btrfs developers?

 If not then it might be easy to track down as there are only
 2 modifications between rc6 and rc7..
 They don't look like they'd be responsible for fixing an extent tree
 corruption, and I don't really have the time to do an open-ended
 bisect to find where the problem fix arose.

 As it is, 3rd attempt failed at 22m inodes, without the warning this
 time:
 
 .
 
 It's hard to tell exactly what path gets to that BUG_ON(), so much
 code is inlined by the compiler into run_clustered_refs() that I
 can't tell exactly how it got to the BUG_ON() triggered in
 alloc_reserved_tree_block().

 This seems to be an oops led by ENOSPC.
 
 At the time of the oops, this is the space used on the filesystem:
 
 $ df -h /mnt/scratch
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/vdc 17T   31G   17T   1% /mnt/scratch
 
 It's less than 0.2% full, so I think ENOSPC can be ruled out here.
 

This bug has done something with our block reservation allocator, not the real 
disk space.

Can you try the below one and see what happens?

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b1c8732..5a7f918 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3978,8 +3978,8 @@ static u64 calc_global_metadata_size(struct btrfs_fs_info 
*fs_info)
csum_size * 2;
num_bytes += div64_u64(data_used + meta_used, 50);
 
-   if (num_bytes * 3  meta_used)
-   num_bytes = div64_u64(meta_used, 3);
+   if (num_bytes * 2  meta_used)
+   num_bytes = div64_u64(meta_used, 2);
 
return ALIGN(num_bytes, fs_info-extent_root-leafsize  10);
 }

 I have noticed one thing, however, in that the there are significant
 numbers of reads coming from disk when the slowdowns and oops occur.
 When everything runs fast, there are virtually no reads occurring at
 all.  It looks to me that maybe the working set of metadata is being
 kicked out of memory, only to be read back in again short while
 later. Maybe that is a contributing factor.
 
 BTW, there is a lot of CPU time being spent on the tree locks. perf
 shows this as the top 2 CPU consumers:
 
 -   9.49%  [kernel]  [k] __write_lock_failed
- __write_lock_failed
   - 99.80% _raw_write_lock
  - 79.35% btrfs_try_tree_write_lock
   99.99% btrfs_search_slot
  - 20.63% btrfs_tree_lock
   89.19% btrfs_search_slot
   10.54% btrfs_lock_root_node
  btrfs_search_slot
 -   9.25%  [kernel]  [k] _raw_spin_unlock_irqrestore
- _raw_spin_unlock_irqrestore
   - 55.87% __wake_up
  + 93.89% btrfs_clear_lock_blocking_rw
  + 3.46% btrfs_tree_read_unlock_blocking
  + 2.35% btrfs_tree_unlock
 

hmm, the new extent_buffer lock scheme written by Chris is aimed to avoid such 
cases,
maybe he can provide some advices.

thanks,
liubo

 Cheers,
 
 Dave.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html