BUG at fs/btrfs/inode.c:1587

2011-11-15 Thread Christian Brunner
Hi,

this time I've hit a new bug. This happened while ceph was rebuilding
his filestore (heavy io).

The btrfs version is from 3.2-rc1, applied to a 3.0 kernel.

Regards,
Christian

[28981.550478] [ cut here ]
[28981.555625] kernel BUG at fs/btrfs/inode.c:1587!
[28981.560773] invalid opcode:  [#1] SMP
[28981.565361] CPU 2
[28981.567407] Modules linked in: btrfs zlib_deflate libcrc32c sunrpc
bonding ipv6 sg serio_raw pcspkr ghes hed iTCO_wdt iTCO_vendor_support
ixgbe dca mdio i7core_edac edac_core iomemory_vsl(P) hpsa squashfs
[last unloaded: scsi_wait_scan]
[28981.591184]
[28981.592842] Pid: 1814, comm: btrfs-fixup-0 Tainted: P
3.0.8-1.fits.4.el6.x86_64 #1 HP ProLiant DL180 G6
[28981.604589] RIP: 0010:[a0292f3c]  [a0292f3c]
btrfs_writepage_fixup_worker+0x14c/0x160 [btrfs]
[28981.616049] RSP: 0018:8805ee735dd0  EFLAGS: 00010246
[28981.621967] RAX:  RBX: ea00132c2520 RCX: 8805ef32ec58
[28981.629918] RDX:  RSI: 003b5000 RDI: 8805ef32ea38
[28981.637870] RBP: 8805ee735e20 R08: 88063f25add0 R09: 8805ee735d88
[28981.645822] R10:  R11: 0001 R12: 003b5000
[28981.653774] R13: 8805ef32eb08 R14:  R15: 003b5fff
[28981.661727] FS:  () GS:88063f24()
knlGS:
[28981.670744] CS:  0010 DS:  ES:  CR0: 8005003b
[28981.677146] CR2: 07737000 CR3: 01a03000 CR4: 06e0
[28981.685098] DR0:  DR1:  DR2: 
[28981.693050] DR3:  DR6: 0ff0 DR7: 0400
[28981.701010] Process btrfs-fixup-0 (pid: 1814, threadinfo
8805ee734000, task 8805f3f54bc0)
[28981.710901] Stack:
[28981.713146]  88045dbf4d20 8805ef32e9a8 00012bc0
88027dcdbd20
[28981.721434]   8805ef99ede0 8805ef99ee30
8805ef99edf8
[28981.729723]  88045dbf4d50 8805ee735e80 8805ee735ee0
a02b39ce
[28981.738013] Call Trace:
[28981.740763]  [a02b39ce] worker_loop+0x13e/0x540 [btrfs]
[28981.747577]  [a02b3890] ? btrfs_queue_worker+0x2d0/0x2d0 [btrfs]
[28981.755263]  [a02b3890] ? btrfs_queue_worker+0x2d0/0x2d0 [btrfs]
[28981.762931]  [81085c96] kthread+0x96/0xa0
[28981.768373]  [81556844] kernel_thread_helper+0x4/0x10
[28981.774976]  [81085c00] ? kthread_worker_fn+0x1a0/0x1a0
[28981.781772]  [81556840] ? gs_change+0x13/0x13
[28981.787593] Code: e0 48 83 c4 28 5b 41 5c 41 5d 41 5e 41 5f c9 c3
48 8b 7d b8 48 8d 4d c8 41 b8 50 00 00 00 4c 89 fa 4c 89 e6 e8 96 38
01 00 eb bd 0f 0b eb fe 48 89 df e8 c8 0e e7 e0 eb 9d 66 0f 1f 44 00
00 55
[28981.809294] RIP  [a0292f3c]
btrfs_writepage_fixup_worker+0x14c/0x160 [btrfs]
[28981.818150]  RSP 8805ee735dd0
[28981.822721] ---[ end trace 0236051622523829 ]---
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/21] [RFC] Btrfs: restriper

2011-11-15 Thread Ilya Dryomov
On Mon, Nov 14, 2011 at 06:59:14PM -0500, Phillip Susi wrote:
 I have a fs that started with the default policy of metadata=dup.  I
 added a second device and rebalanced, and so the metadata chunks were
 converted to raid1.  Now I can not remove the second device because
 raid1 requires at least two devices.
 
 If I understand this patch series correctly, I can use it to manually
 convert those raid1 chunks back to dup, and then remove the second
 device.  It occurs to me though, that in the restripe process, the
 newly created dup chunks can be allocated from either disk still, and
 any that are allocated on the second disk will then need to be
 relocated in order to remove that disk.  This seems inefficient, so I
 was wondering if there is a way to make sure that during the restripe,
 only the disk I intend to keep is allocated from to create the dup
 chunks, and thus avoid the need to relocate when I remove the second disk?

Restriper won't let you do raid1 - dup transition because dup is only
allowed for a single-spindle FS, so you'll end up with error btrfs:
unable to start restripe 

There is no way to prioritize disks during restripe.  To get dup back
you'll have to convert everything to single, remove the second drive and
then convert metadata from single to dup.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Don't prevent removal of devices that break raid reqs

2011-11-15 Thread Ilya Dryomov
On Thu, Nov 10, 2011 at 09:21:00PM -0500, Chris Mason wrote:
 On Thu, Nov 10, 2011 at 05:32:48PM -0200, Alexandre Oliva wrote:
  Instead of preventing the removal of devices that would render existing
  raid10 or raid1 impossible, warn but go ahead with it; the rebalancing
  code is smart enough to use different block group types.
  
  Should the refusal remain, so that we'd only proceed with a
  newly-introduced --force option or so?
 
 Hmm, going to three devices on raid10 doesn't turn it into
 raid1.  It turns it into a degraded raid10.
 
 We'll need a --force or some kind.  There are definitely cases users
 have wanted to do this but it is rarely a good idea ;)

I'm not sure about use cases Chris talks about, but sans those I think
we should prevent breaking raids.  If user wants to downgrade his FS he
can do that explicitly with restriper.  As for the relocation code
'smartness', we already have a confusing case where balancing silently
upgrades single to raid0.

Chris, can you describe those cases in detail so I can integrate and
align this whole thing with restriper before it's merged ?  (I added a
--force option for some of the transitions, probably best not to add
another closely related one)

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce option to rebalance only metadata

2011-11-15 Thread Ilya Dryomov
On Thu, Nov 10, 2011 at 07:43:07PM +, Hugo Mills wrote:
 On Thu, Nov 10, 2011 at 05:40:56PM -0200, Alexandre Oliva wrote:
  Experimental patch to be able to compact only the metadata after
  clustered allocation allocated lots of unnecessary metadata block
  groups.  It's also useful to measure performance differences between
  -o cluster and -o nocluster.
  
  I guess it should be implemented as a balance option rather than a
  separate ioctl, but this was good enough for me to try it.
 
This should be covered by the restriper work. (And was also covered
 by my balance-management patches, which were superseded by restriper).

Hugo is right, this is covered by restriper (both kernel and userspace
sides).  The exact command would be

btrfs fi restripe start -mconvert=PROFILE mount point

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce option to rebalance only metadata

2011-11-15 Thread Ilya Dryomov
On Tue, Nov 15, 2011 at 11:40:04AM +0200, Ilya Dryomov wrote:
 On Thu, Nov 10, 2011 at 07:43:07PM +, Hugo Mills wrote:
  On Thu, Nov 10, 2011 at 05:40:56PM -0200, Alexandre Oliva wrote:
   Experimental patch to be able to compact only the metadata after
   clustered allocation allocated lots of unnecessary metadata block
   groups.  It's also useful to measure performance differences between
   -o cluster and -o nocluster.
   
   I guess it should be implemented as a balance option rather than a
   separate ioctl, but this was good enough for me to try it.
  
 This should be covered by the restriper work. (And was also covered
  by my balance-management patches, which were superseded by restriper).
 
 Hugo is right, this is covered by restriper (both kernel and userspace
 sides).  The exact command would be
 
 btrfs fi restripe start -mconvert=PROFILE mount point

And the exact command to mimic your patch is

btrfs fi restripe start -m mount point

It simply balances metadata, whereas the one in the previous mail would
convert it to a specified profile.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reliability questions / me too for bugs

2011-11-15 Thread Josef Bacik
On Mon, Nov 14, 2011 at 05:12:13PM -0500, Jérôme Carretero wrote:
 I have a couple of questions concerning btrfs reliability.
 
 I'm currently using btrfs in my internal drives (strong advantages) and have 
 used it on external drives, but I've recently migrated the external ones to 
 ext4, for reliability reasons.
 The kernel seems to be able to handle ext4 partition disconnections (drive 
 error, cable gets eaten by rodent, or most commonly, unplugged too early on 
 removable drives...) quite gracefully.
 This is not yet the case for btrfs partitions (deadlocks, various oopses, 
 need to reboot).
 Any idea when this will be available ?
 

At some point in the future.  We know it's a problem, but as you can see from
your nice list at the end, we have lot's of problems working properly when the
drives are plugged in and behaving fine ;).

 How to handle bad blocks (sometimes, they are very localized on HDDs, and 
 they will happen on old SSDs) ?
 
 Imagine the following use case:
  - get untrusted drive from dumpster
  - check that it runs, and has an acceptable amount of bad block clusters
  - add the drive to a btrfs pool, which guarantees that its data will be 
 duplicated somewhere else
  - enjoy the drive while it lasts
  - ability to retrieve bad blocks map later on
  - ability to cleanly remove the drive from the pool if it becomes useless 
 (found a better one) or dies (see first question)
when that happens, data gets replicated to other locations...
data replication could be done automatically by background scrubbing with 
 some mount flag or ioctl
 How far are we from that ? Will we get there some day ?
 
 
 Since I'm here, a few random and useless notes, as I'm currently testing 
 v3.2-rc1-284-g52e4c2a and I see a few bugs, deadlocks and weirdnesses.
 I don't know if it's normal for -rc1, maybe.
 My current workload is rsync 1.5TB from SATA to USB2+3 (500+1000GB in raid0) 
 and vice versa.
 The load average can grow to 15.
 I've ran into BUG at fs/btrfs/inode.c:1795 
 (http://comments.gmane.org/gmane.comp.file-systems.btrfs/14128).
 I've ran into WARNING: at fs/btrfs/free-space-cache.c:1847 
 btrfs_remove_free_space+0x1a3/0x287() [1]
 I've also ran into INFO: task btrfs-transacti:1465 blocked for more than 120 
 seconds. [2]
 Sometimes linux is writing I don't know what for a looong time on drives, 
 and there's nothing in cache.
 Sometimes rsync stops, doing nothing. It will somehow restart after I do a 
 echo 3  /proc/sys/vm/drop_caches...
 I see that a lot of features will be added for 3.2 but I hope they will be 
 well tested !

So the inode one was fixed recently, Chris sent the patch to Linus this weekend
so upgrade to that and you should be good.  The free_space_cache one is a new
one on me, how often do you hit it?  And as for the transaction thing, if it
happens again can you do sysrq+w?  Sometimes the guy hanging everybody up
doesn't get printed out so we only get part of the picture, sysrq+w will give us
all waiters so we can figure out whats going on.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: clear pages dirty for io and set them extent mapped

2011-11-15 Thread Josef Bacik
When doing the io_ctl helpers to clean up the free space cache stuff I stopped
using our normal prepare_pages stuff, which means I of course forgot to do
things like set the pages extent mapped, which will cause us all sorts of
wonderful propblems.  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/free-space-cache.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 96cb54d..30e045d 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -351,6 +351,11 @@ static int io_ctl_prepare_pages(struct io_ctl *io_ctl, 
struct inode *inode,
}
}
 
+   for (i = 0; i  io_ctl-num_pages; i++) {
+   clear_page_dirty_for_io(io_ctl-pages[i]);
+   set_page_extent_mapped(io_ctl-pages[i]);
+   }
+
return 0;
 }
 
-- 
1.7.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/21] [RFC] Btrfs: restriper

2011-11-15 Thread Phillip Susi

On 11/15/2011 4:22 AM, Ilya Dryomov wrote:

Restriper won't let you do raid1 -  dup transition because dup is only
allowed for a single-spindle FS, so you'll end up with error btrfs:
unable to start restripe 

There is no way to prioritize disks during restripe.  To get dup back
you'll have to convert everything to single, remove the second drive and
then convert metadata from single to dup.


So there is no way to put a disk into read only mode and prevent 
allocations of new chunks there?


It seems like both of these limitations are highly undesirable when 
trying to recover from a failing disk.  You don't want any more data 
being written to the failing disk while you are trying to remove it, and 
you certainly don't want to drop back to a single copy of data that is 
then written to the failing disk.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/21] [RFC] Btrfs: restriper

2011-11-15 Thread Ilya Dryomov
On Tue, Nov 15, 2011 at 09:33:14AM -0500, Phillip Susi wrote:
 On 11/15/2011 4:22 AM, Ilya Dryomov wrote:
 Restriper won't let you do raid1 -  dup transition because dup is only
 allowed for a single-spindle FS, so you'll end up with error btrfs:
 unable to start restripe 
 
 There is no way to prioritize disks during restripe.  To get dup back
 you'll have to convert everything to single, remove the second drive and
 then convert metadata from single to dup.
 
 So there is no way to put a disk into read only mode and prevent
 allocations of new chunks there?
 
 It seems like both of these limitations are highly undesirable when
 trying to recover from a failing disk.  You don't want any more data
 being written to the failing disk while you are trying to remove it,
 and you certainly don't want to drop back to a single copy of data
 that is then written to the failing disk.

If you have a failing disk in a raid setup, you don't need to downgrade
your raid, you can add a third drive and remove the failing one.  But
that's inconvenient and most of the time you'll have to do a full
balance.

So another thing I'm working on is drive swap, when it's done it will
take care of the failing disk scenario.  If you have a raid setup and
one of the disks gone bad you'll be able to say

btrfs device replace FAILED NEW mountpoint

and it will put valid copy onto the fresh drive, basically doing a raid
rebuild.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs progs git repo on kernel.org

2011-11-15 Thread Phillip Susi

On 10/27/2011 11:27 AM, Chris Mason wrote:

Hi everyone,

I've pulled in Hugo's integration tree, minus the features that were not
yet in the kernel.  This also has a few small commits that I had queued
up outside of the fsck work.

Hugo, many thanks for keeping up the integration tree!  Taking out the
features not in the kernel meant I had to rebase it the commits, I'm
sorry about that.

The code from the integration tree is here:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git


I notice that there are no tags in the repo.  Did you just forget to 
push them, or have they been lost?  Also the repository description 
still needs filled out.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] xfstests 265: add a prealloc and reserve test

2011-11-15 Thread Ben Myers
Hi Wu Bo,

On Thu, Nov 03, 2011 at 11:09:00AM +0800, WuBo wrote:
 This test is for preallocation test. If the disk is full, just with a prealloc
 file has some free space that prealloc early. We need to check whether the 
 write
 to the free space is success or not.
 
 Signed-off-by: Wu Bo wu...@cn.fujitsu.com

This test is failing for me because I don't have fallocate installed.  I
suggest the test could to be changed to check for binaries it uses,
possibly the version of those binaries, and then not run unless the
right ones are installed.  But the best I can do right now is make a
note of it.

Just a heads up.  ;)

Regards,
Ben

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: at fs/btrfs/inode.c:2198 btrfs_orphan_commit_root+0xa8/0xc0

2011-11-15 Thread Josef Bacik
On Tue, Nov 15, 2011 at 08:13:43PM +0100, Stefan Kleijkers wrote:
 Hello Josef,
 
 We have patched the 3.1.1 kernel with your patch and after a short
 time one of the ceph osds crashed (core dumped) and I found this in
 the dmesg, please let me know if that's enough information or if you
 need more.
 

Yeah I was hoping for a WARN_ON() that should have shown up before that, do you
have the entire dmesg?  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: at fs/btrfs/inode.c:2198 btrfs_orphan_commit_root+0xa8/0xc0

2011-11-15 Thread Stefan Kleijkers

Hello Josef,

We have patched the 3.1.1 kernel with your patch and after a short time 
one of the ceph osds crashed (core dumped) and I found this in the 
dmesg, please let me know if that's enough information or if you need more.


Stefan

[11226.207447] [ cut here ]
[11226.212107] kernel BUG at fs/btrfs/extent-tree.c:3592!
[11226.217283] invalid opcode:  [#1] SMP
[11226.221442] CPU 2
[11226.223288] Modules linked in: btrfs zlib_deflate lzo_compress md_mod 
target_core_mod configfs ahci libahci e1000e mptsas i7core_edac mptscsih 
mptbase scsi_transport_sas bnx2 i5000_edac edac_core ipmi_devintf 
ipmi_msghandler

[11226.243940]
[11226.245458] Pid: 6845, comm: ceph-osd Not tainted 
3.1.1-un13.1-64-nohz #1 Supermicro X8ST3/X8ST3
[11226.254349] RIP: 0010:[a011719a]  [a011719a] 
block_rsv_release_bytes+0x11a/0x120 [btrfs]

[11226.264316] RSP: 0018:8805fb4e1cd8  EFLAGS: 00010206
[11226.269663] RAX:  RBX: 8802c4303d20 RCX: 
0113be7a
[11226.276831] RDX: 00018000 RSI:  RDI: 
8802c4303d58
[11226.284015] RBP: 8805fb4e1cf8 R08: a0114475 R09: 
8805fb4e1b98
[11226.291173] R10:  R11:  R12: 

[11226.298396] R13: 00018000 R14: 8805fb280800 R15: 
0001
[11226.305579] FS:  7f34edeae700() GS:88061fc4() 
knlGS:

[11226.313709] CS:  0010 DS:  ES:  CR0: 80050033
[11226.319480] CR2: 7f34e414e000 CR3: 0005fc8f9000 CR4: 
06e0
[11226.326637] DR0:  DR1:  DR2: 

[11226.333797] DR3:  DR6: 0ff0 DR7: 
0400
[11226.340956] Process ceph-osd (pid: 6845, threadinfo 8805fb4e, 
task 880604292f00)

[11226.349578] Stack:
[11226.351614]  8805e08d5de0 0001 880602fd8400 
8805e092fdc8
[11226.359164]  8805fb4e1d08 a01171cd 8805fb4e1d18 
a011721f
[11226.366687]  8805fb4e1d58 a0139565 8805fb4e1d58 
8805e092fdc8

[11226.374270] Call Trace:
[11226.376776]  [a01171cd] btrfs_block_rsv_release+0x2d/0x50 
[btrfs]
[11226.383852]  [a011721f] 
btrfs_orphan_release_metadata+0x2f/0x40 [btrfs]

[11226.391430]  [a0139565] btrfs_orphan_del+0xe5/0x150 [btrfs]
[11226.398044]  [a013ab27] btrfs_truncate+0x587/0x600 [btrfs]
[11226.404510]  [a013abe7] btrfs_setsize+0x47/0xc0 [btrfs]
[11226.410646]  [a013ad05] btrfs_setattr+0xa5/0xd0 [btrfs]
[11226.416933]  [810e552a] notify_change+0x10a/0x2b0
[11226.422548]  [810cb75f] do_truncate+0x5f/0x90
[11226.427840]  [810cb8d4] sys_truncate+0x144/0x1a0
[11226.433363]  [813fd97b] system_call_fastpath+0x16/0x1b
[11226.439573] Code: 00 49 8d be b8 00 00 00 e8 64 5d 2e e1 4d 29 6e 20 
49 ff 46 48 41 fe 86 b8 00 00 00 eb a1 0f 1f 00 83 ca 04 41 88 54 24 41 
eb 87 0f 0b eb fe 66 90 55 48 89 f8 48 89 e5 48 89 f7 48 8b 80 20 01
[11226.460355] RIP  [a011719a] 
block_rsv_release_bytes+0x11a/0x120 [btrfs]

[11226.468018]  RSP 8805fb4e1cd8
[11226.472008] ---[ end trace 7a5f53562ba538a2 ]---


On 11/14/2011 05:03 PM, Josef Bacik wrote:

On Thu, Nov 10, 2011 at 01:13:46PM +0100, Stefan Kleijkers wrote:

Hello Josef,

I have a workload running at the moment and I'm seeing a lot of
these (paste 1) messages in dmesg, this is the 3.1 kernel with your
patch applied.

At the end I see a couple of the old warnings (paste 2).

Futhermore it looks like after a while the speed of the filesystem
decreases. I have a workload with 20 rsyncs and a total of about
1.5T data. I don't make it to have a full run.


Hmm well thats interesting, and you'd think that would tell me what was wrong
but I'm still confused :).  Give this debug patch a whirl (unapply the last one
I gave you and apply this one instead) and send me your dmesg if you get any of
the new warnings.  Thanks,

Josef

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 634608d..395a746 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -74,6 +74,7 @@ struct btrfs_inode {

/* the space_info for where this inode's data allocations are done */
struct btrfs_space_info *space_info;
+   struct btrfs_block_rsv *rsv;

/* full 64 bit generation number, struct vfs_inode doesn't have a big
 * enough field for this.
@@ -140,7 +141,7 @@ struct btrfs_inode {
 */
unsigned outstanding_extents;
unsigned reserved_extents;
-
+   unsigned orphan_count;
/*
 * ordered_data_close is set by truncate when a file that used
 * to have good data has been truncated to zero.  When it is set
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fa4f602..c73f4b1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3721,6 +3721,7 @@ static void 

[GIT PULL] various updates for -rc3

2011-11-15 Thread Josef Bacik
Hey Chris,

Here are the cluster rework patches from Alexandre along with my tracepoints
patch and a couple of bugfixes.  This should fix the panics we've been seeing
when running xfstests 13 in a loop.  The cluster fixes I've been testing for a
while, and the tracepoints patch I used to profile the new clustering stuff to
make sure it was giving us a good behavior.  I have a repo with some tools to
use the allocator tracepoints

git://github.com/josefbacik/btrfs-tracing.git

In all respects Alexandre's patches work wonders.  You can pull from

git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work.git for-chris

which is based off of your for-linus branch.  The shortlog and diffstat is
attached below.  Thanks,

Josef

Alexandre Oliva (3):
  Revamp btrfs cluster creation logic.
  Drop gap detection from btrfs.
  Require at least one extent of the requested size, but accept other 
smaller ones except when SSD_SPREAD is enabled.

Josef Bacik (3):
  Btrfs: add allocator tracepoints
  Btrfs: wait on caching if we're loading the free space cache
  Btrfs: clear pages dirty for io and set them extent mapped

 fs/btrfs/ctree.h |3 +-
 fs/btrfs/extent-tree.c   |  130 ++--
 fs/btrfs/free-space-cache.c  |  130 
 include/trace/events/btrfs.h |  173 ++
 4 files changed, 329 insertions(+), 107 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Don't prevent removal of devices that break raid reqs

2011-11-15 Thread Chris Mason
On Tue, Nov 15, 2011 at 11:37:13AM +0200, Ilya Dryomov wrote:
 On Thu, Nov 10, 2011 at 09:21:00PM -0500, Chris Mason wrote:
  On Thu, Nov 10, 2011 at 05:32:48PM -0200, Alexandre Oliva wrote:
   Instead of preventing the removal of devices that would render existing
   raid10 or raid1 impossible, warn but go ahead with it; the rebalancing
   code is smart enough to use different block group types.
   
   Should the refusal remain, so that we'd only proceed with a
   newly-introduced --force option or so?
  
  Hmm, going to three devices on raid10 doesn't turn it into
  raid1.  It turns it into a degraded raid10.
  
  We'll need a --force or some kind.  There are definitely cases users
  have wanted to do this but it is rarely a good idea ;)
 
 I'm not sure about use cases Chris talks about, but sans those I think
 we should prevent breaking raids.  If user wants to downgrade his FS he
 can do that explicitly with restriper.  As for the relocation code
 'smartness', we already have a confusing case where balancing silently
 upgrades single to raid0.
 
 Chris, can you describe those cases in detail so I can integrate and
 align this whole thing with restriper before it's merged ?  (I added a
 --force option for some of the transitions, probably best not to add
 another closely related one)

There are a few valid use cases where people want to be able to break
a raid1.  I'd put it at the very bottom of the list of interesting
things, just because I see a long list of bug reports that start with:

my FS was broken, so I told it to remove device xxyzzz, and that didn't
work so I ran --force, and then (sad story follows).

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] xfstests 265: add a prealloc and reserve test

2011-11-15 Thread WuBo
On 11/16/2011 03:08 AM, Christoph Hellwig wrote:
 On Tue, Nov 15, 2011 at 12:21:13PM -0600, Ben Myers wrote:
 Hi Wu Bo,

 On Thu, Nov 03, 2011 at 11:09:00AM +0800, WuBo wrote:
 This test is for preallocation test. If the disk is full, just with a 
 prealloc
 file has some free space that prealloc early. We need to check whether the 
 write
 to the free space is success or not.

 Signed-off-by: Wu Bo wu...@cn.fujitsu.com

 This test is failing for me because I don't have fallocate installed.  I
 suggest the test could to be changed to check for binaries it uses,
 possibly the version of those binaries, and then not run unless the
 right ones are installed.  But the best I can do right now is make a
 note of it.
 
 It might be even better to just use the xfs_io falloc command as we
 generally expect an uptodate xfs_io for use with xfstests.

Got it.

thanks,
wubo

 
 --
 To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce option to rebalance only metadata

2011-11-15 Thread Alexandre Oliva
On Nov 15, 2011, Ilya Dryomov idryo...@gmail.com wrote:

 And the exact command to mimic your patch is

 btrfs fi restripe start -m mount point

Thanks.  I wasn't aware of the restripe patch when I wrote this Quick
Hack (TM).

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs: avoid unnecessary bitmap search for cluster setup

2011-11-15 Thread Li Zefan
setup_cluster_no_bitmap() searches all the extents and bitmaps starting
from offset. Therefore if it returns -ENOSPC, all the bitmaps starting
from offset are in the bitmaps list, so it's sufficient to search from
this list in setup_cluser_bitmap().

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/free-space-cache.c |   42 --
 1 files changed, 4 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index df3bc22..584ef14 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2451,7 +2451,6 @@ setup_cluster_bitmap(struct btrfs_block_group_cache 
*block_group,
 {
struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl;
struct btrfs_free_space *entry;
-   struct rb_node *node;
int ret = -ENOSPC;
u64 bitmap_offset = offset_to_bitmap(ctl, offset);
 
@@ -2469,10 +2468,6 @@ setup_cluster_bitmap(struct btrfs_block_group_cache 
*block_group,
list_add(entry-list, bitmaps);
}
 
-   /*
-* First check our cached list of bitmaps and see if there is an entry
-* here that will work.
-*/
list_for_each_entry(entry, bitmaps, list) {
if (entry-bytes  min_bytes)
continue;
@@ -2483,38 +2478,10 @@ setup_cluster_bitmap(struct btrfs_block_group_cache 
*block_group,
}
 
/*
-* If we do have entries on our list and we are here then we didn't find
-* anything, so go ahead and get the next entry after the last entry in
-* this list and start the search from there.
+* The bitmaps list has all the bitmaps that record free space
+* starting after offset, so no more search is required.
 */
-   if (!list_empty(bitmaps)) {
-   entry = list_entry(bitmaps-prev, struct btrfs_free_space,
-  list);
-   node = rb_next(entry-offset_index);
-   if (!node)
-   return -ENOSPC;
-   entry = rb_entry(node, struct btrfs_free_space, offset_index);
-   goto search;
-   }
-
-   entry = tree_search_offset(ctl, offset_to_bitmap(ctl, offset), 0, 1);
-   if (!entry)
-   return -ENOSPC;
-
-search:
-   node = entry-offset_index;
-   do {
-   entry = rb_entry(node, struct btrfs_free_space, offset_index);
-   node = rb_next(entry-offset_index);
-   if (!entry-bitmap)
-   continue;
-   if (entry-bytes  min_bytes)
-   continue;
-   ret = btrfs_bitmap_cluster(block_group, entry, cluster, offset,
-  bytes, min_bytes);
-   } while (ret  node);
-
-   return ret;
+   return -ENOSPC;
 }
 
 /*
@@ -2532,8 +2499,8 @@ int btrfs_find_space_cluster(struct btrfs_trans_handle 
*trans,
 u64 offset, u64 bytes, u64 empty_size)
 {
struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl;
-   struct list_head bitmaps;
struct btrfs_free_space *entry, *tmp;
+   LIST_HEAD(bitmaps);
u64 min_bytes;
int ret;
 
@@ -2572,7 +2539,6 @@ int btrfs_find_space_cluster(struct btrfs_trans_handle 
*trans,
goto out;
}
 
-   INIT_LIST_HEAD(bitmaps);
ret = setup_cluster_no_bitmap(block_group, cluster, bitmaps, offset,
  bytes, min_bytes);
if (ret)
-- 1.7.3.1 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html