[PATCH] xfstests: fix 251's cp -axT problem
When I ran xfstests, 251 got failed cause cp -axT did not work as wish: cp: cannot overwrite directory `/mnt/scratch/1' with non-directory With this patch, 251 has passed. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- 251 |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/251 b/251 index fa3d74a..b54e4c3 100755 --- a/251 +++ b/251 @@ -130,7 +130,7 @@ function run_process() { # Copy content - partition. mkdir $SCRATCH_MNT/$p - cp -axT $content $SCRATCH_MNT/$p + cp -axT $content/ $SCRATCH_MNT/$p/ export chpid=$! wait $chpid /dev/null check_sums -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Broken btrfs filesystem
Hi, I have problems with a btrfs filesystem, and am holding on to it for some more days before reformat. What I am interested about is two things: 1. Is there any way to restore more stuff from the filesystem then already fetched (it would help to get the system up faster, but nothing really of worth on that computer that is not already backed up)? 2. Is there anything here that resembles a bug that should be fixed somewhere and do you need more information to fix this bug? Please CC me as I am not subscribed. So here comes the gory details: I have a latop on which I have stock Fedora 16 installed with a ext4 boot, and then a luks-encrypted swap partiton and a luks-encrypted root partition holding a btrfs volume. Yesterday I hibernated my laptop, and when I resumed it it seemed to resume normally, it let me unlock the screensaver, but did not allow any file-system-access and suddenly oopsed within seconds. Afterwards the system failed to mount the root partition. So I hooked the harddrive up to my desktop running Gentoo with a 3.2.0-kernel and latest btrfs-progs from git. Trying to mount the filesystem does not work: [11353.370007] device fsid 4c86ad4c-0d71-48a3-8cd2-058cccda2a07 devid 1 transid 83234 /dev/mapper/udisks-luks-uuid-d7efe74d-ed8f-425a-942c-c6bbc44483a3-uid1000 [11353.391953] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391958] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391961] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391964] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391966] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391968] Failed to read block groups: -5 [11353.404931] btrfs: open_ctree failed Trying with -o recovery [11353.370007] device fsid 4c86ad4c-0d71-48a3-8cd2-058cccda2a07 devid 1 transid 83234 /dev/mapper/udisks-luks-uuid-d7efe74d-ed8f-425a-942c-c6bbc44483a3-uid1000 [11353.391953] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391958] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391961] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391964] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391966] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391968] Failed to read block groups: -5 [11353.404931] btrfs: open_ctree failed So mounting it seems not to be an option. So I tried restore. First run it restored one file, then it stopped. Upon retrying it restored a lot more files (mostly the /var/lib/yum directory, and a couple of empty directories), but now it never restores more then up to one certain file, and it always fails after that with the following: # ./restore /dev/dm-1 /home/xake/Skrivbord/ferra-rescue parent transid verify failed on 869829160960 wanted 82376 found 83320 parent transid verify failed on 869829160960 wanted 82376 found 83320 parent transid verify failed on 869829160960 wanted 82376 found 83320 parent transid verify failed on 869829160960 wanted 82376 found 83320 Ignoring transid failure parent transid verify failed on 869828055040 wanted 82376 found 83315 parent transid verify failed on 869828055040 wanted 82376 found 83315 parent transid verify failed on 869828055040 wanted 82376 found 83315 parent transid verify failed on 869828055040 wanted 82376 found 83315 Ignoring transid failure parent transid verify failed on 823939305472 wanted 83180 found 83847 parent transid verify failed on 823939305472 wanted 83180 found 83847 parent transid verify failed on 823939305472 wanted 83180 found 83847 parent transid verify failed on 823939305472 wanted 83180 found 83847 Ignoring transid failure Root objectid is 5 Skipping existing file /home/xake/Skrivbord/ferra-rescue/var/lib/rpm/.rpm.lock If you wish to overwrite use the -o option to overwrite parent transid verify failed on 823805370368 wanted 83121 found 83393 parent transid verify failed on 823805370368 wanted 83121 found 83393 parent transid verify failed on 823805370368 wanted 83121 found 83393 parent transid verify failed on 823805370368 wanted 83121 found 83393 Ignoring transid failure parent transid verify failed on 823789125632 wanted 83120 found 83356 parent transid verify failed on 823789125632 wanted 83120 found 83356 parent transid verify failed on 823789125632 wanted 83120 found 83356 parent transid verify failed on 823789125632 wanted 83120 found 83356 Ignoring transid failure parent transid verify failed on 823789125632 wanted 83120 found 83356 Ignoring transid failure parent transid verify failed on 823784792064 wanted 81189 found 83707 parent transid verify failed on 823784792064 wanted 81189 found 83707 parent transid verify failed on 823784792064 wanted 81189 found 83707 parent transid verify failed on 823784792064 wanted 81189 found 83707 Ignoring transid failure parent
Broken btrfs filesystem
Hi, I have problems with a btrfs filesystem, and am holding on to it for some more days before reformat. What I am interested about is two things: 1. Is there any way to restore more stuff from the filesystem then already fetched (it would help to get the system up faster, but nothing really of worth on that computer that is not already backed up)? 2. Is there anything here that resembles a bug that should be fixed somewhere and do you need more information to fix this bug? Please CC me as I am not subscribed. So here comes the gory details: I have a latop on which I have stock Fedora 16 installed with a ext4 boot, and then a luks-encrypted swap partiton and a luks-encrypted root partition holding a btrfs volume. Yesterday I hibernated my laptop, and when I resumed it it seemed to resume normally, it let me unlock the screensaver, but did not allow any file-system-access and suddenly oopsed within seconds. Afterwards the system failed to mount the root partition. So I hooked the harddrive up to my desktop running Gentoo with a 3.2.0-kernel and latest btrfs-progs from git. Trying to mount the filesystem does not work: [11353.370007] device fsid 4c86ad4c-0d71-48a3-8cd2-058cccda2a07 devid 1 transid 83234 /dev/mapper/udisks-luks-uuid-d7efe74d-ed8f-425a-942c-c6bbc44483a3-uid1000 [11353.391953] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391958] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391961] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391964] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391966] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391968] Failed to read block groups: -5 [11353.404931] btrfs: open_ctree failed Trying with -o recovery [11353.370007] device fsid 4c86ad4c-0d71-48a3-8cd2-058cccda2a07 devid 1 transid 83234 /dev/mapper/udisks-luks-uuid-d7efe74d-ed8f-425a-942c-c6bbc44483a3-uid1000 [11353.391953] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391958] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391961] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391964] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391966] parent transid verify failed on 869829160960 wanted 82376 found 83320 [11353.391968] Failed to read block groups: -5 [11353.404931] btrfs: open_ctree failed So mounting it seems not to be an option. So I tried restore. First run it restored one file, then it stopped. Upon retrying it restored a lot more files (mostly the /var/lib/yum directory, and a couple of empty directories), but now it never restores more then up to one certain file, and it always fails after that with the following: # ./restore /dev/dm-1 /home/xake/Skrivbord/ferra-rescue parent transid verify failed on 869829160960 wanted 82376 found 83320 parent transid verify failed on 869829160960 wanted 82376 found 83320 parent transid verify failed on 869829160960 wanted 82376 found 83320 parent transid verify failed on 869829160960 wanted 82376 found 83320 Ignoring transid failure parent transid verify failed on 869828055040 wanted 82376 found 83315 parent transid verify failed on 869828055040 wanted 82376 found 83315 parent transid verify failed on 869828055040 wanted 82376 found 83315 parent transid verify failed on 869828055040 wanted 82376 found 83315 Ignoring transid failure parent transid verify failed on 823939305472 wanted 83180 found 83847 parent transid verify failed on 823939305472 wanted 83180 found 83847 parent transid verify failed on 823939305472 wanted 83180 found 83847 parent transid verify failed on 823939305472 wanted 83180 found 83847 Ignoring transid failure Root objectid is 5 Skipping existing file /home/xake/Skrivbord/ferra-rescue/var/lib/rpm/.rpm.lock If you wish to overwrite use the -o option to overwrite parent transid verify failed on 823805370368 wanted 83121 found 83393 parent transid verify failed on 823805370368 wanted 83121 found 83393 parent transid verify failed on 823805370368 wanted 83121 found 83393 parent transid verify failed on 823805370368 wanted 83121 found 83393 Ignoring transid failure parent transid verify failed on 823789125632 wanted 83120 found 83356 parent transid verify failed on 823789125632 wanted 83120 found 83356 parent transid verify failed on 823789125632 wanted 83120 found 83356 parent transid verify failed on 823789125632 wanted 83120 found 83356 Ignoring transid failure parent transid verify failed on 823789125632 wanted 83120 found 83356 Ignoring transid failure parent transid verify failed on 823784792064 wanted 81189 found 83707 parent transid verify failed on 823784792064 wanted 81189 found 83707 parent transid verify failed on 823784792064 wanted 81189 found 83707 parent transid verify failed on 823784792064 wanted 81189 found 83707 Ignoring transid failure parent
[PATCH V2] Btrfs: cleanup: move node-,leaf-,sectorsize to fs_info
moved the node-,leaf-,sectorsize from btrfs_root to btrfs_fs_info since we don't intend to allow different sizes between trees also removed sectorsize from btrfs_block_group_cache because it now can use the one in fs_info updated all uses accordingly please note in disk-io.c: -static int __setup_root(nodesize, leafsize, sectorsize, stripesize, - *root, *fs_info, objectid) +static int __setup_root(stripesize, *root, *fs_info, objectid) Signed-off-by: Simon Peeters peeters.si...@gmail.com --- fs/btrfs/backref.c |2 +- fs/btrfs/compression.c |8 +++--- fs/btrfs/ctree.c| 12 +- fs/btrfs/ctree.h| 31 +- fs/btrfs/disk-io.c | 50 ++ fs/btrfs/extent-tree.c | 22 -- fs/btrfs/extent_io.c|6 ++-- fs/btrfs/file-item.c| 20 fs/btrfs/file.c | 28 fs/btrfs/free-space-cache.c | 22 +- fs/btrfs/inode.c| 42 ++-- fs/btrfs/ioctl.c|8 +++--- fs/btrfs/ordered-data.c |4 +- fs/btrfs/ordered-data.h |4 +- fs/btrfs/relocation.c | 16 +++--- fs/btrfs/scrub.c|8 +++--- fs/btrfs/super.c|2 +- fs/btrfs/tree-log.c |2 +- fs/btrfs/volumes.c | 10 19 files changed, 139 insertions(+), 158 deletions(-) --- Simon Peeters diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 22c64ff..45d9cf8 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -420,7 +420,7 @@ static int __iter_shared_inline_ref(struct btrfs_fs_info *fs_info, int found = 0; eb = read_tree_block(fs_info-tree_root, logical, -fs_info-tree_root-leafsize, 0); +fs_info-leafsize, 0); if (!eb) return -EIO; diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 14f1c5a..535ff98 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -88,8 +88,8 @@ static inline int compressed_bio_size(struct btrfs_root *root, u16 csum_size = btrfs_super_csum_size(root-fs_info-super_copy); return sizeof(struct compressed_bio) + - ((disk_size + root-sectorsize - 1) / root-sectorsize) * - csum_size; + ((disk_size + root-fs_info-sectorsize - 1) / + root-fs_info-sectorsize) * csum_size; } static struct bio *compressed_bio_alloc(struct block_device *bdev, @@ -675,8 +675,8 @@ int btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, comp_bio, sums); BUG_ON(ret); } - sums += (comp_bio-bi_size + root-sectorsize - 1) / -root-sectorsize; + sums += (comp_bio-bi_size + root-fs_info-sectorsize - 1) / +root-fs_info-sectorsize; ret = btrfs_map_bio(root, READ, comp_bio, mirror_num, 0); diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index dede441..b72272f 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -2087,13 +2087,13 @@ static noinline int insert_new_root(struct btrfs_trans_handle *trans, else btrfs_node_key(lower, lower_key, 0); - c = btrfs_alloc_free_block(trans, root, root-nodesize, 0, + c = btrfs_alloc_free_block(trans, root, root-fs_info-nodesize, 0, root-root_key.objectid, lower_key, level, root-node-start, 0); if (IS_ERR(c)) return PTR_ERR(c); - root_add_used(root, root-nodesize); + root_add_used(root, root-fs_info-nodesize); memset_extent_buffer(c, 0, 0, sizeof(struct btrfs_header)); btrfs_set_header_nritems(c, 1); @@ -2214,13 +2214,13 @@ static noinline int split_node(struct btrfs_trans_handle *trans, mid = (c_nritems + 1) / 2; btrfs_node_key(c, disk_key, mid); - split = btrfs_alloc_free_block(trans, root, root-nodesize, 0, + split = btrfs_alloc_free_block(trans, root, root-fs_info-nodesize, 0, root-root_key.objectid, disk_key, level, c-start, 0); if (IS_ERR(split)) return PTR_ERR(split); - root_add_used(root, root-nodesize); + root_add_used(root, root-fs_info-nodesize); memset_extent_buffer(split, 0, 0, sizeof(struct btrfs_header)); btrfs_set_header_level(split, btrfs_header_level(c)); @@ -2968,13 +2968,13 @@ again: else btrfs_item_key(l, disk_key, mid); - right = btrfs_alloc_free_block(trans, root, root-leafsize, 0, + right = btrfs_alloc_free_block(trans, root, root-fs_info-leafsize, 0, root-root_key.objectid, disk_key, 0, l-start, 0); if (IS_ERR(right)) return PTR_ERR(right); - root_add_used(root, root-leafsize); + root_add_used(root, root-fs_info-leafsize); memset_extent_buffer(right, 0, 0, sizeof(struct btrfs_header)); btrfs_set_header_bytenr(right, right-start); diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 6738503..d5ca265 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -340,11 +340,11 @@ struct btrfs_header { u8 level; } __attribute__ ((__packed__)); -#define BTRFS_NODEPTRS_PER_BLOCK(r) (((r)-nodesize -
real free space on btrfs volume (performance impact)
Hello we are currently investigating performance issue on system runing above btrs filesystem. Is it possible, that performance is impacted by lack of free space? Also, how to get info about real free space on btrfs volume? # btrfs-show /dev/sdb1 Label: opt uuid: 28a55827-e677-47a9-98d5-d31eb3d71436 Total devices 1 FS bytes used 167.83GB devid1 size 240.00GB used *229.25GB* path /dev/sdb1 Btrfs Btrfs v0.19 # btrfs filesystem df /opt Data: total=213.23GB, used=165.26GB System, DUP: total=8.00MB, used=40.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=8.00GB, used=2.57GB # df -h /opt FilesystemSize Used Avail Use% Mounted on /dev/sdb1 240G 171G 59G 75% /opt How come that there is difference detween btrfs-show and df .. 40GB Is the space really usead or can I claim it back? (there are no snapshots) # btrfs subvolume list /opt # Thanks michal -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[v3.2-4874-ge4e1118 OOPS] btrfs-related kernel oops due to media error
[Note : this is a resent of a mail I send to linux-btrfs earlier, this time tested with the lastest git kernel] Hi, One of my disks, partitioned into a single btrfs partition, is showing media errors. The problem is that these errors lead to kernel panic from btrfs - that make the filesystem unusable until reboot - and therefore it is very hard for me to do a full backup of the data prior to changing the disk. My current kernel is a vanilla kernel at current tip (output from git describe is v3.2-4874-ge4e1118). I assume that the filesystem should not panic even in case of a media error... Is there any procedure I can follow / patch I could apply to salvage my data while ignoring media errors ? logs/OOPS at the end of this mail, please let me know if more information is needed, Best regards, Vincent --- [ 3210.717304] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 [ 3210.717309] ata6.00: BMDMA stat 0x24 [ 3210.717312] ata6.00: failed command: READ DMA EXT [ 3210.717318] ata6.00: cmd 25/00:08:5f:dc:2f/00:00:70:00:00/e0 tag 0 dma 4096 in [ 3210.717320] res 51/40:00:61:dc:2f/40:00:70:00:00/e0 Emask 0x9 (media error) [ 3210.717323] ata6.00: status: { DRDY ERR } [ 3210.717325] ata6.00: error: { UNC } [ 3210.732234] ata6.00: configured for UDMA/133 [ 3210.732248] sd 5:0:0:0: [sdd] Unhandled sense code [ 3210.732250] sd 5:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 3210.732254] sd 5:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor] [ 3210.732259] Descriptor sense data with sense descriptors (in hex): [ 3210.732261] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 [ 3210.732270] 70 2f dc 61 [ 3210.732274] sd 5:0:0:0: [sdd] Add. Sense: Unrecovered read error - auto reallocate failed [ 3210.732278] sd 5:0:0:0: [sdd] CDB: Read(10): 28 00 70 2f dc 5f 00 00 08 00 [ 3210.732287] end_request: I/O error, dev sdd, sector 1882184801 [ 3210.732305] ata6: EH complete [ 3210.732322] BUG: unable to handle kernel NULL pointer dereference at (null) [ 3210.732373] IP: [a017f129] extent_range_uptodate+0x59/0xe0 [btrfs] [ 3210.732426] PGD 21e9b7067 PUD 21e9b6067 PMD 0 [ 3210.732455] Oops: [#1] SMP [ 3210.732475] CPU 3 [ 3210.732486] Modules linked in: ip6table_filter ip6_tables ipt_MASQUERADE bnep iptable_nat nf_nat rfcomm bluetooth nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp kvm_intel kvm parport_pc ppdev nfsd nfs lockd fscache binfmt_misc auth_rpcgss nfs_acl sunrpc dm_crypt snd_usb_audio snd_usbmidi_lib joydev snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundcore snd_page_alloc psmouse serio_raw cdc_acm lp parport btrfs zlib_deflate libcrc32c hid_logitech ff_memless usbhid hid i915 drm_kms_helper drm r8169 i2c_algo_bit video pata_jmicron [ 3210.732870] [ 3210.732880] Pid: 3856, comm: btrfs-endio-met Not tainted 3.2.0-custom #2 Gigabyte Technology Co., Ltd. G33-DS3R/G33-DS3R [ 3210.732933] RIP: 0010:[a017f129] [a017f129] extent_range_uptodate+0x59/0xe0 [btrfs] [ 3210.732989] RSP: 0018:880006f3fde0 EFLAGS: 00010246 [ 3210.733014] RAX: RBX: 00df57385000 RCX: [ 3210.733047] RDX: 0001 RSI: 0df57385 RDI: [ 3210.733079] RBP: 880006f3fe00 R08: R09: 88008bce5200 [ 3210.733111] R10: 8800299f9010 R11: 1000 R12: 8802190f4030 [ 3210.733143] R13: 00df573853ff R14: 880006f3fe98 R15: 880143263d88 [ 3210.733175] FS: () GS:88022fd8() knlGS: [ 3210.733212] CS: 0010 DS: ES: CR0: 8005003b [ 3210.733238] CR2: CR3: 00021f35a000 CR4: 000406e0 [ 3210.733270] DR0: DR1: DR2: [ 3210.733302] DR3: DR6: 0ff0 DR7: 0400 [ 3210.74] Process btrfs-endio-met (pid: 3856, threadinfo 880006f3e000, task 8801fa8d8000) [ 3210.733374] Stack: [ 3210.733385] 8800298dd838 8801f9cc9840 88021ee05000 [ 3210.733423] 880006f3fe30 a01581f9 880143263d80 8800298dd860 [ 3210.733461] 880143263d80 880143263d98 880006f3fee0 a0187fef [ 3210.733499] Call Trace: [ 3210.733524] [a01581f9] end_workqueue_fn+0x119/0x140 [btrfs] [ 3210.733567] [a0187fef] worker_loop+0x16f/0x5d0 [btrfs] [ 3210.733608] [a0187e80] ? btrfs_queue_worker+0x310/0x310 [btrfs] [ 3210.733643] [8106fa93] kthread+0x93/0xa0 [ 3210.733668] [8162caa4] kernel_thread_helper+0x4/0x10 [ 3210.733697] [8106fa00] ?
Re: [PATCH 00/21] Btrfs: restriper
On Mon, Jan 09, 2012 at 03:44:18PM +0200, Ilya Dryomov wrote: On Mon, Jan 09, 2012 at 01:50:34AM -0500, Marios Titas wrote: I tried this for many different scenarios and it seems to work pretty well. I only ran into one problematic case: If you remove a device from a multidevice filesystem it crashes. Here's how to reproduce it: truncate -s1g /tmp/test1 truncate -s1g /tmp/test2 losetup /dev/loop1 /tmp/test1 losetup /dev/loop2 /tmp/test2 mkdir /tmp/test ./mkfs.btrfs -L test -d single -m single /dev/loop1 /dev/loop2 mount -o noatime /dev/loop1 /tmp/test ./btrfs dev del /dev/loop1 /tmp/test ./btrfs fi bal start /tmp/test There is no actual restriping involved but the above example does work corretly under 3.1+for-linus whereas it fails with your patches. Thanks for your testing. The good news is that I put that BUG() there simply for debugging so it's nothing major: 2520if (ret) 2521BUG(); /* FIXME break ? */ It used to be just a break out of the loop there, so that's the reason it doesn't panic with 3.1+for-linus. I'll investigate further and fix this. I force-rebased my tree, removed two other BUG_ONs along with this one. Thanks, Ilya diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index d7c5c7d..9b3d03d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2312,7 +2312,8 @@ static int chunk_drange_filter(struct extent_buffer *leaf, int factor; int i; - BUG_ON(!(bargs-flags BTRFS_BALANCE_ARGS_DEVID)); + if (!(bargs-flags BTRFS_BALANCE_ARGS_DEVID)) + return 0; if (btrfs_chunk_type(leaf, chunk) (BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID10)) @@ -2355,7 +2356,8 @@ static int chunk_vrange_filter(struct extent_buffer *leaf, static int chunk_soft_convert_filter(u64 chunk_profile, struct btrfs_balance_args *bargs) { - BUG_ON(!(bargs-flags BTRFS_BALANCE_ARGS_CONVERT)); + if (!(bargs-flags BTRFS_BALANCE_ARGS_CONVERT)) + return 0; chunk_profile = BTRFS_BLOCK_GROUP_PROFILE_MASK; @@ -2518,7 +2520,7 @@ again: ret = btrfs_previous_item(chunk_root, path, 0, BTRFS_CHUNK_ITEM_KEY); if (ret) - BUG(); /* FIXME break ? */ + break; leaf = path-nodes[0]; slot = path-slots[0]; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: real free space on btrfs volume (performance impact)
2012/1/10 Michal Suba michal.s...@pantheon.sk: Hello we are currently investigating performance issue on system runing above btrs filesystem. Is it possible, that performance is impacted by lack of free space? Also, how to get info about real free space on btrfs volume? # btrfs-show /dev/sdb1 Label: opt uuid: 28a55827-e677-47a9-98d5-d31eb3d71436 Total devices 1 FS bytes used 167.83GB devid 1 size 240.00GB used *229.25GB* path /dev/sdb1 Btrfs Btrfs v0.19 # btrfs filesystem df /opt Data: total=213.23GB, used=165.26GB System, DUP: total=8.00MB, used=40.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=8.00GB, used=2.57GB # df -h /opt Filesystem Size Used Avail Use% Mounted on /dev/sdb1 240G 171G 59G 75% /opt How come that there is difference detween btrfs-show and df .. 40GB Is the space really usead or can I claim it back? (there are no snapshots) # btrfs subvolume list /opt # The btrfs-show command is being deprecated. It's output can be easy to misunderstand, but it probably won't be corrected since it's going away at some point. Basically, what this is telling you is that 229.25GB is committed (213.23GB Data + 2 x8.00GB Metadata (because it's duplicated) + 2 x 8.00MB System. However, all the committed space is not being used (which is clearer in the 'btrfs filesystem df' command). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: real free space on btrfs volume (performance impact)
On Tue, Jan 10, 2012 at 12:40 PM, Mitch Harder mitch.har...@sabayonlinux.org wrote: 2012/1/10 Michal Suba michal.s...@pantheon.sk: Hello we are currently investigating performance issue on system runing above btrs filesystem. Is it possible, that performance is impacted by lack of free space? Also, how to get info about real free space on btrfs volume? # btrfs-show /dev/sdb1 Label: opt uuid: 28a55827-e677-47a9-98d5-d31eb3d71436 Total devices 1 FS bytes used 167.83GB devid 1 size 240.00GB used *229.25GB* path /dev/sdb1 Btrfs Btrfs v0.19 # btrfs filesystem df /opt Data: total=213.23GB, used=165.26GB System, DUP: total=8.00MB, used=40.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=8.00GB, used=2.57GB # df -h /opt Filesystem Size Used Avail Use% Mounted on /dev/sdb1 240G 171G 59G 75% /opt How come that there is difference detween btrfs-show and df .. 40GB Is the space really usead or can I claim it back? (there are no snapshots) # btrfs subvolume list /opt # The btrfs-show command is being deprecated. It's output can be easy to misunderstand, but it probably won't be corrected since it's going away at some point. The output of btrfs fi show /dev/whatever is identical, and isn't going away afaik. That said, it is easy to misinterpret, although that's probably unavoidable while still actually presenting that information. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: release space on error in page_mkwrite
If updating the inode gave us an ENOSPC we were just returning in page_mkwrite, which is a problem since we make our reservation right before trying to update the inode, so fix the out label so that we actually free our reservation. Thanks, Signed-off-by: Josef Bacik jo...@redhat.com --- fs/btrfs/inode.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b0d..90a32f1 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6509,8 +6509,8 @@ out_unlock: if (!ret) return VM_FAULT_LOCKED; unlock_page(page); - btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE); out: + btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE); return ret; } -- 1.7.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: revert to static snapshot on reboot
Hello! bt...@spiritvideo.com bt...@spiritvideo.com schrieb: The plan that occurs to me is to make a snapshot of the system in the state that I want to always boot. Then, I would rewrite the init script in the initrd to (a) delete any old tmp copy of the snapshot; (b) copy the static snapshot to a tmp copy; (c) mount the tmp copy. I'd suggest to create a snapshot during initrd phase, then switch to that snapshot as the root. Before creating the new snapshot, first delete all old snapshots still there... Something like: # sda1 = btrfs mkdir -p /btrfs-prepare mount /dev/sda1 /btrfs-prepare -o $REAL_ROOT_FLAGS,... for snapshot in /btrfs-prepare/snapshots/*; do btrfs sub del $snapshot done snapshot=snapshots/root-$(date +%s) # original-root has to be a subvolume btrfs sub snap /btrfs-prepare/original-root /btrfs-prepare/$snapshot REAL_ROOT=$snapshot sync umount /btrfs-prepare # now let the rest of the initrd switch to the real root # depending on your initrd system REAL_ROOT needs to be named # differently: it should result in mount options like # -o subvol=snapshots/root-123456789,... This should be much faster than copying stuff around. I'm not sure how btrfs behaves when unmounting during the btrfs-cleaner deleting snapshots. It may become instable over time. I'm sure the btrfs gurus here can comment on this. I used a timestamp on the snapshot names so no naming conflicts occur during snapshot deletion and creation. I figured that if deleting and recreating the same snapshot name may confuse btrfs after unexpected reboots while the btrfs-cleaner was still running. The above script expects your btrfs layout to be something like that: $ ls -al / ./ original-root/ # system installation goes here (subvolume) snapshots/ # normal empty directory # nothing more This way you can also use an alternate initrd which does no snapshotting to upgrade or reconfigure the system. Or you just chroot into the original root and update that. HTH Kai -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 1/3] Btrfs: add the Probabilistic Skiplist
On Tue, Jan 10, 2012 at 03:31:34PM +0800, Liu Bo wrote: +static inline int generate_node_level(u64 randseed) +{ + int level = 0; + + while (randseed !(randseed 3)) { + randseed = 2; + level++; + } + + return (level MAXLEVEL ? MAXLEVEL : level); +} This is counting number of trailing zeros * 2 (except when randseed == 0), there's a gcc builtin for it __builtin_ctzll and you can turn it in a loopless inlinable function: static inline int generate_node_level(u64 randseed) { return randseed == 0 ? 0 : __builtin_ctzll(randseed) 1 } the builtin should be safe on all arches without the need of libgcc support, there seem to be handcoded asm statements for each arch. microbenchmarkg of builtin vs while-counter showed 2.3x speedup: builtin: 1.866529 ns/loop while: 4.265664 ns/loop (132 loops, on a generic intel x86_64 box) and if MAXLEVEL is = 16, then you can generate just 4 random bytes and compute the level in the same way without any loss. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 1/3] Btrfs: add the Probabilistic Skiplist
On 01/11/2012 08:37 AM, David Sterba wrote: Hi, a few thoughts and comments below: On Tue, Jan 10, 2012 at 03:31:34PM +0800, Liu Bo wrote: c) The level limit may need to be adjusted. I know it is a magic number, but now for simplicity we just keep it at 16, and then each skiplist is able to contain (2^32-1)/3 nodes at most. (2^32-1)/3 = 1,431,655,765 that's a lot, I wonder what an average member count of a skiplist would be and whether eg. maxlevel = 12 is not enough (5,592,405 members). hmm, sorry, I found I've made a mistake here, let me correct it here (changelog will also be updated later): As I set the probability to 1/4, the members linked on N+1 level list will be 1/4 of those linked on N level list. And what's more, in skiplist a node can be linked on multi levels, eg. a node with N+1 level will also be linked on N level list. So before the node count reaches to 4^(maxlevel - 1), the skiplist can maintain O(lgn), and after that, it will be no more O(lgn) although we can still insert nodes into the skiplist. That's the difference. or you can set the maxlevel during skiplist creation, or predefine a small skiplist with compile-time-set level to whatever 16. this can be tuned later of course. Yes, I do set the maxlevel to 16 at the creation of a skiplist. Here 4^(16 - 1) is 2^30, I don't think this is enough for some severe workloads which build large amount of fragments. Maybe we should make the maxlevel self-update. --- /dev/null +++ b/fs/btrfs/skiplist.c @@ -0,0 +1,98 @@ +inline int sl_fill_node(struct sl_node *node, int level, gfp_t mask) I suggest to pick the full prefix skiplist_ instead of just sl_, it'll be IMHO more readable and googlable. (Out of curiosity I grepped for the sl_ prefix and it's used by drivers/net/slip/slip.c). I did hesitate for a while between skiplist_ and sl_... and I just wanna make it be similar to rb_. Anyway, I'm ok with skiplist_. +{ +struct sl_node **p; +struct sl_node **q; +int num; + +BUG_ON(level MAXLEVEL); + +num = level + 1; +p = kmalloc(sizeof(*p) * num, mask); +BUG_ON(!p); you can drop the BUG_ON +if (!p) ^^^ +return -ENOMEM; +q = kmalloc(sizeof(*q) * num, mask); +BUG_ON(!q); ^^ ok, just in case. +if (!q) { +kfree(p); +return -ENOMEM; +} + +node-next = p; +node-prev = q; +node-level = level; +return 0; +} + diff --git a/fs/btrfs/skiplist.h b/fs/btrfs/skiplist.h new file mode 100644 index 000..3e414b5 --- /dev/null +++ b/fs/btrfs/skiplist.h + +#define MAXLEVEL 16 +/* double p = 0.25; */ + +struct sl_node { +struct sl_node **next; +struct sl_node **prev; +unsigned int level; +unsigned int head:1; the bitfield will use another sizeof(int) bytes, but the level is at most 16, you can reduce it's size eg to unsigned short. on the other hand, the structure has to start at address aligned to sizeof(void*) and the bytes after 'head' up to next sizeof(void*) boundary will be left unusable anyway. then, 'head' could be a full int or bool so the compiler is not restricted and forced to keep state of the single bit. if access to these items is exptected to be frequent, the diffenence could be mesurable. I see. Thanks a lot for your advice! thanks, liubo +}; david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: revert to static snapshot on reboot
Upcoming btrfs autosnap feature might help your problem-solution. But the main part in your case which is to replace the root with its snapshot is something beyond the scope of autosnap project. What is being developed is a set of btrfs-prog sub-command to create and manage snapshots with a rule-set. Code is under development, if you would like to test and provide feedback I can send you a copy this week. OR if you want to just know the new feature relevant to you its as below (not a complete features list though). - Create snapshot automatically based on - AD-hoc (package-installation/boot ..etc) cli eg: # btrfs autosnap enable tag retain-policy subvol and the cli that a init or package script should call is # btrfs autosnap now -t tag /btrfs which will create a snapshot and reviews its retention policy. retention policy can be based on count, based on FS % full, OR manually maintained snapshots. If you have any feedback pls let me know. thanks, Anand On Monday 09,January,2012 02:43 PM, bt...@spiritvideo.com wrote: Hi all -- I just installed my first btrfs-based linux tonight, and I must say it gives me a very warm feeling! Congratulations on all your hard work and your fine product. I administer laptops for a small school, and we want to implement what Deep Freeze (http://www.faronics.com/enterprise/deep-freeze) does for Windows -- no matter what a student does after they log in, when they reboot it is all forgotten and the computer has returned to a standard state. I would think this would be a FAQ, but I have searched the web and mailing list for the past couple of hours. Of course it's easy to mount a snapshot, but then if students make changes the snapshot changes. The plan that occurs to me is to make a snapshot of the system in the state that I want to always boot. Then, I would rewrite the init script in the initrd to (a) delete any old tmp copy of the snapshot; (b) copy the static snapshot to a tmp copy; (c) mount the tmp copy. That's a little harder than I was hoping to work -- is there an easier way to get this functionality? I have a small ext4 boot partition containing grub, vmlinuz and initramfs. Everything else is in a big btrfs root partition. I am running Fedora 14, with Fedora-patched linux 2.6.35. I could upgrade if necessary. Thanks, Bob -- I blog about my work at the school at SmallSchoolIT.wordpress.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/11] Btrfs: simplfy calculation of stripe length for discard operation
For btrfs raid, while discarding a range of space, we'll need to know the start offset and length to discard for each device, and it's done in btrfs_map_block(). However the calculation is a bit complex for raid0 and raid10, so I reimplement it based on a fact that: dev1 dev2 dev3(raid0) --- s0 s3 s6 s1 s4 s7 s2 s5 Each device has (total_stripes / nr_dev) stripes, or plus one. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/volumes.c | 95 +--- 1 files changed, 31 insertions(+), 64 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 540fdd2..563ef65 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3024,80 +3024,47 @@ static int __btrfs_map_block(struct btrfs_mapping_tree *map_tree, int rw, atomic_set(bbio-error, 0); if (rw REQ_DISCARD) { + int factor = 0; + int sub_stripes = 0; + u64 stripes_per_dev = 0; + u32 remaining_stripes = 0; + + if (map-type + (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID10)) { + if (map-type BTRFS_BLOCK_GROUP_RAID0) + sub_stripes = 1; + else + sub_stripes = map-sub_stripes; + + factor = map-num_stripes / sub_stripes; + stripes_per_dev = div_u64_rem(stripe_nr_end - + stripe_nr_orig, + factor, + remaining_stripes); + } + for (i = 0; i num_stripes; i++) { bbio-stripes[i].physical = map-stripes[stripe_index].physical + stripe_offset + stripe_nr * map-stripe_len; bbio-stripes[i].dev = map-stripes[stripe_index].dev; - if (map-type BTRFS_BLOCK_GROUP_RAID0) { - u64 stripes; - u32 last_stripe = 0; - int j; - - div_u64_rem(stripe_nr_end - 1, - map-num_stripes, - last_stripe); - - for (j = 0; j map-num_stripes; j++) { - u32 test; - - div_u64_rem(stripe_nr_end - 1 - j, - map-num_stripes, test); - if (test == stripe_index) - break; - } - stripes = stripe_nr_end - 1 - j; - do_div(stripes, map-num_stripes); - bbio-stripes[i].length = map-stripe_len * - (stripes - stripe_nr + 1); - - if (i == 0) { + if (map-type (BTRFS_BLOCK_GROUP_RAID0 | +BTRFS_BLOCK_GROUP_RAID10)) { + bbio-stripes[i].length = stripes_per_dev * + map-stripe_len; + if (i / sub_stripes remaining_stripes) + bbio-stripes[i].length += + map-stripe_len; + if (i sub_stripes) bbio-stripes[i].length -= stripe_offset; - stripe_offset = 0; - } - if (stripe_index == last_stripe) - bbio-stripes[i].length -= - stripe_end_offset; - } else if (map-type BTRFS_BLOCK_GROUP_RAID10) { - u64 stripes; - int j; - int factor = map-num_stripes / -map-sub_stripes; - u32 last_stripe = 0; - - div_u64_rem(stripe_nr_end - 1, - factor, last_stripe); - last_stripe *= map-sub_stripes; - - for (j = 0; j factor; j++) { - u32 test; - - div_u64_rem(stripe_nr_end - 1 - j, - factor, test); - -
[PATCH 09/11][RESEND] Btrfs: rewrite btrfs_trim_block_group()
There are various bugs in block group trimming: - It may trim from offset smaller than user-specified offset. - It may trim beyond user-specified range. - It may leak free space for extents smaller than specified minlen. - It may truncate the last trimmed extent thus leak free space. - With mixed extents+bitmaps, some extents may not be trimmed. - With mixed extents+bitmaps, some bitmaps may not be trimmed (even none will be trimmed). Even for those trimmed, not all the free space in the bitmaps will be trimmed. I rewrite btrfs_trim_block_group() and break it into two functions. One is to trim extents only, and the other is to trim bitmaps only. Before patching: # fstrim -v /mnt/ /mnt/: 1496465408 bytes were trimmed After patching: # fstrim -v /mnt/ /mnt/: 2193768448 bytes were trimmed And this matches the total free space: # btrfs fi df /mnt Data: total=3.58GB, used=1.79GB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=205.12MB, used=97.14MB Metadata: total=8.00MB, used=0.00 Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/free-space-cache.c | 235 ++- 1 files changed, 164 insertions(+), 71 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index e4eb222..b3cbb89 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2594,17 +2594,57 @@ void btrfs_init_free_cluster(struct btrfs_free_cluster *cluster) cluster-block_group = NULL; } -int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, - u64 *trimmed, u64 start, u64 end, u64 minlen) +static int do_trimming(struct btrfs_block_group_cache *block_group, + u64 *total_trimmed, u64 start, u64 bytes, + u64 reserved_start, u64 reserved_bytes) { - struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl; - struct btrfs_free_space *entry = NULL; + struct btrfs_space_info *space_info = block_group-space_info; struct btrfs_fs_info *fs_info = block_group-fs_info; - u64 bytes = 0; - u64 actually_trimmed; - int ret = 0; + int ret; + int update = 0; + u64 trimmed = 0; - *trimmed = 0; + spin_lock(space_info-lock); + spin_lock(block_group-lock); + if (!block_group-ro) { + block_group-reserved += reserved_bytes; + space_info-bytes_reserved += reserved_bytes; + update = 1; + } + spin_unlock(block_group-lock); + spin_unlock(space_info-lock); + + ret = btrfs_error_discard_extent(fs_info-extent_root, +start, bytes, trimmed); + if (!ret) + *total_trimmed += trimmed; + + btrfs_add_free_space(block_group, reserved_start, reserved_bytes); + + if (update) { + spin_lock(space_info-lock); + spin_lock(block_group-lock); + if (block_group-ro) + space_info-bytes_readonly += reserved_bytes; + block_group-reserved -= reserved_bytes; + space_info-bytes_reserved -= reserved_bytes; + spin_unlock(space_info-lock); + spin_unlock(block_group-lock); + } + + return ret; +} + +static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, + u64 *total_trimmed, u64 start, u64 end, u64 minlen) +{ + struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl; + struct btrfs_free_space *entry; + struct rb_node *node; + int ret = 0; + u64 extent_start; + u64 extent_bytes; + u64 bytes; while (start end) { spin_lock(ctl-tree_lock); @@ -2615,81 +2655,118 @@ int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group, } entry = tree_search_offset(ctl, start, 0, 1); - if (!entry) - entry = tree_search_offset(ctl, - offset_to_bitmap(ctl, start), - 1, 1); - - if (!entry || entry-offset = end) { + if (!entry) { spin_unlock(ctl-tree_lock); break; } - if (entry-bitmap) { - ret = search_bitmap(ctl, entry, start, bytes); - if (!ret) { - if (start = end) { - spin_unlock(ctl-tree_lock); - break; - } - bytes = min(bytes, end - start); - bitmap_clear_bits(ctl, entry, start, bytes); -
[PATCH 10/11] Btrfs: update global block_rsv when creating a new block group
A bug was triggered while using seed device: # mkfs.btrfs /dev/loop1 # btrfstune -S 1 /dev/loop1 # mount -o /dev/loop1 /mnt # btrfs dev add /dev/loop2 /mnt btrfs: block rsv returned -28 [ cut here ] WARNING: at fs/btrfs/extent-tree.c:5969 btrfs_alloc_free_block+0x166/0x396 [btrfs]() ... Call Trace: ... [f7b7c31c] btrfs_cow_block+0x101/0x147 [btrfs] [f7b7eaa6] btrfs_search_slot+0x1b8/0x55f [btrfs] [f7b7f844] btrfs_insert_empty_items+0x42/0x7f [btrfs] [f7b7f8c1] btrfs_insert_item+0x40/0x7e [btrfs] [f7b8ac02] btrfs_make_block_group+0x243/0x2aa [btrfs] [f7bb3f53] __btrfs_alloc_chunk+0x672/0x70e [btrfs] [f7bb41ff] init_first_rw_device+0x77/0x13c [btrfs] [f7bb5a62] btrfs_init_new_device+0x664/0x9fd [btrfs] [f7bbb65a] btrfs_ioctl+0x694/0xdbe [btrfs] [c04f55f7] do_vfs_ioctl+0x496/0x4cc [c04f5660] sys_ioctl+0x33/0x4f [c07b9edf] sysenter_do_call+0x12/0x38 ---[ end trace 906adac595facc7d ]--- Since seed device is readonly, there's no usable space in the filesystem. Afterwards we add a sprout device to it, and the kernel creates a METADATA block group and a SYSTEM block group where comes free space we can reserve, but we still get revervation failure because the global block_rsv hasn't been updated accordingly. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/extent-tree.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 5b53479..bf30f67 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7446,6 +7446,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, ret = update_space_info(root-fs_info, cache-flags, size, bytes_used, cache-space_info); BUG_ON(ret); + update_global_block_rsv(root-fs_info); spin_lock(cache-space_info-lock); cache-space_info-bytes_readonly += cache-bytes_super; -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/11] Btrfs: fix possible deadlock when opening a seed device
The correct lock order is uuid_mutex - volume_mutex - chunk_mutex, but when we mount a filesystem which has backing seed devices, we have this lock chain: open_ctree() lock(chunk_mutex); read_chunk_tree(); read_one_dev(); open_seed_devices(); lock(uuid_mutex); and then we hit a lockdep splat. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/disk-io.c |2 -- fs/btrfs/volumes.c |9 +++-- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 3f9d555..858ab34 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2270,9 +2270,7 @@ struct btrfs_root *open_ctree(struct super_block *sb, (unsigned long)btrfs_header_chunk_tree_uuid(chunk_root-node), BTRFS_UUID_SIZE); - mutex_lock(fs_info-chunk_mutex); ret = btrfs_read_chunk_tree(chunk_root); - mutex_unlock(fs_info-chunk_mutex); if (ret) { printk(KERN_WARNING btrfs: failed to read chunk tree on %s\n, sb-s_id); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 563ef65..fbb493b 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3506,7 +3506,7 @@ static int open_seed_devices(struct btrfs_root *root, u8 *fsid) struct btrfs_fs_devices *fs_devices; int ret; - mutex_lock(uuid_mutex); + BUG_ON(!mutex_is_locked(uuid_mutex)); fs_devices = root-fs_info-fs_devices-seed; while (fs_devices) { @@ -3544,7 +3544,6 @@ static int open_seed_devices(struct btrfs_root *root, u8 *fsid) fs_devices-seed = root-fs_info-fs_devices-seed; root-fs_info-fs_devices-seed = fs_devices; out: - mutex_unlock(uuid_mutex); return ret; } @@ -3687,6 +3686,9 @@ int btrfs_read_chunk_tree(struct btrfs_root *root) if (!path) return -ENOMEM; + mutex_lock(uuid_mutex); + lock_chunks(root); + /* first we search for all of the device items, and then we * read in all of the chunk items. This way we can create chunk * mappings that reference all of the devices that are afound @@ -3737,6 +3739,9 @@ again: } ret = 0; error: + unlock_chunks(root); + mutex_unlock(uuid_mutex); + btrfs_free_path(path); return ret; } -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/11] Btrfs: reserve metadata space in btrfs_ioctl_setflags()
Check and reserve space for btrfs_update_inode(). Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/ioctl.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 9619fb0..fe8a60c 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -254,7 +254,7 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg) ip-flags = ~(BTRFS_INODE_COMPRESS | BTRFS_INODE_NOCOMPRESS); } - trans = btrfs_join_transaction(root); + trans = btrfs_start_transaction(root, 1); if (IS_ERR(trans)) { ret = PTR_ERR(trans); goto out_drop; -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/11] Btrfs: check the return value of io_ctl_init()
It can return -ENOMEM. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/free-space-cache.c |9 +++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 4e55af3..e4eb222 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -637,7 +637,10 @@ int __load_free_space_cache(struct btrfs_root *root, struct inode *inode, if (!num_entries) return 0; - io_ctl_init(io_ctl, inode, root); + ret = io_ctl_init(io_ctl, inode, root); + if (ret) + return ret; + ret = readahead_cache(inode); if (ret) goto out; @@ -851,7 +854,9 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode, if (!i_size_read(inode)) return -1; - io_ctl_init(io_ctl, inode, root); + ret = io_ctl_init(io_ctl, inode, root); + if (ret) + return -1; /* Get the cluster for this block_group if it exists */ if (block_group !list_empty(block_group-cluster_list)) -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/11] Btrfs: some patches for 3.3
The biggest one is a fix for fstrim, and there's a fix for on-disk free space cache. Others are small fixes and cleanups. The last three have been sent weeks ago. The patchset is also available in this repo: git://repo.or.cz/linux-btrfs-devel.git for-chris Note there's a small confict with Al Viro's vfs changes. Li Zefan (11): Btrfs: add pinned extents to on-disk free space cache correctly Btrfs: avoid possible NULL deref in io_ctl_drop_pages() Btrfs: check the return value of io_ctl_init() Btrfs: remove BUG_ON()s in btrfs_ioctl_setflags() Btrfs: reserve metadata space in btrfs_ioctl_setflags() Btrfs: don't pass a trans handle unnecessarily in volumes.c Btrfs: don't pre-allocate btrfs bio Btrfs: simplfy calculation of stripe length for discard operation Btrfs: rewrite btrfs_trim_block_group() Btrfs: update global block_rsv when creating a new block group Btrfs: fix possible deadlock when opening a seed device fs/btrfs/disk-io.c |2 - fs/btrfs/extent-tree.c |3 +- fs/btrfs/free-space-cache.c | 293 +-- fs/btrfs/ioctl.c| 20 +++- fs/btrfs/volumes.c | 189 ++-- fs/btrfs/volumes.h |3 +- 6 files changed, 280 insertions(+), 230 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/11] Btrfs: avoid possible NULL deref in io_ctl_drop_pages()
If we run into some failure path in io_ctl_prepare_pages(), io_ctl-pages[] array may have some NULL pointers. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/free-space-cache.c |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 01840ef..4e55af3 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -319,9 +319,11 @@ static void io_ctl_drop_pages(struct io_ctl *io_ctl) io_ctl_unmap_page(io_ctl); for (i = 0; i io_ctl-num_pages; i++) { - ClearPageChecked(io_ctl-pages[i]); - unlock_page(io_ctl-pages[i]); - page_cache_release(io_ctl-pages[i]); + if (io_ctl-pages[i]) { + ClearPageChecked(io_ctl-pages[i]); + unlock_page(io_ctl-pages[i]); + page_cache_release(io_ctl-pages[i]); + } } } -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/11] Btrfs: don't pass a trans handle unnecessarily in volumes.c
Some functions never use the transaction handle passed to them. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/extent-tree.c |2 +- fs/btrfs/volumes.c | 18 +++--- fs/btrfs/volumes.h |3 +-- 3 files changed, 9 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 8603ee4..5b53479 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7084,7 +7084,7 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr) * space to fit our block group in. */ if (device-total_bytes device-bytes_used + min_free) { - ret = find_free_dev_extent(NULL, device, min_free, + ret = find_free_dev_extent(device, min_free, dev_offset, NULL); if (!ret) dev_nr++; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f4b839f..73f673c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -829,7 +829,6 @@ out: /* * find_free_dev_extent - find free space in the specified device - * @trans: transaction handler * @device:the device which we search the free space in * @num_bytes: the size of the free space that we need * @start: store the start of the free space. @@ -848,8 +847,7 @@ out: * But if we don't find suitable free space, it is used to store the size of * the max free space. */ -int find_free_dev_extent(struct btrfs_trans_handle *trans, -struct btrfs_device *device, u64 num_bytes, +int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes, u64 *start, u64 *len) { struct btrfs_key key; @@ -893,7 +891,7 @@ int find_free_dev_extent(struct btrfs_trans_handle *trans, key.offset = search_start; key.type = BTRFS_DEV_EXTENT_KEY; - ret = btrfs_search_slot(trans, root, key, path, 0, 0); + ret = btrfs_search_slot(NULL, root, key, path, 0, 0); if (ret 0) goto out; if (ret 0) { @@ -1469,8 +1467,7 @@ error_undo: /* * does all the dirty work required for changing file system's UUID. */ -static int btrfs_prepare_sprout(struct btrfs_trans_handle *trans, - struct btrfs_root *root) +static int btrfs_prepare_sprout(struct btrfs_root *root) { struct btrfs_fs_devices *fs_devices = root-fs_info-fs_devices; struct btrfs_fs_devices *old_devices; @@ -1695,7 +1692,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) if (seeding_dev) { sb-s_flags = ~MS_RDONLY; - ret = btrfs_prepare_sprout(trans, root); + ret = btrfs_prepare_sprout(root); BUG_ON(ret); } @@ -2323,8 +2320,7 @@ done: return ret; } -static int btrfs_add_system_chunk(struct btrfs_trans_handle *trans, - struct btrfs_root *root, +static int btrfs_add_system_chunk(struct btrfs_root *root, struct btrfs_key *key, struct btrfs_chunk *chunk, int item_size) { @@ -2496,7 +2492,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, if (total_avail == 0) continue; - ret = find_free_dev_extent(trans, device, + ret = find_free_dev_extent(device, max_stripe_size * dev_stripes, dev_offset, max_avail); if (ret ret != -ENOSPC) @@ -2687,7 +2683,7 @@ static int __finish_chunk_alloc(struct btrfs_trans_handle *trans, BUG_ON(ret); if (map-type BTRFS_BLOCK_GROUP_SYSTEM) { - ret = btrfs_add_system_chunk(trans, chunk_root, key, chunk, + ret = btrfs_add_system_chunk(chunk_root, key, chunk, item_size); BUG_ON(ret); } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 78f2d4d..c1701ec 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -230,7 +230,6 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size); int btrfs_init_new_device(struct btrfs_root *root, char *path); int btrfs_balance(struct btrfs_root *dev_root); int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset); -int find_free_dev_extent(struct btrfs_trans_handle *trans, -struct btrfs_device *device, u64 num_bytes, +int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes, u64 *start, u64 *max_avail); #endif -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/11] Btrfs: add pinned extents to on-disk free space cache correctly
I got this while running xfstests: [24256.836098] block group 317849600 has an wrong amount of free space [24256.836100] btrfs: failed to load free space cache for block group 317849600 We should clamp the extent returned by find_first_extent_bit(), so the start of the extent won't smaller than the start of the block group. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/free-space-cache.c | 41 - 1 files changed, 20 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index ec23d43..01840ef 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -838,7 +838,7 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode, struct io_ctl io_ctl; struct list_head bitmap_list; struct btrfs_key key; - u64 start, end, len; + u64 start, extent_start, extent_end, len; int entries = 0; int bitmaps = 0; int ret; @@ -857,25 +857,12 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode, struct btrfs_free_cluster, block_group_list); - /* -* We shouldn't have switched the pinned extents yet so this is the -* right one -*/ - unpin = root-fs_info-pinned_extents; - /* Lock all pages first so we can lock the extent safely. */ io_ctl_prepare_pages(io_ctl, inode, 0); lock_extent_bits(BTRFS_I(inode)-io_tree, 0, i_size_read(inode) - 1, 0, cached_state, GFP_NOFS); - /* -* When searching for pinned extents, we need to start at our start -* offset. -*/ - if (block_group) - start = block_group-key.objectid; - node = rb_first(ctl-free_space_offset); if (!node cluster) { node = rb_first(cluster-root); @@ -918,9 +905,20 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode, * We want to add any pinned extents to our free space cache * so we don't leak the space */ + + /* +* We shouldn't have switched the pinned extents yet so this is the +* right one +*/ + unpin = root-fs_info-pinned_extents; + + if (block_group) + start = block_group-key.objectid; + while (block_group (start block_group-key.objectid + block_group-key.offset)) { - ret = find_first_extent_bit(unpin, start, start, end, + ret = find_first_extent_bit(unpin, start, + extent_start, extent_end, EXTENT_DIRTY); if (ret) { ret = 0; @@ -928,20 +926,21 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode, } /* This pinned extent is out of our range */ - if (start = block_group-key.objectid + + if (extent_start = block_group-key.objectid + block_group-key.offset) break; - len = block_group-key.objectid + - block_group-key.offset - start; - len = min(len, end + 1 - start); + extent_start = max(extent_start, start); + extent_end = min(block_group-key.objectid + +block_group-key.offset, extent_end + 1); + len = extent_end - extent_start; entries++; - ret = io_ctl_add_entry(io_ctl, start, len, NULL); + ret = io_ctl_add_entry(io_ctl, extent_start, len, NULL); if (ret) goto out_nospc; - start = end + 1; + start = extent_end; } /* Write out the bitmaps */ -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/11] Btrfs: don't pre-allocate btrfs bio
We pre-allocate a btrfs bio with fixed size, and then may re-allocate memory if we find stripes are bigger than the fixed size. But this pre-allocation is not necessary. Also we don't have to calcuate the stripe number twice. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/volumes.c | 67 --- 1 files changed, 21 insertions(+), 46 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 73f673c..540fdd2 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2897,26 +2897,13 @@ static int __btrfs_map_block(struct btrfs_mapping_tree *map_tree, int rw, u64 stripe_nr; u64 stripe_nr_orig; u64 stripe_nr_end; - int stripes_allocated = 8; - int stripes_required = 1; int stripe_index; int i; + int ret = 0; int num_stripes; int max_errors = 0; struct btrfs_bio *bbio = NULL; - if (bbio_ret !(rw (REQ_WRITE | REQ_DISCARD))) - stripes_allocated = 1; -again: - if (bbio_ret) { - bbio = kzalloc(btrfs_bio_size(stripes_allocated), - GFP_NOFS); - if (!bbio) - return -ENOMEM; - - atomic_set(bbio-error, 0); - } - read_lock(em_tree-lock); em = lookup_extent_mapping(em_tree, logical, *length); read_unlock(em_tree-lock); @@ -2935,32 +2922,6 @@ again: if (mirror_num map-num_stripes) mirror_num = 0; - /* if our btrfs_bio struct is too small, back off and try again */ - if (rw REQ_WRITE) { - if (map-type (BTRFS_BLOCK_GROUP_RAID1 | -BTRFS_BLOCK_GROUP_DUP)) { - stripes_required = map-num_stripes; - max_errors = 1; - } else if (map-type BTRFS_BLOCK_GROUP_RAID10) { - stripes_required = map-sub_stripes; - max_errors = 1; - } - } - if (rw REQ_DISCARD) { - if (map-type (BTRFS_BLOCK_GROUP_RAID0 | -BTRFS_BLOCK_GROUP_RAID1 | -BTRFS_BLOCK_GROUP_DUP | -BTRFS_BLOCK_GROUP_RAID10)) { - stripes_required = map-num_stripes; - } - } - if (bbio_ret (rw (REQ_WRITE | REQ_DISCARD)) - stripes_allocated stripes_required) { - stripes_allocated = map-num_stripes; - free_extent_map(em); - kfree(bbio); - goto again; - } stripe_nr = offset; /* * stripe_nr counts the total number of stripes we have to stride @@ -3055,6 +3016,13 @@ again: } BUG_ON(stripe_index = map-num_stripes); + bbio = kzalloc(btrfs_bio_size(num_stripes), GFP_NOFS); + if (!bbio) { + ret = -ENOMEM; + goto out; + } + atomic_set(bbio-error, 0); + if (rw REQ_DISCARD) { for (i = 0; i num_stripes; i++) { bbio-stripes[i].physical = @@ -3151,15 +3119,22 @@ again: stripe_index++; } } - if (bbio_ret) { - *bbio_ret = bbio; - bbio-num_stripes = num_stripes; - bbio-max_errors = max_errors; - bbio-mirror_num = mirror_num; + + if (rw REQ_WRITE) { + if (map-type (BTRFS_BLOCK_GROUP_RAID1 | +BTRFS_BLOCK_GROUP_RAID10 | +BTRFS_BLOCK_GROUP_DUP)) { + max_errors = 1; + } } + + *bbio_ret = bbio; + bbio-num_stripes = num_stripes; + bbio-max_errors = max_errors; + bbio-mirror_num = mirror_num; out: free_extent_map(em); - return 0; + return ret; } int btrfs_map_block(struct btrfs_mapping_tree *map_tree, int rw, -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/11] Btrfs: remove BUG_ON()s in btrfs_ioctl_setflags()
We can recover from errors and return -errno to user space. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/ioctl.c | 18 ++ 1 files changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index c04f02c..9619fb0 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -176,6 +176,8 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg) struct btrfs_trans_handle *trans; unsigned int flags, oldflags; int ret; + u64 ip_oldflags; + unsigned int i_oldflags; if (btrfs_root_readonly(root)) return -EROFS; @@ -192,6 +194,9 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg) mutex_lock(inode-i_mutex); + ip_oldflags = ip-flags; + i_oldflags = inode-i_flags; + flags = btrfs_mask_flags(inode-i_mode, flags); oldflags = btrfs_flags_to_ioctl(ip-flags); if ((flags ^ oldflags) (FS_APPEND_FL | FS_IMMUTABLE_FL)) { @@ -250,18 +255,23 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg) } trans = btrfs_join_transaction(root); - BUG_ON(IS_ERR(trans)); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + goto out_drop; + } btrfs_update_iflags(inode); inode-i_ctime = CURRENT_TIME; ret = btrfs_update_inode(trans, root, inode); - BUG_ON(ret); btrfs_end_transaction(trans, root); + out_drop: + if (ret) { + ip-flags = ip_oldflags; + inode-i_flags = i_oldflags; + } mnt_drop_write(file-f_path.mnt); - - ret = 0; out_unlock: mutex_unlock(inode-i_mutex); return ret; -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html