Re: Performance Issues
Am Freitag, 19. September 2014, 13:51:22 schrieb Holger Hoffstätte: On Fri, 19 Sep 2014 13:18:34 +0100, Rob Spanton wrote: I have a particularly uncomplicated setup (a desktop PC with a hard disk) and I'm seeing particularly slow performance from btrfs. A `git status` in the linux source tree takes about 46 seconds after dropping caches, whereas on other machines using ext4 this takes about 13s. My mail client (evolution) also seems to perform particularly poorly on this setup, and my hunch is that it's spending a lot of time waiting on the filesystem. This is - unfortunately - a particular btrfs oddity/characteristic/flaw, whatever you want to call it. git relies a lot on fast stat() calls, and those seem to be particularly slow with btrfs esp. on rotational media. I have the same problem with rsync on a freshly mounted volume; it gets fast (quite so!) after the first run. my favorite benchmark is ls -l /usr/bin: ext4: 0.934s btrfs: 21.814s also mounting large partitons (several 100Gs) takes lot of time on btrfs. Better to defer it during boot using e.g. noauto,x-systemd.automount. Marc signature.asc Description: This is a digitally signed message part.
Re: general thoughts and questions + general and RAID5/6 stability?
William Hanson posted on Fri, 19 Sep 2014 16:50:05 -0400 as excerpted: Hey guys... I was just crawling through the wiki and this list's archive to find answers about some questions. Actually many of them matching those which Christoph has asked here some time ago, though it seems no answers came up at all. Seems his post slipped thru the cracks, perhaps because it was too much at once for people to try to chew on. Let's see if second time around works better... On Sun, 2014-08-31 at 06:02 +0200, Christoph Anton Mitterer wrote: For some time now I consider to use btrfs at a larger scale, basically in two scenarios: a) As the backend for data pools handled by dcache (dcache.org), where we run a Tier-2 in the higher PiB range for the LHC Computing Grid... For now that would be rather boring use of btrfs (i.e. not really using any of its advanced features) and also RAID functionality would still be provided by hardware (at least with the current hardware generations we have in use). While that scale is simply out of my league, here's what I'd say if I were asked my own opinion. I'd say btrfs isn't ready for that, basically for one reason. Btrfs has stabilized quite a bit in the last year, and the scary warnings have now come off, but it's still not fully stable, and keeping backups of any data you value is still very strongly recommended. The scenario above is talking high PiB scale. Simply put, that's a **LOT** of data to keep backups of, or to lose all at once if you don't and something happens! At that scale I'd look at something more mature, with a reputation for working well at that scale. Xfs is what I'd be looking at. That or possibly zfs. People who value their data highly tend, for good reason, to be rather conservative when it comes to filesystems. At that level and at the conservatism I'd guess it calls for, I'd say another two years, perhaps longer, given btrfs history and how much longer than expected every step has seemed to take. b) Personally, for my NAS. Here the main goal is less performance but rather data safety (i.e. I want something like RAID6 or better) and security (i.e. it will be on top of dm-crypt/LUKS) and integrity. Hardware wise I'll use and UPS as well as enterprise SATA disks, from different vendors respectively different production lots. (Of course I'm aware that btrfs is experimental, and I would have regular backups) [...] [1] So one issue I have is to determine the general stability of the different parts. Raid5/6 are still out of the question at this point. The operating code is there, but the recovery code is incomplete. In effect, btrfs raid5/6 must be treated as if it's slow raid0 in terms of dependability, but with a free upgrade to raid5/6 when the code is complete (assuming the array survives that long in its raid0 stage), as the operational code has been there all along and it has been creating and writing the parity, it just can't yet reliably restore from it if called to do so. So if you wouldn't be comfortable with the data on raid0, that is, with the idea of losing it all if you lose any of it, don't put it on btrfs raid5/6 at this point. The situation is actually /somewhat/ better than that, but that's the reliability bottom line you should be planning for, and if raid0 reliability isn't appropriate for your data, neither is btrfs raid5/6 at this point. Btrfs raid1 and raid10 modes, OTOH, are reasonably mature and ready for use, basically at the same level as single-device btrfs. Which is to say there's still active development and keep your backups ready as it's not /entirely/ stable yet, but a lot of people are using it without undue issues -- just keep those backups current and tested, and be prepared to use them if you need to. For btrfs raid1 mode, it's worth pointing out that for btrfs raid1 means two copies on different devices, no matter how many devices are in the array. It's always two copies, more devices simply adds more total capacity. Similarly with btrfs raid10, the 1/mirror side of that 10 is always paired. Stripes can be two or three or whatever width, but there's always only the two mirrors. N-way-mirroring is on the roadmap, scheduled for introduction after raid5/6 is complete. So it's coming, but given the time it has taken for raid5/6 and the fact that it's still not complete, reasonably reliable n- way-mirroring could easily still be a year away or more. Features: Most of the core btrfs features are reasonably stable but some don't work so well together; see my just-previous post on a different thread about nocow and snapshots, for instance. (Basically, setting nocow ends up being nearly useless in the face of frequent snapshots of an actively rewritten file.) Qgroups/quotas are an exception. They've recently rewritten it as the old approach simply wasn't working, and while it /should/ be more stable now, it's still very new
Re: Performance Issues
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 20/09/14 09:23, Marc Dietrich wrote: Am Freitag, 19. September 2014, 13:51:22 schrieb Holger Hoffstätte: On Fri, 19 Sep 2014 13:18:34 +0100, Rob Spanton wrote: I have a particularly uncomplicated setup (a desktop PC with a hard disk) and I'm seeing particularly slow performance from btrfs. A `git status` in the linux source tree takes about 46 seconds after dropping caches, whereas on other machines using ext4 this takes about 13s. My mail client (evolution) also seems to perform particularly poorly on this setup, and my hunch is that it's spending a lot of time waiting on the filesystem. This is - unfortunately - a particular btrfs oddity/characteristic/flaw, whatever you want to call it. git relies a lot on fast stat() calls, and those seem to be particularly slow with btrfs esp. on rotational media. I have the same problem with rsync on a freshly mounted volume; it gets fast (quite so!) after the first run. my favorite benchmark is ls -l /usr/bin: ext4: 0.934s btrfs: 21.814s So... On my old low power slow Atom SSD ext4 system: time ls -l /usr/bin real0m0.369s user0m0.048s sys 0m0.128s Repeated: real0m0.107s user0m0.040s sys 0m0.044s and that is for: # ls -l /usr/bin | wc 1384 13135 88972 On a comparatively super dual core Athlon64 SSD btrfs three disk btrfs raid1 system: real0m0.103s user0m0.004s sys 0m0.040s Repeated: real0m0.027s user0m0.008s sys 0m0.012s For: # ls -l /usr/bin | wc 1449 13534 89024 And on an identical comparatively super dual core Athlon64 HDD 'spinning rust' btrfs two disk btrfs raid1 system: real0m0.101s user0m0.008s sys 0m0.020s Repeated: real0m0.020s user0m0.004s sys 0m0.012s For: # ls -l /usr/bin | wc 1161 10994 79350 So, no untoward concerns there. Marc: You on something really ancient and hopelessly fragmented into oblivion? also mounting large partitons (several 100Gs) takes lot of time on btrfs. I've noticed that also for some 16TB btrfs raid1 mounts, btrfs is not as fast as mounting ext4 but then again all very much faster than mounting ext4 when a fsck count is tripped!... So, nothing untoward there. For my usage, controlling fragmentation and having some automatic mechanism to deal with pathological fragmentation with such as sqlite files are greater concerns! (Yes, there is the manual fix of NOCOW... I also put such horrors into tmpfs and snapshot that... All well and good but all unnecessary admin tasks!) Regards, Martin -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.15 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAlQdhBwACgkQ+sI3Ds7h07f/VwCgkHPjrIkBkWh5zrKwvN7fXalZ LWcAoIbLFEoc7iTNLzgSChNvnYatIkuZ =YlDI -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Using two mirrored drives separately
Hi, I am wondering: if I set up btrfs on two identical drives, with data and metadata mirroring, will it be possible to use these drives separately later on? Will just one of these drives work as a regular btrfs-formatted single drive if connected to a different machine? Thanks ahead, Leonid. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Using two mirrored drives separately
On Sat, Sep 20, 2014 at 07:11:52PM +0300, Leonid Bloch wrote: I am wondering: if I set up btrfs on two identical drives, with data and metadata mirroring, will it be possible to use these drives separately later on? Will just one of these drives work as a regular btrfs-formatted single drive if connected to a different machine? If by this you mean btrfs raid1, then yes, they will work this way. Just both of them will have missing device and, afaik, wont mount without -o degraded. But then you can rebalance to notraid1 or add another disk and replace missing devices on new machines. Piotr Szymaniak. signature.asc Description: Digital signature
Re: Performance Issues
Hi, Am Samstag 20 September 2014, 22:04:16 schrieb Wang Shilong: Hi, just my two cents here.^_^ Am Freitag, 19. September 2014, 13:51:22 schrieb Holger Hoffstätte: On Fri, 19 Sep 2014 13:18:34 +0100, Rob Spanton wrote: I have a particularly uncomplicated setup (a desktop PC with a hard disk) and I'm seeing particularly slow performance from btrfs. A `git status` in the linux source tree takes about 46 seconds after dropping caches, whereas on other machines using ext4 this takes about 13s. My mail client (evolution) also seems to perform particularly poorly on this setup, and my hunch is that it's spending a lot of time waiting on the filesystem. This is - unfortunately - a particular btrfs oddity/characteristic/flaw, whatever you want to call it. git relies a lot on fast stat() calls, and those seem to be particularly slow with btrfs esp. on rotational media. I have the same problem with rsync on a freshly mounted volume; it gets fast (quite so!) after the first run. my favorite benchmark is ls -l /usr/bin: ext4: 0.934s btrfs: 21.814s I did a quick benchmark for this: Testing tool is something like follows, it create 50W files and 50w directories under a fresh mkfs filesystem, btrfs is just a little slower than ext4: For ext4: real 0m9.295s user 0m2.252s sys 0m7.010s For btrfs: real 0m10.207s user 0m1.347s sys 0m8.353s And test is done with a 20G vm disk(backend is hard disk) with latest kernel compiled under VM. thanks for testing! However, I think a double cached VM disk may not be a good test candidate. #!/bin/bash umount /dev/sdc #~/source/e2fsprogs/misc/mke2fs -F -O inline_data /dev/sdc /dev/null mkfs.ext4 -F /dev/sdc /dev/null mount /dev/sdc /mnt ./mdtest -d /mnt/ext4 -n 50 -C /dev/null echo 3 /proc/sys/vm/drop_caches time ls -l /mnt/ext4/\#test-dir.0/mdtest_tree.0 /dev/null umount /dev/sdc mkfs.btrfs -f /dev/sdc /dev/null mount /dev/sdc /mnt ./mdtest -d /mnt/btrfs -n 50 -C /dev/null echo 3 /proc/sys/vm/drop_caches time ls -l /mnt/btrfs/\#test-dir.0/mdtest_tree.0 /dev/null ok, 50 is much more than my 5000 files in /usr/bin, so ext4 needs a bit more time. Also a fresh new btrfs may not reflect the same stage as an ageing one. Unfortunately, I haven't found a method yet to find out the fragmentation of a directory, so the question is why btrfs is that fast in your case I did a small experiment: mkdir /usr/bin2; cd /usr/bin2 for i in ../bin/*; do ln $i; done echo 3 /proc/sys/vm/drop_caches time ls -l /dev/null real0m5.935s user0m0.063s sys 0m0.344s better, so I think this is partly due to heavy fragmentation of the original directory where even defrag does not help, but btrfs fi defrag bin2 echo 3 /proc/sys/vm/drop_caches time ls -l /dev/null real0m8.059s user0m0.080s sys 0m0.381s and btrfs fi defrag -clzo bin2 echo 3 /proc/sys/vm/drop_caches time ls -l /dev/null real0m12.524s user0m0.072s sys 0m0.461s times are +/- 1s in repeated tests. so defragging seems to hurt in this case. So here i think Btrfs is not that you think much slower than Ext4 at least for ‘ls’ which means directory reading performances… Here i think you could do several ways to improve or avoid such things: 1. create a separate subvolume or separate partition for /usr and use noatime mount option if possible.(I remembered Marc MARLIEN gave some good example for this) I have relatime, nodiratime and also compress=lzo. 2. running defrag command to reduce fragmentation. I do this once per week on all directories via a cron job: find / -xdev -type d -print -exec btrfs filesystem defragment -c '{}' \; Reasons for doing these are because that directory like /usr/bin is regularly accessed which will also trigger Btrfs widely COWed which may cause serious fragmentation and that may cause some bad performances.. and compression may also not benefit it. And another factor btrfs in default all files are mixed together in a a fs B-tree, which means that all read/write lock will walk through same tree which may cause some lock contention problem. So use a separate subvoulme tree could improve lock thing a bit for that IMO. well, I have /, /var, and /usr on the same partition. So not so much additional data (/home, /usr/src, /opt have their own partition). BTW, next time if someone reported some problems,it will be nice to give your detailed information for example kernel version, how many subvolumes/snapshots, btrfs file system configurations, usage(running btrfs file show,btrfs file df e.g.) these informations are useful for others to reproduce and analysis... ok, for the sake of completness: no subvolumes/snapshots (but merged two partitons), nodiratime,relatime,compress=lzo,space_cache,autodefrag, kernel 3.17rc5 (+btrfsprogs 3.17.x) rotating media # btrfs file show / Label: 'root' uuid:
[PATCH 1/4] Properly cast to avoid compiler warnings, fixes FTBFS on alpha and ia64.
Bug-Debian: http://bugs.debian.org/539433 Bug-Debian: http://bugs.debian.org/583768 Authors: Luca Bruno lu...@debian.org Alexander Kurtz kurtz.a...@googlemail.com Daniel Baumann daniel.baum...@progress-technologies.net Signed-off-by: Dimitri John Ledkov x...@debian.org --- btrfs-convert.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/btrfs-convert.c b/btrfs-convert.c index 71b7bd6..3673050 100644 --- a/btrfs-convert.c +++ b/btrfs-convert.c @@ -2441,7 +2441,7 @@ static int do_rollback(const char *devname) ext2_root = btrfs_read_fs_root(root-fs_info, key); if (!ext2_root || IS_ERR(ext2_root)) { fprintf(stderr, unable to open subvol %llu\n, - key.objectid); + (unsigned long long) key.objectid); goto fail; } -- 2.1.0.rc1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] Fixing unaligned memory accesses.
From: Shawn Landen shawnland...@gmail.com Bug-Debian: http://bugs.debian.org/656955 Signed-off-by: Dimitri John Ledkov x...@debian.org --- ctree.h | 18 ++ volumes.c | 5 +++-- 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/ctree.h b/ctree.h index fa73c4a..92c6ad3 100644 --- a/ctree.h +++ b/ctree.h @@ -19,6 +19,8 @@ #ifndef __BTRFS__ #define __BTRFS__ +#include stdint.h + #if BTRFS_FLAT_INCLUDES #include list.h #include kerncompat.h @@ -1191,13 +1193,17 @@ struct btrfs_root { static inline u##bits btrfs_##name(const struct extent_buffer *eb) \ { \ const struct btrfs_header *h = (struct btrfs_header *)eb-data; \ - return le##bits##_to_cpu(h-member);\ + uint##bits##_t t; \ + memcpy(t, h-member, sizeof(h-member)); \ + return le##bits##_to_cpu(t);\ } \ static inline void btrfs_set_##name(struct extent_buffer *eb, \ u##bits val)\ { \ struct btrfs_header *h = (struct btrfs_header *)eb-data; \ - h-member = cpu_to_le##bits(val); \ + uint##bits##_t t; \ + t = cpu_to_le##bits(val); \ + memcpy(h-member, t, sizeof(h-member)); \ } #define BTRFS_SETGET_FUNCS(name, type, member, bits) \ @@ -1219,11 +1225,15 @@ static inline void btrfs_set_##name(struct extent_buffer *eb, \ #define BTRFS_SETGET_STACK_FUNCS(name, type, member, bits) \ static inline u##bits btrfs_##name(const type *s) \ { \ - return le##bits##_to_cpu(s-member);\ + uint##bits##_t t; \ + memcpy(t, s-member, sizeof(s-member)); \ + return le##bits##_to_cpu(t);\ } \ static inline void btrfs_set_##name(type *s, u##bits val) \ { \ - s-member = cpu_to_le##bits(val); \ + uint##bits##_t t; \ + t = cpu_to_le##bits(val); \ + memcpy(s-member, t, sizeof(s-member)); \ } BTRFS_SETGET_FUNCS(device_type, struct btrfs_dev_item, type, 64); diff --git a/volumes.c b/volumes.c index 388c94e..102380b 100644 --- a/volumes.c +++ b/volumes.c @@ -472,10 +472,11 @@ static int find_next_chunk(struct btrfs_root *root, u64 objectid, u64 *offset) if (found_key.objectid != objectid) *offset = 0; else { + u64 t; chunk = btrfs_item_ptr(path-nodes[0], path-slots[0], struct btrfs_chunk); - *offset = found_key.offset + - btrfs_chunk_length(path-nodes[0], chunk); + t = found_key.offset + btrfs_chunk_length(path-nodes[0], chunk); + memcpy(offset, t, sizeof(found_key.offset)); } } ret = 0; -- 2.1.0.rc1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] Fixes FTBFS with --no-add-needed.
From: Luk Claes l...@debian.org Bug-Debian: http://bugs.debian.org/554059 Signed-off-by: Dimitri John Ledkov x...@debian.org --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index e721e99..441e925 100644 --- a/Makefile +++ b/Makefile @@ -26,7 +26,7 @@ TESTS = fsck-tests.sh convert-tests.sh INSTALL = install prefix ?= /usr/local bindir = $(prefix)/bin -lib_LIBS = -luuid -lblkid -lm -lz -llzo2 -L. +lib_LIBS = -luuid -lblkid -lm -lz -llzo2 -lcom_err -L. libdir ?= $(prefix)/lib incdir = $(prefix)/include/btrfs LIBS = $(lib_LIBS) $(libs_static) -- 2.1.0.rc1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] Default to acting like fsck.
Inspect arguments, if we are not called as btrfs, then assume we are called to act like fsck. Bug-Debian: http://bugs.debian.org/712078 Signed-off-by: Dimitri John Ledkov x...@debian.org --- btrfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/btrfs.c b/btrfs.c index e83349c..e8a87ac 100644 --- a/btrfs.c +++ b/btrfs.c @@ -222,7 +222,7 @@ int main(int argc, char **argv) else bname = argv[0]; - if (!strcmp(bname, btrfsck)) { + if (strcmp(bname, btrfs) != 0) { argv[0] = check; } else { argc--; -- 2.1.0.rc1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
deleting a dead device
On a system running the Debian 3.14.15-2 kernel I added a new drive to a RAID-1 array. My aim was to add a device and remove one of the old devices. Sep 21 11:26:51 server kernel: [2070145.375221] BTRFS: lost page write due to I/O error on /dev/sdc3 Sep 21 11:26:51 server kernel: [2070145.375225] BTRFS: bdev /dev/sdc3 errs: wr 269, rd 0, flush 0, corrupt 0, gen 0 Sep 21 11:27:21 server kernel: [2070175.517691] BTRFS: lost page write due to I/O error on /dev/sdc3 Sep 21 11:27:21 server kernel: [2070175.517699] BTRFS: bdev /dev/sdc3 errs: wr 270, rd 0, flush 0, corrupt 0, gen 0 Sep 21 11:27:21 server kernel: [2070175.517712] BTRFS: lost page write due to I/O error on /dev/sdc3 Sep 21 11:27:21 server kernel: [2070175.517715] BTRFS: bdev /dev/sdc3 errs: wr 271, rd 0, flush 0, corrupt 0, gen 0 Sep 21 11:27:51 server kernel: [2070205.665947] BTRFS: lost page write due to I/O error on /dev/sdc3 Sep 21 11:27:51 server kernel: [2070205.665955] BTRFS: bdev /dev/sdc3 errs: wr 272, rd 0, flush 0, corrupt 0, gen 0 Sep 21 11:27:51 server kernel: [2070205.665967] BTRFS: lost page write due to I/O error on /dev/sdc3 Sep 21 11:27:51 server kernel: [2070205.665971] BTRFS: bdev /dev/sdc3 errs: wr 273, rd 0, flush 0, corrupt 0, gen 0 Anyway the new drive turned out to have some errors, writes failed and I've got a heap of errors such as the above. The errors started immediately after adding the drive and the system wasn't actively writing to the filesystem. So very few (if any) writes made it to the device. # btrfs device delete /dev/sdc3 / ERROR: error removing the device '/dev/sdc3' - Invalid argument It seems that I can't remove the device because removing requires writing. # btrfs device delete /dev/sdc3 / ERROR: error removing the device '/dev/sdc3' - No such file or directory # btrfs device stats / [/dev/sda3].write_io_errs 0 [/dev/sda3].read_io_errs0 [/dev/sda3].flush_io_errs 0 [/dev/sda3].corruption_errs 57 [/dev/sda3].generation_errs 0 [/dev/sdb3].write_io_errs 0 [/dev/sdb3].read_io_errs0 [/dev/sdb3].flush_io_errs 0 [/dev/sdb3].corruption_errs 0 [/dev/sdb3].generation_errs 0 [/dev/sdc3].write_io_errs 267 [/dev/sdc3].read_io_errs0 [/dev/sdc3].flush_io_errs 0 [/dev/sdc3].corruption_errs 0 [/dev/sdc3].generation_errs 0 The drive is attached by USB so I turned off the USB device and then got the above result. So it still seems impossible to remove the device even though it's physically not present. I've connected a new USB disk which is now /dev/sdd, so it seems that BTRFS is keeping the name /dev/sdc locked. Should there be a way to fix this without rebooting or anything? Also as an aside, while the stats about write errors are useful, in this case it would be really good if there was a count of successful writes, it would be useful to know if the successful write count was close to 0. My understanding of the BTRFS design is that there would be no performance penalty for adding counts of the number of successful reads and writes to the superblock. Could this be done? -- My Main Blog http://etbe.coker.com.au/ My Documents Bloghttp://doc.coker.com.au/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
device delete progress
We need to have a way to determine the progress of a device delete operation. Also for a balance of a RAID-1 that has more than 2 devices it would be good to know how much space is used on each device. Could btrfs fi df be extended to show information separately for each device? -- My Main Blog http://etbe.coker.com.au/ My Documents Bloghttp://doc.coker.com.au/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bug on Kernel Bugzilla I Found Today
Greeting to the Btrfs Developers and Community, Today , I found a new bug on the kernel bugzilla related to btrfs and am wondering if this bug has been fixed yet. Further due to my limited knowledge of the btrfs code base, I lack the knowledge to fix bugs like this yet.In addition, due to this I will paste below this message a link to the reported bug page on the kernel bugzilla . It would be very helpful for mycontinued learning about the btrfs file system if someone can CC me into the patches and/or discussion of this bug in order to help me learn more about btrfs and bugs related to btrfs, this would be greatly appreciated and extremely helpful for my learning. Hopefully eventually I can help fix bugs like this too and aid eventually in future btrfs development. Cheers and Thanks for the Help with Xfs Tests, Nick Link to the Bug: https://bugzilla.kernel.org/show_bug.cgi?id=84631 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: fix ABBA deadlock in btrfs_dev_replace_finishing()
btrfs_map_bio() first calls btrfs_bio_counter_inc_blocked() which checks fs state and increase bio_counter, then calls __btrfs_map_block() which will take the dev_replace lock. On the other hand, btrfs_dev_replace_finishing() takes dev_replace lock first then set fs state to BTRFS_FS_STATE_DEV_REPLACING and waits for bio_counter to be zero. The deadlock can be reproduced easily by running replace and fsstress at the same time, e.g. mkfs -t btrfs -f /dev/sdb1 /dev/sdb2 mount /dev/sdb1 /mnt/btrfs fsstress -d /mnt/btrfs -n 100 -p 2 -l 0 # fsstress from ltp supports -l option i=0 while btrfs replace start -Bf /dev/sdb2 /dev/sdb3 /mnt/btrfs \ btrfs replace start -Bf /dev/sdb3 /dev/sdb2 /mnt/btrfs; do echo === loop $i === let i=$i+1 done This was introduced by c404e0d Btrfs: fix use-after-free in the finishing procedure of the device replace Signed-off-by: Eryu Guan guane...@gmail.com --- Tested by the reproducer and xfstests, no new failure found. But I found kmem_cache leak if I remove btrfs module after my new test case[1], which does fsstress replace subvolume create/mount/umount/delete at the same time. BUG btrfs_extent_state (Tainted: GB ): Objects remaining in btrfs_extent_state on kmem_cache_close() .. kmem_cache_destroy btrfs_extent_state: Slab cache still has objects CPU: 3 PID: 9503 Comm: modprobe Tainted: GB 3.17.0-rc5+ #12 Hardware name: Hewlett-Packard ProLiant DL388eGen8, BIOS P73 06/01/2012 8dd09c52 880411c37eb0 81642f7a 8800b9a19300 880411c37ed0 8118ce89 a05dcd20 880411c37ee0 a056a80f 880411c37ef0 Call Trace: [81642f7a] dump_stack+0x45/0x56 [8118ce89] kmem_cache_destroy+0xf9/0x100 [a056a80f] extent_io_exit+0x1f/0x50 [btrfs] [a05c3ae3] exit_btrfs_fs+0x2c/0x549 [btrfs] [810efda2] SyS_delete_module+0x162/0x200 [81013bb7] ? do_notify_resume+0x97/0xb0 [8164af69] system_call_fastpath+0x16/0x1b The test would hang before the fix. I'm not sure if it's related to the fix (seems not), please help review. Thanks, Eryu Guan [1] http://www.spinics.net/lists/linux-btrfs/msg37625.html fs/btrfs/dev-replace.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index eea26e1..5dfd292 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -510,6 +510,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, /* keep away write_all_supers() during the finishing procedure */ mutex_lock(root-fs_info-chunk_mutex); mutex_lock(root-fs_info-fs_devices-device_list_mutex); + btrfs_rm_dev_replace_blocked(fs_info); btrfs_dev_replace_lock(dev_replace); dev_replace-replace_state = scrub_ret ? BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED @@ -567,12 +568,8 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, btrfs_kobj_rm_device(fs_info, src_device); btrfs_kobj_add_device(fs_info, tgt_device); - btrfs_rm_dev_replace_blocked(fs_info); - btrfs_rm_dev_replace_srcdev(fs_info, src_device); - btrfs_rm_dev_replace_unblocked(fs_info); - /* * this is again a consistent state where no dev_replace procedure * is running, the target device is part of the filesystem, the @@ -581,6 +578,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, * belong to this filesystem. */ btrfs_dev_replace_unlock(dev_replace); + btrfs_rm_dev_replace_unblocked(fs_info); mutex_unlock(root-fs_info-fs_devices-device_list_mutex); mutex_unlock(root-fs_info-chunk_mutex); -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html