Re: btrfs check --repair is clean, but mount fails
On Fri, Feb 26, 2016 at 06:45:34PM -0800, Liu Bo wrote: > On Fri, Feb 26, 2016 at 06:39:38PM -0800, Marc MERLIN wrote: > > btrfs-tools 4.4-1 > > gargamel:~# uname -r > > 4.4.2-amd64-i915-volpreempt-20160214bc2 > > > > 2 drive array stopped working after a crash/reboot. Check --repair finds > > nothing wrong with it: > > > > gargamel:~# btrfs check --repair /dev/mapper/raid0d1 > > enabling repair mode > > Checking filesystem on /dev/mapper/raid0d1 > > UUID: 01334b81-c0db-4e80-92e4-cac4da867651 > > checking extents > > Fixed 0 roots. > > checking free space cache > > cache and super generation don't match, space cache will be invalidated > > checking fs roots > > checking csums > > checking root refs > > found 1201345524312 bytes used err is 0 > > total csum bytes: 1165124220 > > total tree bytes: 8258322432 > > total fs tree bytes: 5574197248 > > total extent tree bytes: 1020428288 > > btree space waste bytes: 1902023247 > > file data blocks allocated: 1193398628352 > > referenced 1209324777472 > > gargamel:~# mount /var/local/space > > mount: wrong fs type, bad option, bad superblock on /dev/mapper/raid0d1, > >missing codepage or helper program, or other error > >In some cases useful info is found in syslog - try > >dmesg | tail or so > > [ 8200.511021] BTRFS info (device dm-6): disk space caching is enabled > > [ 8200.533030] BTRFS: failed to read the system array on dm-6 > > [ 8200.582097] BTRFS: open_ctree failed > > Does 'btrfs dev scan' help? Oh my, it does... gargamel:~# btrfs dev scan Scanning for Btrfs filesystems gargamel:~# mount /var/local/space [14477.028083] BTRFS: device label btrfs_space devid 2 transid 776784 /dev/mapper/raid0d2 [14500.262307] BTRFS info (device dm-7): disk space caching is enabled [14504.042485] BTRFS: checking UUID tree Err, I'm very perplexed now. I already have a scan in my boot process after device decrypts. Somehow it saw one of my 2 devices, but not the other one? I'm looking at my boot: [ 112.063677] BTRFS: device label btrfs_space devid 1 transid 776782 /dev/mapper/raid0d1 [ 112.090192] BTRFS info (device dm-6): disk space caching is enabled [ 112.111740] BTRFS: failed to read the system array on dm-6 [ 112.160047] BTRFS: open_ctree failed [ 112.269710] BTRFS info (device dm-6): disk space caching is enabled [ 112.291430] BTRFS: failed to read the system array on dm-6 [ 112.320104] BTRFS: open_ctree failed So dm-6 is: raid0d1 -> ../dm-6 So, raid0d1 had an issue, btrfs check didn't really report any, for some unknown reason this caused raid0d2 not to be scanned, and in turn this caused mounting that filesystem to fail? Or did btrfs check actually fix something that I missed? Any why would scan have missed raid0d2 the first time around? Is it complaining that it can't read btrfs structures from raid0d1 because raid0d2 wasn't known yet? I'm not too sure what open_ctree means in that context. Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check --repair is clean, but mount fails
On Fri, Feb 26, 2016 at 06:39:38PM -0800, Marc MERLIN wrote: > btrfs-tools 4.4-1 > gargamel:~# uname -r > 4.4.2-amd64-i915-volpreempt-20160214bc2 > > 2 drive array stopped working after a crash/reboot. Check --repair finds > nothing wrong with it: > > gargamel:~# btrfs check --repair /dev/mapper/raid0d1 > enabling repair mode > Checking filesystem on /dev/mapper/raid0d1 > UUID: 01334b81-c0db-4e80-92e4-cac4da867651 > checking extents > Fixed 0 roots. > checking free space cache > cache and super generation don't match, space cache will be invalidated > checking fs roots > checking csums > checking root refs > found 1201345524312 bytes used err is 0 > total csum bytes: 1165124220 > total tree bytes: 8258322432 > total fs tree bytes: 5574197248 > total extent tree bytes: 1020428288 > btree space waste bytes: 1902023247 > file data blocks allocated: 1193398628352 > referenced 1209324777472 > gargamel:~# mount /var/local/space > mount: wrong fs type, bad option, bad superblock on /dev/mapper/raid0d1, >missing codepage or helper program, or other error >In some cases useful info is found in syslog - try >dmesg | tail or so > [ 8200.511021] BTRFS info (device dm-6): disk space caching is enabled > [ 8200.533030] BTRFS: failed to read the system array on dm-6 > [ 8200.582097] BTRFS: open_ctree failed Does 'btrfs dev scan' help? Thanks, -liubo > > > gargamel:~# btrfs check --repair /dev/mapper/raid0d2 > enabling repair mode > Checking filesystem on /dev/mapper/raid0d2 > UUID: 01334b81-c0db-4e80-92e4-cac4da867651 > checking extents > Fixed 0 roots. > checking free space cache > cache and super generation don't match, space cache will be invalidated > checking fs roots > checking csums > checking root refs > found 1201345540696 bytes used err is 0 > total csum bytes: 1165124220 > total tree bytes: 8258338816 > total fs tree bytes: 5574197248 > total extent tree bytes: 1020444672 > btree space waste bytes: 1902039421 > file data blocks allocated: 1193398628352 > referenced 1209324777472 > gargamel:~# mount /var/local/space > mount: wrong fs type, bad option, bad superblock on /dev/mapper/raid0d1, > [13373.606737] BTRFS info (device dm-6): disk space caching is enabled > [13373.628682] BTRFS: failed to read the system array on dm-6 > [13373.672607] BTRFS: open_ctree failed > > -- > "A mouse is a device used to point at the xterm you want to type in" - A.S.R. > Microsoft is to operating systems > what McDonalds is to gourmet > cooking > Home page: http://marc.merlins.org/ > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs check --repair is clean, but mount fails
btrfs-tools 4.4-1 gargamel:~# uname -r 4.4.2-amd64-i915-volpreempt-20160214bc2 2 drive array stopped working after a crash/reboot. Check --repair finds nothing wrong with it: gargamel:~# btrfs check --repair /dev/mapper/raid0d1 enabling repair mode Checking filesystem on /dev/mapper/raid0d1 UUID: 01334b81-c0db-4e80-92e4-cac4da867651 checking extents Fixed 0 roots. checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots checking csums checking root refs found 1201345524312 bytes used err is 0 total csum bytes: 1165124220 total tree bytes: 8258322432 total fs tree bytes: 5574197248 total extent tree bytes: 1020428288 btree space waste bytes: 1902023247 file data blocks allocated: 1193398628352 referenced 1209324777472 gargamel:~# mount /var/local/space mount: wrong fs type, bad option, bad superblock on /dev/mapper/raid0d1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so [ 8200.511021] BTRFS info (device dm-6): disk space caching is enabled [ 8200.533030] BTRFS: failed to read the system array on dm-6 [ 8200.582097] BTRFS: open_ctree failed gargamel:~# btrfs check --repair /dev/mapper/raid0d2 enabling repair mode Checking filesystem on /dev/mapper/raid0d2 UUID: 01334b81-c0db-4e80-92e4-cac4da867651 checking extents Fixed 0 roots. checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots checking csums checking root refs found 1201345540696 bytes used err is 0 total csum bytes: 1165124220 total tree bytes: 8258338816 total fs tree bytes: 5574197248 total extent tree bytes: 1020444672 btree space waste bytes: 1902039421 file data blocks allocated: 1193398628352 referenced 1209324777472 gargamel:~# mount /var/local/space mount: wrong fs type, bad option, bad superblock on /dev/mapper/raid0d1, [13373.606737] BTRFS info (device dm-6): disk space caching is enabled [13373.628682] BTRFS: failed to read the system array on dm-6 [13373.672607] BTRFS: open_ctree failed -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs/nfsd kernel panic
Hi Daryl, On Thu, Feb 25, 2016 at 10:25:24PM -0600, Daryl Styrk wrote: > I attempted to deploy a nfs/brtrfs server Monday. However, it KP’s > every 1-3 hours with this: This is very similar to an old bug about btrfs_next_leaf I fixed, but I'm pretty sure that bug has been fixed. So this stack info shows a key disorder or u64 overflow. In order to get more information, can you give this debug patch a shot (has passed build on my box)? Thanks, -liubo diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 769e0ff..c05da72 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -3150,6 +3150,7 @@ static void fixup_low_keys(struct btrfs_fs_info *fs_info, } } + /* * update item key. * @@ -3182,6 +3183,63 @@ void btrfs_set_item_key_safe(struct btrfs_fs_info *fs_info, fixup_low_keys(fs_info, path, _key, 1); } +noinline void btrfs_set_item_key_debug(struct btrfs_fs_info *fs_info, +struct btrfs_path *path, +struct btrfs_key *new_key) +{ + struct btrfs_disk_key disk_key; + struct btrfs_key key; + struct extent_buffer *eb; + u64 extent_end; + u64 extent_offset = 0; + u64 num_bytes; + u64 disk_bytenr; + struct btrfs_file_extent_item *fi; + int extent_type; + int slot; + + eb = path->nodes[0]; + slot = path->slots[0]; + + btrfs_item_key_to_cpu(eb, , path->slots[0]); + + fi = btrfs_item_ptr(eb, path->slots[0], + struct btrfs_file_extent_item); + extent_type = btrfs_file_extent_type(eb, fi); + + BUG_ON(extent_type == BTRFS_FILE_EXTENT_INLINE); + + disk_bytenr = btrfs_file_extent_disk_bytenr(eb, fi); + num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi); + extent_offset = btrfs_file_extent_offset(eb, fi); + extent_end = key.offset + btrfs_file_extent_num_bytes(eb, fi); + + if (slot > 0) { + btrfs_item_key(eb, _key, slot - 1); + if (comp_keys(_key, new_key) >= 0) { + struct btrfs_key tmp; + btrfs_item_key_to_cpu(eb, , slot - 1); + pr_info("(slot - 1) is bigger: key.offset=%llu new_key.offset=%llu extent_end=%llu num_bytes=%llu disk_bytenr=%llu extent_offset=%llu\n", + tmp.offset, new_key->offset, extent_end, num_bytes, disk_bytenr, extent_offset); + BUG_ON(1); + } + } + if (slot < btrfs_header_nritems(eb) - 1) { + btrfs_item_key(eb, _key, slot + 1); + + if (comp_keys(_key, new_key) <= 0) { + struct btrfs_key tmp; + btrfs_item_key_to_cpu(eb, , slot + 1); + pr_info("(slot + 1) is smaller: key.offset=%llu new_key.offset=%llu extent_end=%llu num_bytes=%llu disk_bytenr=%llu extent_offset=%llu\n", + tmp.offset, new_key->offset, extent_end, num_bytes, disk_bytenr, extent_offset); + BUG_ON(1); + } + } + + btrfs_set_item_key_safe(fs_info, path, new_key); +} + + /* * try to push data from one node into the next node left in the * tree. diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index bfe4a33..84e68c5 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3660,6 +3660,9 @@ int btrfs_previous_extent_item(struct btrfs_root *root, void btrfs_set_item_key_safe(struct btrfs_fs_info *fs_info, struct btrfs_path *path, struct btrfs_key *new_key); +void btrfs_set_item_key_debug(struct btrfs_fs_info *fs_info, +struct btrfs_path *path, +struct btrfs_key *new_key); struct extent_buffer *btrfs_root_node(struct btrfs_root *root); struct extent_buffer *btrfs_lock_root_node(struct btrfs_root *root); int btrfs_find_next_key(struct btrfs_root *root, struct btrfs_path *path, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 098bb8f..6f2c00b 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -678,6 +678,7 @@ next: free_extent_map(split2); } + /* * this is very complex, but the basic idea is to drop all extents * in the range start - end. hint_block is filled in with a block number @@ -871,7 +872,7 @@ next_slot: memcpy(_key, , sizeof(new_key)); new_key.offset = end; - btrfs_set_item_key_safe(root->fs_info, path, _key); + btrfs_set_item_key_debug(root->fs_info, path, _key); extent_offset += end - key.offset; btrfs_set_file_extent_offset(leaf, fi, extent_offset); > > > > 4.3.0 kernel: > > > > 6569688.065748] CPU: 18 PID: 62680 Comm: nfsd Not tainted 4.3.0 #1 > > [6569688.067559] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS > 1.3.6 06/03/2015 > >
Re: loop subsystem corrupted after mounting multiple btrfs sub-volumes
On Fri, 26 Feb 2016 22:00:44 +0100, Stanislav Brabec said: > Well, it seems to be safe, even if the loop device was not allocated by > mount(8) itself, as > ioctl(fd, LOOP_CLR_FD) > never returns EBUSY: The fact you don't get an EBUSY doesn't mean it's actually safe pgpySWEpuLcwi.pgp Description: PGP signature
Re: loop subsystem corrupted after mounting multiple btrfs sub-volumes
On Fri, Feb 26, 2016 at 10:36:50PM +0100, Stanislav Brabec wrote: > It should definitely report error whenever trying -oloop on top of > anything else than a file. Or at least a warning. > > Well, even losetup should report a warning. Keep in mind that with crypto in the game it just might be useful to have loop-over-loop - it might be _not_ a no-op (hell, you might have two layers of encryption - not the smartest thing to do, but if that's what got dumped on your lap, you deal with what you've got). So such warnings shouldn't be hard errors. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: loop subsystem corrupted after mounting multiple btrfs sub-volumes
On Feb 26, 2016 at 22:03 Al Viro wrote: And I'm not sure how to deal with -o loop in a sane way, TBH - automagical losetup is bloody hard to get right. See another reply in this thread for the idea: Fri, 26 Feb 2016 22:00:44 +0100 Keep in mind that loop-over-loop is also possible... Indeed! Let's remember that mount(8) should never do it. # losetup /dev/loop0 /dev/sda2 # losetup /dev/loop1 /dev/loop0 # losetup -l NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE /dev/loop0 0 0 0 0 /dev/sda2 /dev/loop1 0 0 0 0 /dev/loop0 But it actually does, if the command line is "overlooped": oct:~ # mount -oloop /dev/loop1 /mnt as it does exactly that: oct:~ # losetup -l NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE /dev/loop0 0 0 0 0 /dev/sda2 /dev/loop1 0 0 0 0 /dev/loop0 /dev/loop2 0 0 1 0 /dev/loop1 It should definitely report error whenever trying -oloop on top of anything else than a file. Or at least a warning. Well, even losetup should report a warning. -- Best Regards / S pozdravem, Stanislav Brabec software developer - SUSE LINUX, s. r. o. e-mail: sbra...@suse.com Lihovarská 1060/12tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republichttp://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upgrading kernel 3.13 to 3.16
Vytautas D posted on Fri, 26 Feb 2016 10:50:12 + as excerpted: > Hi all, > > Are there any known issues upgrading btrfs running ubuntu kernel 3.13 to > 3.16 ? System was once converted from ext4 using btrfs-convert ( > btrfs-progs 3.17 ). > > The commit that worries me is following: > * Btrfs: incompatible format change to remove hole extents (+373/-56) > ( > http://linux-btrfs.vger.kernel.narkive.com/syNRZbHS/patch-btrfs- incompatible-format-change-to-remove-hole-extents-v3#post1 > ) > > would this block me from reverting system with a snapshot back to kernel > 3.13 ? > After upgrade would the system continue writing more metadata ? As Austin H says about that commit, but in the broader sense as well, btrfs policy since inclusion in the mainline kernel (with one exception in the first kernel series after that, which Linus made *very* clear he didn't appreciate as he was actually running btrfs on something and it made switching kernels back and forth across that exception, for testing, nearly impossible), has been that on existing btrfs, new features that affect the on-device format must be specifically enabled. IOW, no worries about incompatible upgrades on existing filesystems -- if such bugs happen at all they're treated exactly as that, regression bugs, and are fixed at high priority, with enough people running btrfs now that such bugs are likely to be found rather fast and a *BIG* stink raised about them not being found and fixed before they even reached mainline. /New/ btrfs, or fresh conversions from ext* using the converter when the creation/conversion is done from a newer kernel with intent to use an older kernel, are different. In that case, the creation/conversion options will often enable new features and old kernels won't be able to mount the filesystem. However, there are options available to create older on-device formats as well, when the filesystems are intended to be mounted on older kernels. All that said, you are **WAY** behind list-recommended kernels, even with kernel 3.16. Btrfs is considered "stabilizing, but not yet entirely stable and mature." As such, the strong on-list recommendation is to choose either the current or mainline LTS kernel series, and to run no further back than next to last kernel series in either. With the current 4.4 kernel also being LTS, that would be 4.4 or 4.3 if you choose the current kernels, and the LTS 4.4 or 4.1 series if you're doing LTS. With 4.4 reasonably new, it's understood if you're still on the previous LTS before that, 3.18, but if you're on 3.18, you'd be strongly encouraged to upgrade to at least 4.1 and preferably 4.4 ASAP. Older kernels, at least back to 3.12 where the "experimental" label was officially pealed off and btrfs (semi-)officially reached its current status of "stabilizing, but not entirely stable or mature", are "best effort" support. We do still try to help as best we can, but the first recommendation you'll get upon posting to the list is "please upgrade to a kernel more in line with btrfs' 'stabilizing but not fully stable' status." Yes there are reasons people may wish to run really old kernels. However, such reasons really aren't compatible with running a still stablizing filesystem like btrfs in the first place, and so many bugs have been fixed and development focus has simply moved on since then, that supporting btrfs on such old kernels really isn't practical, as for us it really is ancient, and buggy, history. So the recommendation is, if you /do/ have a reason to run such old kernels, generally, a wish for stability and lack of change, then you really should consider running something other than btrfs, because the fact of the matter is, it's still changing fast, and simply doesn't yet reach the level of stability that running such old kernels indicates you want/need. So choose one or the other, btrfs on reasonably current kernels if you want it, or stability on older kernels, without btrfs, if you want/need that. All /that/ said, yes, some distros do claim support on older kernels, and indeed, they may well be backporting bug patches as appropriate to properly support that claim. But that's their claim and their support. On the list we're focused on newer kernels and features, and while we try not to break older and doing so is a bug we'll patch if we find it, as a rule we don't track those distros and what patches they may or may not have backported, and thus have no way of properly supporting them. So if you're relying on distro support for btrfs on such old kernels, you really should be looking to them for that support, not to this list, as we'll still do our best effort, but the fact is, it's not going to be to the level of support we'd be able to give if you were running kernels within our recommended kernel support time frame, the last two of either current or LTS kernel series, and often, the best we'll be able to do with
Re: loop subsystem corrupted after mounting multiple btrfs sub-volumes
On Fri, Feb 26, 2016 at 09:37:22PM +0100, Stanislav Brabec wrote: > Do I understand, that you are saying: > > Yes, mounting multiple loop devices associated with one file is a > legitimate use, but mount(8) should never do it, because it has other > ugly side effects? It's on the same level as "hey, let's have an nbd daemon run in qemu guest, exporting a host file over nbd, import it to host as /dev/nbd69, set a loopback device over the underlying file as /dev/loop42 and ask e.g. xfs to recognize that it's dealing with the same underlying array of bytes in both cases - wouldn't it be neat if it could do that?" There's no magic. Really. Unexpected sharing of backing store between apparently unrelated devices can cause trouble. And I'm not sure how to deal with -o loop in a sane way, TBH - automagical losetup is bloody hard to get right. Keep in mind that loop-over-loop is also possible... -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: loop subsystem corrupted after mounting multiple btrfs sub-volumes
On Feb 26, 2016 at 21:30 Al Viro wrote: IMO on-demand losetup a-la -o loop is simply a bad idea... So the correct behavior of -o loop should: Check, whether another mount command already did losetup. If not, allocate new loop device. If yes, reuse existing loop device. Well, it seems to be safe, even if the loop device was not allocated by mount(8) itself, as ioctl(fd, LOOP_CLR_FD) never returns EBUSY: # losetup /dev/loop2 /ext4.img # mount /dev/loop2 /mnt # strace losetup -d /dev/loop2 2>&1 | tail -n7 | head -n3 open("/dev/loop2", O_RDONLY|O_CLOEXEC) = 3 ioctl(3, LOOP_CLR_FD) = 0 close(3)= 0 If the recycling "alien" loop devices will not be considered as a good idea, then (if possible): If the loop device was allocated by mount(8) itself, recycle it. If the loop device was not allocated by mount(8) itself, return error. -- Best Regards / S pozdravem, Stanislav Brabec software developer - SUSE LINUX, s. r. o. e-mail: sbra...@suse.com Lihovarská 1060/12tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republichttp://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: loop subsystem corrupted after mounting multiple btrfs sub-volumes
On Feb 26, 2016 at 21:05 Austin S. Hemmelgarn wrote: > It's kind of interesting, but I can't reproduce _any_ of this behavior > with either ext4 or BTRFS when I manually set up the loop devices and > point mount(8) at those instead of using -o loop on a file. That really > seems to indicate that this is caused by something mount(8) is doing > when it's calling losetup. Behavior of "-oloop" is more similar to "losetup -f /fs.img"? than to "losetup /dev/loop0 /fs.img". Anyway, I can reproduce without -oloop: # losetup /dev/loop0 /btrfs.img # mount /dev/loop0 /mnt/1 # grep /mnt /proc/self/mountinfo 107 59 0:59 /d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 /mnt/1 rw,relatime shared:45 - btrfs /dev/loop0 rw,space_cache,subvolid=257,subvol=/d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 # losetup /dev/loop1 /btrfs.img # mount -osubvol=/ /dev/loop1 /mnt/2 # grep /mnt /proc/self/mountinfo 107 59 0:59 /d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 /mnt/1 rw,relatime shared:45 - btrfs /dev/loop1 rw,space_cache,subvolid=257,subvol=/d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 108 59 0:59 / /mnt/2 rw,relatime shared:48 - btrfs /dev/loop1 rw,space_cache,subvolid=5,subvol=/ # uname -a Linux oct 4.4.1-1-default #1 SMP PREEMPT Mon Feb 15 11:03:27 UTC 2016 (6398c2d) x86_64 x86_64 x86_64 GNU/Linux (Note that the system was freshly rebooted. After other experiments, the second line of mountinfo can be missing completely.) >> 2) mount(2) called after the reproducer returns OK but does nothing. >> > OK, we've determined that mount(2) is misbehaving. That doesn't change > the fact that mount(8) is triggering this, and therefore should itself > be corrected. > Assume that mount(2) gets fixed so it doesn't lose it's > mind and /proc/self/mountinfo doesn't change. There will still be > issues resulting from mount(8)'s behavior: > 1. BTRFS will lose it's mind and corrupt data when using a multi-device > filesystem (due to the problems with duplicate FS UUID's). > 2. XFS might have similar issues to 1 when using metadata checksumming, > although it's more likely that it won't allow the second mount to succeed. > 3. Most other filesystems will likely end up corrupting data. Do I understand, that you are saying: Yes, mounting multiple loop devices associated with one file is a legitimate use, but mount(8) should never do it, because it has other ugly side effects? OK, it looks like a next task for mount(8) to fix. -- Best Regards / S pozdravem, Stanislav Brabec software developer - SUSE LINUX, s. r. o. e-mail: sbra...@suse.com Lihovarská 1060/12tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republichttp://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: loop subsystem corrupted after mounting multiple btrfs sub-volumes
On 2016-02-26 15:30, Al Viro wrote: On Fri, Feb 26, 2016 at 03:05:27PM -0500, Austin S. Hemmelgarn wrote: Where is /mnt/2? It's kind of interesting, but I can't reproduce _any_ of this behavior with either ext4 or BTRFS when I manually set up the loop devices and point mount(8) at those instead of using -o loop on a file. That really seems to indicate that this is caused by something mount(8) is doing when it's calling losetup. I'm running a mostly unmodified version of 4.4.2 (the only modification that would come even remotely close to this is that I changed the default mount options for everything from relatime to noatime), and util-linux 2.27.1 from Gentoo. Sigh... sys_mount() (mount_bdev(), actually) has no way to tell if two loop devices refer to the same underlying object. As far as it's concerned, you are asking to mount a completely unrelated block device. Which just happens to see the data (living in separate pagecache, even) modified behind its back (with some delay) after it gets written to another device. Filesystem drivers generally don't like when something is screwing the underlying data, to put it mildly... When you ask to mount the _same_ device, mount_bdev(), as well as btrfs counterpart, makes sure that you get a reference to the same struct super_block, which avoids all coherency problems - all mounted instances refer to the same in-core objects (dentries, inodes, page cache, etc.). They get separate struct vfsmount instances, but that only matters for mountpoint crossing. As soon as you've set the second /dev/loop alias for the same underlying file, you are asking for all kinds of trouble. If you use the same one consistently, you are OK. BTW, even losetup /dev/loop0 /dev/sda1 mount -t ext2 /dev/sda1 /mnt/1 mount -t ext2 /dev/loop0 /mnt/2 is enough for trouble - you get (as far as ext2 knows) unrelated devices screwing each other, with no good way to predict that. And you need to check propagation through more than one layer - loop over loop over block is also possible. IMO on-demand losetup a-la -o loop is simply a bad idea... I agree wholeheartedly and wasn't disputing any of this, I meant I'm not seeing any of the odd mount(2) or /proc/self/mountinfo behavior that Stanislav started the thread about. It was entirely trivial to get the filesystem images I used into a state where they couldn't be mounted again afterwards. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: loop subsystem corrupted after mounting multiple btrfs sub-volumes
On Fri, Feb 26, 2016 at 03:05:27PM -0500, Austin S. Hemmelgarn wrote: > >Where is /mnt/2? > It's kind of interesting, but I can't reproduce _any_ of this > behavior with either ext4 or BTRFS when I manually set up the loop > devices and point mount(8) at those instead of using -o loop on a > file. That really seems to indicate that this is caused by something > mount(8) is doing when it's calling losetup. I'm running a mostly > unmodified version of 4.4.2 (the only modification that would come > even remotely close to this is that I changed the default mount > options for everything from relatime to noatime), and util-linux > 2.27.1 from Gentoo. Sigh... sys_mount() (mount_bdev(), actually) has no way to tell if two loop devices refer to the same underlying object. As far as it's concerned, you are asking to mount a completely unrelated block device. Which just happens to see the data (living in separate pagecache, even) modified behind its back (with some delay) after it gets written to another device. Filesystem drivers generally don't like when something is screwing the underlying data, to put it mildly... When you ask to mount the _same_ device, mount_bdev(), as well as btrfs counterpart, makes sure that you get a reference to the same struct super_block, which avoids all coherency problems - all mounted instances refer to the same in-core objects (dentries, inodes, page cache, etc.). They get separate struct vfsmount instances, but that only matters for mountpoint crossing. As soon as you've set the second /dev/loop alias for the same underlying file, you are asking for all kinds of trouble. If you use the same one consistently, you are OK. BTW, even losetup /dev/loop0 /dev/sda1 mount -t ext2 /dev/sda1 /mnt/1 mount -t ext2 /dev/loop0 /mnt/2 is enough for trouble - you get (as far as ext2 knows) unrelated devices screwing each other, with no good way to predict that. And you need to check propagation through more than one layer - loop over loop over block is also possible. IMO on-demand losetup a-la -o loop is simply a bad idea... -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: loop subsystem corrupted after mounting multiple btrfs sub-volumes
On 2016-02-26 14:12, Stanislav Brabec wrote: Al Viro wrote: On Fri, Feb 26, 2016 at 11:39:11AM -0500, Austin S. Hemmelgarn wrote: That's just it though, from what I can tell based on what I've seen and what you said above, mount(8) isn't doing things correctly in this case. If we were to do this with something like XFS or ext4, the filesystem would probably end up completely messed up just because of the log replay code (assuming they actually mount the second time, I'm not sure what XFS would do in this case, but I believe that ext4 would allow the mount as long as the mmp feature is off). It would make sense that this behavior wouldn't have been noticed before (and probably wouldn't have mattered even if it had been), because most filesystems don't allow multiple mounts even if they're all RO, and most people don't try to mount other filesystems multiple times as a result of this. Well, in such case kernel should return an error when mount(8) is trying to use multiple mount devices for a single file for mount(2). As I said in my other e-mail, there are perfectly legitimate reasons to be doing this. And I should also point out that anybody who has one of those reasons for doing this should be setting up the loop devices themselves, so mount(8) behaving this way is still wrong. But kernel does not return error, it starts to do strange things. They most certainly do. The problem is mount(8) treatment of -o loop - you can mount e.g. ext4 many times, it'll just get you extra references to the same struct super_block from those new vfsmounts. IOW, that'll behave the same way as if you were doing mount --bind on subsequent ones. I just tested the same with ext4. The rewriting of mountinfo happens only with btrfs. But after that mount(2) stops to work. See the last mount(2). It returns 0, but nothing is mounted! Kernel mount(2) refuses to work! # mount -oloop /ext4.img /mnt/1 # cat /proc/self/mountinfo | grep /mnt 238 59 7:0 / /mnt/1 rw,relatime shared:153 - ext4 /dev/loop0 rw,data=ordered # mount -oloop /ext4.img /mnt/2 # cat /proc/self/mountinfo | grep /mnt 238 59 7:0 / /mnt/1 rw,relatime shared:153 - ext4 /dev/loop0 rw,data=ordered 243 59 7:1 / /mnt/2 rw,relatime shared:156 - ext4 /dev/loop1 rw,data=ordered # umount /mnt/* # mount -oloop /btrfs.img /mnt/1 # cat /proc/self/mountinfo | grep /mnt 238 59 0:94 /d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 /mnt/1 rw,relatime shared:153 - btrfs /dev/loop0 rw,space_cache,subvolid=257,subvol=/d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 # mount -oloop,subvol=/ /btrfs.img /mnt/2 # cat /proc/self/mountinfo | grep /mnt 238 59 0:94 /d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 /mnt/1 rw,relatime shared:153 - btrfs /dev/loop1 rw,space_cache,subvolid=257,subvol=/d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 I is really strange! Mount was called, but nothing appeared in the mountinfo. Just a rewritten /dev/loop0 -> /dev/loop1 in the existing mount. To be sure, that it is mount(2) issue and not mount(8), let's try it again with strace. # strace mount -oloop,subvol=/ /btrfs.img /mnt/2 2>&1 | tail -n 7 mount("/dev/loop1", "/mnt/2", "btrfs", MS_MGC_VAL, "subvol=/") = 0 access("/mnt/2", W_OK) = 0 close(4)= 0 close(1)= 0 close(2)= 0 exit_group(0) = ? +++ exited with 0 +++ # cat /proc/self/mountinfo | grep /mnt 238 59 0:94 /d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 /mnt/1 rw,relatime shared:153 - btrfs /dev/loop1 rw,space_cache,subvolid=257,subvol=/d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 Where is /mnt/2? It's kind of interesting, but I can't reproduce _any_ of this behavior with either ext4 or BTRFS when I manually set up the loop devices and point mount(8) at those instead of using -o loop on a file. That really seems to indicate that this is caused by something mount(8) is doing when it's calling losetup. I'm running a mostly unmodified version of 4.4.2 (the only modification that would come even remotely close to this is that I changed the default mount options for everything from relatime to noatime), and util-linux 2.27.1 from Gentoo. And as far as kernel is concerned, /dev/loop* isn't special in any respects; if you do explicit losetup and mount the resulting /dev/loop as many times as you wish, it'll work just fine. mount(8) just calls losetup internally for every -o loop. Once per "loop" option. Nobody probably tried to loop mount the same ext4 volume more times, so no problems appeared. But for btrfs, one would. And mounting two btrfs subvolumes with two "-oloop" calls losetup twice for the same file. And from the kernel POV it's not different from what it sees with -o loop; setting the loop device up is done first by separate syscall, then mount(2) for that device is issued. Yes, it is different. - You have one file. - You have two loop devices pointing to the same file. - btrfs subvolumes are internally handled similarly like bind mounts. It means, that all
Re: loop subsystem corrupted after mounting multiple btrfs sub-volumes
On Feb 26, 2016 at 19:22 Austin S. Hemmelgarn wrote: The first commit is just test cases, and the others are specific issues that only affected BTRFS which have nothing to do with this thread at all other than involving mount(8) and BTRFS. Yes, it is a bit off topic. It just demonstrates, how complex mount(8) for btrfs is. The test case is all-fail in util-linux 2.27.1. mount(8) behavior has the potential to cause either data corruption or similar behavior in the future (I would expect that XFS with metadata checksumming enabled would cause a similar interaction, although they probably would handle it better). Especially "mount -a" has a hard time to recognize what was already mounted and what still needs to be mounted. The only information mount(8) has is the one from mountinfo. Interpreting of the mountinfo contents to reconstruct possible mount options used is a task far from being trivial. Some of them are even impossible to discriminate: Suppose you have: mount -osubvol=/ /dev/sda2 /mnt/1 mount -osubvol=/sbv /dev/sda2 /mnt/2 Case 1: mount -obind /mnt/1/sbv/bnd /mnt/3 Case 2: mount -obind /mnt/2/bnd /mnt/3 Case 1 and case 2 have exactly the same mountinfo, but different reference counts for /mnt/1 and /mnt/2. -- Best Regards / S pozdravem, Stanislav Brabec software developer - SUSE LINUX, s. r. o. e-mail: sbra...@suse.com Lihovarská 1060/12tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republichttp://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: loop subsystem corrupted after mounting multiple btrfs sub-volumes
Al Viro wrote: > On Fri, Feb 26, 2016 at 11:39:11AM -0500, Austin S. Hemmelgarn wrote: > >> That's just it though, from what I can tell based on what I've seen >> and what you said above, mount(8) isn't doing things correctly in >> this case. If we were to do this with something like XFS or ext4, >> the filesystem would probably end up completely messed up just >> because of the log replay code (assuming they actually mount the >> second time, I'm not sure what XFS would do in this case, but I >> believe that ext4 would allow the mount as long as the mmp feature >> is off). It would make sense that this behavior wouldn't have been >> noticed before (and probably wouldn't have mattered even if it had >> been), because most filesystems don't allow multiple mounts even if >> they're all RO, and most people don't try to mount other filesystems >> multiple times as a result of this. Well, in such case kernel should return an error when mount(8) is trying to use multiple mount devices for a single file for mount(2). But kernel does not return error, it starts to do strange things. > They most certainly do. The problem is mount(8) treatment of -o loop - > you can mount e.g. ext4 many times, it'll just get you extra references > to the same struct super_block from those new vfsmounts. IOW, that'll > behave the same way as if you were doing mount --bind on subsequent ones. I just tested the same with ext4. The rewriting of mountinfo happens only with btrfs. But after that mount(2) stops to work. See the last mount(2). It returns 0, but nothing is mounted! Kernel mount(2) refuses to work! # mount -oloop /ext4.img /mnt/1 # cat /proc/self/mountinfo | grep /mnt 238 59 7:0 / /mnt/1 rw,relatime shared:153 - ext4 /dev/loop0 rw,data=ordered # mount -oloop /ext4.img /mnt/2 # cat /proc/self/mountinfo | grep /mnt 238 59 7:0 / /mnt/1 rw,relatime shared:153 - ext4 /dev/loop0 rw,data=ordered 243 59 7:1 / /mnt/2 rw,relatime shared:156 - ext4 /dev/loop1 rw,data=ordered # umount /mnt/* # mount -oloop /btrfs.img /mnt/1 # cat /proc/self/mountinfo | grep /mnt 238 59 0:94 /d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 /mnt/1 rw,relatime shared:153 - btrfs /dev/loop0 rw,space_cache,subvolid=257,subvol=/d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 # mount -oloop,subvol=/ /btrfs.img /mnt/2 # cat /proc/self/mountinfo | grep /mnt 238 59 0:94 /d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 /mnt/1 rw,relatime shared:153 - btrfs /dev/loop1 rw,space_cache,subvolid=257,subvol=/d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 I is really strange! Mount was called, but nothing appeared in the mountinfo. Just a rewritten /dev/loop0 -> /dev/loop1 in the existing mount. To be sure, that it is mount(2) issue and not mount(8), let's try it again with strace. # strace mount -oloop,subvol=/ /btrfs.img /mnt/2 2>&1 | tail -n 7 mount("/dev/loop1", "/mnt/2", "btrfs", MS_MGC_VAL, "subvol=/") = 0 access("/mnt/2", W_OK) = 0 close(4)= 0 close(1)= 0 close(2)= 0 exit_group(0) = ? +++ exited with 0 +++ # cat /proc/self/mountinfo | grep /mnt 238 59 0:94 /d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 /mnt/1 rw,relatime shared:153 - btrfs /dev/loop1 rw,space_cache,subvolid=257,subvol=/d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 Where is /mnt/2? > And as far as kernel is concerned, /dev/loop* isn't special in any respects; > if you do explicit losetup and mount the resulting /dev/loop as many > times as you wish, it'll work just fine. mount(8) just calls losetup internally for every -o loop. Once per "loop" option. Nobody probably tried to loop mount the same ext4 volume more times, so no problems appeared. But for btrfs, one would. And mounting two btrfs subvolumes with two "-oloop" calls losetup twice for the same file. > And from the kernel POV it's not > different from what it sees with -o loop; setting the loop device up is > done first by separate syscall, then mount(2) for that device is issued. Yes, it is different. - You have one file. - You have two loop devices pointing to the same file. - btrfs subvolumes are internally handled similarly like bind mounts. It means, that all subvolumes should have the same mount source. But these two mounts don't have. > It's mount(8) that screws up here. Yes mount(8) screws mount(2). And it corrupts kernel: 1) /proc/self/mountinfo changes its contents. 2) mount(2) called after the reproducer returns OK but does nothing. -- Best Regards / S pozdravem, Stanislav Brabec software developer - SUSE LINUX, s. r. o. e-mail: sbra...@suse.com Lihovarská 1060/12tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republichttp://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe from this list: send the line "unsubscribe
Re: loop subsystem corrupted after mounting multiple btrfs sub-volumes
On 2016-02-26 12:07, Stanislav Brabec wrote: Austin S. Hemmelgarn wrote: > On 2016-02-26 10:50, Stanislav Brabec wrote: That's just it though, from what I can tell based on what I've seen and what you said above, mount(8) isn't doing things correctly in this case. If we were to do this with something like XFS or ext4, the filesystem would probably end up completely messed up just because of the log replay code (assuming they actually mount the second time, I'm not sure what XFS would do in this case, but I believe that ext4 would allow the mount as long as the mmp feature is off). It would make sense that this behavior wouldn't have been noticed before (and probably wouldn't have mattered even if it had been), because most filesystems don't allow multiple mounts even if they're all RO, and most people don't try to mount other filesystems multiple times as a result of this. If this behavior of allocating a new loop device for each call on a given file is in fact not BTRFS specific (as implied by your statement about a possible workaround in mount(8)), then mount(8) really should be fixed to not do that before we even consider looking at the issues in BTRFS, as that is behavior that has serious potential to result in data corruption for any filesystem, not just BTRFS. Well, kernel could "fix" it in a simple way: - don't allow two loop devices pointing to the same file or - don't allow two loop devices pointing to the same file being used by mount(2). This has legitimate usage in testing multipath configuration and operation, and in testing that filesystems handle this correctly. On top of that, it becomes decidedly non-trivial to handle when you consider that loop devices can map a fixed range of a file independent of the rest of the file (this used to be the way to pull partitions out of raw disk images before the device mapper became as commonplace as it is now). Then util-linux would need a behavior change for sure. I already found another inconsistency caused by this implementation: /proc/self/mountinfo reports subvolid of the nearest upper sub-volume root for the bind mount, not the sub-volume that was used for creating this bind mount, and subvolid that potentially does not correspond to any subvolume root. This could causes problem for evaluation of order of umount(2) that should prevent EBUSY. I was talking about it with David Sterba, and he told, that in the current implementation is not optimal. btrfs driver does not have sufficient information to evaluate true root of the bind mount. I've noticed this before myself, but I've never seen any issues resulting from it; however, I've also not tried calling BTRFS related ioctls on or from such a mount, so I may just have been lucky. I can imagine two side effects deeply inside mount(8): - "mount -a" uses subvol internally for a path lookup of the default volume or volume corresponding to subvolid. (Only the GIT version, not yet in 2.27.1.) I could imagine that the lookup is confused by a bind mount reporting the searched subvolid and a "random" subvol subvol. But I don't have a reproducer yet, and I am not sure, whether it is really possible. - "umount -a" could have a problem to find a proper order to umount(2) without EBUSY. I did not check the algorithm, so I am not sure, whether it is a real issue. If BTRFS can't get the correct ref on the FS root internally, then there are all kinds of things that could go wrong when you try to do any of the typical maintenance stuff on it (like balancing, scrub, defrag, snapshot/subvolume creation/deletion, etc). In essence, if you try to do almost anything using the btrfs command line tools on that mount point, it might fail in new and interesting ways. P. S.: There were many problems with btrfs in mount(8): https://git.kernel.org/cgit/utils/util-linux/util-linux.git/commit/?id=c4af75a84ef3430003c77be2469869aaf3a63e2a https://git.kernel.org/cgit/utils/util-linux/util-linux.git/commit/?id=618a88140e26a134727a39c906c9cdf6d0c04513 https://git.kernel.org/cgit/utils/util-linux/util-linux.git/commit/?id=d2f8267847ecbe763a3b63af1289bf1179cd8c45 https://git.kernel.org/cgit/utils/util-linux/util-linux.git/commit/?id=2cd28fc82d0c947472a4700d5e764265916fba1e https://git.kernel.org/cgit/utils/util-linux/util-linux.git/commit/?id=352740e88e2c9cb180fe845ce210b1c7b5ad88c7 The first commit is just test cases, and the others are specific issues that only affected BTRFS which have nothing to do with this thread at all other than involving mount(8) and BTRFS. The originally stated issue that this thread is about is specific to loop mounting a BTRFS filesystem stored in a file multiple times. The issue can be empirically demonstrated to be a result of an interaction between BTRFS behavior regarding duplicate filesystems and an implementation detail of mount(8). The BTRFS behavior WRT duplicate FS UUID's is not going away any time soon (believe me, it's
Re: loop subsystem corrupted after mounting multiple btrfs sub-volumes
On Fri, Feb 26, 2016 at 11:39:11AM -0500, Austin S. Hemmelgarn wrote: > That's just it though, from what I can tell based on what I've seen > and what you said above, mount(8) isn't doing things correctly in > this case. If we were to do this with something like XFS or ext4, > the filesystem would probably end up completely messed up just > because of the log replay code (assuming they actually mount the > second time, I'm not sure what XFS would do in this case, but I > believe that ext4 would allow the mount as long as the mmp feature > is off). It would make sense that this behavior wouldn't have been > noticed before (and probably wouldn't have mattered even if it had > been), because most filesystems don't allow multiple mounts even if > they're all RO, and most people don't try to mount other filesystems > multiple times as a result of this. They most certainly do. The problem is mount(8) treatment of -o loop - you can mount e.g. ext4 many times, it'll just get you extra references to the same struct super_block from those new vfsmounts. IOW, that'll behave the same way as if you were doing mount --bind on subsequent ones. And as far as kernel is concerned, /dev/loop* isn't special in any respects; if you do explicit losetup and mount the resulting /dev/loop as many times as you wish, it'll work just fine. And from the kernel POV it's not different from what it sees with -o loop; setting the loop device up is done first by separate syscall, then mount(2) for that device is issued. It's mount(8) that screws up here. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs progs release 4.4.1
Hi, btrfs-progs 4.4.1 have been released, minor bugfixes. Changes: * find-root: don't skip the first chunk * free-space-tree compat bits fix * build: target symlinks * documentation updates * test updates Not much updates for the pending patchset. I've tried to merge the mkfs feature guessing patches but had to write them from scratch. The other pending patchsets are in the integration branch. As my kernel works for next merge window are almost done I hopefully will spend more time on progs during next weeks. Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/ Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git Shortlog: David Sterba (4): btrfs-progs: fix compat_ro mask for free space tree btrfs-progs: tests: store checksums in /tmp btrfs-progs: tests: use common variables and helpers Btrfs progs v4.4.1 Hongxu Jia (1): btrfs-progs: fix symlink creation multiple times Lakshmipathi.G (1): btrfs-progs: tests: do checksum verification with convert-tests Mark Fasheh (1): btrfs-progs: Import interval tree implemenation from Linux v4.0-rc7. Mike Gilbert (1): btrfs-progs: Makefile.in: Simplify/correct install-static Qu Wenruo (5): btrfs-progs: volume: Fix a bug causing btrfs-find-root to skip first chunk btrfs-progs: Allow open_ctree to return fs_info even chunk tree is corrupted btrfs-progs: Add support for tree block operations on fs_info without roots btrfs-progs: find-root: Allow btrfs-find-root to search chunk root even chunk root is corrupted btrfs-progs: misc-test: Add regression test for find-root gives empty result Satoru Takeuchi (3): btrfs-progs: Fix self-reference of man btrfs-subvolume btrfs-progs: describe btrfs-send requires read-only subvolume btrfs-progs: write down the meaning of BTRFS_ARG_BLKDEV Tsutomu Itoh (2): btrfs-progs: doc: fix typo of some documents btrfs-progs: doc: fix size suffix in mkfs.btrfs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: loop subsystem corrupted after mounting multiple btrfs sub-volumes
Austin S. Hemmelgarn wrote: > On 2016-02-26 10:50, Stanislav Brabec wrote: That's just it though, from what I can tell based on what I've seen and what you said above, mount(8) isn't doing things correctly in this case. If we were to do this with something like XFS or ext4, the filesystem would probably end up completely messed up just because of the log replay code (assuming they actually mount the second time, I'm not sure what XFS would do in this case, but I believe that ext4 would allow the mount as long as the mmp feature is off). It would make sense that this behavior wouldn't have been noticed before (and probably wouldn't have mattered even if it had been), because most filesystems don't allow multiple mounts even if they're all RO, and most people don't try to mount other filesystems multiple times as a result of this. If this behavior of allocating a new loop device for each call on a given file is in fact not BTRFS specific (as implied by your statement about a possible workaround in mount(8)), then mount(8) really should be fixed to not do that before we even consider looking at the issues in BTRFS, as that is behavior that has serious potential to result in data corruption for any filesystem, not just BTRFS. Well, kernel could "fix" it in a simple way: - don't allow two loop devices pointing to the same file or - don't allow two loop devices pointing to the same file being used by mount(2). Then util-linux would need a behavior change for sure. I already found another inconsistency caused by this implementation: /proc/self/mountinfo reports subvolid of the nearest upper sub-volume root for the bind mount, not the sub-volume that was used for creating this bind mount, and subvolid that potentially does not correspond to any subvolume root. This could causes problem for evaluation of order of umount(2) that should prevent EBUSY. I was talking about it with David Sterba, and he told, that in the current implementation is not optimal. btrfs driver does not have sufficient information to evaluate true root of the bind mount. I've noticed this before myself, but I've never seen any issues resulting from it; however, I've also not tried calling BTRFS related ioctls on or from such a mount, so I may just have been lucky. I can imagine two side effects deeply inside mount(8): - "mount -a" uses subvol internally for a path lookup of the default volume or volume corresponding to subvolid. (Only the GIT version, not yet in 2.27.1.) I could imagine that the lookup is confused by a bind mount reporting the searched subvolid and a "random" subvol subvol. But I don't have a reproducer yet, and I am not sure, whether it is really possible. - "umount -a" could have a problem to find a proper order to umount(2) without EBUSY. I did not check the algorithm, so I am not sure, whether it is a real issue. P. S.: There were many problems with btrfs in mount(8): https://git.kernel.org/cgit/utils/util-linux/util-linux.git/commit/?id=c4af75a84ef3430003c77be2469869aaf3a63e2a https://git.kernel.org/cgit/utils/util-linux/util-linux.git/commit/?id=618a88140e26a134727a39c906c9cdf6d0c04513 https://git.kernel.org/cgit/utils/util-linux/util-linux.git/commit/?id=d2f8267847ecbe763a3b63af1289bf1179cd8c45 https://git.kernel.org/cgit/utils/util-linux/util-linux.git/commit/?id=2cd28fc82d0c947472a4700d5e764265916fba1e https://git.kernel.org/cgit/utils/util-linux/util-linux.git/commit/?id=352740e88e2c9cb180fe845ce210b1c7b5ad88c7 -- Best Regards / S pozdravem, Stanislav Brabec software developer - SUSE LINUX, s. r. o. e-mail: sbra...@suse.com Lihovarská 1060/12tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republichttp://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: loop subsystem corrupted after mounting multiple btrfs sub-volumes
On 2016-02-26 10:50, Stanislav Brabec wrote: Austin S. Hemmelgarn wrote: > Added linux-btrfs as this should be documented there as a known issue > until it gets fixed (although I have no idea which side is the issue). This is a very bad behavior, as it makes impossible to safely use btrfs loop bind mounts in fstab. (Well, it is possible to write a work-around in util-linux: Remember the source file, and if -oloop is specified next time, and source file is already assigned to a loop device, use existing loop device.) I'm not 100% certain, but I think this is a interaction between how BTRFS handles multiple mounts of the same filesystem on a given system and how mount handles loop mounts. AFAIUI, all instances of a given BTRFS filesystem being mounted on a given system are internally identical to bind mounts of a hidden mount of that filesystem. This is what allows both manual mounting of sub-volumes, and multiple mounting of the FS in general. Yes, internal implementation is the same. But here it causes a real trouble: However both mounts point to the same file, first and second mount use different loop device. To create a bind mount, something ugly needs to be done. And it is done in an incorrect way. That's just it though, from what I can tell based on what I've seen and what you said above, mount(8) isn't doing things correctly in this case. If we were to do this with something like XFS or ext4, the filesystem would probably end up completely messed up just because of the log replay code (assuming they actually mount the second time, I'm not sure what XFS would do in this case, but I believe that ext4 would allow the mount as long as the mmp feature is off). It would make sense that this behavior wouldn't have been noticed before (and probably wouldn't have mattered even if it had been), because most filesystems don't allow multiple mounts even if they're all RO, and most people don't try to mount other filesystems multiple times as a result of this. If this behavior of allocating a new loop device for each call on a given file is in fact not BTRFS specific (as implied by your statement about a possible workaround in mount(8)), then mount(8) really should be fixed to not do that before we even consider looking at the issues in BTRFS, as that is behavior that has serious potential to result in data corruption for any filesystem, not just BTRFS. Now, if this does get fixed, mount(8) doesn't necessarily need to maintain it's own copy of the state of /dev/loop mappings, it could simply check the currently allocated loop devices. You would of course need some form of locking relative to other mount -o loop instances and losetup, and it would be slow, but if you're using enough loop devices that this causes noticeable delays, then you really shouldn't be complaining all that much about performance. I already found another inconsistency caused by this implementation: /proc/self/mountinfo reports subvolid of the nearest upper sub-volume root for the bind mount, not the sub-volume that was used for creating this bind mount, and subvolid that potentially does not correspond to any subvolume root. This could causes problem for evaluation of order of umount(2) that should prevent EBUSY. I was talking about it with David Sterba, and he told, that in the current implementation is not optimal. btrfs driver does not have sufficient information to evaluate true root of the bind mount. I've noticed this before myself, but I've never seen any issues resulting from it; however, I've also not tried calling BTRFS related ioctls on or from such a mount, so I may just have been lucky. Maybe the same is valid for the reported loop issue, and this is just an ugly side effect. I'd be more than willing to bet that that isn't the case, loop mounts and bind mounts are entirely different inside the kernel, and I think the loop mount issue on the BTRFS side is a result of the issues it has when dealing with filesystems with the same UUID (if this is in fact the case, similar behavior should be seen when trying to either mount multiple lower level components of a multi-path device, or by manually creating multiple /dev/loop associations for the same file and mounting them all at once using the /dev/loop names instead of the file). P. S.: There are some use differences between bind mounts and btrfs sub-volumes: - Bind mounts can be created for any file or directory. - Sub-volume mounts can be created only for inodes marked as sub-volume root. - Bind mounts can be mounted only if any of upper sub-volume root is mounted. - Sub-volumes can be mounted even if volume root is not mounted. FWIW, it's actually possible to simulate this behavior with bind mounts by mounting the root at the eventual mount point, then bind mounting the desired directory from that root over top of it. Of course, there is almost zero practical purpose to anyone doing this on most traditional
Re: loop subsystem corrupted after mounting multiple btrfs sub-volumes
Austin S. Hemmelgarn wrote: > Added linux-btrfs as this should be documented there as a known issue > until it gets fixed (although I have no idea which side is the issue). This is a very bad behavior, as it makes impossible to safely use btrfs loop bind mounts in fstab. (Well, it is possible to write a work-around in util-linux: Remember the source file, and if -oloop is specified next time, and source file is already assigned to a loop device, use existing loop device.) I'm not 100% certain, but I think this is a interaction between how BTRFS handles multiple mounts of the same filesystem on a given system and how mount handles loop mounts. AFAIUI, all instances of a given BTRFS filesystem being mounted on a given system are internally identical to bind mounts of a hidden mount of that filesystem. This is what allows both manual mounting of sub-volumes, and multiple mounting of the FS in general. Yes, internal implementation is the same. But here it causes a real trouble: However both mounts point to the same file, first and second mount use different loop device. To create a bind mount, something ugly needs to be done. And it is done in an incorrect way. I already found another inconsistency caused by this implementation: /proc/self/mountinfo reports subvolid of the nearest upper sub-volume root for the bind mount, not the sub-volume that was used for creating this bind mount, and subvolid that potentially does not correspond to any subvolume root. This could causes problem for evaluation of order of umount(2) that should prevent EBUSY. I was talking about it with David Sterba, and he told, that in the current implementation is not optimal. btrfs driver does not have sufficient information to evaluate true root of the bind mount. Maybe the same is valid for the reported loop issue, and this is just an ugly side effect. P. S.: There are some use differences between bind mounts and btrfs sub-volumes: - Bind mounts can be created for any file or directory. - Sub-volume mounts can be created only for inodes marked as sub-volume root. - Bind mounts can be mounted only if any of upper sub-volume root is mounted. - Sub-volumes can be mounted even if volume root is not mounted. -- Best Regards / S pozdravem, Stanislav Brabec software developer - SUSE LINUX, s. r. o. e-mail: sbra...@suse.com Lihovarská 1060/12tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republichttp://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL] Btrfs for 4.6
Hi, this is my final pull request for 4.6 branches that I've been tracking. All of them were in for-next for some time. Summary: * Chandan's preparatory work for subpage-blocksize * Qu's updates to mount options (usebackuproot, nologreplay, norecovery) * Zhao's readahead bugfixes * Josef's updates to space handling * from me, GFP flag updates, b-tree key space renamings * collection of misc patches (sent out of series) * misc cleanups The patchset to allow device deletion by id is not part of this pull. The following changes since commit 18558cae0272f8fd9647e69d3fec1565a7949865: Linux 4.5-rc4 (2016-02-14 13:05:20 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-chris for you to fetch changes up to f5bc27c71a1b0741cb93dbec0f216b012b21d93f: Merge branch 'dev/control-ioctl' into for-chris-4.6 (2016-02-26 15:38:34 +0100) Arnd Bergmann (1): btrfs: avoid uninitialized variable warning Byongho Lee (2): btrfs: simplify expression in btrfs_calc_trans_metadata_size() btrfs: remove redundant error check Chandan Rajendra (12): Btrfs: __btrfs_buffered_write: Reserve/release extents aligned to block size Btrfs: Compute and look up csums based on sectorsized blocks Btrfs: Direct I/O read: Work on sectorsized blocks Btrfs: fallocate: Work with sectorsized blocks Btrfs: btrfs_page_mkwrite: Reserve space in sectorsized units Btrfs: Search for all ordered extents that could span across a page Btrfs: Use (eb->start, seq) as search key for tree modification log Btrfs: btrfs_submit_direct_hook: Handle map_length < bio vector length Btrfs: Limit inline extents to root->sectorsize Btrfs: Fix block size returned to user space Btrfs: Clean pte corresponding to page straddling i_size Btrfs: btrfs_ioctl_clone: Truncate complete page after performing clone operation Dave Jones (1): btrfs: remove open-coded swap() in backref.c:__merge_refs David Sterba (30): btrfs: send: use GFP_KERNEL everywhere btrfs: reada: use GFP_KERNEL everywhere btrfs: scrub: use GFP_KERNEL on the submission path btrfs: let callers of btrfs_alloc_root pass gfp flags btrfs: fallocate: use GFP_KERNEL btrfs: readdir: use GFP_KERNEL btrfs: device add and remove: use GFP_KERNEL btrfs: extent same: use GFP_KERNEL for page array allocations btrfs: switch to kcalloc in btrfs_cmp_data_prepare btrfs: introduce key type for persistent temporary items btrfs: switch balance item to the temporary item key btrfs: introduce key type for persistent permanent items btrfs: switch dev stats item to the permanent item key btrfs: teach print_leaf about permanent item subtypes btrfs: teach print_leaf about temporary item subtypes btrfs: use proper type for failrec in extent_state btrfs: remove error message from search ioctl for nonexistent tree btrfs: change max_inline default to 2048 btrfs: add GET_SUPPORTED_FEATURES to the control device ioctls btrfs: drop unused argument in btrfs_ioctl_get_supported_features Merge branch 'chandan/prep-subpage-blocksize' into for-chris-4.6 Merge branch 'dev/gfp-flags' into for-chris-4.6 Merge branch 'dev/rename-keys' into for-chris-4.6 Merge branch 'foreign/qu/norecovery-v7' into for-chris-4.6 Merge branch 'foreign/zhaolei/reada' into for-chris-4.6 Merge branch 'foreign/josef/space-updates' into for-chris-4.6 Merge branch 'foreign/liubo/replace-lockup' into for-chris-4.6 Merge branch 'cleanups-4.6' into for-chris-4.6 Merge branch 'misc-4.6' into for-chris-4.6 Merge branch 'dev/control-ioctl' into for-chris-4.6 Deepa Dinamani (1): btrfs: Replace CURRENT_TIME by current_fs_time() Josef Bacik (4): Btrfs: change how we update the global block rsv Btrfs: fix truncate_space_check Btrfs: add transaction space reservation tracepoints Btrfs: check reserved when deciding to background flush Kinglong Mee (2): btrfs: drop null testing before destroy functions btrfs: fix memory leak of fs_info in block group cache Liu Bo (1): Btrfs: fix lockdep deadlock warning due to dev_replace Qu Wenruo (3): btrfs: Introduce new mount option usebackuproot to replace recovery btrfs: Introduce new mount option to disable tree log replay btrfs: Introduce new mount option alias for nologreplay Sudip Mukherjee (1): btrfs: fix build warning Zhao Lei (18): btrfs: reada: Fix in-segment calculation for reada btrfs: reada: reduce additional fs_info->reada_lock in reada_find_zone btrfs: reada: Add missed segment checking in reada_find_zone btrfs: reada: Avoid many times of empty loop btrfs:
Re: [PATCH 1/3] btrfs: Continue write in case of can_not_nocow
On Fri, Feb 26, 2016 at 10:41:31AM +0500, Roman Mamedov wrote: > On Wed, 6 Jan 2016 19:00:17 +0800 > Zhao Leiwrote: > > > btrfs failed in xfstests btrfs/080 with -o nodatacow. > > > > Can be reproduced by following script: > > DEV=/dev/vdg > > MNT=/mnt/tmp > > > > umount $DEV &>/dev/null > > mkfs.btrfs -f $DEV > > mount -o nodatacow $DEV $MNT > > > > dd if=/dev/zero of=$MNT/test bs=1 count=2048 & > > btrfs subvolume snapshot -r $MNT $MNT/test_snap & > > wait > > -- > > We can see dd failed on NO_SPACE. > > > > Reason: > > __btrfs_buffered_write should run cow write when no_cow impossible, > > and current code is designed with above logic. > > But check_can_nocow() have 2 type of return value(0 and <0) on > > can_not_no_cow, and current code only continue write on first case, > > the second case happened in doing subvolume. > > > > Fix: > > Continue write when check_can_nocow() return 0 and <0. > > > > Signed-off-by: Zhao Lei > > Guys please don't forget about this patch. It solves real problem for people, > http://www.spinics.net/lists/linux-btrfs/msg51276.html > http://www.spinics.net/lists/linux-btrfs/msg51819.html > but it's not in 4.4.1, not in 4.4.2... and now not in 4.4.3 Dave already has it queued for the next merge window. Thanks! -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Btrfs fixes for 4.6
From: Filipe MananaHi Chris, Please consider the following changes for the 4.6 kernel merge window. Nothing particularly outstanding, just the usual sort of bug fixes. These have all been sent to the mailing list before (I just changed in my repo the changelog for the deadlock fix patch to fix a typo pointed by Liu Bo, other than that it's exactly the same as the version sent to the mailing list). Some xfstests for these were already merged upstream and one more sent earlier this week (for the listxattrs issue) that is not yet merged. Thanks. The following changes since commit 0fcb760afa6103419800674e22fb7f4de1f9670b: Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.6 (2016-02-24 10:21:44 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git integration-4.6 for you to fetch changes up to 97c86c11a5cb9839609a9df195e998c3312e68b0: Btrfs: do not collect ordered extents when logging that inode exists (2016-02-26 04:28:15 +) Filipe Manana (7): Btrfs: fix unreplayable log after snapshot delete + parent dir fsync Btrfs: fix file loss on log replay after renaming a file and fsync Btrfs: fix extent_same allowing destination offset beyond i_size Btrfs: fix deadlock between direct IO reads and buffered writes Btrfs: fix listxattrs not listing all xattrs packed in the same item Btrfs: fix race when checking if we can skip fsync'ing an inode Btrfs: do not collect ordered extents when logging that inode exists fs/btrfs/file.c | 9 + fs/btrfs/inode.c| 25 +++-- fs/btrfs/ioctl.c| 6 ++ fs/btrfs/tree-log.c | 99 --- fs/btrfs/tree-log.h | 2 ++ fs/btrfs/xattr.c| 65 + 6 files changed, 165 insertions(+), 41 deletions(-) -- 2.7.0.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: do not collect ordered extents when logging that inode exists
From: Filipe MananaWhen logging that an inode exists, for example as part of a directory fsync operation, we were collecting any ordered extents for the inode but we ended up doing nothing with them except tagging them as processed, by setting the flag BTRFS_ORDERED_LOGGED on them, which prevented a subsequent fsync of that inode (using the LOG_INODE_ALL mode) from collecting and processing them. This created a time window where a second fsync against the inode, using the fast path, ended up not logging the checksums for the new extents but it logged the extents since they were part of the list of modified extents. This happened because the ordered extents were not collected and checksums were not yet added to the csum tree - the ordered extents have not gone through btrfs_finish_ordered_io() yet (which is where we add them to the csum tree by calling inode.c:add_pending_csums()). So fix this by not collecting an inode's ordered extents if we are logging it with the LOG_INODE_EXISTS mode. Signed-off-by: Filipe Manana --- fs/btrfs/tree-log.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 9f6372d..9d2e8ec 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -4500,7 +4500,22 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans, mutex_lock(_I(inode)->log_mutex); - btrfs_get_logged_extents(inode, _list, start, end); + /* +* Collect ordered extents only if we are logging data. This is to +* ensure a subsequent request to log this inode in LOG_INODE_ALL mode +* will process the ordered extents if they still exists at the time, +* because when we collect them we test and set for the flag +* BTRFS_ORDERED_LOGGED to prevent multiple log requests to process the +* same ordered extents. The consequence for the LOG_INODE_ALL log mode +* not processing the ordered extents is that we end up logging the +* corresponding file extent items, based on the extent maps in the +* inode's extent_map_tree's modified_list, without logging the +* respective checksums (since the may still be only attached to the +* ordered extents and have not been inserted in the csum tree by +* btrfs_finish_ordered_io() yet). +*/ + if (inode_only == LOG_INODE_ALL) + btrfs_get_logged_extents(inode, _list, start, end); /* * a brute force approach to making sure we get the most uptodate -- 2.7.0.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: loop subsystem corrupted after mounting multiple btrfs sub-volumes
Added linux-btrfs as this should be documented there as a known issue until it gets fixed (although I have no idea which side is the issue). On 2016-02-25 14:22, Stanislav Brabec wrote: While writing a test suite for util-linux[1], I experienced a a strange behavior of loop device: When two loop devices refer to the same file, and two btrfs mounts are called on them, the second mount changes loop device of the first, already mounted sub-volume. (Note that the current implementation of util-linux mount -oloop works exactly in this way, and it allocates new loop device for each mount command, so this bug can be easily reproduced without losetup, just using "mount -oloop" or fstab.) I'm not 100% certain, but I think this is a interaction between how BTRFS handles multiple mounts of the same filesystem on a given system and how mount handles loop mounts. AFAIUI, all instances of a given BTRFS filesystem being mounted on a given system are internally identical to bind mounts of a hidden mount of that filesystem. This is what allows both manual mounting of sub-volumes, and multiple mounting of the FS in general. /proc/self/mountinfo after first btrfs loop mount: 107 59 0:59 /d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 /mnt/1 rw,relatime shared:45 - btrfs /dev/loop0 rw,space_cache,subvolid=257,subvol=/d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 This line changes after second first btrfs loop to: 07 59 0:59 /d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 /mnt/1 rw,relatime shared:45 - btrfs /dev/loop1 rw,space_cache,subvolid=257,subvol=/d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 See the change of /dev/loop0 to /dev/loop1! It is apparently not only proc file change, but it also causes a corruption of loop device subsystem, as I observed severe problems on the affected system later: - mount(2) returning 0 but doing nothing. - mount(8) entering an infinite loop while searching for free loop device. This seems odd that it would cause such a degree of inconsistency in the kernel itself. My guess though is that mount(8) sees that you're trying to mount a file and unconditionally tries to bind it to a loop device without checking any in-use loop devices to see if it's already bound to them, and then when it calls mount(2), this ends up somehow confusing the BTRFS driver (probably because you've now mounted two filesystems with effectively identical super-blocks, BTRFS already has issues if multiple filesystems have the same UUID, and I have no idea how it might react to filesystems that appear identical but are on separate devices). Here is a main reproducer: = #!/bin/sh # Prepare the environment: /btrfs.sh mkdir -p /mnt/1 /mnt/2 losetup /dev/loop0 /btrfs.img # Verify that nothing is mounted: cat /proc/self/mountinfo | grep /mnt mount /dev/loop0 /mnt/1 echo "One file system should be mounted now." cat /proc/self/mountinfo | grep /mnt # Create another loop. losetup /dev/loop1 /btrfs.img echo "Going to mount second one." mount -osubvol=/ /dev/loop1 /mnt/2 2>&1 echo "Two file system should be mounted now." cat /proc/self/mountinfo | grep /mnt echo "Strange. First mount changed its loop device!" umount /mnt/2 echo "And now check, whether it remains changed after umount." cat /proc/self/mountinfo | grep /mnt umount /mnt/1 losetup -d /dev/loop1 losetup -d /dev/loop0 rmdir /mnt/1 /mnt/2 = And here is its output: One file system should be mounted now. 107 59 0:59 /d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 /mnt/1 rw,relatime shared:45 - btrfs /dev/loop0 rw,space_cache,subvolid=257,subvol=/d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 Going to mount second one. Two file system should be mounted now. 107 59 0:59 /d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 /mnt/1 rw,relatime shared:45 - btrfs /dev/loop1 rw,space_cache,subvolid=257,subvol=/d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 108 59 0:59 / /mnt/2 rw,relatime shared:47 - btrfs /dev/loop1 rw,space_cache,subvolid=5,subvol=/ Strange. First mount changed its loop device! And now check, whether it remains changed after umount. 107 59 0:59 /d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 /mnt/1 rw,relatime shared:45 - btrfs /dev/loop1 rw,space_cache,subvolid=257,subvol=/d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 It was actually reproduced on linux-4.4.1 on openSUSE Tumbleweed. Test image creator: = /btrfs.sh = #!/bin/sh truncate -s 42M /btrfs.img mkfs.btrfs -f -d single -m single /btrfs.img >/dev/null mount -o loop /btrfs.img /mnt pushd . >/dev/null cd /mnt mkdir -p d0/dd0/ddd0 cd ./d0/dd0/ddd0 touch file{1..5} btrfs subvol create s1 >/dev/null cd ./s1 touch file{1..5} mkdir bind-point mkdir -p d1/dd1/ddd1 cd ./d1/dd1/ddd1 btrfs subvol create s2 >/dev/null DEFAULT_SUBVOLID=$(btrfs inspect rootid s2) btrfs subvol set-default $DEFAULT_SUBVOLID . >/dev/null NON_DEFAULT_SUBVOLID=$(btrfs subvol list /mnt | while read dummy id rest ; do if test $id = $DEFAULT_SUBVOLID ; then continue ; fi ; echo $id ; done) cd ../../../.. mkdir -p d2/dd2/ddd2 cd ./d2/dd2/ddd2 btrfs subvol create s3 >/dev/null mkdir -p s3/bind-mnt popd >/dev/null
Re: upgrading kernel 3.13 to 3.16
On 2016-02-26 05:50, Vytautas D wrote: Hi all, Are there any known issues upgrading btrfs running ubuntu kernel 3.13 to 3.16 ? System was once converted from ext4 using btrfs-convert ( btrfs-progs 3.17 ). The commit that worries me is following: * Btrfs: incompatible format change to remove hole extents (+373/-56) ( http://linux-btrfs.vger.kernel.narkive.com/syNRZbHS/patch-btrfs-incompatible-format-change-to-remove-hole-extents-v3#post1 ) would this block me from reverting system with a snapshot back to kernel 3.13 ? After upgrade would the system continue writing more metadata ? That particular commit also added a flag to control whether or not newly written metadata uses that feature. As long as you don't manually enable this flag, you should be perfectly fine WRT that change. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
upgrading kernel 3.13 to 3.16
Hi all, Are there any known issues upgrading btrfs running ubuntu kernel 3.13 to 3.16 ? System was once converted from ext4 using btrfs-convert ( btrfs-progs 3.17 ). The commit that worries me is following: * Btrfs: incompatible format change to remove hole extents (+373/-56) ( http://linux-btrfs.vger.kernel.narkive.com/syNRZbHS/patch-btrfs-incompatible-format-change-to-remove-hole-extents-v3#post1 ) would this block me from reverting system with a snapshot back to kernel 3.13 ? After upgrade would the system continue writing more metadata ? Thanks, Vytas -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html