Re: Two uncorrectable errors across RAID1 at same logical block?
On Wed, Oct 08, 2014 at 09:13:58AM -0700, Rich Rauenzahn wrote: On 10/8/2014 7:20 AM, Liu Bo wrote: On Mon, Oct 06, 2014 at 07:18:06PM -0700, Rich Rauenzahn wrote: On 10/6/2014 7:05 PM, Liu Bo wrote: btrfs inspect-internal logical-resolve 58464632832 $ sudo btrfs inspect-internal logical-resolve 58464632832 / ...no output? Hmm...have you tried the latest btrfs-progs? You can pull it or get a tar ball from git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git thanks, -liubo Still no output: $ sudo ./btrfs inspect-internal logical-resolve 58464632832 / Could it be a deleted file? No idea. Would you please try it with verbose option? sudo ./btrfs inspect-internal logical-resolve -v 58464632832 / thanks, -liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] btrfs: test mount btrfs subvolume with selinux context
If one subvolume was mounted with selinux context, other subvolumes should be able to be mounted with the same selinux context too. Cc: Qu Wenruo quwen...@cn.fujitsu.com Signed-off-by: Eryu Guan eg...@redhat.com --- v2: - redirect _scratch_mkfs output to $seqres.full to avoid trim disk message if the disk supports trim tests/btrfs/075 | 70 + tests/btrfs/075.out | 2 ++ tests/btrfs/group | 1 + 3 files changed, 73 insertions(+) create mode 100755 tests/btrfs/075 create mode 100644 tests/btrfs/075.out diff --git a/tests/btrfs/075 b/tests/btrfs/075 new file mode 100755 index 000..16ed854 --- /dev/null +++ b/tests/btrfs/075 @@ -0,0 +1,70 @@ +#! /bin/bash +# FSQA Test No. btrfs/075 +# +# If one subvolume was mounted with selinux context, other subvolumes +# should be able to be mounted with the same selinux context too. +# +#--- +# Copyright (C) 2014 Red Hat Inc. All rights reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +# +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ +status=1 +trap _cleanup; exit \$status 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* + $UMOUNT_PROG $subvol_mnt /dev/null 21 +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_scratch + +# SELINUX_MOUNT_OPTIONS will be set in common/config if selinux is enabled +if [ $SELINUX_MOUNT_OPTIONS == ]; then + _notrun Require selinux to be enabled +fi + +rm -f $seqres.full +echo Silence is golden + +# first mount default subvolume with selinux context set +_scratch_mkfs $seqres.full 21 +_scratch_mount + +# create a new subvolume and mount it with the same selinux context +subvol_mnt=$TEST_DIR/$seq.mnt +mkdir -p $subvol_mnt +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol $seqres.full 21 +$MOUNT_PROG -o subvol=subvol $SELINUX_MOUNT_OPTIONS $SCRATCH_DEV $subvol_mnt +status=$? + +exit diff --git a/tests/btrfs/075.out b/tests/btrfs/075.out new file mode 100644 index 000..ded801b --- /dev/null +++ b/tests/btrfs/075.out @@ -0,0 +1,2 @@ +QA output created by 075 +Silence is golden diff --git a/tests/btrfs/group b/tests/btrfs/group index c8ac72c..00ae8ce 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -77,3 +77,4 @@ 072 auto scrub defrag compress 073 auto scrub remount compress 074 auto defrag remount compress +075 auto quick subvol -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map
On Thu, Oct 9, 2014 at 1:28 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map From: Filipe David Manana fdman...@gmail.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年10月08日 20:08 On Fri, Sep 19, 2014 at 1:31 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map From: Filipe David Manana fdman...@gmail.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年09月18日 21:16 On Wed, Sep 17, 2014 at 4:53 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: The following commit enhanced the merge_extent_mapping() to reduce fragment in extent map tree, but it can't handle case which existing lies before map_start: 51f39 btrfs: Use right extent length when inserting overlap extent map. [BUG] When existing extent map's start is before map_start, the em-len will be minus, which will corrupt the extent map and fail to insert the new extent map. This will happen when someone get a large extent map, but when it is going to insert it into extent map tree, some one has already commit some write and split the huge extent into small parts. This sounds like very deterministic to me. Any reason to not add tests to the sanity tests that exercise this/these case/cases? Yes, thanks for the informing. Will add the test case for it soon. Hi Qu, Any progress on the test? This is a very important one IMHO, not only because of the bad consequences of the bug (extent map corruption, leading to all sorts of chaos), but also because this problem was not found by the full xfstests suite on several developer machines. thanks Still trying to reproduce it under xfstest framework. That's the problem, wasn't apparently reproducible (or detectable at least) by anyone with xfstests. But even followiiing the FileBench randomrw behavior(1 thread random read 1 thread random write on preallocated space), I still failed to reproduce it. Still investigating how to reproduce it. Worst case may be add a new C program into src of xfstests? How about the sanity tests (fs/btrfs/tests/*.c)? Create an empty map tree, add some extent maps, then try to merge some new extent maps that used to fail before this fix. Seems simple, no? thanks Qu Thanks, Qu Thanks, Qu Thanks [REPRODUCER] It is very easy to tiger using filebench with randomrw personality. It is about 100% to reproduce when using 8G preallocated file in 60s randonrw test. [FIX] This patch can now handle any existing extent position. Since it does not directly use existing-start, now it will find the previous and next extent around map_start. So the old existing-start map_start bug will never happen again. [ENHANCE] This patch will insert the best fitted extent map into extent map tree, other than the oldest [map_start, map_start + sectorsize) or the relatively newer but not perfect [map_start, existing-start). The patch will first search existing extent that does not intersects with the desired map range [map_start, map_start + len). The existing extent will be either before or behind map_start, and based on the existing extent, we can find out the previous and next extent around map_start. So the best fitted extent would be [prev-end, next-start). For prev or next is not found, em-start would be prev-end and em-end wold be next-start. With this patch, the fragment in extent map tree should be reduced much more than the 51f39 commit and reduce an unneeded extent map tree search. Reported-by: Tsutomu Itoh t-i...@jp.fujitsu.com Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- fs/btrfs/inode.c | 79 1 file changed, 57 insertions(+), 22 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 016c403..8039021 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6191,21 +6191,60 @@ out_fail_inode: goto out_fail; } +/* Find next extent map of a given extent map, caller needs to ensure locks */ +static struct extent_map *next_extent_map(struct extent_map *em) +{ + struct rb_node *next; + + next = rb_next(em-rb_node); + if (!next) + return NULL; + return container_of(next, struct extent_map, rb_node); +} + +static struct extent_map *prev_extent_map(struct extent_map *em) +{ + struct rb_node *prev; + + prev = rb_prev(em-rb_node); + if (!prev) + return NULL; + return container_of(prev, struct extent_map, rb_node); +} + /* helper for btfs_get_extent. Given an existing extent in the tree, + * the existing extent is the nearest extent to map_start, * and an extent that you want to insert, deal with overlap and insert - * the new extent into
Re: What is the vision for btrfs fs repair?
On 2014-10-08 15:11, Eric Sandeen wrote: I was looking at Marc's post: http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html and it feels like there isn't exactly a cohesive, overarching vision for repair of a corrupted btrfs filesystem. In other words - I'm an admin cruising along, when the kernel throws some fs corruption error, or for whatever reason btrfs fails to mount. What should I do? Marc lays out several steps, but to me this highlights that there seem to be a lot of disjoint mechanisms out there to deal with these problems; mostly from Marc's blog, with some bits of my own: * btrfs scrub Errors are corrected along if possible (what *is* possible?) * mount -o recovery Enable autorecovery attempts if a bad tree root is found at mount time. * mount -o degraded Allow mounts to continue with missing devices. (This isn't really a way to recover from corruption, right?) * btrfs-zero-log remove the log tree if log tree is corrupt * btrfs rescue Recover a damaged btrfs filesystem chunk-recover super-recover How does this relate to btrfs check? * btrfs check repair a btrfs filesystem --repair --init-csum-tree --init-extent-tree How does this relate to btrfs rescue? * btrfs restore try to salvage files from a damaged filesystem (not really repair, it's disk-scraping) What's the vision for, say, scrub vs. check vs. rescue? Should they repair the same errors, only online vs. offline? If not, what class of errors does one fix vs. the other? How would an admin know? Can btrfs check recover a bad tree root in the same way that mount -o recovery does? How would I know if I should use --init-*-tree, or chunk-recover, and what are the ramifications of using these options? It feels like recovery tools have been badly splintered, and if there's an overarching design or vision for btrfs fs repair, I can't tell what it is. Can anyone help me? Well, based on my understanding: * btrfs scrub is intended to be almost exactly equivalent to scrubbing a RAID volume; that is, it fixes disparity between multiple copies of the same block. IOW, it isn't really repair per se, but more preventative maintnence. Currently, it only works for cases where you have multiple copies of a block (dup, raid1, and raid10 profiles), but support is planned for error correction of raid5 and raid6 profiles. * mount -o recovery I don't know much about, but AFAICT, it s more for dealing with metadata related FS corruption. * mount -o degraded is used to mount a fs configured for a raid storage profile with fewer devices than the profile minimum. It's primarily so that you can get the fs into a state where you can run 'btrfs device replace' * btrfs-zero-log only deals with log tree corruption. This would be roughly equivalent to zeroing out the journal on an XFS or ext4 filesystem, and should almost never be needed. * btrfs rescue is intended for low level recovery corruption on an offline fs. * chunk-recover I'm not entirely sure about, but I believe it's like scrub for a single chunk on an offline fs * super-recover is for dealing with corrupted superblocks, and tries to replace it with one of the other copies (which hopefully isn't corrupted) * btrfs check is intended to (eventually) be equivalent to the fsck utility for most other filesystems. Currently, it's relatively good at identifying corruption, but less so at actually fixing it. There are however, some things that it won't catch, like a superblock pointing to a corrupted root tree. * btrfs restore is essentially disk scraping, but with built-in knowledge of the filesystem's on-disk structure, which makes it more reliable than more generic tools like scalpel for files that are too big to fit in the metadata blocks, and it is pretty much essential for dealing with transparently compressed files. In general, my personal procedure for handling a misbehaving BTRFS filesystem is: * Run btrfs check on it WITHOUT ANY OTHER OPTIONS to try to identify what's wrong * Try mounting it using -o recovery * Try mounting it using -o ro,recovery * Use -o degraded only if it's a BTRFS raid set that lost a disk * If btrfs check AND dmesg both seem to indicate that the log tree is corrupt, try btrfs-zero-log * If btrfs check indicated a corrupt superblock, try btrfs rescue super-recover * If all of the above fails, ask for advice on the mailing list or IRC Also, you should be running btrfs scrub regularly to correct bit-rot and force remapping of blocks with read errors. While BTRFS technically handles both transparently on reads, it only corrects thing on disk when you do a scrub. smime.p7s Description: S/MIME Cryptographic Signature
Re: What is the vision for btrfs fs repair?
Austin S Hemmelgarn posted on Thu, 09 Oct 2014 07:29:23 -0400 as excerpted: Also, you should be running btrfs scrub regularly to correct bit-rot and force remapping of blocks with read errors. While BTRFS technically handles both transparently on reads, it only corrects thing on disk when you do a scrub. AFAIK that isn't quite correct. Currently, the number of copies is limited to two, meaning if one of the two is bad, there's a 50% chance of btrfs reading the good one on first try. If btrfs reads the good copy, it simply uses it. If btrfs reads the bad one, it checks the other one and assuming it's good, replaces the bad one with the good one both for the read (which otherwise errors out), and by overwriting the bad one. But here's the rub. The chances of detecting that bad block are relatively low in most cases. First, the system must try reading it for some reason, but even then, chances are 50% it'll pick the good one and won't even notice the bad one. Thus, while btrfs may randomly bump into a bad block and rewrite it with the good copy, scrub is the only way to systematically detect and (if there's a good copy) fix these checksum errors. It's not that btrfs doesn't do it if it finds them, it's that the chances of finding them are relatively low, unless you do a scrub, which systematically checks the entire filesystem (well, other than files marked nocsum, or nocow, which implies nocsum, or files written when mounted with nodatacow or nodatasum). At least that's the way it /should/ work. I guess it's possible that btrfs isn't doing those routine bump-into-it-and-fix-it fixes yet, but if so, that's the first /I/ remember reading of it. Other than that detail, what you posted matches my knowledge and experience, such as it may be as a non-dev list regular, as well. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is the vision for btrfs fs repair?
On Thu, Oct 09, 2014 at 11:53:23AM +, Duncan wrote: Austin S Hemmelgarn posted on Thu, 09 Oct 2014 07:29:23 -0400 as excerpted: Also, you should be running btrfs scrub regularly to correct bit-rot and force remapping of blocks with read errors. While BTRFS technically handles both transparently on reads, it only corrects thing on disk when you do a scrub. AFAIK that isn't quite correct. Currently, the number of copies is limited to two, meaning if one of the two is bad, there's a 50% chance of btrfs reading the good one on first try. Scrub checks both copies, though. It's ordinary reads that don't. Hugo. If btrfs reads the good copy, it simply uses it. If btrfs reads the bad one, it checks the other one and assuming it's good, replaces the bad one with the good one both for the read (which otherwise errors out), and by overwriting the bad one. But here's the rub. The chances of detecting that bad block are relatively low in most cases. First, the system must try reading it for some reason, but even then, chances are 50% it'll pick the good one and won't even notice the bad one. Thus, while btrfs may randomly bump into a bad block and rewrite it with the good copy, scrub is the only way to systematically detect and (if there's a good copy) fix these checksum errors. It's not that btrfs doesn't do it if it finds them, it's that the chances of finding them are relatively low, unless you do a scrub, which systematically checks the entire filesystem (well, other than files marked nocsum, or nocow, which implies nocsum, or files written when mounted with nodatacow or nodatasum). At least that's the way it /should/ work. I guess it's possible that btrfs isn't doing those routine bump-into-it-and-fix-it fixes yet, but if so, that's the first /I/ remember reading of it. Other than that detail, what you posted matches my knowledge and experience, such as it may be as a non-dev list regular, as well. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great oxymorons of the world, no. 7: The Simple Truth --- signature.asc Description: Digital signature
Re: What is the vision for btrfs fs repair?
On 2014-10-09 07:53, Duncan wrote: Austin S Hemmelgarn posted on Thu, 09 Oct 2014 07:29:23 -0400 as excerpted: Also, you should be running btrfs scrub regularly to correct bit-rot and force remapping of blocks with read errors. While BTRFS technically handles both transparently on reads, it only corrects thing on disk when you do a scrub. AFAIK that isn't quite correct. Currently, the number of copies is limited to two, meaning if one of the two is bad, there's a 50% chance of btrfs reading the good one on first try. If btrfs reads the good copy, it simply uses it. If btrfs reads the bad one, it checks the other one and assuming it's good, replaces the bad one with the good one both for the read (which otherwise errors out), and by overwriting the bad one. But here's the rub. The chances of detecting that bad block are relatively low in most cases. First, the system must try reading it for some reason, but even then, chances are 50% it'll pick the good one and won't even notice the bad one. Thus, while btrfs may randomly bump into a bad block and rewrite it with the good copy, scrub is the only way to systematically detect and (if there's a good copy) fix these checksum errors. It's not that btrfs doesn't do it if it finds them, it's that the chances of finding them are relatively low, unless you do a scrub, which systematically checks the entire filesystem (well, other than files marked nocsum, or nocow, which implies nocsum, or files written when mounted with nodatacow or nodatasum). At least that's the way it /should/ work. I guess it's possible that btrfs isn't doing those routine bump-into-it-and-fix-it fixes yet, but if so, that's the first /I/ remember reading of it. I'm not 100% certain, but I believe it doesn't actually fix things on disk when it detects an error during a read, I know it doesn't it the fs is mounted ro (even if the media is writable), because I did some testing to see how 'read-only' mounting a btrfs filesystem really is. Also, that's a much better description of how multiple copies work than I could probably have ever given. smime.p7s Description: S/MIME Cryptographic Signature
Re: What is the vision for btrfs fs repair?
On Thu, Oct 09, 2014 at 08:07:51AM -0400, Austin S Hemmelgarn wrote: On 2014-10-09 07:53, Duncan wrote: Austin S Hemmelgarn posted on Thu, 09 Oct 2014 07:29:23 -0400 as excerpted: Also, you should be running btrfs scrub regularly to correct bit-rot and force remapping of blocks with read errors. While BTRFS technically handles both transparently on reads, it only corrects thing on disk when you do a scrub. AFAIK that isn't quite correct. Currently, the number of copies is limited to two, meaning if one of the two is bad, there's a 50% chance of btrfs reading the good one on first try. If btrfs reads the good copy, it simply uses it. If btrfs reads the bad one, it checks the other one and assuming it's good, replaces the bad one with the good one both for the read (which otherwise errors out), and by overwriting the bad one. But here's the rub. The chances of detecting that bad block are relatively low in most cases. First, the system must try reading it for some reason, but even then, chances are 50% it'll pick the good one and won't even notice the bad one. Thus, while btrfs may randomly bump into a bad block and rewrite it with the good copy, scrub is the only way to systematically detect and (if there's a good copy) fix these checksum errors. It's not that btrfs doesn't do it if it finds them, it's that the chances of finding them are relatively low, unless you do a scrub, which systematically checks the entire filesystem (well, other than files marked nocsum, or nocow, which implies nocsum, or files written when mounted with nodatacow or nodatasum). At least that's the way it /should/ work. I guess it's possible that btrfs isn't doing those routine bump-into-it-and-fix-it fixes yet, but if so, that's the first /I/ remember reading of it. I'm not 100% certain, but I believe it doesn't actually fix things on disk when it detects an error during a read, I'm fairly sure it does, as I've had it happen to me. :) I know it doesn't it the fs is mounted ro (even if the media is writable), because I did some testing to see how 'read-only' mounting a btrfs filesystem really is. If the FS is RO, then yes, it won't fix things. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great films about cricket: Interview with the Umpire --- signature.asc Description: Digital signature
Re: What is the vision for btrfs fs repair?
On 2014-10-09 08:12, Hugo Mills wrote: On Thu, Oct 09, 2014 at 08:07:51AM -0400, Austin S Hemmelgarn wrote: On 2014-10-09 07:53, Duncan wrote: Austin S Hemmelgarn posted on Thu, 09 Oct 2014 07:29:23 -0400 as excerpted: Also, you should be running btrfs scrub regularly to correct bit-rot and force remapping of blocks with read errors. While BTRFS technically handles both transparently on reads, it only corrects thing on disk when you do a scrub. AFAIK that isn't quite correct. Currently, the number of copies is limited to two, meaning if one of the two is bad, there's a 50% chance of btrfs reading the good one on first try. If btrfs reads the good copy, it simply uses it. If btrfs reads the bad one, it checks the other one and assuming it's good, replaces the bad one with the good one both for the read (which otherwise errors out), and by overwriting the bad one. But here's the rub. The chances of detecting that bad block are relatively low in most cases. First, the system must try reading it for some reason, but even then, chances are 50% it'll pick the good one and won't even notice the bad one. Thus, while btrfs may randomly bump into a bad block and rewrite it with the good copy, scrub is the only way to systematically detect and (if there's a good copy) fix these checksum errors. It's not that btrfs doesn't do it if it finds them, it's that the chances of finding them are relatively low, unless you do a scrub, which systematically checks the entire filesystem (well, other than files marked nocsum, or nocow, which implies nocsum, or files written when mounted with nodatacow or nodatasum). At least that's the way it /should/ work. I guess it's possible that btrfs isn't doing those routine bump-into-it-and-fix-it fixes yet, but if so, that's the first /I/ remember reading of it. I'm not 100% certain, but I believe it doesn't actually fix things on disk when it detects an error during a read, I'm fairly sure it does, as I've had it happen to me. :) I probably just misinterpreted the source code, while I know enough C to generally understand things, I'm by far no expert. I know it doesn't it the fs is mounted ro (even if the media is writable), because I did some testing to see how 'read-only' mounting a btrfs filesystem really is. If the FS is RO, then yes, it won't fix things. Hugo. smime.p7s Description: S/MIME Cryptographic Signature
Re: What is the vision for btrfs fs repair?
On Thu, 09 Oct 2014 08:07:51 -0400 Austin S Hemmelgarn ahferro...@gmail.com wrote: On 2014-10-09 07:53, Duncan wrote: Austin S Hemmelgarn posted on Thu, 09 Oct 2014 07:29:23 -0400 as excerpted: Also, you should be running btrfs scrub regularly to correct bit-rot and force remapping of blocks with read errors. While BTRFS technically handles both transparently on reads, it only corrects thing on disk when you do a scrub. AFAIK that isn't quite correct. Currently, the number of copies is limited to two, meaning if one of the two is bad, there's a 50% chance of btrfs reading the good one on first try. If btrfs reads the good copy, it simply uses it. If btrfs reads the bad one, it checks the other one and assuming it's good, replaces the bad one with the good one both for the read (which otherwise errors out), and by overwriting the bad one. But here's the rub. The chances of detecting that bad block are relatively low in most cases. First, the system must try reading it for some reason, but even then, chances are 50% it'll pick the good one and won't even notice the bad one. Thus, while btrfs may randomly bump into a bad block and rewrite it with the good copy, scrub is the only way to systematically detect and (if there's a good copy) fix these checksum errors. It's not that btrfs doesn't do it if it finds them, it's that the chances of finding them are relatively low, unless you do a scrub, which systematically checks the entire filesystem (well, other than files marked nocsum, or nocow, which implies nocsum, or files written when mounted with nodatacow or nodatasum). At least that's the way it /should/ work. I guess it's possible that btrfs isn't doing those routine bump-into-it-and-fix-it fixes yet, but if so, that's the first /I/ remember reading of it. I'm not 100% certain, but I believe it doesn't actually fix things on disk when it detects an error during a read, I know it doesn't it the fs is mounted ro (even if the media is writable), because I did some testing to see how 'read-only' mounting a btrfs filesystem really is. Definitely it won't with a read-only mount. But then scrub shouldn't be able to write to a read-only mount either. The only way a read-only mount should be writable is if it's mounted (bind-mounted or btrfs-subvolume-mounted) read-write elsewhere, and the write occurs to that mount, not the read-only mounted location. There's even debate about replaying the journal or doing orphan-delete on read-only mounts (at least on-media, the change could, and arguably should, occur in RAM and be cached, marking the cache dirty at the same time so it's appropriately flushed if/when the filesystem goes writable), with some arguing read-only means just that, don't write /anything/ to it until it's read-write mounted. But writable-mounted, detected checksum errors (with a good copy available) should be rewritten as far as I know. If not, I'd call it a bug. The problem is in the detection, not in the rewriting. Scrub's the only way to reliably detect these errors since it's the only thing that systematically checks /everything/. Also, that's a much better description of how multiple copies work than I could probably have ever given. Thanks. =:^) -- Duncan - No HTML messages please, as they are filtered as spam. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is the vision for btrfs fs repair?
On Thu, 9 Oct 2014 12:55:50 +0100 Hugo Mills h...@carfax.org.uk wrote: On Thu, Oct 09, 2014 at 11:53:23AM +, Duncan wrote: Austin S Hemmelgarn posted on Thu, 09 Oct 2014 07:29:23 -0400 as excerpted: Also, you should be running btrfs scrub regularly to correct bit-rot and force remapping of blocks with read errors. While BTRFS technically handles both transparently on reads, it only corrects thing on disk when you do a scrub. AFAIK that isn't quite correct. Currently, the number of copies is limited to two, meaning if one of the two is bad, there's a 50% chance of btrfs reading the good one on first try. Scrub checks both copies, though. It's ordinary reads that don't. While I believe I was clear in full context (see below), agreed. I was talking about normal reads in the above, not scrub, as the full quote should make clear. I guess I could have made it clearer in the immediate context, however. Thanks. Thus, while btrfs may randomly bump into a bad block and rewrite it with the good copy, scrub is the only way to systematically detect and (if there's a good copy) fix these checksum errors. -- Duncan - No HTML messages please, as they are filtered as spam. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is the vision for btrfs fs repair?
On 2014-10-09 08:34, Duncan wrote: On Thu, 09 Oct 2014 08:07:51 -0400 Austin S Hemmelgarn ahferro...@gmail.com wrote: On 2014-10-09 07:53, Duncan wrote: Austin S Hemmelgarn posted on Thu, 09 Oct 2014 07:29:23 -0400 as excerpted: Also, you should be running btrfs scrub regularly to correct bit-rot and force remapping of blocks with read errors. While BTRFS technically handles both transparently on reads, it only corrects thing on disk when you do a scrub. AFAIK that isn't quite correct. Currently, the number of copies is limited to two, meaning if one of the two is bad, there's a 50% chance of btrfs reading the good one on first try. If btrfs reads the good copy, it simply uses it. If btrfs reads the bad one, it checks the other one and assuming it's good, replaces the bad one with the good one both for the read (which otherwise errors out), and by overwriting the bad one. But here's the rub. The chances of detecting that bad block are relatively low in most cases. First, the system must try reading it for some reason, but even then, chances are 50% it'll pick the good one and won't even notice the bad one. Thus, while btrfs may randomly bump into a bad block and rewrite it with the good copy, scrub is the only way to systematically detect and (if there's a good copy) fix these checksum errors. It's not that btrfs doesn't do it if it finds them, it's that the chances of finding them are relatively low, unless you do a scrub, which systematically checks the entire filesystem (well, other than files marked nocsum, or nocow, which implies nocsum, or files written when mounted with nodatacow or nodatasum). At least that's the way it /should/ work. I guess it's possible that btrfs isn't doing those routine bump-into-it-and-fix-it fixes yet, but if so, that's the first /I/ remember reading of it. I'm not 100% certain, but I believe it doesn't actually fix things on disk when it detects an error during a read, I know it doesn't it the fs is mounted ro (even if the media is writable), because I did some testing to see how 'read-only' mounting a btrfs filesystem really is. Definitely it won't with a read-only mount. But then scrub shouldn't be able to write to a read-only mount either. The only way a read-only mount should be writable is if it's mounted (bind-mounted or btrfs-subvolume-mounted) read-write elsewhere, and the write occurs to that mount, not the read-only mounted location. In theory yes, but there are caveats to this, namely: * atime updates still happen unless you have mounted the fs with noatime * The superblock gets updated if there are 'any' writes * The free space cache 'might' be updated if there are any writes All in all, a BTRFS filesystem mounted ro is much more read-only than say ext4 (which at least updates the sb, and old versions replayed the journal, in addition to the atime updates). There's even debate about replaying the journal or doing orphan-delete on read-only mounts (at least on-media, the change could, and arguably should, occur in RAM and be cached, marking the cache dirty at the same time so it's appropriately flushed if/when the filesystem goes writable), with some arguing read-only means just that, don't write /anything/ to it until it's read-write mounted. But writable-mounted, detected checksum errors (with a good copy available) should be rewritten as far as I know. If not, I'd call it a bug. The problem is in the detection, not in the rewriting. Scrub's the only way to reliably detect these errors since it's the only thing that systematically checks /everything/. Also, that's a much better description of how multiple copies work than I could probably have ever given. Thanks. =:^) smime.p7s Description: S/MIME Cryptographic Signature
Re: What is the vision for btrfs fs repair?
Austin S Hemmelgarn posted on Thu, 09 Oct 2014 09:18:22 -0400 as excerpted: On 2014-10-09 08:34, Duncan wrote: The only way a read-only mount should be writable is if it's mounted (bind-mounted or btrfs-subvolume-mounted) read-write elsewhere, and the write occurs to that mount, not the read-only mounted location. In theory yes, but there are caveats to this, namely: * atime updates still happen unless you have mounted the fs with noatime I've been mounting noatime for well over a decade now, exactly due to such problems. But I believe at least /some/ filesystems are truly read- only when they're mounted as such, and atime updates don't happen on them. These days I actually apply a patch that changes the default relatime to noatime, so I don't even have to have it in my mount-options. =:^) * The superblock gets updated if there are 'any' writes Yeah. At least in theory, there shouldn't be, however. As I said, in theory, even journal replay and orphan delete shouldn't hit media, altho handling it in memory and dirtying the cache, so if the filesystem is ever remounted read-write they get written, is reasonable. * The free space cache 'might' be updated if there are any writes Makes sense. But of course that's what I'm arguing, there shouldn't /be/ any writes. Read-only should mean exactly that, don't touch media, period. I remember at one point activating an mdraid1 degraded, read-only, just a single device of the 4-way raid1 I was running at the time, to recover data from it after the system it was running in died. The idea was don't write to the device at all, because I was still testing the new system, and in case I decided to try to reassemble the raid at some point. Read- only really NEEDS to be read-only, under such conditions. Similarly for forensic examination, of course. If there's a write, any write, it's evidence tampering. Read-only needs to MEAN read-only! -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
deadlock with 3.16.3
running wheezy Debian 3.16.3-2~bpo70+1 system has locked up 2 nights in a row running rsync copying from remote to a ~100TB btrfs. Only job running on the server, no interactive users or anything. soft locks showed up in kern.log across many CPUs shortly before system became non-responsive. First lines of the call traces via: grep '8 17:12' /var/log/kern.log | cut -d --complement -f 1,2,3,4,5,6 | grep -A1 'Call Trace:' | egrep -v '\-\-|Call Trace:' I've gone back to 3.14, which I've never had issues with. sar report from 5 minutes prior looked pretty normal, no swap depletion. Let me know if you want more info [30328.735511] [a0227df8] ? btrfs_lookup_file_extent+0x38/0x40 [btrfs] [30328.747283] [a026d353] ? btrfs_tree_lock+0xd3/0x1e0 [btrfs] [30328.759289] [a026d353] ? btrfs_tree_lock+0xd3/0x1e0 [btrfs] [30328.775303] [a026d353] ? btrfs_tree_lock+0xd3/0x1e0 [btrfs] [30328.787311] [a026d353] ? btrfs_tree_lock+0xd3/0x1e0 [btrfs] [30328.799324] [a026cfaf] ? btrfs_tree_read_lock+0x3f/0x110 [btrfs] [30328.839351] [a026cfaf] ? btrfs_tree_read_lock+0x3f/0x110 [btrfs] [30328.855363] [a026d353] ? btrfs_tree_lock+0xd3/0x1e0 [btrfs] [30328.867373] [a026cecc] ? btrfs_clear_lock_blocking_rw+0x4c/0xf0 [btrfs] [30328.879384] [a026cfaf] ? btrfs_tree_read_lock+0x3f/0x110 [btrfs] [30328.891394] [a026cecc] ? btrfs_clear_lock_blocking_rw+0x4c/0xf0 [btrfs] [30328.907407] [a026d353] ? btrfs_tree_lock+0xd3/0x1e0 [btrfs] [30328.919396] [a026cfaf] ? btrfs_tree_read_lock+0x3f/0x110 [btrfs] -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs balance segfault, kernel BUG at fs/btrfs/extent-tree.c:7727
Hello, I have trouble finishing btrfs balance on five disk raid10 fs. I added a disk to 4x3TB raid10 fs and run btrfs balance start /mnt/b3, which segfaulted after few hours, probably because of the BUG below. btrfs check does not find any errors, both before the balance and after reboot (the fs becomes un-umountable). This was second attempt of balance run, the first one ended the same, see Comment 10 on bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=64961 There are ~7.5M files on /mnt/b3; one subvolume with 4.8M files has been snapshot 85 times. root@fs0:~# uname -a Linux fs0 3.17.0 #10 SMP Mon Oct 6 11:31:13 CEST 2014 x86_64 GNU/Linux root@fs0:~# btrfs fi show /mnt/b3 Label: 'BTR3' uuid: f181dd81-c219-44fc-b113-3a8cfd0d3295 Total devices 5 FS bytes used 2.35TiB devid1 size 2.73TiB used 1.05TiB path /dev/sde devid2 size 2.73TiB used 1.05TiB path /dev/sdf devid3 size 2.73TiB used 1.05TiB path /dev/sdg devid4 size 2.73TiB used 1.05TiB path /dev/sdh devid5 size 3.64TiB used 524.03GiB path /dev/sdp Btrfs v3.16 root@fs0:~# btrfs fi df /mnt/b3 Data, RAID10: total=2.34TiB, used=2.34TiB System, RAID1: total=32.00MiB, used=304.00KiB Metadata, RAID1: total=15.00GiB, used=13.75GiB unknown, single: total=512.00MiB, used=496.00KiB [22717.728944] BTRFS info (device sdp): relocating block group 299458816 flags 65 [22735.276539] BTRFS info (device sdp): found 60187 extents [22744.233882] [ cut here ] [22744.238559] WARNING: CPU: 0 PID: 4211 at fs/btrfs/extent-tree.c:876 btrfs_lookup_extent_info+0x292/0x30a [btrfs]() [22744.248953] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc xfs libcrc32c loop raid10 md_mod iTCO_wdt x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel iTCO_vendor_support aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper lpc_ich mfd_core i2c_i801 i2c_core pcspkr psmouse evdev microcode serio_raw battery ipmi_si ipmi_msghandler video tpm_tis tpm button acpi_cpufreq processor ie31200_edac edac_core btrfs xor raid6_pq sg sd_mod uas usb_storage hid_generic usbhid hid ahci libahci mpt2sas raid_class libata scsi_transport_sas crc32c_intel ehci_pci e1000e ehci_hcd ptp pps_core scsi_mod thermal fan thermal_sys usbcore usb_common [22744.312827] CPU: 0 PID: 4211 Comm: btrfs Not tainted 3.17.0 #10 [22744.318770] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0a 06/08/2012 [22744.326475] 0009 813a6a46 [22744.333983] 8103b591 c0593cc4 88027beef380 [22744.341503] 88028da8ff50 880136d1a000 [22744.349019] Call Trace: [22744.351493] [813a6a46] ? dump_stack+0x41/0x51 [22744.356827] [8103b591] ? warn_slowpath_common+0x78/0x90 [22744.363037] [c0593cc4] ? btrfs_lookup_extent_info+0x292/0x30a [btrfs] [22744.370471] [c0593cc4] ? btrfs_lookup_extent_info+0x292/0x30a [btrfs] [22744.377912] [c059438f] ? walk_down_proc+0xaf/0x1e3 [btrfs] [22744.384373] [8110bc2a] ? kmem_cache_alloc+0x91/0x104 [22744.390321] [c05965b8] ? walk_down_tree+0x40/0xa9 [btrfs] [22744.396706] [c0598f3e] ? btrfs_drop_snapshot+0x2c4/0x656 [btrfs] [22744.403702] [c05e6297] ? merge_reloc_roots+0xf0/0x1ca [btrfs] [22744.410434] [c05e6972] ? relocate_block_group+0x445/0x4bd [btrfs] [22744.417520] [c05e6b39] ? btrfs_relocate_block_group+0x14f/0x267 [btrfs] [22744.425138] [c05c56b7] ? btrfs_relocate_chunk.isra.58+0x58/0x5e2 [btrfs] [22744.432862] [c0586786] ? btrfs_item_key_to_cpu+0x12/0x30 [btrfs] [22744.439851] [c05ba695] ? btrfs_get_token_64+0x76/0xc6 [btrfs] [22744.446590] [c05c190b] ? release_extent_buffer+0x9d/0xa4 [btrfs] [22744.453585] [c05c8186] ? btrfs_balance+0x9b0/0xb9d [btrfs] [22744.460064] [c05cf646] ? btrfs_ioctl_balance+0x21a/0x297 [btrfs] [22744.467057] [c05d2462] ? btrfs_ioctl+0x10f4/0x20a5 [btrfs] [22744.473531] [81121b9e] ? path_openat+0x233/0x4c5 [22744.479129] [81030620] ? __do_page_fault+0x339/0x3df [22744.485072] [810f2b9c] ? __vma_link_rb+0x58/0x73 [22744.490668] [810f2c22] ? vma_link+0x6b/0x8a [22744.495824] [811237f8] ? do_vfs_ioctl+0x3ec/0x435 [22744.501509] [8105b9e0] ? vtime_account_user+0x35/0x40 [22744.507539] [8112388a] ? SyS_ioctl+0x49/0x77 [22744.512781] [813aafac] ? tracesys+0x7e/0xe2 [22744.517935] [813ab00b] ? tracesys+0xdd/0xe2 [22744.523092] ---[ end trace fac5e12cd6384894 ]--- 22744.527735] [ cut here ] [22744.532378] kernel BUG at fs/btrfs/extent-tree.c:7727! [22744.537532] invalid opcode: [#1] SMP [22744.541684] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc xfs
Re: Fwd: Re: [PATCH] btrfs: add more superblock checks
On Tue, Oct 07, 2014 at 04:51:11PM +0800, Qu Wenruo wrote: + struct btrfs_super_block *sb = fs_info-super_copy; + int ret = 0; + + if (sb-root_level BTRFS_MAX_LEVEL) { + printk(KERN_ERR BTRFS: tree_root level too big: %d %d\n, + sb-root_level, BTRFS_MAX_LEVEL); + ret = -EINVAL; + } + if (sb-chunk_root_level BTRFS_MAX_LEVEL) { + printk(KERN_ERR BTRFS: chunk_root level too big: %d %d\n, + sb-chunk_root_level, BTRFS_MAX_LEVEL); + ret = -EINVAL; + } + if (sb-log_root_level BTRFS_MAX_LEVEL) { + printk(KERN_ERR BTRFS: log_root level too big: %d %d\n, + sb-log_root_level, BTRFS_MAX_LEVEL); + ret = -EINVAL; + } + /* -* Placeholder for checks +* The common minimum, we don't know if we can trust the nodesize/sectorsize +* items yet, they'll be verified later. Issue just a warning. */ - return 0; + if (!IS_ALIGNED(sb-root, 4096)) + printk(KERN_WARNING BTRFS: tree_root block unaligned: %llu\n, + sb-root); + if (!IS_ALIGNED(sb-chunk_root, 4096)) + printk(KERN_WARNING BTRFS: tree_root block unaligned: %llu\n, + sb-chunk_root); + if (!IS_ALIGNED(sb-log_root, 4096)) + printk(KERN_WARNING BTRFS: tree_root block unaligned: %llu\n, + sb-log_root); 1) it is better not to call IS_ALIGNED to immediate value//. Although current btrfs implement ensures that all sectorsize is larger equal than page_size, but Chandan Rajendra is trying to support subpage-sized blocksize, which may cause false alert later. The patch reflects current state, so when the subpage blocksize patches are merged, this will have to be changed accordingly. It would be much better using btrfs_super_sectorsize() instead to improve extendability. See the comment above, we don't trust the superblock yet and cannot use the sectorsize reliably. 2) missing endian convert. On big endian system it would be a disaster btrfs_super_* marco should be used. Thanks, will fix it. + if (memcmp(fs_info-fsid, sb-dev_item.fsid, BTRFS_UUID_SIZE) != 0) { + printk(KERN_ERR BTRFS: dev_item UUID does not match fsid: %pU != %pU\n, + fs_info-fsid, sb-dev_item.fsid); + ret = -EINVAL; + } + + /* +* Hint to catch really bogus numbers, bitflips or so, more exact checks are +* done later +*/ + if (sb-num_devices (1UL 31)) + printk(KERN_WARNING BTRFS: suspicious number of devices: %llu\n, + sb-num_devices); What about also check the devid with sb-num_devices too? Every valid devid should be less equal than sb-num_devices if I am right. Although iterate dev_item here may be overkilled... This could be done of course, I've tried to keep the checks very small and using only directly accessible information. More is possible of course. + + if (sb-bytenr != BTRFS_SUPER_INFO_OFFSET) { + printk(KERN_ERR BTRFS: super offset mismatch %llu != %u\n, + sb-bytenr, BTRFS_SUPER_INFO_OFFSET); + ret = -EINVAL; + } + + /* +* The generation is a global counter, we'll trust it more than the others +* but it's still possible that it's the one that's wrong. +*/ + if (sb-generation sb-chunk_root_generation) + printk(KERN_WARNING + BTRFS: suspicious: generation chunk_root_generation: %llu %llu\n, + sb-generation, sb-chunk_root_generation); + if (sb-generation sb-cache_generation sb-cache_generation != (u64)-1) + printk(KERN_WARNING + BTRFS: suspicious: generation cache_generation: %llu %llu\n, + sb-generation, sb-cache_generation); + + return ret; } Still the endian problem. Will fix, thanks. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: move struct btrfs_ioctl_defrag_range_args from ctree.h to linux/btrfs.h
On Wed, Oct 08, 2014 at 01:23:41AM +0100, Marios Titas wrote: include/uapi/linux/btrfs.h is a more logical place to put the struct btrfs_ioctl_defrag_range_args as it is being used by the BTRFS_IOC_DEFRAG_RANGE IOCTL which is defined in that file. Additionally, this is where the btrfs-progs defines that struct. Thus this patches reduces the gap between the btrfs-progs headers and the kernel headers. Signed-off-by: Marios Titas red...@gmx.com Reviewed-by: David Sterba dste...@suse.cz -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is the vision for btrfs fs repair?
On 10/9/14 8:49 AM, Duncan wrote: Austin S Hemmelgarn posted on Thu, 09 Oct 2014 09:18:22 -0400 as excerpted: On 2014-10-09 08:34, Duncan wrote: The only way a read-only mount should be writable is if it's mounted (bind-mounted or btrfs-subvolume-mounted) read-write elsewhere, and the write occurs to that mount, not the read-only mounted location. In theory yes, but there are caveats to this, namely: * atime updates still happen unless you have mounted the fs with noatime Getting off the topic a bit, but that really shouldn't happen: #define IS_NOATIME(inode) __IS_FLG(inode, MS_RDONLY|MS_NOATIME) and in touch_atime(): if (IS_NOATIME(inode)) return; -Eric -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Confusion with newly converted filesystem
I will try to explain what I have done, but also try to keep it fairly short. I installed a Linux distro that does not support installing to btrfs to an ext4. I ran dist-upgrade to ensure that I have the latest btrfs-tools. I upgraded the Debian kernel from 3.13 to 3.16.3. When all this was completed, there was something like 900 MB used on a 40 GB partition. Next, I booted to another distro (Arch Linux) which also has the latest kernel and btrfs-progs. I ran btrfs-convert /dev/sda6. When I rebooted to the new Debian system, the btrfs was mounted read-only. btrfs fi show / showed all 40 GB as used. I did some internet research, then I remounted the filesystem as rw and added another 40 GB partition on a separate disk drive. Then I ran btrfs balance start -dusage=30. This seemed to stabilize the filesystem to the point that it is usable. I proceeded with my original plan, which was to make it a two-drive RAID filesystem, using -dconvert =raid0 -mconvert=raid1. This succeeded, but the data and metadata usage stats still look all out of whack. After several rebalance attempts, my usage stats look like the following: btrfs fi show / shows a total usage of 1.76 GB, with 40 GB allocated and 14.03 GB used on each device. btrfs fi df / shows total data of 2 GB allocated with 1.69 GB used and metadata of 13 GB total with 72.41 MB used. Why is 13 GB needed for 72 MB of metadata? Is there any understandable way to fix this? I am not a newbie, but am by no means an expert with btrfs Thank you, Tim -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Two uncorrectable errors across RAID1 at same logical block?
On 10/9/2014 12:13 AM, Liu Bo wrote: sudo ./btrfs inspect-internal logical-resolve -v 58464632832 / $ sudo ./btrfs inspect-internal logical-resolve -v 58464632832 / ioctl ret=0, total_size=4096, bytes_left=4080, bytes_missing=0, cnt=0, missed=0 I also tried -P and -s 1 Also did this: $ sudo ./btrfs-map-logical -l 58464632832 -o /tmp/58464632832 /dev/sdf3 mirror 1 logical 58464632832 physical 1536393216 device /dev/sdg3 mirror 2 logical 58464632832 physical 58464632832 device /dev/sdf3 And looked at the 4k block. strings doesn't show anything useful: +V0T File doesn't recognize it as anything particular. Weird. I have one other clue which I think is irrelevant. I had another error on a different drive/different fs and it turned out to be the vmem file for a virtual machine under vmware workstation. I deleted the file since it was just the memory image and the error went away. It was easy to map the bad block to the file from dmesg and the inode. I may have also created a vm at some point on this drive we're looking at now and then moved it. So I think that information is not relevant... but maybe you've seen this before. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: Confusion with newly converted filesystem
Never mind. I have stumbled my way into a solution. I ran btrfs subv delete /ext2_saved. Then I ran btrfs balance start /. That relocated 15 of 15 chunks. Now fi show shows 2.03 GB used on each device and fi df shows 1 GB of metadata total. Apparently, that saved ext4 subvolume was a real mess. Tim -- Forwarded message -- From: Tim Cuthbertson ratch...@gmail.com Date: Thu, Oct 9, 2014 at 11:41 AM Subject: Confusion with newly converted filesystem To: linux-btrfs@vger.kernel.org I will try to explain what I have done, but also try to keep it fairly short. I installed a Linux distro that does not support installing to btrfs to an ext4. I ran dist-upgrade to ensure that I have the latest btrfs-tools. I upgraded the Debian kernel from 3.13 to 3.16.3. When all this was completed, there was something like 900 MB used on a 40 GB partition. Next, I booted to another distro (Arch Linux) which also has the latest kernel and btrfs-progs. I ran btrfs-convert /dev/sda6. When I rebooted to the new Debian system, the btrfs was mounted read-only. btrfs fi show / showed all 40 GB as used. I did some internet research, then I remounted the filesystem as rw and added another 40 GB partition on a separate disk drive. Then I ran btrfs balance start -dusage=30. This seemed to stabilize the filesystem to the point that it is usable. I proceeded with my original plan, which was to make it a two-drive RAID filesystem, using -dconvert =raid0 -mconvert=raid1. This succeeded, but the data and metadata usage stats still look all out of whack. After several rebalance attempts, my usage stats look like the following: btrfs fi show / shows a total usage of 1.76 GB, with 40 GB allocated and 14.03 GB used on each device. btrfs fi df / shows total data of 2 GB allocated with 1.69 GB used and metadata of 13 GB total with 72.41 MB used. Why is 13 GB needed for 72 MB of metadata? Is there any understandable way to fix this? I am not a newbie, but am by no means an expert with btrfs Thank you, Tim -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Btrfs: make inode.c:compress_file_range() return void
Its return value is useless, its single caller ignores it and can't do anything with it anyway, since it's a workqueue task and not the task calling filemap_fdatawrite_range (writepages) nor filemap_fdatawait_range(). Failure is communicated to such functions via start and end of writeback with the respective pages tagged with an error and AS_EIO flag set in the inode's imapping. Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/inode.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b91a171..aef0fa3 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -382,7 +382,7 @@ static inline int inode_need_compress(struct inode *inode) * are written in the same order that the flusher thread sent them * down. */ -static noinline int compress_file_range(struct inode *inode, +static noinline void compress_file_range(struct inode *inode, struct page *locked_page, u64 start, u64 end, struct async_cow *async_cow, @@ -621,8 +621,7 @@ cleanup_and_bail_uncompressed: *num_added += 1; } -out: - return ret; + return; free_pages_out: for (i = 0; i nr_pages_ret; i++) { @@ -630,8 +629,6 @@ free_pages_out: page_cache_release(pages[i]); } kfree(pages); - - goto out; } static void free_async_extent_pages(struct async_extent *async_extent) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Btrfs: report error after failure inlining extent in compressed write path
If cow_file_range_inline() failed, when called from compress_file_range(), we were tagging the locked page for writeback, end its writeback and unlock it, but not marking it with an error nor setting AS_EIO in inode's mapping flags. This made it impossible for a caller of filemap_fdatawrite_range (writepages) or filemap_fdatawait_range() to know that an error happened. And the return value of compress_file_range() is useless because it's returned to a workqueue task and not to the task calling filemap_fdatawrite_range (writepages). This change applies on top of the previous patchset starting at the patch titled: [1/5] Btrfs: set page and mapping error on compressed write failure Which changed extent_clear_unlock_delalloc() to use SetPageError and mapping_set_error(). Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/inode.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 7635b1d..b91a171 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -538,6 +538,7 @@ cont: clear_flags, PAGE_UNLOCK | PAGE_CLEAR_DIRTY | PAGE_SET_WRITEBACK | +PAGE_SET_ERROR | PAGE_END_WRITEBACK); goto free_pages_out; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Btrfs: correctly flush compressed data before/after direct IO
For compressed writes, after doing the first filemap_fdatawrite_range() we don't get the pages tagged for writeback immediately. Instead we create a workqueue task, which is run by other kthread, and keep the pages locked. That other kthread compresses data, creates the respective ordered extent/s, tags the pages for writeback and unlocks them. Therefore we need a second call to filemap_fdatawrite_range() if we have compressed writes, as this second call will wait for the pages to become unlocked, then see they became tagged for writeback and finally wait for the writeback to finish. Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/file.c | 12 +++- fs/btrfs/inode.c | 16 +--- 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 29b147d..82c7229 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1692,8 +1692,18 @@ static ssize_t __btrfs_direct_write(struct kiocb *iocb, err = written_buffered; goto out; } + /* +* Ensure all data is persisted. We want the next direct IO read to be +* able to read what was just written. +*/ endbyte = pos + written_buffered - 1; - err = filemap_write_and_wait_range(file-f_mapping, pos, endbyte); + err = filemap_fdatawrite_range(file-f_mapping, pos, endbyte); + if (!err test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, +BTRFS_I(file_inode(file))-runtime_flags)) + err = filemap_fdatawrite_range(file-f_mapping, pos, endbyte); + if (err) + goto out; + err = filemap_fdatawait_range(file-f_mapping, pos, endbyte); if (err) goto out; written += written_buffered; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index aef0fa3..752ff18 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7052,9 +7052,19 @@ static int lock_extent_direct(struct inode *inode, u64 lockstart, u64 lockend, btrfs_put_ordered_extent(ordered); } else { /* Screw you mmap */ - ret = filemap_write_and_wait_range(inode-i_mapping, - lockstart, - lockend); + ret = filemap_fdatawrite_range(inode-i_mapping, + lockstart, + lockend); + if (!ret test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, +BTRFS_I(inode)-runtime_flags)) + ret = filemap_fdatawrite_range(inode-i_mapping, + lockstart, + lockend); + if (ret) + break; + ret = filemap_fdatawait_range(inode-i_mapping, + lockstart, + lockend); if (ret) break; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Btrfs: add helper btrfs_fdatawrite_range
To avoid duplicating this double filemap_fdatawrite_range() call for inodes with async extents (compressed writes) so often. Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/ctree.h| 1 + fs/btrfs/file.c | 36 fs/btrfs/inode.c| 9 + fs/btrfs/ordered-data.c | 24 ++-- 4 files changed, 32 insertions(+), 38 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 089f6da..4e0ad8c 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3896,6 +3896,7 @@ int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode, struct page **pages, size_t num_pages, loff_t pos, size_t write_bytes, struct extent_state **cached); +int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end); /* tree-defrag.c */ int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 82c7229..2df1dce 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1697,10 +1697,7 @@ static ssize_t __btrfs_direct_write(struct kiocb *iocb, * able to read what was just written. */ endbyte = pos + written_buffered - 1; - err = filemap_fdatawrite_range(file-f_mapping, pos, endbyte); - if (!err test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, -BTRFS_I(file_inode(file))-runtime_flags)) - err = filemap_fdatawrite_range(file-f_mapping, pos, endbyte); + err = btrfs_fdatawrite_range(file-f_mapping, pos, endbyte); if (err) goto out; err = filemap_fdatawait_range(file-f_mapping, pos, endbyte); @@ -1864,10 +1861,7 @@ static int start_ordered_ops(struct inode *inode, loff_t start, loff_t end) int ret; atomic_inc(BTRFS_I(inode)-sync_writers); - ret = filemap_fdatawrite_range(inode-i_mapping, start, end); - if (!ret test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, -BTRFS_I(inode)-runtime_flags)) - ret = filemap_fdatawrite_range(inode-i_mapping, start, end); + ret = btrfs_fdatawrite_range(inode-i_mapping, start, end); atomic_dec(BTRFS_I(inode)-sync_writers); return ret; @@ -2820,3 +2814,29 @@ int btrfs_auto_defrag_init(void) return 0; } + +int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end) +{ + int ret; + + /* +* So with compression we will find and lock a dirty page and clear the +* first one as dirty, setup an async extent, and immediately return +* with the entire range locked but with nobody actually marked with +* writeback. So we can't just filemap_write_and_wait_range() and +* expect it to work since it will just kick off a thread to do the +* actual work. So we need to call filemap_fdatawrite_range _again_ +* since it will wait on the page lock, which won't be unlocked until +* after the pages have been marked as writeback and so we're good to go +* from there. We have to do this otherwise we'll miss the ordered +* extents and that results in badness. Please Josef, do not think you +* know better and pull this out at some point in the future, it is +* right and you are wrong. +*/ + ret = filemap_fdatawrite_range(inode-i_mapping, start, end); + if (!ret test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, +BTRFS_I(inode)-runtime_flags)) + ret = filemap_fdatawrite_range(inode-i_mapping, start, end); + + return ret; +} diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 752ff18..be955481 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7052,14 +7052,7 @@ static int lock_extent_direct(struct inode *inode, u64 lockstart, u64 lockend, btrfs_put_ordered_extent(ordered); } else { /* Screw you mmap */ - ret = filemap_fdatawrite_range(inode-i_mapping, - lockstart, - lockend); - if (!ret test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, -BTRFS_I(inode)-runtime_flags)) - ret = filemap_fdatawrite_range(inode-i_mapping, - lockstart, - lockend); + ret = btrfs_fdatawrite_range(inode, lockstart, lockend); if (ret) break; ret = filemap_fdatawait_range(inode-i_mapping, diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index ac734ec..1401b1a 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -725,30
Re: What is the vision for btrfs fs repair?
On Oct 8, 2014, at 3:11 PM, Eric Sandeen sand...@redhat.com wrote: I was looking at Marc's post: http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html and it feels like there isn't exactly a cohesive, overarching vision for repair of a corrupted btrfs filesystem. It's definitely confusing compared to any other filesystem I've used on four different platforms. And that's when excluding scraping and the functions unique to any multiple device volume: scrubs, degraded mount. To be fair, mdadm doesn't even have a scrub command, it's done via 'echo check /sys/block/mdX/md/sync_action'. And meanwhile LVM has pvck, vgck, and for scrubs it's lvchange --syncaction {check|repair}. These are also completely non-obvious. * mount -o recovery Enable autorecovery attempts if a bad tree root is found at mount time. I'm confused why it's not the default yet. Maybe it's continuing to evolve at a pace that suggests something could sneak in that makes things worse? It is almost an oxymoron in that I'm manually enabling an autorecovery If true, maybe the closest indication we'd get of btrfs stablity is the default enabling of autorecovery. * btrfs-zero-log remove the log tree if log tree is corrupt * btrfs rescue Recover a damaged btrfs filesystem chunk-recover super-recover How does this relate to btrfs check? * btrfs check repair a btrfs filesystem --repair --init-csum-tree --init-extent-tree How does this relate to btrfs rescue? These three translate into eight combinations of repairs, adding -o recovery there are 9 combinations. I think this is the main source of confusion, there are just too many options, but also it's completely non-obvious which one to use in which situation. My expectation is that eventually these get consolidated into just check and check --repair. As the repair code matures, it'd go into kernel autorecovery code. That's a guess on my part, but it's consistent with design goals. It feels like recovery tools have been badly splintered, and if there's an overarching design or vision for btrfs fs repair, I can't tell what it is. Can anyone help me? I suspect it's unintended splintering, and is an artifact that will go away. I'd rather the convoluted, fractured nature of repair go away before the scary experimental warnings do. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: Confusion with newly converted filesystem
Tim Cuthbertson posted on Thu, 09 Oct 2014 13:58:58 -0500 as excerpted: I ran btrfs subv delete /ext2_saved. Then I ran btrfs balance start /. That relocated 15 of 15 chunks. Now fi show shows 2.03 GB used on each device and fi df shows 1 GB of metadata total. Apparently, that saved ext4 subvolume was a real mess. Yes and no. The problem is that ext4 and btrfs work rather differently from each other, and btrfs can't manage the saved ext4 subvolume as it would normal btrfs subvolumes because doing so would break the ext4 side, killing the ability to roll-back to ext4 that's the whole point of keeping that dedicated subvolume. So once you are sure you aren't going to roll-back, deleting the ext4 saved subvolume, thus allowing btrfs to manage the entire filesystem without the previously ext4 stuff getting in the way, is high priority. IOW the conversion, like many conversions, is a compromise. It serves a certain purpose, but until the legacy stuff is gone, the new stuff is hobbled and can't be used to full effect. So yes, any btrfs converted from ext4 is going to be a real mess, in btrfs terms, until that ext4 saved subvolume is deleted, because it simply can't manage it like it can native btrfs since doing so would break the ability to roll back to the ext4. But it's an expected mess, and it's only a mess because the native formats differ. The ext4 image can be just fine in ext4, and when it is removed, btrfs is normally just fine as well. It's just the btrfs with the ext4 image still there that's a problem, and that only because the ext4 image isn't really playing by btrfs native rules, so btrfs can't properly manage it. BTW, if it was letting you balance without an error than you probably didn't run into this particular problem that often happens with ext* conversions, likely because the filesystem was new and basically all relatively small (under 1 GiB) distro files, but it's worth knowing about and doing the one additional step, just to be sure, plus for possible future conversions. With ext4, extent size is effectively unlimited. A full 4.7 GiB DVD ISO image file, for instance, properly defragged, can appear as a single 4.7 GiB extent. No problem on ext4 and in fact that'd be the ideal. On btrfs, by contrast, data chunk size, and thus largest possible extent size, is 1 GiB. That 4.7 GiB DVD ISO image would have to be broken up into at least five extents, four of a full GiB each plus the sub-GiB remainder of the file. In practice it'd likely be six extents, the beginning of the file using what was left of the current data chunk, four complete 1 GiB data chunks, and whatever was left beginning a sixth data chunk, that would eventually be filled with other file data as well. Of course the same thing applies to other large files, whatever their content and size. Big VM images are one example, big database files another, big multi-gig archive files yet another, big non-ISO media files again another. As a result, people with these sorts of large files on their originating ext4 filesystem tend to run into problems with btrfs balance, etc, after the conversion, because btrfs balance expects to see extents no larger than the btrfs-native 1 GiB data chunk, and doesn't know what to do with these 1 GiB super-extents. On converted btrfs with this sort of file, balance will simply error out while the ext4 saved subvolume remains, and normally even after it is gone, until a btrfs filesystem defrag is run on the former ext4 content to break up these super-extents into 1 GiB maximum native btrfs data chunks that btrfs in general and btrfs balance in particular can actually handle. Since you didn't run into this problem, you evidently either didn't have any of these 1 GiB files, not surprising on a fresh install, or if you did, they were already fragmented enough for btrfs balance to handle. However, I'd still recommend doing a proper btrfs filesystem defrag and then another balance, the combination of which should ensure that every last bit of what remains of the ext4 formatting is properly converted to btrfs native. Given that you already completed a balance the defrag and rebalance may not matter, but better to do it unnecessarily now and be sure, than to run into problems and /wish/ you had done so later. Additionally, doing it now, before you add too many additional files to the filesystem, will be easier and take less time than doing it later. One more tip while we're talking about defrag: If you don't have any big ( half a GiB) files to deal with, or if you do but they're all essentially static files (like already written media files that aren't going to be edited in-place), I'd strongly recommend using btrfs' autodefrag mount option, which I use on all my btrfs here. OTOH, for large internal rewrite pattern files such as active VM image files, big database files, even big torrented files until they're fully downloaded
Re: [PATCH] btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map
Original Message Subject: Re: [PATCH] btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map From: Filipe David Manana fdman...@gmail.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年10月09日 18:27 On Thu, Oct 9, 2014 at 1:28 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map From: Filipe David Manana fdman...@gmail.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年10月08日 20:08 On Fri, Sep 19, 2014 at 1:31 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map From: Filipe David Manana fdman...@gmail.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年09月18日 21:16 On Wed, Sep 17, 2014 at 4:53 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: The following commit enhanced the merge_extent_mapping() to reduce fragment in extent map tree, but it can't handle case which existing lies before map_start: 51f39 btrfs: Use right extent length when inserting overlap extent map. [BUG] When existing extent map's start is before map_start, the em-len will be minus, which will corrupt the extent map and fail to insert the new extent map. This will happen when someone get a large extent map, but when it is going to insert it into extent map tree, some one has already commit some write and split the huge extent into small parts. This sounds like very deterministic to me. Any reason to not add tests to the sanity tests that exercise this/these case/cases? Yes, thanks for the informing. Will add the test case for it soon. Hi Qu, Any progress on the test? This is a very important one IMHO, not only because of the bad consequences of the bug (extent map corruption, leading to all sorts of chaos), but also because this problem was not found by the full xfstests suite on several developer machines. thanks Still trying to reproduce it under xfstest framework. That's the problem, wasn't apparently reproducible (or detectable at least) by anyone with xfstests. I'll try to build a C program to behave the same of filebench and to see if it works. At least with filebench, it can be triggered in 60s with 100% possibility to reproduce. But even followiiing the FileBench randomrw behavior(1 thread random read 1 thread random write on preallocated space), I still failed to reproduce it. Still investigating how to reproduce it. Worst case may be add a new C program into src of xfstests? How about the sanity tests (fs/btrfs/tests/*.c)? Create an empty map tree, add some extent maps, then try to merge some new extent maps that used to fail before this fix. Seems simple, no? thanks Qu It needs concurrency read and write(commit) to trigger it, I am not sure it can be reproduced in sanity tests since it seems not commit things and lacks multithread facility. I'll give it a try on the filebench-behavior C program first, and then sanity tests if former doesn't work at all Thanks, Qu Thanks, Qu Thanks, Qu Thanks [REPRODUCER] It is very easy to tiger using filebench with randomrw personality. It is about 100% to reproduce when using 8G preallocated file in 60s randonrw test. [FIX] This patch can now handle any existing extent position. Since it does not directly use existing-start, now it will find the previous and next extent around map_start. So the old existing-start map_start bug will never happen again. [ENHANCE] This patch will insert the best fitted extent map into extent map tree, other than the oldest [map_start, map_start + sectorsize) or the relatively newer but not perfect [map_start, existing-start). The patch will first search existing extent that does not intersects with the desired map range [map_start, map_start + len). The existing extent will be either before or behind map_start, and based on the existing extent, we can find out the previous and next extent around map_start. So the best fitted extent would be [prev-end, next-start). For prev or next is not found, em-start would be prev-end and em-end wold be next-start. With this patch, the fragment in extent map tree should be reduced much more than the 51f39 commit and reduce an unneeded extent map tree search. Reported-by: Tsutomu Itoh t-i...@jp.fujitsu.com Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- fs/btrfs/inode.c | 79 1 file changed, 57 insertions(+), 22 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 016c403..8039021 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6191,21 +6191,60 @@ out_fail_inode: goto out_fail; } +/* Find next extent map of a given extent map, caller needs to ensure locks */ +static struct extent_map *next_extent_map(struct extent_map *em) +{ + struct rb_node *next; + + next =
Re: What is the vision for btrfs fs repair?
Chris Murphy posted on Thu, 09 Oct 2014 21:58:53 -0400 as excerpted: I suspect it's unintended splintering, and is an artifact that will go away. I'd rather the convoluted, fractured nature of repair go away before the scary experimental warnings do. Heh, agreed with everything[1], but too late for this, the experimental warnings are peeled off, the experimental or at least horribly immature /behavior/ remains. =:^( --- [1] ... and a much more logically cohesive and well structured reply than I could have managed as my own thoughts simply weren't that well organized. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Confusion with newly converted filesystem
On Oct 9, 2014, at 2:58 PM, Tim Cuthbertson ratch...@gmail.com wrote: Never mind. I have stumbled my way into a solution. I ran btrfs subv delete /ext2_saved. Then I ran btrfs balance start /. That relocated 15 of 15 chunks. Now fi show shows 2.03 GB used on each device and fi df shows 1 GB of metadata total. Apparently, that saved ext4 subvolume was a real mess. Not a mess, it's just a side effect of the conversion while it's still reversible. It's kinda both ext3/4 and Btrfs at the same time after conversion. You can even still mount the snapshot as ext3/4. Once you're ready to commit to Btrfs and not use the ext rollback snapshot, deleting it and balancing completes the conversion. It's more of a metadata duplication via in-line migration. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html