Re: Is stability a joke?
On Sun, Sep 11, 2016 at 02:39:14PM +0200, Waxhead wrote: > That is exactly the same reason I don't edit the wiki myself. I could of > course get it started and hopefully someone will correct what I write, but I > feel that if I start this off I don't have deep enough knowledge to do a > proper start. Perhaps I will change my mind about this. My first edits to the wiki was when I had barely started btrfs myself, to simply write down answers to questions I had asked on the list and that were not present on the wiki yet. You don't have to be 100% right for everything, if something is wrong, it'll likely bother someone and they'll go edit your changes, which is more motivation and less work for them and write your changes from scratch. You can also add a small disclaimer "to the best of my knowledge", etc... Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
brtfs on top of dmcrypt with SSD. No corruption iff write cache off?
Howdy, I'm considering using brtfs for my new laptop install. Encryption is however a requirement, and ecryptfs doesn't quite cut it for me, so that leaves me with dmcrypt which is what I've been using with ext3/ext4 for years. https://btrfs.wiki.kernel.org/articles/g/o/t/Gotchas.html still states that 'dm-crypt block devices require write-caching to be turned off on the underlying HDD' While the report was for 2.6.33, I'll assume it's still true. I was considering migrating to a recent 256GB SSD and 3.2.x kernel. First, I'd like to check if the 'turn off write cache' comment is still accurate and if it does apply to SSDs too. Second, I was wondering if anyone is running btrfs over dmcrypt on an SSD and what the performance is like with write cache turned off (I'm actually not too sure what the impact is for SSDs considering that writing to flash can actually be slower than writing to a hard drive). Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: brtfs on top of dmcrypt with SSD. No corruption iff write cache off?
On Wed, Feb 01, 2012 at 12:56:24PM -0500, Chris Mason wrote: > > Second, I was wondering if anyone is running btrfs over dmcrypt on an SSD > > and what the performance is like with write cache turned off (I'm actually > > not too sure what the impact is for SSDs considering that writing to flash > > can actually be slower than writing to a hard drive). > > Performance without the cache on is going to vary wildly from one SSD to > another. Some really need it to give them nice fat writes while others > do better on smaller writes. It's best to just test yours and see. > > With a 3.2 kernel (it really must be 3.2 or higher), both btrfs and dm > are doing the right thing for barriers. Thanks for the answer. Can you confirm that I still must disable write cache on the SSD to avoid corruption with btrfs on top of dmcrypt, or is there a chance that it just works now? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: brtfs on top of dmcrypt with SSD. No corruption iff write cache off?
On Thu, Feb 02, 2012 at 07:27:22AM -0800, Marc MERLIN wrote: > On Thu, Feb 02, 2012 at 07:42:41AM -0500, Chris Mason wrote: > > On Wed, Feb 01, 2012 at 07:23:45PM -0800, Marc MERLIN wrote: > > > On Wed, Feb 01, 2012 at 12:56:24PM -0500, Chris Mason wrote: > > > > > Second, I was wondering if anyone is running btrfs over dmcrypt on an > > > > > SSD > > > > > and what the performance is like with write cache turned off (I'm > > > > > actually > > > > > not too sure what the impact is for SSDs considering that writing to > > > > > flash > > > > > can actually be slower than writing to a hard drive). > > > > > > > > Performance without the cache on is going to vary wildly from one SSD to > > > > another. Some really need it to give them nice fat writes while others > > > > do better on smaller writes. It's best to just test yours and see. > > > > > > > > With a 3.2 kernel (it really must be 3.2 or higher), both btrfs and dm > > > > are doing the right thing for barriers. > > > > > > Thanks for the answer. > > > Can you confirm that I still must disable write cache on the SSD to avoid > > > corruption with btrfs on top of dmcrypt, or is there a chance that it just > > > works now? > > > > No, with 3.2 or higher it is expected to work. dm-crypt is doing the > > barriers correctly and as of 3.2 btrfs is sending them down correctly. > > Thanks for confirming, I'll give this a shot. > (no warranty implied of course :) ). Actually I had one more question. I read this page: http://www.redhat.com/archives/dm-devel/2011-July/msg00042.html I'm not super clear if with 3.2.5 kernel, I need to pass the special allow_discards option for brtfs and dm-crypt to be safe together, or whether they now talk through an API and everything "just works" :) Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: brtfs on top of dmcrypt with SSD. No corruption iff write cache off?
On Mon, Feb 13, 2012 at 12:47:54AM +0100, Milan Broz wrote: > On 02/12/2012 11:32 PM, Marc MERLIN wrote: > >Actually I had one more question. > > > >I read this page: > >http://www.redhat.com/archives/dm-devel/2011-July/msg00042.html > > > >I'm not super clear if with 3.2.5 kernel, I need to pass the special > >allow_discards option for brtfs and dm-crypt to be safe together, or > >whether > >they now talk through an API and everything "just works" :) > > If you want discards to be supported in dmcrypt, you have to enable it > manually. > > The most comfortable way is just use recent cryptsetup and add > --allow-discards option to luksOpen command. > > It will be never enabled by default in dmcrypt for security reasons > http://asalor.blogspot.com/2011/08/trim-dm-crypt-problems.html Thanks for the answer. I knew that it created some security problems but I had not yet found the page you just gave, which effectively states that TRIM isn't actually that big a win on recent SSDs (I thought it was actually pretty important to use it on them until now). Considering that I have a fairly new crucial 256GB SDD, I'm going to assume that this bit applies to me: "On the other side, TRIM is usually overrated. Drive itself should keep good performance even without TRIM, either by using internal garbage collecting process or by some area reserved for optimal writes handling." So it sounds like I should just not give the "ssd" mount option to btrfs, and not worry about TRIM. My main concern was to make sure I wasn't risking corruption with btrfs on top of dm-crypt, which is now true with 3.2.x, and I now understand that it is true regardless of whether I use --allow-discards in cryptsetup, so I'm just not going to use it given the warning page you just posted. Thanks for the answer. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: brtfs on top of dmcrypt with SSD. No corruption iff write cache off?
On Wed, Feb 15, 2012 at 10:42:43AM -0500, Calvin Walton wrote: > On Sun, 2012-02-12 at 16:14 -0800, Marc MERLIN wrote: > > Considering that I have a fairly new crucial 256GB SDD, I'm going to assume > > that this bit applies to me: > > "On the other side, TRIM is usually overrated. Drive itself should keep good > > performance even without TRIM, either by using internal garbage collecting > > process or by some area reserved for optimal writes handling." > > > > So it sounds like I should just not give the "ssd" mount option to btrfs, > > and not worry about TRIM. > > The 'ssd' option on btrfs is actually completely unrelated to trim > support. Instead, it changes how blocks are allocated on the device, > taking advantage of the the improved random read/write speed. The 'ssd' > option should be autodetected on most SSDs, but I don't know if this is > handled correctly when you're using dm-crypt. (Btrfs prints a message at > mount time when it autodetects this.) It shouldn't hurt to leave it. Yes, I found out more after I got my laptop back up (I had limited search while I was rebuilding it). Thanks for clearing up my improper guess at the time :) The good news is that ssd mode is autodetected through dmcrypt: [ 23.130486] device label btrfs_pool1 devid 1 transid 732 /dev/mapper/cryptroot [ 23.130854] btrfs: disk space caching is enabled [ 23.175547] Btrfs detected SSD devices, enabling SSD mode > Discard is handled with a separate mount option on btrfs (called > 'discard'), and is disabled by default even if you have the 'ssd' option > enabled, because of the negative performance impact it has had on some > SSDs. That's what I read up later. It's a bit counter intuitive after all the work what went into TRIM to then figure out that actually there are more reasons not to bother with it then to do :) On the plus side, it means SSDs are getting better and don't need special code that makes data recovery harder should you ever need it. I tried updating the wiki pages, because: https://btrfs.wiki.kernel.org/articles/f/a/q/FAQ_1fe9.html says nothing about - trim/discard - dmcrypt while https://btrfs.wiki.kernel.org/articles/g/o/t/Gotchas.html still states 'btrfs volumes on top of dm-crypt block devices (and possibly LVM) require write-caching to be turned off on the underlying HDD. Failing to do so, in the event of a power failure, may result in corruption not yet handled by btrfs code. (2.6.33) ' I'm happy to fix both pages, but the login link of course doesn't work and I'm not sure where the canonical copy to edit actually is or if I can get access. That said if someone else can fix it too, that's great :) Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: brtfs on top of dmcrypt with SSD. No corruption iff write cache off?
On Sat, Feb 18, 2012 at 01:39:24PM +0100, Martin Steigerwald wrote: > To Milan Broz: Well now I noticed that you linked to your own blog entry. He did not, I'm the one who did. I asked a bunch of questions since the online docs didn't address them for me. Some of you answered those for me, I asked access to the wiki and I updated the wiki to have the information you gave me. While I have no inherent bias one way or another, obviously I did put some of your opinions on the wiki :) > Please do not take my below statements personally - I might have written > them a bit harshly. Actually I do not really know whether your statement > that TRIM is overrated is correct, but before believing that TRIM does not > give much advantage, I would like to see at least some evidence of any > sort, cause for me my explaination below that it should make a difference > at least seems logical to me. That sounds like a reasonable request to me. In the meantime I changed the page to 'Does Btrfs support TRIM/discard? "-o discard" is supported, but can have some negative consequences on performance on some SSDs or at least whether it adds worthwhile performance is up for debate depending on who you ask, and makes undeletion/recovery near impossible while being a security problem if you use dm-crypt underneath (see http://asalor.blogspot.com/2011/08/trim-dm-crypt-problems.html ), therefore it is not enabled by default. You are welcome to run your own benchmarks and post them here, with the caveat that they'll be very SSD firmware specific.' I'll leave further edits to others ;) Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Creating backup snapshots (8 per filesystem) causes No space left on device?
Howdy, I have a little script that creates hourly/daily/weekly snapshots on a device that otherwise has plenty of disk space free: gandalfthegreat:~# df -h | grep cryptroot /dev/mapper/cryptroot 232G 144G 85G 63% / /dev/mapper/cryptroot 232G 144G 85G 63% /usr /dev/mapper/cryptroot 232G 144G 85G 63% /var /dev/mapper/cryptroot 232G 144G 85G 63% /home /dev/mapper/cryptroot 232G 144G 85G 63% /tmp /dev/mapper/cryptroot 232G 144G 85G 63% /mnt/btrfs_pool1 I have kernel 3.3.1. The FAQ of course talks about the topic: https://btrfs.wiki.kernel.org/articles/f/a/q/FAQ_1fe9.html but I can't get the filesystem show command to output anything useful: gandalfthegreat:~# btrfs filesystem show /dev/mapper/cryptroot Btrfs Btrfs v0.19 gandalfthegreat:~# and the btrfs df ssems to show that I'm ok: gandalfthegreat:~# btrfs filesystem df /home Data: total=169.01GB, used=134.70GB System, DUP: total=8.00MB, used=28.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=5.88GB, used=4.39GB Metadata: total=8.00MB, used=0.00 gandalfthegreat:~# I read about rebalance but it's a mostly new fliesystem will little churn, and I'm not anywhere close to full filesystem yet. So far, when this happened, I've had to delete a set of older snapshots. This would make sense if I was close to full, but at 63% I'm nowhere that. Any idea what's going on and how I can debug further and more specifically what I should capture next time I get a no free space error in userspace? Thanks, Marc gandalfthegreat:/mnt/btrfs_pool1# l total 4 dr-xr-xr-x 1 root root 2210 Apr 15 08:00 ./ drwxr-xr-x 1 root root 112 Feb 12 17:38 ../ drwxr-xr-x 1 root root 12 Feb 12 17:57 home/ drwxr-xr-x 1 root root 12 Feb 12 17:57 home_daily_20120412_00:01:01/ drwxr-xr-x 1 root root 12 Feb 12 17:57 home_daily_20120413_00:01:02/ drwxr-xr-x 1 root root 12 Feb 12 17:57 home_daily_20120414_00:01:01/ drwxr-xr-x 1 root root 12 Feb 12 17:57 home_daily_20120415_00:01:01/ drwxr-xr-x 1 root root 12 Feb 12 17:57 home_hourly_20120415_06:00:01/ drwxr-xr-x 1 root root 12 Feb 12 17:57 home_hourly_20120415_07:00:01/ drwxr-xr-x 1 root root 12 Feb 12 17:57 home_hourly_20120415_08:00:01/ drwxr-xr-x 1 root root 12 Feb 12 17:57 home_weekly_20120415_00:02:01/ drwxr-xr-x 1 root root 436 Apr 3 07:26 root/ drwxr-xr-x 1 root root 436 Apr 3 07:26 root_daily_20120412_00:01:01/ drwxr-xr-x 1 root root 436 Apr 3 07:26 root_daily_20120414_00:01:01/ drwxr-xr-x 1 root root 436 Apr 3 07:26 root_daily_20120415_00:01:01/ drwxr-xr-x 1 root root 436 Apr 3 07:26 root_hourly_20120415_06:00:01/ drwxr-xr-x 1 root root 436 Apr 3 07:26 root_hourly_20120415_07:00:01/ drwxr-xr-x 1 root root 436 Apr 3 07:26 root_hourly_20120415_08:00:01/ drwxr-xr-x 1 root root 436 Apr 3 07:26 root_weekly_20120415_00:02:01/ drwxrwxrwt 1 root root 7476 Apr 15 08:05 tmp/ drwxrwxrwt 1 root root 7156 Apr 12 00:01 tmp_daily_20120412_00:01:01/ drwxrwxrwt 1 root root 7130 Apr 13 00:01 tmp_daily_20120413_00:01:02/ drwxrwxrwt 1 root root 7236 Apr 14 00:01 tmp_daily_20120414_00:01:01/ drwxrwxrwt 1 root root 7368 Apr 15 00:01 tmp_daily_20120415_00:01:01/ drwxrwxrwt 1 root root 7368 Apr 15 06:00 tmp_hourly_20120415_06:00:01/ drwxrwxrwt 1 root root 7368 Apr 15 07:00 tmp_hourly_20120415_07:00:01/ drwxrwxrwt 1 root root 7476 Apr 15 08:00 tmp_hourly_20120415_08:00:01/ drwxrwxrwt 1 root root 7368 Apr 15 00:02 tmp_weekly_20120415_00:02:01/ drwxr-xr-x 1 root root 206 Mar 31 11:07 usr/ drwxr-xr-x 1 root root 206 Mar 31 11:07 usr_daily_20120412_00:01:01/ drwxr-xr-x 1 root root 206 Mar 31 11:07 usr_daily_20120413_00:01:02/ drwxr-xr-x 1 root root 206 Mar 31 11:07 usr_daily_20120414_00:01:01/ drwxr-xr-x 1 root root 206 Mar 31 11:07 usr_daily_20120415_00:01:01/ drwxr-xr-x 1 root root 206 Mar 31 11:07 usr_hourly_20120415_06:00:01/ drwxr-xr-x 1 root root 206 Mar 31 11:07 usr_hourly_20120415_07:00:01/ drwxr-xr-x 1 root root 206 Mar 31 11:07 usr_hourly_20120415_08:00:01/ drwxr-xr-x 1 root root 206 Mar 31 11:07 usr_weekly_20120415_00:02:01/ drwxr-xr-x 1 root root 130 Feb 12 23:52 var/ drwxr-xr-x 1 root root 130 Feb 12 23:52 var_daily_20120412_00:01:01/ drwxr-xr-x 1 root root 130 Feb 12 23:52 var_daily_20120413_00:01:02/ drwxr-xr-x 1 root root 130 Feb 12 23:52 var_daily_20120414_00:01:01/ drwxr-xr-x 1 root root 130 Feb 12 23:52 var_daily_20120415_00:01:01/ drwxr-xr-x 1 root root 130 Feb 12 23:52 var_hourly_20120415_06:00:01/ drwxr-xr-x 1 root root 130 Feb 12 23:52 var_hourly_20120415_07:00:01/ drwxr-xr-x 1 root root 130 Feb 12 23:52 var_hourly_20120415_08:00:01/ drwxr-xr-x 1 root root 130 Feb 12 23:52 var_weekly_20120415_00:02:01/ -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a m
Re: Creating backup snapshots (8 per filesystem) causes No space left on device?
(replying on list) On Sun, Apr 15, 2012 at 05:52:05PM +0200, Bart Noordervliet wrote: > Hi Marc, > > there's a known regression causing early "Out of space"-errors in > kernel 3.3. A patch for stable has been queued I think, but it's not > in 3.3.1 yet. So your best bet would be to either downgrade to 3.2 or > use a 3.4-rc kernel. Otherwise you'd have to apply the patch in > question yourself. It's been discussed on this list very recently. I'll watch for 3.3.x updates (I see nothing in 3.3.2 yet), thanks. Or is it just a matter of reverting this patch? https://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=5500cdbe14d7435e04f66ff3cfb8ecd8b8e44ebf diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index dc083f5..079e5a1 100644 (file) --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4108,7 +4108,7 @@ static u64 calc_global_metadata_size(struct btrfs_fs_info *fs_info) num_bytes += div64_u64(data_used + meta_used, 50); if (num_bytes * 3 > meta_used) - num_bytes = div64_u64(meta_used, 3); + num_bytes = div64_u64(meta_used, 3) * 2; return ALIGN(num_bytes, fs_info->extent_root->leafsize << 10); } On Sun, Apr 15, 2012 at 10:19:30AM -0600, cwillu wrote: > > but I can't get the filesystem show command to output anything useful: > > gandalfthegreat:~# btrfs filesystem show /dev/mapper/cryptroot > > Btrfs Btrfs v0.19 > > You need to run that as root. That was run as root :) '#' Thanks for the replies, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Creating backup snapshots (8 per filesystem) causes No space left on device?
On Sun, Apr 15, 2012 at 09:27:27AM -0700, Marc MERLIN wrote: > I'll watch for 3.3.x updates (I see nothing in 3.3.2 yet), thanks. > > Or is it just a matter of reverting this patch? > https://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=5500cdbe14d7435e04f66ff3cfb8ecd8b8e44ebf > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index dc083f5..079e5a1 100644 (file) > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -4108,7 +4108,7 @@ static u64 calc_global_metadata_size(struct > btrfs_fs_info *fs_info) > num_bytes += div64_u64(data_used + meta_used, 50); > > if (num_bytes * 3 > meta_used) > - num_bytes = div64_u64(meta_used, 3); > + num_bytes = div64_u64(meta_used, 3) * 2; > > return ALIGN(num_bytes, fs_info->extent_root->leafsize << 10); > } After I knew what to look for, I searched the archives some more and they only seemed to point to this patch. I have reverted it, but I'm still seeing the same problem on my laptop. It sounds like I'll have to downgrade back to 3.2.x unless there is some other patch to revert that I missed. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How to change/fix 'Received UUID'
Howdy, I did a bunch of copies and moving around subvolumes between disks and at some point, I did a snapshot dir1/Win_ro.20180205_21:18:31 dir2/Win_ro.20180205_21:18:31 As a result, I lost the ro flag, and apparently 'Received UUID' which is now preventing me from restarting the btrfs send/receive. I changed the snapshot back to 'ro' but that's not enough: Source: Name: Win_ro.20180205_21:18:31 UUID: 23ccf2bd-f494-e348-b34e-1f28486b2540 Parent UUID:- Received UUID: 3cc327e1-358f-284e-92e2-4e4fde92b16f Creation time: 2018-02-15 20:14:42 -0800 Subvolume ID: 964 Generation: 4062 Gen at creation:459 Parent ID: 5 Top level ID: 5 Flags: readonly Dest: Name: Win_ro.20180205_21:18:31 UUID: a1e8777c-c52b-af4e-9ce2-45ca4d4d2df8 Parent UUID:- Received UUID: - Creation time: 2018-02-17 22:20:25 -0800 Subvolume ID: 94826 Generation: 250714 Gen at creation:250540 Parent ID: 89160 Top level ID: 89160 Flags: readonly If I absolutely know that the data is the same on both sides, how do I either 1) force back in a 'Received UUID' value on the destination 2) force a btrfs receive to work despite the lack of matching 'Received UUID' Yes, I could discard and start over, but my 2nd such subvolume is 8TB, so I'd really rather not :) Any ideas? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to change/fix 'Received UUID'
On Mon, Mar 05, 2018 at 10:38:16PM +0300, Andrei Borzenkov wrote: > > If I absolutely know that the data is the same on both sides, how do I > > either > > 1) force back in a 'Received UUID' value on the destination > > I suppose the most simple is to write small program that does it using > BTRFS_IOC_SET_RECEIVED_SUBVOL. Understdood. Given that I have not worked with the code at all, what is the best tool in btrfs progs, to add this to? btrfstune? btrfs propery set? other? David, is this something you'd be willing to add support for? (to be honest, it'll be quicker for someone who knows the code to add than for me, but if no one has the time, I'l see if I can have a shot at it) Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to change/fix 'Received UUID'
On Tue, Mar 06, 2018 at 08:12:15PM +0100, Hans van Kranenburg wrote: > On 05/03/2018 20:47, Marc MERLIN wrote: > > On Mon, Mar 05, 2018 at 10:38:16PM +0300, Andrei Borzenkov wrote: > >>> If I absolutely know that the data is the same on both sides, how do I > >>> either > >>> 1) force back in a 'Received UUID' value on the destination > >> > >> I suppose the most simple is to write small program that does it using > >> BTRFS_IOC_SET_RECEIVED_SUBVOL. > > > > Understdood. > > Given that I have not worked with the code at all, what is the best > > tool in btrfs progs, to add this to? > > > > btrfstune? > > btrfs propery set? > > other? > > > > David, is this something you'd be willing to add support for? > > (to be honest, it'll be quicker for someone who knows the code to add than > > for me, but if no one has the time, I'l see if I can have a shot at it) > > If you want something right now that works, so you can continue doing > your backups, python-btrfs also has the ioctl, since v9, together with > an example of using it: > > https://github.com/knorrie/python-btrfs/commit/1ace623f95300ecf581b1182780fd6432a46b24d Well, I had never heard about it until now, thank you. I'll see if I can make it work when I get a bit of time. Dear btrfs-progs folks, this would be great to add to the canonical btrfs-progs too :) Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to change/fix 'Received UUID'
On Tue, Mar 06, 2018 at 12:02:47PM -0800, Marc MERLIN wrote: > > https://github.com/knorrie/python-btrfs/commit/1ace623f95300ecf581b1182780fd6432a46b24d > > Well, I had never heard about it until now, thank you. > > I'll see if I can make it work when I get a bit of time. Sorry, I missed the fact that there was no code to write at all. gargamel:/var/local/src/python-btrfs/examples# ./set_received_uuid.py 2afc7a5e-107f-d54b-8929-197b80b70828 31337 1234.5678 /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41 Current subvolume information: subvol_id: 94887 received_uuid: ---- stime: 0.0 (1970-01-01T00:00:00) stransid: 0 rtime: 0.0 (1970-01-01T00:00:00) rtransid: 0 Setting received subvolume... Resulting subvolume information: subvol_id: 94887 received_uuid: 2afc7a5e-107f-d54b-8929-197b80b70828 stime: 1234.5678 (1970-01-01T00:20:34.567800) stransid: 31337 rtime: 1520488877.415709329 (2018-03-08T06:01:17.415709) rtransid: 255755 gargamel:/var/local/src/python-btrfs/examples# btrfs property set -ts /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41 ro true ABORT: btrfs send -p /mnt/btrfs_pool1/Video_ro.20180205_21:05:15 Video_ro.20180307_22:03:03 | btrfs receive /mnt/btrfs_bigbackup/DS1//. failed At subvol Video_ro.20180307_22:03:03 At snapshot Video_ro.20180307_22:03:03 ERROR: cannot find parent subvolume gargamel:/mnt/btrfs_pool1# btrfs subvolume show /mnt/btrfs_pool1/Video_ro.20180220_21\:03\:41/ Video_ro.20180220_21:03:41 Name: Video_ro.20180220_21:03:41 UUID: 2afc7a5e-107f-d54b-8929-197b80b70828 Parent UUID:e5ec5c1e-6b49-084e-8820-5a8cfaa1b089 Received UUID: 0e220a4f-6426-4745-8399-0da0084f8b23 Creation time: 2018-02-20 21:03:42 -0800 Subvolume ID: 11228 Generation: 4174 Gen at creation:4150 Parent ID: 5 Top level ID: 5 Flags: readonly Snapshot(s): Video_rw.20180220_21:03:41 Video Wasn't I supposed to set 2afc7a5e-107f-d54b-8929-197b80b70828 onto the destination? Doesn't that look ok now? Is there something else I'm missing? gargamel:/mnt/btrfs_pool1# btrfs subvolume show /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41 DS1/Video_ro.20180220_21:03:41 Name: Video_ro.20180220_21:03:41 UUID: cb4f343c-5e79-7f49-adf0-7ce0b29f23b3 Parent UUID:0e220a4f-6426-4745-8399-0da0084f8b23 Received UUID: 2afc7a5e-107f-d54b-8929-197b80b70828 Creation time: 2018-02-20 21:13:36 -0800 Subvolume ID: 94887 Generation: 250689 Gen at creation:250689 Parent ID: 89160 Top level ID: 89160 Flags: readonly Snapshot(s): Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to change/fix 'Received UUID'
On Thu, Mar 08, 2018 at 09:34:45AM +0300, Andrei Borzenkov wrote: > 08.03.2018 09:06, Marc MERLIN пишет: > > On Tue, Mar 06, 2018 at 12:02:47PM -0800, Marc MERLIN wrote: > >>> https://github.com/knorrie/python-btrfs/commit/1ace623f95300ecf581b1182780fd6432a46b24d > >> > >> Well, I had never heard about it until now, thank you. > >> > >> I'll see if I can make it work when I get a bit of time. > > > > Sorry, I missed the fact that there was no code to write at all. > > gargamel:/var/local/src/python-btrfs/examples# ./set_received_uuid.py > > 2afc7a5e-107f-d54b-8929-197b80b70828 31337 1234.5678 > > /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41 > > Current subvolume information: > > subvol_id: 94887 > > received_uuid: ---- > > stime: 0.0 (1970-01-01T00:00:00) > > stransid: 0 > > rtime: 0.0 (1970-01-01T00:00:00) > > rtransid: 0 > > > > Setting received subvolume... > > > > Resulting subvolume information: > > subvol_id: 94887 > > received_uuid: 2afc7a5e-107f-d54b-8929-197b80b70828 > > stime: 1234.5678 (1970-01-01T00:20:34.567800) > > stransid: 31337 > > rtime: 1520488877.415709329 (2018-03-08T06:01:17.415709) > > rtransid: 255755 > > > > gargamel:/var/local/src/python-btrfs/examples# btrfs property set -ts > > /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41 ro true > > > > > > ABORT: btrfs send -p /mnt/btrfs_pool1/Video_ro.20180205_21:05:15 > > Video_ro.20180307_22:03:03 | btrfs receive /mnt/btrfs_bigbackup/DS1//. > > failed > > At subvol Video_ro.20180307_22:03:03 > > At snapshot Video_ro.20180307_22:03:03 > > ERROR: cannot find parent subvolume > > > > gargamel:/mnt/btrfs_pool1# btrfs subvolume show > > /mnt/btrfs_pool1/Video_ro.20180220_21\:03\:41/ > > Video_ro.20180220_21:03:41 > > Not sure I understand how this subvolume is related. You send > differences between Video_ro.20180205_21:05:15 and > Video_ro.20180307_22:03:03, so you need to have (replica of) > Video_ro.20180205_21:05:15 on destination. How exactly > Video_ro.20180220_21:03:41 comes in picture here? Sorry, I pasted the wrong thing. ABORT: btrfs send -p /mnt/btrfs_pool1/Video_ro.20180220_21:03:41 Video_ro.20180308_07:50:06 | btrfs receive /mnt/btrfs_bigbackup/DS1//. failed At subvol Video_ro.20180308_07:50:06 At snapshot Video_ro.20180308_07:50:06 ERROR: cannot find parent subvolume Same problem basically, just copied the wrong attempt, sorry about that. Do I need to make sure of more than DS1/Video_ro.20180220_21:03:41 Received UUID: 2afc7a5e-107f-d54b-8929-197b80b70828 be equal to Name: Video_ro.20180220_21:03:41 UUID: 2afc7a5e-107f-d54b-8929-197b80b70828 Thanks, Marc > > Name: Video_ro.20180220_21:03:41 > > UUID: 2afc7a5e-107f-d54b-8929-197b80b70828 > > Parent UUID:e5ec5c1e-6b49-084e-8820-5a8cfaa1b089 > > Received UUID: 0e220a4f-6426-4745-8399-0da0084f8b23> > > Creation time: 2018-02-20 21:03:42 -0800 > > Subvolume ID: 11228 > > Generation: 4174 > > Gen at creation:4150 > > Parent ID: 5 > > Top level ID: 5 > > Flags: readonly > > Snapshot(s): > > Video_rw.20180220_21:03:41 > > Video > > > > > > Wasn't I supposed to set 2afc7a5e-107f-d54b-8929-197b80b70828 onto the > > destination? > > > > Doesn't that look ok now? Is there something else I'm missing? > > gargamel:/mnt/btrfs_pool1# btrfs subvolume show > > /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41 > > DS1/Video_ro.20180220_21:03:41 > > Name: Video_ro.20180220_21:03:41 > > UUID: cb4f343c-5e79-7f49-adf0-7ce0b29f23b3 > > Parent UUID:0e220a4f-6426-4745-8399-0da0084f8b23 > > Received UUID: 2afc7a5e-107f-d54b-8929-197b80b70828 > > Creation time: 2018-02-20 21:13:36 -0800 > > Subvolume ID: 94887 > > Generation: 250689 > > Gen at creation:250689 > > Parent ID: 89160 > > Top level ID: 89160 > > Flags: readonly > > Snapshot(s): > > > > Thanks, > > Marc > > > > -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to change/fix 'Received UUID'
On Thu, Mar 08, 2018 at 09:36:49PM +0300, Andrei Borzenkov wrote: > Yes. Your source has Received UUID. In this case btrfs send will > transmit received UUID instead of subvolume UUID as reference to base > snapshot. You need to either clear received UUID on source or set > received UUID on destination to received UUID of source (not to > subvolume UUID of source). gargamel:/var/local/src/python-btrfs/examples# ./set_received_uuid.py 0e220a4f-6426-4745-8399-0da0084f8b23 313 37 1234.5678 /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41 Current subvolume information: subvol_id: 94887 received_uuid: 2afc7a5e-107f-d54b-8929-197b80b70828 stime: 1234.5678 (1970-01-01T00:20:34.567800) stransid: 31337 rtime: 1520488877.415709329 (2018-03-08T06:01:17.415709) rtransid: 255755 Setting received subvolume... Resulting subvolume information: subvol_id: 94887 received_uuid: 0e220a4f-6426-4745-8399-0da0084f8b23 stime: 1234.5678 (1970-01-01T00:20:34.567800) stransid: 31337 rtime: 1520537034.890253770 (2018-03-08T19:23:54.890254) rtransid: 256119 gargamel:/var/local/src/python-btrfs/examples# btrfs property set -ts /mnt/btrfs_bigbackup/DS1/Video_ro.201802 20_21:03:41 ro true This worked fine, thank you so much. I now have an incremental send that is going on and will take a few dozen minutes instead of days for 8TB+ :) Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to change/fix 'Received UUID'
Thanks all for the help again. I just wrote a blog post to explain the process to others should anyone need this later. http://marc.merlins.org/perso/btrfs/post_2018-03-09_Btrfs-Tips_-Rescuing-A-Btrfs-Send-Receive-Relationship.html Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
5.1.21: fs/btrfs/extent-tree.c:7100 __btrfs_free_extent+0x18b/0x921
This happened almost after a resume from suspend to disk. First corruption and read only I got a very long time. Could they be related? [26062.126505] [ cut here ] [26062.126524] WARNING: CPU: 7 PID: 12394 at fs/btrfs/extent-tree.c:7100 __btrfs_free_extent+0x18b/0x921 [26062.126526] Modules linked in: msr ccm ipt_MASQUERADE ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_conntrack nf_log_ipv4 nf_log_common xt_LOG iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter rfcomm ax25 bnep pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) autofs4 binfmt_misc uinput nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc nls_utf8 nls_cp437 vfat fat cuse ecryptfs bbswitch(OE) configs input_polldev loop firewire_sbp2 firewire_core crc_itu_t ppdev parport_pc lp parport uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media btusb btrtl btbcm btintel bluetooth ecdh_generic hid_generic usbhid hid joydev arc4 coretemp x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic intel_wmi_thunderbolt iTCO_wdt wmi_bmof mei_hdcp iTCO_vendor_support rtsx_pci_sdmmc snd_hda_intel [26062.126561] snd_hda_codec iwlmvm crct10dif_pclmul snd_hda_core crc32_pclmul mac80211 snd_hwdep thinkpad_acpi snd_pcm ghash_clmulni_intel nvram ledtrig_audio intel_cstate deflate snd_seq efi_pstore iwlwifi snd_seq_device snd_timer intel_rapl_perf psmouse pcspkr efivars wmi hwmon snd ac battery cfg80211 mei_me soundcore xhci_pci xhci_hcd rtsx_pci i2c_i801 rfkill sg nvidiafb intel_pch_thermal usbcore vgastate fb_ddc pcc_cpufreq sata_sil24 r8169 libphy mii fuse fan raid456 multipath mmc_block mmc_core dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx blowfish_x86_64 blowfish_common crc32c_intel bcache crc64 aesni_intel input_leds i915 aes_x86_64 crypto_simd cryptd ptp glue_helper serio_raw pps_core thermal evdev [last unloaded: e1000e] [26062.126597] CPU: 7 PID: 12394 Comm: btrfs-transacti Tainted: GW OE 5.1.21-amd64-preempt-sysrq-20190816 #5 [26062.126599] Hardware name: LENOVO 20ERCTO1WW/20ERCTO1WW, BIOS N1DET95W (2.21 ) 12/13/2017 [26062.126604] RIP: 0010:__btrfs_free_extent+0x18b/0x921 [26062.126606] Code: 00 8b 45 40 44 29 e0 83 f8 05 0f 8f 2e 05 00 00 41 ff cc eb a5 83 f8 fe 0f 85 29 07 00 00 48 c7 c7 f8 67 f0 89 e8 f6 cb dd ff <0f> 0b 48 8b 7d 00 e8 e5 54 00 00 4c 89 fa 48 c7 c6 85 e0 f4 89 41 [26062.126608] RSP: 0018:b2d9c46e7c88 EFLAGS: 00010246 [26062.126611] RAX: 0024 RBX: 9abca20884e0 RCX: [26062.126613] RDX: RSI: 9abccf5d6558 RDI: 9abccf5d6558 [26062.126617] RBP: 9ab5a4545460 R08: 0001 R09: 8a80c7af [26062.126618] R10: 0002 R11: b2d9c46e7b2f R12: 0169 [26062.126622] R13: fffe R14: 0104 R15: 006ac918e000 [26062.126625] FS: () GS:9abccf5c() knlGS: [26062.126627] CS: 0010 DS: ES: CR0: 80050033 [26062.126629] CR2: 199a9fb4d000 CR3: 00016c20e006 CR4: 003606e0 [26062.126633] DR0: DR1: DR2: [26062.126634] DR3: DR6: fffe0ff0 DR7: 0400 [26062.126636] Call Trace: [26062.126647] __btrfs_run_delayed_refs+0x750/0xc36 [26062.126653] ? __switch_to_asm+0x41/0x70 [26062.126655] ? __switch_to_asm+0x35/0x70 [26062.126658] ? __switch_to_asm+0x41/0x70 [26062.126662] ? __switch_to+0x13d/0x3d5 [26062.126668] btrfs_run_delayed_refs+0x5d/0x132 [26062.126672] btrfs_commit_transaction+0x55/0x7c8 [26062.126676] ? start_transaction+0x347/0x3cb [26062.126679] transaction_kthread+0xc9/0x135 [26062.126683] ? btrfs_cleanup_transaction+0x403/0x403 [26062.126688] kthread+0xeb/0xf0 [26062.126692] ? kthread_create_worker_on_cpu+0x65/0x65 [26062.126695] ret_from_fork+0x35/0x40 [26062.126698] ---[ end trace 4c1a6b3749a2f650 ]--- [26062.126703] BTRFS info (device dm-2): leaf 510067163136 gen 2427077 total ptrs 130 free space 4329 owner 2 [26062.126706] item 0 key (458630676480 168 65536) itemoff 16217 itemsize 66 [26062.126708] extent refs 2 gen 2369265 flags 1 [26062.126709] ref#0: extent data backref root 456 objectid 72925787 offset 5472256 count 1 [26062.126711] ref#1: shared data backref parent 437615230976 count 1 [26062.126714] item 1 key (458630856704 168 69632) itemoff 16151 itemsize 66 [26062.126715] extent refs 2 gen 2369025 flags 1 [26062.126716] ref#0: extent data backref root 456 objectid 72925787 offset 4796416 count 1 [26062.126718] ref#1: shared data backref parent 437615230976 count 1 [26062.126720] item 2 key (458631012352 168 16384) itemoff
Re: 5.1.21: fs/btrfs/extent-tree.c:7100 __btrfs_free_extent+0x18b/0x921
8192 Fixed discount file extents for inode: 75432801 in root: 456 root 456 inode 75432801 errors 100, file extent discount Found file extent holes: start: 0, len: 4096 Fixed discount file extents for inode: 75432807 in root: 456 Fixed discount file extents for inode: 75432817 in root: 456 Fixed discount file extents for inode: 75432829 in root: 456 Fixed discount file extents for inode: 75432860 in root: 456 root 456 inode 75432860 errors 100, file extent discount Found file extent holes: start: 0, len: 4096 Fixed discount file extents for inode: 75432862 in root: 456 Fixed discount file extents for inode: 75432863 in root: 456 Fixed discount file extents for inode: 75432869 in root: 456 Fixed discount file extents for inode: 75432870 in root: 456 Fixed discount file extents for inode: 75432871 in root: 456 Fixed discount file extents for inode: 75432872 in root: 456 Fixed discount file extents for inode: 75432875 in root: 456 Fixed discount file extents for inode: 75432877 in root: 456 Fixed discount file extents for inode: 75432882 in root: 456 Fixed discount file extents for inode: 75432883 in root: 456 Fixed discount file extents for inode: 75432893 in root: 456 Fixed discount file extents for inode: 75432894 in root: 456 Fixed discount file extents for inode: 75432897 in root: 456 Fixed discount file extents for inode: 75432899 in root: 456 Fixed discount file extents for inode: 75432900 in root: 456 Fixed discount file extents for inode: 75432905 in root: 456 Fixed discount file extents for inode: 75432906 in root: 456 Fixed discount file extents for inode: 75432916 in root: 456 Fixed discount file extents for inode: 75432917 in root: 456 Fixed discount file extents for inode: 75432919 in root: 456 Fixed discount file extents for inode: 75432920 in root: 456 Fixed discount file extents for inode: 75432923 in root: 456 Fixed discount file extents for inode: 75432942 in root: 456 Fixed discount file extents for inode: 75432944 in root: 456 Fixed discount file extents for inode: 75432948 in root: 456 root 456 inode 75432948 errors 100, file extent discount Found file extent holes: start: 0, len: 8192 Fixed discount file extents for inode: 75432949 in root: 456 root 456 inode 75432949 errors 100, file extent discount Found file extent holes: start: 0, len: 8192 and it loops forever on 456 On Thu, Oct 17, 2019 at 07:56:04PM -0700, Marc MERLIN wrote: > This happened almost after a resume from suspend to disk. > First corruption and read only I got a very long time. > > Could they be related? > > [26062.126505] [ cut here ] > [26062.126524] WARNING: CPU: 7 PID: 12394 at fs/btrfs/extent-tree.c:7100 > __btrfs_free_extent+0x18b/0x921 > [26062.126526] Modules linked in: msr ccm ipt_MASQUERADE ipt_REJECT > nf_reject_ipv4 xt_tcpudp xt_conntrack nf_log_ipv4 nf_log_common xt_LOG > iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle > ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables > x_tables bpfilter rfcomm ax25 bnep pci_stub vboxpci(O) vboxnetadp(O) > vboxnetflt(O) vboxdrv(O) autofs4 binfmt_misc uinput nfsd auth_rpcgss nfs_acl > nfs lockd grace fscache sunrpc nls_utf8 nls_cp437 vfat fat cuse ecryptfs > bbswitch(OE) configs input_polldev loop firewire_sbp2 firewire_core crc_itu_t > ppdev parport_pc lp parport uvcvideo videobuf2_vmalloc videobuf2_memops > videobuf2_v4l2 videobuf2_common videodev media btusb btrtl btbcm btintel > bluetooth ecdh_generic hid_generic usbhid hid joydev arc4 coretemp > x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass > snd_hda_codec_realtek snd_hda_codec_generic intel_wmi_thunderbolt iTCO_wdt > wmi_bmof mei_hdcp iTCO_vendor_support rtsx_pci_sdmmc snd_hda_intel > [26062.126561] snd_hda_codec iwlmvm crct10dif_pclmul snd_hda_core > crc32_pclmul mac80211 snd_hwdep thinkpad_acpi snd_pcm ghash_clmulni_intel > nvram ledtrig_audio intel_cstate deflate snd_seq efi_pstore iwlwifi > snd_seq_device snd_timer intel_rapl_perf psmouse pcspkr efivars wmi hwmon snd > ac battery cfg80211 mei_me soundcore xhci_pci xhci_hcd rtsx_pci i2c_i801 > rfkill sg nvidiafb intel_pch_thermal usbcore vgastate fb_ddc pcc_cpufreq > sata_sil24 r8169 libphy mii fuse fan raid456 multipath mmc_block mmc_core > dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_crypt dm_mod > async_raid6_recov async_pq async_xor async_memcpy async_tx blowfish_x86_64 > blowfish_common crc32c_intel bcache crc64 aesni_intel input_leds i915 > aes_x86_64 crypto_simd cryptd ptp glue_helper serio_raw pps_core thermal > evdev [last unloaded: e1000e] > [26062.126597] CPU: 7 PID: 12394 Comm: btrfs-transacti Tainted: GW > OE 5.1.21-amd64-preempt-sysrq-20190816 #5 > [26062.126599] Hardware name: LENOVO 20ERCTO1WW/20ERCTO1WW, BIOS N1DET95W > (2.21 ) 12/13/2017 > [26062.126604]
Re: 5.1.21: fs/btrfs/extent-tree.c:7100 __btrfs_free_extent+0x18b/0x921
On Fri, Oct 18, 2019 at 08:07:28PM -0700, Marc MERLIN wrote: > Ok, so before blowing the filesystem away after it was apparently badly > damaged by a suspend to disk, I tried check --repair and I hit an > infinite loop. > > Let me know if you'd like anything off the FS before I delete it. I heard nothing back, so I deleted the FS and restored from backup. But now I'm scared of ever doing a suspend to disk again. Could someone please look at the logs and give me some idea of what happened, if at all possible? Non recoverable data corruption on my laptop when I travel and backups/restores are complicated, is a bit unnerving... Thanks, Marc > Thanks, > Marc > > enabling repair mode > repair mode will force to clear out log tree, are you sure? [y/N]: y > Checking filesystem on /dev/mapper/pool1 > UUID: fda628bc-1ca4-49c5-91c2-4260fe967a23 > checking extents > Backref 415334400 parent 36028797198598144 not referenced back 0x5648ef1870e0 > Backref 415334400 parent 179634176 root 179634176 not found in extent tree > Incorrect global backref count on 415334400 found 2 wanted 1 > backpointer mismatch on [415334400 16384] > repair deleting extent record: key 415334400 169 0 > adding new tree backref on start 415334400 len 16384 parent 179634176 root > 179634176 > Repaired extent references for 415334400 > ref mismatch on [101995261952 4096] extent item 36028797018963969, found 1 > repair deleting extent record: key 101995261952 168 4096 > adding new data backref on 101995261952 root 456 owner 74455677 offset > 64892928 found 1 > Repaired extent references for 101995261952 > Incorrect local backref count on 458640384000 root 456 owner 81409181 offset > 17039360 found 0 wanted 1 back 0x5648eefd3d10 > Backref disk bytenr does not match extent record, bytenr=458640384000, ref > bytenr=0 > Backref 458640384000 root 456 owner 73020573 offset 17039360 num_refs 0 not > found in extent tree > Incorrect local backref count on 458640384000 root 456 owner 73020573 offset > 17039360 found 1 wanted 0 back 0x5648b32a9600 > backpointer mismatch on [458640384000 86016] > repair deleting extent record: key 458640384000 168 86016 > adding new data backref on 458640384000 parent 438017720320 owner 0 offset 0 > found 1 > adding new data backref on 458640384000 root 456 owner 73020573 offset > 17039360 found 1 > Repaired extent references for 458640384000 > Fixed 0 roots. > checking free space cache > cache and super generation don't match, space cache will be invalidated > checking fs roots > Deleting bad dir index [10138517,96,436945] root 456 > Deleting bad dir index [10138518,96,646273] root 456 > Deleting bad dir index [10138517,96,437016] root 456 > Deleting bad dir index [10138518,96,808999] root 456 > Deleting bad dir index [10215134,96,149427] root 456 > Deleting bad dir index [10240541,96,268037] root 456 > Deleting bad dir index [10138517,96,540247] root 456 > Deleting bad dir index [10138518,96,825234] root 456 > Deleting bad dir index [10138517,96,736673] root 456 > Deleting bad dir index [10138518,96,1118221] root 456 > Deleting bad dir index [10240541,96,439703] root 456 > Deleting bad dir index [10138517,96,752282] root 456 > root 456 inode 75431563 errors 100, file extent discount > Found file extent holes: > start: 4096, len: 4096 > root 456 inode 75431568 errors 100, file extent discount > Found file extent holes: > start: 0, len: 1638400 > root 456 inode 75431583 errors 100, file extent discount > Found file extent holes: > start: 0, len: 147456 > root 456 inode 75431585 errors 100, file extent discount > Found file extent holes: > start: 0, len: 208896 > root 456 inode 75431591 errors 100, file extent discount > Found file extent holes: > start: 0, len: 2523136 > root 456 inode 75431730 errors 100, file extent discount > Found file extent holes: > start: 0, len: 208896 > root 456 inode 75431744 errors 100, file extent discount > Found file extent holes: > start: 0, len: 2084864 > root 456 inode 75431751 errors 100, file extent discount > Found file extent holes: > start: 0, len: 172032 > root 456 inode 75431756 errors 100, file extent discount > Found file extent holes: > start: 0, len: 8192 > root 456 inode 75431760 errors 100, file extent discount > Found file extent holes: > start: 0, len: 12288 > root 456 inode 75431765 errors 100, file extent discount > Found file extent holes: > start: 0, len: 32768 > root 456 inode 75431773 errors 100, file extent discount > Found file extent holes: > start: 0, len: 90112 > Fixed discount file extents for inode: 75432421 in root: 456 > Fixed discount file extents for inode: 75432429 in root: 4
Re: Why original mode doesn't use swap? (Original: Re: btrfs check lowmem, take 2)
On Thu, Jul 12, 2018 at 01:26:41PM +0800, Qu Wenruo wrote: > > > On 2018年07月12日 01:09, Chris Murphy wrote: > > On Tue, Jul 10, 2018 at 12:09 PM, Marc MERLIN wrote: > >> Thanks to Su and Qu, I was able to get my filesystem to a point that > >> it's mountable. > >> I then deleted loads of snapshots and I'm down to 26. > >> > >> IT now looks like this: > >> gargamel:~# btrfs fi show /mnt/mnt > >> Label: 'dshelf2' uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d > >> Total devices 1 FS bytes used 12.30TiB > >> devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2 > >> > >> gargamel:~# btrfs fi df /mnt/mnt > >> Data, single: total=13.57TiB, used=12.19TiB > >> System, DUP: total=32.00MiB, used=1.55MiB > >> Metadata, DUP: total=124.50GiB, used=115.62GiB > >> Metadata, single: total=216.00MiB, used=0.00B > >> GlobalReserve, single: total=512.00MiB, used=0.00B > >> > >> > >> Problems > >> 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the > >> server, despite my deleting lots of snapshots. > >> Is it because I have too many files then? > > > > I think originally needs most of metdata in memory. > > > > I'm not understanding why btrfs check won't use swap like at least > > xfs_repair and pretty sure e2fsck will as well. > > I don't understand either. > > Isn't memory from malloc() swappable? I never looked at the code and why/how it crashes, but my guess was that it somehow causes the kernel to grab a lot of memory in the btrfs driver and that is what is what is crashing the system. If it were just malloc() the btrfs user space tool, it should be both swappable like you said, and should also get OOM'ed. I suppose I can still be completely wrong, but I can't find another logical explanation. I just tried running it again to trigger the problem, but because I freed a lot of snapshots, btrfs check --repair goes back to only using 10GB instead of 32GB, so I wasn't able to replicate OOM for you. Incidently, it died with: gargamel:~# btrfs check --repair /dev/mapper/dshelf2 enabling repair mode Checking filesystem on /dev/mapper/dshelf2 UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d root 18446744073709551607 has a root item with a more recent gen (143376) compared to the found root node (139061) ERROR: failed to repair root items: Invalid argument That said, when it was using a fair amount of RAM, I captured this: USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 1376 1.4 25.2 8256368 8240392 pts/18 R+ 14:52 1:07 btrfs check --repair /dev/mapper/dshelf2 I don't know how to read /proc/meminfo, but that's what it said: MemTotal: 32643792 kB MemFree: 1367516 kB MemAvailable: 15554836 kB Buffers: 3491672 kB Cached: 15900320 kB SwapCached: 2092 kB Active: 14577228 kB Inactive: 15028608 kB Active(anon): 12122180 kB Inactive(anon): 2643176 kB Active(file):2455048 kB Inactive(file): 12385432 kB Unevictable:8068 kB Mlocked:8068 kB SwapTotal: 15616764 kB < swap was totally unused and stays unused when I get the system to crash SwapFree: 15578020 kB Dirty: 71956 kB Writeback:64 kB AnonPages: 10219976 kB Mapped: 4033568 kB Shmem: 4545552 kB Slab: 713300 kB SReclaimable: 395508 kB SUnreclaim: 317792 kB KernelStack: 11788 kB PageTables:52592 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit:31938660 kB Committed_AS: 20070736 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages:0 kB ShmemPmdMapped:0 kB CmaTotal: 16384 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free:0 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 1207572 kB DirectMap2M:32045056 kB Does it help figure out where the memory was going and wehther kernel memory was being used? Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
task btrfs-transacti:921 blocked for more than 120 seconds during check repair
I got the following on 4.17.6 while running btrfs check --repair on an unmounted filesystem (not the lowmem version) I understand that btrfs check is userland only, although it seems that it caused these FS hangs on a different filesystem (the trace of course does not provide info on which FS) Any idea what happened here? I'm going to wait a few hours without running btrfs check to see if it happens again and then if running btrfs check will re-create this issue, but other suggestions (if any), are welcome: [ 2538.566952] Workqueue: btrfs-endio-write btrfs_endio_write_helper [ 2538.616484] Call Trace: [ 2538.623828] ? __schedule+0x53e/0x59b [ 2538.634802] schedule+0x7f/0x98 [ 2538.644214] wait_current_trans+0x9b/0xd8 [ 2538.656229] ? add_wait_queue+0x3a/0x3a [ 2538.668239] start_transaction+0x1ce/0x325 [ 2538.680556] btrfs_finish_ordered_io+0x240/0x5d3 [ 2538.694414] normal_work_helper+0x118/0x277 [ 2538.706984] process_one_work+0x19c/0x281 [ 2538.719036] ? rescuer_thread+0x279/0x279 [ 2538.731064] worker_thread+0x197/0x246 [ 2538.742322] kthread+0xeb/0xf0 [ 2538.751492] ? kthread_create_worker_on_cpu+0x66/0x66 [ 2538.76] ret_from_fork+0x35/0x40 [ 2538.777403] INFO: task kworker/u16:11:369 blocked for more than 120 seconds. [ 2538.799025] Not tainted 4.17.6-amd64-preempt-sysrq-20180818 #4 [ 2538.818109] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2538.841640] kworker/u16:11 D0 369 2 0x8000 [ 2538.858112] Workqueue: btrfs-endio-write btrfs_endio_write_helper [ 2538.876401] Call Trace: [ 2538.883770] ? __schedule+0x53e/0x59b [ 2538.894760] schedule+0x7f/0x98 [ 2538.904192] wait_current_trans+0x9b/0xd8 [ 2538.916242] ? add_wait_queue+0x3a/0x3a [ 2538.927772] start_transaction+0x1ce/0x325 [ 2538.940081] btrfs_finish_ordered_io+0x240/0x5d3 [ 2538.953973] normal_work_helper+0x118/0x277 [ 2538.966523] process_one_work+0x19c/0x281 [ 2538.978546] ? rescuer_thread+0x279/0x279 [ 2538.990560] worker_thread+0x197/0x246 [ 2539.001797] kthread+0xeb/0xf0 [ 2539.010986] ? kthread_create_worker_on_cpu+0x66/0x66 [ 2539.026137] ret_from_fork+0x35/0x40 [ 2539.037666] INFO: task btrfs-transacti:921 blocked for more than 120 seconds. [ 2539.059851] Not tainted 4.17.6-amd64-preempt-sysrq-20180818 #4 [ 2539.079733] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2539.104007] btrfs-transacti D0 921 2 0x8000 [ 2539.121257] Call Trace: [ 2539.129377] ? __schedule+0x53e/0x59b [ 2539.141171] schedule+0x7f/0x98 [ 2539.151370] btrfs_tree_lock+0xa6/0x19d [ 2539.163621] ? add_wait_queue+0x3a/0x3a [ 2539.175876] btrfs_search_slot+0x5aa/0x756 [ 2539.188899] lookup_inline_extent_backref+0x11a/0x485 [ 2539.204781] ? fixup_slab_list.isra.43+0x1b/0x72 [ 2539.219360] __btrfs_free_extent+0xf1/0xa72 [ 2539.232597] ? btrfs_merge_delayed_refs+0x18b/0x1a7 [ 2539.247922] ? __mutex_trylock_or_owner+0x43/0x54 [ 2539.262708] __btrfs_run_delayed_refs+0xad8/0xc40 [ 2539.277504] btrfs_run_delayed_refs+0x6e/0x16a [ 2539.291519] btrfs_commit_transaction+0x42/0x710 [ 2539.306043] ? start_transaction+0x295/0x325 [ 2539.319516] transaction_kthread+0xc9/0x135 [ 2539.332757] ? btrfs_cleanup_transaction+0x3ee/0x3ee [ 2539.348327] kthread+0xeb/0xf0 [ 2539.358155] ? kthread_create_worker_on_cpu+0x66/0x66 [ 2539.373977] ret_from_fork+0x35/0x40 [ 2539.385394] INFO: task vnstatd:6338 blocked for more than 120 seconds. [ 2539.405667] Not tainted 4.17.6-amd64-preempt-sysrq-20180818 #4 Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs check (not lowmem) and OOM-like hangs (4.17.6)
On Tue, Jul 17, 2018 at 10:50:32AM -0700, Marc MERLIN wrote: > I got the following on 4.17.6 while running btrfs check --repair on an > unmounted filesystem (not the lowmem version) > > I understand that btrfs check is userland only, although it seems that > it caused these FS hangs on a different filesystem (the trace of course > does not provide info on which FS) > > Any idea what happened here? > I'm going to wait a few hours without running btrfs check to see if it > happens again and then if running btrfs check will re-create this issue, > but other suggestions (if any), are welcome: Hi Qu, I know we were talking about this last week and then, btrfs check just worked for me so I wasn't able to reproduce. Now I'm able to reproduce again. I tried again, it's definitely triggered by btrfs check --repair I tried to capture what happens, and memory didn't dip to 0, but the system got very slow and things started failing. btrfs was never killed though while ssh was. Is there a chance that maybe btrfs is in some kernel OOM exclude list? Here is what I got when the system was not doing well (it took minutes to run): total used free sharedbuffers cached Mem: 32643788 32070952 572836 0 1021604378772 -/+ buffers/cache: 275900205053768 Swap: 15616764 973596 14643168 gargamel:~# cat /proc/meminfo MemTotal: 32643788 kB MemFree: 2726276 kB MemAvailable:2502200 kB Buffers: 12360 kB Cached: 1676388 kB SwapCached: 11048580 kB Active: 16443004 kB Inactive: 12010456 kB Active(anon): 16287780 kB Inactive(anon): 11651692 kB Active(file): 155224 kB Inactive(file): 358764 kB Unevictable:5776 kB Mlocked:5776 kB SwapTotal: 15616764 kB SwapFree: 294592 kB Dirty: 3032 kB Writeback: 76064 kB AnonPages: 15723272 kB Mapped: 612124 kB Shmem: 1171032 kB Slab: 399824 kB SReclaimable: 84568 kB SUnreclaim: 315256 kB KernelStack: 20576 kB PageTables:94268 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit:31938656 kB Committed_AS: 37909452 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 98304 kB ShmemHugePages:0 kB ShmemPmdMapped:0 kB CmaTotal: 16384 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free:0 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 355604 kB DirectMap2M:32897024 kB and console: [ 9184.345329] INFO: task zmtrigger.pl:9981 blocked for more than 120 seconds. [ 9184.366258] Not tainted 4.17.6-amd64-preempt-sysrq-20180818 #4 [ 9184.385323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 9184.408803] zmtrigger.plD0 9981 9804 0x20020080 [ 9184.425249] Call Trace: [ 9184.432580] ? __schedule+0x53e/0x59b [ 9184.443551] schedule+0x7f/0x98 [ 9184.452960] io_schedule+0x16/0x38 [ 9184.463154] wait_on_page_bit_common+0x10c/0x199 [ 9184.476996] ? file_check_and_advance_wb_err+0xd7/0xd7 [ 9184.493339] shmem_getpage_gfp+0x2dd/0x975 [ 9184.506558] shmem_fault+0x188/0x1c3 [ 9184.518199] ? filemap_map_pages+0x6f/0x295 [ 9184.531680] __do_fault+0x1d/0x6e [ 9184.542505] __handle_mm_fault+0x675/0xa61 [ 9184.555653] ? list_move+0x21/0x3a [ 9184.566737] handle_mm_fault+0x11c/0x16b [ 9184.579355] __do_page_fault+0x324/0x41c [ 9184.591996] ? page_fault+0x8/0x30 [ 9184.603059] page_fault+0x1e/0x30 [ 9184.613846] RIP: 0023:0xf7d2d022 [ 9184.624366] RSP: 002b:ffeb9fe8 EFLAGS: 00010202 [ 9184.640868] RAX: f7eed000 RBX: 567e6000 RCX: 0004 [ 9184.663095] RDX: 587fecb0 RSI: 5876538c RDI: 0004 [ 9184.685308] RBP: 58185160 R08: R09: [ 9184.707524] R10: R11: 0286 R12: [ 9184.729757] R13: R14: R15: [ 9184.751988] INFO: task /usr/sbin/apach:11868 blocked for more than 120 seconds. [ 9184.775106] Not tainted 4.17.6-amd64-preempt-sysrq-20180818 #4 [ 9184.795072] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 9184.819423] /usr/sbin/apach D0 11868 11311 0x20020080 [ 9184.836748] Call Trace: [ 9184.844926] ? __schedule+0x53e/0x59b [ 9184.856811] schedule+0x7f/0x98 [ 9184.867075] io_schedule+0x16/0x38 [ 9184.878114] wait_on_page_bit_common+0x10c/0x199 [ 9184.892807] ? file_check_and_advance_wb_err+0xd7/0xd7 [ 9184.909036] shmem_getpage_gfp+0x2dd/0x975 [ 9184.922157] shmem_fault+0x188/0x1c3 [ 9184.933667] ? filemap_map_pages+0x6f/0x29
Re: btrfs check (not lowmem) and OOM-like hangs (4.17.6)
Ok, I did more testing. Qu is right that btrfs check does not crash the kernel. It just takes all the memory until linux hangs everywhere, and somehow (no idea why) the OOM killer never triggers. Details below: On Tue, Jul 17, 2018 at 01:32:57PM -0700, Marc MERLIN wrote: > Here is what I got when the system was not doing well (it took minutes to > run): > > total used free sharedbuffers cached > Mem: 32643788 32070952 572836 0 1021604378772 > -/+ buffers/cache: 275900205053768 > Swap: 15616764 973596 14643168 ok, the reason it was not that close to 0 was due to /dev/shm it seems. I cleared that, and now I can get it to go to near 0 again. I'm wrong about the system being fully crashed, it's not, it's just very close to being hung. I can type killall -9 btrfs in the serial console and wait a few minutes. The system eventually recovers, but it's impossible to fix anything via ssh apparently because networking does not get to run when I'm in this state. I'm not sure why my system reproduces this easy while Qu's system does not, but Qu was right that the kernel is not dead and that it's merely a problem of userspace taking all the RAM and somehow not being killed by OOM I checked the PID and don't see why it's not being killed: gargamel:/proc/31006# grep . oom* oom_adj:0 oom_score:221 << this increases a lot, but OOM never kills it oom_score_adj:0 I have these variables: /proc/sys/vm/oom_dump_tasks:1 /proc/sys/vm/oom_kill_allocating_task:0 /proc/sys/vm/overcommit_kbytes:0 /proc/sys/vm/overcommit_memory:0 /proc/sys/vm/overcommit_ratio:50 << is this bad (seems default) Here is my system when it virtually died: ER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 31006 21.2 90.7 29639020 29623180 pts/19 D+ 13:49 1:35 ./btrfs check /dev/mapper/dshelf2 total used free sharedbuffers cached Mem: 32643788 32180100 463688 0 44664 119508 -/+ buffers/cache: 32015928 627860 Swap: 15616764 443676 15173088 MemTotal: 32643788 kB MemFree: 463440 kB MemAvailable: 44864 kB Buffers: 44664 kB Cached: 120360 kB SwapCached:87064 kB Active: 30381404 kB Inactive: 585952 kB Active(anon): 30334696 kB Inactive(anon): 474624 kB Active(file): 46708 kB Inactive(file): 111328 kB Unevictable:5616 kB Mlocked:5616 kB SwapTotal: 15616764 kB SwapFree: 15173088 kB Dirty: 1636 kB Writeback: 4 kB AnonPages: 30734240 kB Mapped:67236 kB Shmem: 3036 kB Slab: 267884 kB SReclaimable: 51528 kB SUnreclaim: 216356 kB KernelStack: 10144 kB PageTables:69284 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit:31938656 kB Committed_AS: 32865492 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages:0 kB ShmemPmdMapped:0 kB CmaTotal: 16384 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free:0 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 560404 kB DirectMap2M:32692224 kB -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check (not lowmem) and OOM-like hangs (4.17.6)
On Wed, Jul 18, 2018 at 08:05:51AM +0800, Qu Wenruo wrote: > No OOM triggers? That's a little strange. > Maybe it's related to how kernel handles memory over-commit? Yes, I think you are correct. > And for the hang, I think it's related to some memory allocation failure > and error handler just didn't handle it well, so it's causing deadlock > for certain page. That indeed matches what I'm seeing. > ENOMEM handling is pretty common but hardly verified, so it's not that > strange, but we must locate the problem. I seem to be getting deadlocks in the kernel, so I'm hoping that at least it's checked there, but maybe not? > In my system, at least I'm not using btrfs as root fs, and for the > memory eating program I normally ensure it's eating all the memory + > swap, so OOM killer is always triggered, maybe that's the cause. > > So in your case, maybe it's btrfs not really taking up all memory, thus > OOM killer not triggered. Correct, the swap is not used. > Any kernel dmesg about OOM killer triggered? Nothing at all. It never gets triggered. > > Here is my system when it virtually died: > > ER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND > > root 31006 21.2 90.7 29639020 29623180 pts/19 D+ 13:49 1:35 ./btrfs > > check /dev/mapper/dshelf2 See how btrs was taking 29GB in that ps output (that's before it takes everything and I can't even type ps anymore) Note that VSZ is almost equal to RSS. Nothing gets swapped. Then see free output: > > total used free sharedbuffers cached > > Mem: 32643788 32180100 463688 0 44664 119508 > > -/+ buffers/cache: 32015928 627860 > > Swap: 15616764 443676 15173088 > > For swap, it looks like only some other program's memory is swapped out, > not btrfs'. That's exactly correct. btrfs check never goes to swap, I'm not sure why, and because there is virtual memory free, maybe that's why OOM does not trigger? So I guess I can probably "fix" my problem by removing swap, but ultimately it would be useful to know why memory taken by btrfs check does not end up in swap. > And unfortunately, I'm not so familiar with OOM/MM code outside of > filesystem. > Any help from other experienced developers would definitely help to > solve why memory of 'btrfs check' is not swapped out or why OOM killer > is not triggered. Do you have someone from linux-vm you might be able to ask, or should we Cc this thread there? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check (not lowmem) and OOM-like hangs (4.17.6)
On Wed, Jul 18, 2018 at 10:42:21PM +0300, Andrei Borzenkov wrote: > > Any help from other experienced developers would definitely help to > > solve why memory of 'btrfs check' is not swapped out or why OOM killer > > is not triggered. > > Almost all used memory is marked as "active" and active pages are not > swapped. Page is active if it was accessed recently. Is it possible that > btrfs logic does frequent scans across all allocated memory? > >> > >> Active: 30381404 kB > >> Inactive: 585952 kB That is a very good find. Yes, the linux kernel VM may be smart enough not to swap pages that got used recently and when btrfs slurps all the extents to cross check everything, I think it does cross reference them all many times. This is why it can run in a few hours when btrfs check lowmem requires days to run in a similar situation. I'm not sure if there is a good way around this, but it's good to know that btrfs repair can effectively abuse the linux VM in a way that it'll take everything down without OOM having a chance to trigger. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Have 15GB missing in btrfs filesystem.
Normally when btrfs fi show will show lost space because your trees aren't balanced. Balance usually reclaims that space, or most of it. In this case, not so much. kernel 4.17.6: saruman:/mnt/btrfs_pool1# btrfs fi show . Label: 'btrfs_pool1' uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23 Total devices 1 FS bytes used 186.89GiB devid1 size 228.67GiB used 207.60GiB path /dev/mapper/pool1 Ok, I have 21GB between used by FS and used in block layer. saruman:/mnt/btrfs_pool1# btrfs balance start -dusage=40 -v . Dumping filters: flags 0x1, state 0x0, force is off DATA (flags 0x2): balancing, usage=40 Done, had to relocate 1 out of 210 chunks saruman:/mnt/btrfs_pool1# btrfs balance start -musage=60 -v . Dumping filters: flags 0x6, state 0x0, force is off METADATA (flags 0x2): balancing, usage=60 SYSTEM (flags 0x2): balancing, usage=60 Done, had to relocate 4 out of 209 chunks saruman:/mnt/btrfs_pool1# btrfs fi show . Label: 'btrfs_pool1' uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23 Total devices 1 FS bytes used 186.91GiB devid1 size 228.67GiB used 205.60GiB path /dev/mapper/pool1 That didn't help much, delta is now 19GB saruman:/mnt/btrfs_pool1# btrfs balance start -dusage=80 -v . Dumping filters: flags 0x1, state 0x0, force is off DATA (flags 0x2): balancing, usage=80 Done, had to relocate 8 out of 207 chunks saruman:/mnt/btrfs_pool1# btrfs fi show . Label: 'btrfs_pool1' uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23 Total devices 1 FS bytes used 187.03GiB devid1 size 228.67GiB used 201.54GiB path /dev/mapper/pool1 Ok, now delta is 14GB saruman:/mnt/btrfs_pool1# btrfs balance start -musage=80 -v . Dumping filters: flags 0x6, state 0x0, force is off METADATA (flags 0x2): balancing, usage=80 SYSTEM (flags 0x2): balancing, usage=80 Done, had to relocate 5 out of 202 chunks saruman:/mnt/btrfs_pool1# btrfs fi show . Label: 'btrfs_pool1' uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23 Total devices 1 FS bytes used 188.24GiB devid1 size 228.67GiB used 203.54GiB path /dev/mapper/pool1 and it's back to 15GB :-/ How can I get 188.24 and 203.54 to converge further? Where is all that space gone? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08
Re: Have 15GB missing in btrfs filesystem.
On Wed, Oct 24, 2018 at 01:07:25PM +0800, Qu Wenruo wrote: > > saruman:/mnt/btrfs_pool1# btrfs balance start -musage=80 -v . > > Dumping filters: flags 0x6, state 0x0, force is off > > METADATA (flags 0x2): balancing, usage=80 > > SYSTEM (flags 0x2): balancing, usage=80 > > Done, had to relocate 5 out of 202 chunks > > saruman:/mnt/btrfs_pool1# btrfs fi show . > > Label: 'btrfs_pool1' uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23 > > Total devices 1 FS bytes used 188.24GiB > > devid1 size 228.67GiB used 203.54GiB path /dev/mapper/pool1 > > > > and it's back to 15GB :-/ > > > > How can I get 188.24 and 203.54 to converge further? Where is all that > > space gone? > > Your original chunks are already pretty compact. > Thus really no need to do extra balance. > > You may get some extra space by doing full system balance (no usage= > filter), but that's really not worthy in my opinion. > > Maybe you could try defrag to free some space wasted by CoW instead? > (If you're not using many snapshots) Thanks for the reply. So right now, I have: saruman:~# btrfs fi show /mnt/btrfs_pool1/ Label: 'btrfs_pool1' uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23 Total devices 1 FS bytes used 188.25GiB devid1 size 228.67GiB used 203.54GiB path /dev/mapper/pool1 saruman:~# btrfs fi df /mnt/btrfs_pool1/ Data, single: total=192.48GiB, used=184.87GiB System, DUP: total=32.00MiB, used=48.00KiB Metadata, DUP: total=5.50GiB, used=3.38GiB GlobalReserve, single: total=512.00MiB, used=0.00B I've been using btrfs for a long time now but I've never had a filesystem where I had 15GB apparently unusable (7%) after a balance. I can't drop all the snapshots since at least two is used for btrfs send/receive backups. However, if I delete more snapshots, and do a full balance, you think it'll free up more space? I can try a defrag next, but since I have COW for snapshots, it's not going to help much, correct? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08
Re: Have 15GB missing in btrfs filesystem.
On Sat, Oct 27, 2018 at 02:12:02PM -0400, Remi Gauvin wrote: > On 2018-10-27 01:42 PM, Marc MERLIN wrote: > > > > > I've been using btrfs for a long time now but I've never had a > > filesystem where I had 15GB apparently unusable (7%) after a balance. > > > > The space isn't unusable. It's just allocated.. (It's used in the sense > that it's reserved for data chunks.). Start writing data to the drive, > and the data will fill that space before more gets allocated.. (Unless > you are using an older kernel and the filesystem gets mounted with ssd > option, in which case, you'll want to add nossd option to prevent that > behaviour.) > > You can use btrfs fi usage to display that more clearly. Got it. I have disk space free alerts based on df, which I know doesn't mean that much on btrfs. Maybe I'll just need to change that alert code to make it btrfs aware. > > I can try a defrag next, but since I have COW for snapshots, it's not > > going to help much, correct? > > The defrag will end up using more space, as the fragmented parts of > files will get duplicated. That being said, if you have the luxury to > defrag *before* taking new snapshots, that would be the time to do it. Thanks for confirming. Because I always have snapshots for btrfs send/receive, defrag will duplicate as you say, but once the older snapshots get freed up, the duplicate blocks should go away, correct? Back to usage, thanks for pointing out that command: saruman:/mnt/btrfs_pool1# btrfs fi usage . Overall: Device size: 228.67GiB Device allocated:203.54GiB Device unallocated: 25.13GiB Device missing: 0.00B Used:192.01GiB Free (estimated): 32.44GiB (min: 19.88GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,single: Size:192.48GiB, Used:185.16GiB /dev/mapper/pool1 192.48GiB Metadata,DUP: Size:5.50GiB, Used:3.42GiB /dev/mapper/pool1 11.00GiB System,DUP: Size:32.00MiB, Used:48.00KiB /dev/mapper/pool1 64.00MiB Unallocated: /dev/mapper/pool1 25.13GiB I'm still seing that I'm using 192GB, but 203GB allocated. Do I have 25GB usable: Device unallocated: 25.13GiB Or 35GB usable? Device size: 228.67GiB - Used:192.01GiB = 36GB ? Yes I know that I shouldn't get close to filling up the device, just trying to clear up if I should stay below 25GB or below 35GB Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08
Re: Have 15GB missing in btrfs filesystem.
On Sun, Oct 28, 2018 at 07:27:22AM +0800, Qu Wenruo wrote: > > I can't drop all the snapshots since at least two is used for btrfs > > send/receive backups. > > However, if I delete more snapshots, and do a full balance, you think > > it'll free up more space? > > No. > > You're already too worried about an non-existing problem. > Your fs looks pretty healthy. Thanks both for the answers. I'll go back and read them more carefully later to see how I can adjust my monitoring but basically I hit the 90% space used in df alert, and I know that once I get close to full, or completely full, very bad things happen with btrfs, making the system sometimes so unusable that it's very hard to reclaim space and fix the issue (not counting that if you have btrfs send snapshots, you're forced to break the snapshot relationship and start over since deleting data does not reclaim blocks that are obviously still marked as used by the last snapshot that was sent to the backup server). Long story short, I try very hard to not ever hit this problem again :) Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08
4.15.6 crash: BUG at fs/btrfs/ctree.c:1862
static noinline struct extent_buffer * read_node_slot(struct btrfs_fs_info *fs_info, struct extent_buffer *parent, int slot) { int level = btrfs_header_level(parent); struct extent_buffer *eb; if (slot < 0 || slot >= btrfs_header_nritems(parent)) return ERR_PTR(-ENOENT); BUG_ON(level == 0); BTRFS info (device dm-2): relocating block group 13404622290944 flags data BTRFS info (device dm-2): found 9959 extents BTRFS info (device dm-2): found 9959 extents BTRFS info (device dm-2): relocating block group 13403548549120 flags data [ cut here ] kernel BUG at fs/btrfs/ctree.c:1862! invalid opcode: [#1] PREEMPT SMP PTI CPU: 5 PID: 8103 Comm: btrfs Tainted: G U 4.15.6-amd64-preempt-sysrq-20171018 #3 Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013 RIP: 0010:read_node_slot+0x3c/0x9e RSP: 0018:becfaa0b7b58 EFLAGS: 00210246 RAX: 00a0 RBX: 000c RCX: 0003 RDX: 000c RSI: 9a60e9d9de78 RDI: 00052f6e RBP: 9a60e9d9de78 R08: 0001 R09: becfaa0b7bf6 R10: 9a64988bd7e9 R11: 9a64988bd7c8 R12: e003d4bdb800 R13: 9a64a481 R14: R15: FS: 7fba34c9c8c0() GS:9a64de34() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 5a8b9c9a CR3: 0001446c6004 CR4: 001606e0 Call Trace: tree_advance+0xb1/0x11e btrfs_compare_trees+0x1c2/0x4d6 ? process_extent+0xdcf/0xdcf btrfs_ioctl_send+0x81e/0xc70 ? __kmalloc_track_caller+0xfb/0x10f _btrfs_ioctl_send+0xbc/0xe6 ? paravirt_sched_clock+0x5/0x8 ? set_task_rq+0x2f/0x80 ? task_rq_unlock+0x22/0x36 btrfs_ioctl+0x162f/0x1dc8 ? select_task_rq_fair+0xb65/0xb7a ? update_load_avg+0x16d/0x442 ? list_add+0x15/0x2e ? cfs_rq_throttled.isra.30+0x9/0x18 ? vfs_ioctl+0x1b/0x28 vfs_ioctl+0x1b/0x28 do_vfs_ioctl+0x4f4/0x53f ? __audit_syscall_entry+0xbf/0xe3 SyS_ioctl+0x52/0x76 do_syscall_64+0x72/0x81 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7fba34d835e7 RSP: 002b:7ffc32cf4cb8 EFLAGS: 0202 ORIG_RAX: 0010 RAX: ffda RBX: 523f RCX: 7fba34d835e7 RDX: 7ffc32cf4d40 RSI: 40489426 RDI: 0004 RBP: 0004 R08: R09: 7fba34c9b700 R10: 7fba34c9b9d0 R11: 0202 R12: 0003 R13: 563a30b87020 R14: 0001 R15: 0001 Code: f5 53 4c 8b a6 98 00 00 00 89 d3 4c 89 e7 e8 67 fd ff ff 85 db 78 63 4c 89 e7 41 88 c6 e8 92 fb ff ff 39 d8 76 54 45 84 f6 75 02 <0f> 0b 89 de 48 89 ef e8 2e ff ff ff 89 de 49 89 c4 48 89 ef e8 RIP: read_node_slot+0x3c/0x9e RSP: becfaa0b7b58 ---[ end trace a24e7de6b77b5cb1 ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: 0x1900 from 0x8100 (relocation range: 0x8000-0xbfff) -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 4.15.6 crash: BUG at fs/btrfs/ctree.c:1862
On Tue, May 15, 2018 at 09:36:11AM +0100, Filipe Manana wrote: > We got a fix for this recently: https://patchwork.kernel.org/patch/10396523/ Thanks very much for the notice, sorry that I missed it. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs balance did not progress after 12H
So, I ran this: gargamel:/mnt/btrfs_pool2# btrfs balance start -dusage=60 -v . & [1] 24450 Dumping filters: flags 0x1, state 0x0, force is off DATA (flags 0x2): balancing, usage=60 gargamel:/mnt/btrfs_pool2# while :; do btrfs balance status .; sleep 60; done 0 out of about 0 chunks balanced (0 considered), -nan% left Balance on '.' is running 0 out of about 73 chunks balanced (2 considered), 100% left Balance on '.' is running After about 20mn, it changed to this: 1 out of about 73 chunks balanced (6724 considered), 99% left Balance on '.' is running Now, 12H later, it's still there, only 1 out of 73. gargamel:/mnt/btrfs_pool2# btrfs fi show . Label: 'dshelf2' uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d Total devices 1 FS bytes used 12.72TiB devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2 gargamel:/mnt/btrfs_pool2# btrfs fi df . Data, single: total=13.57TiB, used=12.60TiB System, DUP: total=32.00MiB, used=1.55MiB Metadata, DUP: total=121.50GiB, used=116.53GiB GlobalReserve, single: total=512.00MiB, used=848.00KiB kernel: 4.16.8 Is that expected? Should I be ready to wait days possibly for this balance to finish? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance did not progress after 12H
On Mon, Jun 18, 2018 at 06:00:55AM -0700, Marc MERLIN wrote: > So, I ran this: > gargamel:/mnt/btrfs_pool2# btrfs balance start -dusage=60 -v . & > [1] 24450 > Dumping filters: flags 0x1, state 0x0, force is off > DATA (flags 0x2): balancing, usage=60 > gargamel:/mnt/btrfs_pool2# while :; do btrfs balance status .; sleep 60; done > 0 out of about 0 chunks balanced (0 considered), -nan% left > Balance on '.' is running > 0 out of about 73 chunks balanced (2 considered), 100% left > Balance on '.' is running > > After about 20mn, it changed to this: > 1 out of about 73 chunks balanced (6724 considered), 99% left > Balance on '.' is running > > Now, 12H later, it's still there, only 1 out of 73. > > gargamel:/mnt/btrfs_pool2# btrfs fi show . > Label: 'dshelf2' uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d > Total devices 1 FS bytes used 12.72TiB > devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2 > > gargamel:/mnt/btrfs_pool2# btrfs fi df . > Data, single: total=13.57TiB, used=12.60TiB > System, DUP: total=32.00MiB, used=1.55MiB > Metadata, DUP: total=121.50GiB, used=116.53GiB > GlobalReserve, single: total=512.00MiB, used=848.00KiB > > kernel: 4.16.8 > > Is that expected? Should I be ready to wait days possibly for this > balance to finish? It's now beeen 2 days, and it's still stuck at 1% 1 out of about 73 chunks balanced (6724 considered), 99% left Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still
On Tue, Jun 19, 2018 at 12:58:44PM -0400, Austin S. Hemmelgarn wrote: > > In your situation, I would run "btrfs pause ", wait to hear from > > a btrfs developer, and not use the volume whatsoever in the meantime. > I would say this is probably good advice. I don't really know what's going > on here myself actually, though it looks like the balance got stuck (the > output hasn't changed for over 36 hours, unless you've got an insanely slow > storage array, that's extremely unusual (it should only be moving at most > 3GB of data per chunk)). I didn't hear from any developer, so I had to continue. - btrfs scrub cancel did not work (hang) - at reboot mounting the filesystem hung, even with 4.17, which is disappointing (it should not hang) - mount -o recovery still hung - mount -o ro did not hang though Sigh, why is my FS corrupted again? Anyway, back to btrfs check --repair and, it took all my 32GB of RAM on a system I can't add more RAM to, so I'm hosed. I'll note in passing (and it's not ok at all) that check --repair after a 20 to 30mn pause, takes all the kernel RAM more quickly than the system can OOM or log anything, and just deadlocks it. This is repeateable and totally not ok :( I'm now left with btrfs-progs git master, and lowmem which finally does a bit of repair. So far: gargamel:~# btrfs check --mode=lowmem --repair -p /dev/mapper/dshelf2 enabling repair mode WARNING: low-memory mode repair support is only partial Checking filesystem on /dev/mapper/dshelf2 UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d Fixed 0 roots. ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, owner: 374857, offset: 3407872) wanted: 3, have: 4 Created new chunk [18457780224000 1073741824] Delete backref in extent [84302495744 69632] ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, owner: 374857, offset: 3407872) wanted: 3, have: 4 Delete backref in extent [84302495744 69632] ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, owner: 374857, offset: 114540544) wanted: 181, have: 240 Delete backref in extent [125712527360 12214272] ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, owner: 374857, offset: 126754816) wanted: 68, have: 115 Delete backref in extent [125730848768 5111808] ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, owner: 374857, offset: 126754816) wanted: 68, have: 115 Delete backref in extent [125730848768 5111808] ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, owner: 374857, offset: 131866624) wanted: 115, have: 143 Delete backref in extent [125736914944 6037504] ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, owner: 374857, offset: 131866624) wanted: 115, have: 143 Delete backref in extent [125736914944 6037504] ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, owner: 374857, offset: 148234240) wanted: 302, have: 431 Delete backref in extent [129952120832 20242432] ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, owner: 374857, offset: 148234240) wanted: 356, have: 433 Delete backref in extent [129952120832 20242432] ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, owner: 374857, offset: 180371456) wanted: 161, have: 240 Delete backref in extent [134925357056 11829248] ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, owner: 374857, offset: 180371456) wanted: 162, have: 240 Delete backref in extent [134925357056 11829248] ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, owner: 374857, offset: 192200704) wanted: 170, have: 249 Delete backref in extent [147895111680 12345344] ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, owner: 374857, offset: 192200704) wanted: 172, have: 251 Delete backref in extent [147895111680 12345344] ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, owner: 374857, offset: 217653248) wanted: 348, have: 418 Delete backref in extent [150850146304 17522688] ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, owner: 374857, offset: 235175936) wanted: 555, have: 1449 Deleted root 2 item[156909494272, 178, 5476627808561673095] ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, owner: 374857, offset: 235175936) wanted: 556, have: 1452 Deleted root 2 item[156909494272, 178, 7338474132555182983] At the rate it's going, it'll probably take days though, it's already been 36H Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still
On Mon, Jun 25, 2018 at 06:24:37PM +0200, Hans van Kranenburg wrote: > >> output hasn't changed for over 36 hours, unless you've got an insanely slow > >> storage array, that's extremely unusual (it should only be moving at most > >> 3GB of data per chunk)). > > > > I didn't hear from any developer, so I had to continue. > > - btrfs scrub cancel did not work (hang) > > Did you mean balance cancel? It waits until the current block group is > finished. Yes, I meant that, thanks for correcting me. And you're correct that because it was hung, cancel wasn't going to go anywhere. At least my filesystem was still working at the time (as in IO was going on just fine) > > - at reboot mounting the filesystem hung, even with 4.17, which is > > disappointing (it should not hang) > > - mount -o recovery still hung > > - mount -o ro did not hang though > > > > Sigh, why is my FS corrupted again? > > Again? Do you think balance is corrupting the filesystem? Or have there > been previous btrfs check --repair operations which made smaller > problems bigger in the past? Honestly, I don't fully remember at this point, I keep notes, but not detailled enough and it's been a little while. I know I've had to delete/recreate this filesystem twice already over the last years, but I'm not fully certain I remember when this one was last wiped. Yes, I do run balance along with scrub once a month: btrfs balance start -musage=0 -v $mountpoint 2>&1 | grep -Ev "$FILTER" # After metadata, let's do data: btrfs balance start -dusage=0 -v $mountpoint 2>&1 | grep -Ev "$FILTER" btrfs balance start -dusage=20 -v $mountpoint 2>&1 | grep -Ev "$FILTER" echo btrfs scrub start -Bd $mountpoint ionice -c 3 nice -10 btrfs scrub start -Bd $mountpoint Hard to say if balance has damaged my filesystem over time, but it's definitely possible. > Am I right to interpret the messages below, and see that you have > extents that are referenced hundreds of times? I'm not certain, but it's a backup server with many blocks that are the same, so it could be some COW stuff, even if I didn't run any dedupe commands myself. > Is there heavy snapshotting or deduping going on in this filesystem? If > so, it's not surprising balance will get a hard time moving extents > around, since it has to update all of the metadata for each extent again > in hundreds of places. There is some snapshotting, but maybe around 20 or so per subvolume, not hundreds. > Did you investigate what balance was doing if it takes long? Is is using > cpu all the time, or is it reading from disk slowly (random reads) or is > it writing to disk all the time at full speed? I couldn't see what it was doing, but it's running in the kernel, is it not? (or can you just strace the user space command?) Either way, it's too late for that now, and given that it didn't make progress of a single block in 36H, I'm assuming it was well deadlocked. Thanks for the reply. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still
On Mon, Jun 25, 2018 at 01:07:10PM -0400, Austin S. Hemmelgarn wrote: > > - mount -o recovery still hung > > - mount -o ro did not hang though > One tip here specifically, if you had to reboot during a balance and the FS > hangs when it mounts, try mounting with `-o skip_balance`. That should > pause the balance instead of resuming it on mount, at which point you should > also be able to cancel it without it hanging. Very good tip, I have this in all my mountpoints :) #LABEL=dshelf2 /mnt/btrfs_pool2 btrfs defaults,compress=lzo,skip_balance,noatime Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
So, does btrfs check lowmem take days? weeks?
Regular btrfs check --repair has a nice progress option. It wasn't perfect, but it showed something. But then it also takes all your memory quicker than the linux kernel can defend itself and reliably completely kills my 32GB server quicker than it can OOM anything. lowmem repair seems to be going still, but it's been days and -p seems to do absolutely nothing. My filesystem is "only" 10TB or so, albeit with a lot of files. 2 things that come to mind 1) can lowmem have some progress working so that I know if I'm looking at days, weeks, or even months before it will be done? 2) non lowmem is more efficient obviously when it doesn't completely crash your machine, but could lowmem be given an amount of memory to use for caching, or maybe use some heuristics based on RAM free so that it's not so excrutiatingly slow? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: So, does btrfs check lowmem take days? weeks?
On Fri, Jun 29, 2018 at 01:07:20PM +0800, Qu Wenruo wrote: > > lowmem repair seems to be going still, but it's been days and -p seems > > to do absolutely nothing. > > I'm a afraid you hit a bug in lowmem repair code. > By all means, --repair shouldn't really be used unless you're pretty > sure the problem is something btrfs check can handle. > > That's also why --repair is still marked as dangerous. > Especially when it's combined with experimental lowmem mode. Understood, but btrfs got corrupted (by itself or not, I don't know) I cannot mount the filesystem read/write I cannot btrfs check --repair it since that code will kill my machine What do I have left? > > My filesystem is "only" 10TB or so, albeit with a lot of files. > > Unless you have tons of snapshots and reflinked (deduped) files, it > shouldn't take so long. I may have a fair amount. gargamel:~# btrfs check --mode=lowmem --repair -p /dev/mapper/dshelf2 enabling repair mode WARNING: low-memory mode repair support is only partial Checking filesystem on /dev/mapper/dshelf2 UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d Fixed 0 roots. ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, owner: 374857, offset: 3407872) wanted: 3, have: 4 Created new chunk [18457780224000 1073741824] Delete backref in extent [84302495744 69632] ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, owner: 374857, offset: 3407872) wanted: 3, have: 4 Delete backref in extent [84302495744 69632] ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, owner: 374857, offset: 114540544) wanted: 181, have: 240 Delete backref in extent [125712527360 12214272] ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, owner: 374857, offset: 126754816) wanted: 68, have: 115 Delete backref in extent [125730848768 5111808] ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, owner: 374857, offset: 126754816) wanted: 68, have: 115 Delete backref in extent [125730848768 5111808] ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, owner: 374857, offset: 131866624) wanted: 115, have: 143 Delete backref in extent [125736914944 6037504] ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, owner: 374857, offset: 131866624) wanted: 115, have: 143 Delete backref in extent [125736914944 6037504] ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, owner: 374857, offset: 148234240) wanted: 302, have: 431 Delete backref in extent [129952120832 20242432] ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, owner: 374857, offset: 148234240) wanted: 356, have: 433 Delete backref in extent [129952120832 20242432] ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, owner: 374857, offset: 180371456) wanted: 161, have: 240 Delete backref in extent [134925357056 11829248] ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, owner: 374857, offset: 180371456) wanted: 162, have: 240 Delete backref in extent [134925357056 11829248] ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, owner: 374857, offset: 192200704) wanted: 170, have: 249 Delete backref in extent [147895111680 12345344] ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, owner: 374857, offset: 192200704) wanted: 172, have: 251 Delete backref in extent [147895111680 12345344] ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, owner: 374857, offset: 217653248) wanted: 348, have: 418 Delete backref in extent [150850146304 17522688] ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, owner: 374857, offset: 235175936) wanted: 555, have: 1449 Deleted root 2 item[156909494272, 178, 5476627808561673095] ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, owner: 374857, offset: 235175936) wanted: 556, have: 1452 Deleted root 2 item[156909494272, 178, 7338474132555182983] ERROR: file extent[374857 235184128] root 21872 owner 21872 backref lost Add one extent data backref [156909494272 55320576] ERROR: file extent[374857 235184128] root 22911 owner 22911 backref lost Add one extent data backref [156909494272 55320576] The last two ERROR lines took over a day to get generated, so I'm not sure if it's still working, but just slowly. For what it's worth non lowmem check used to take 12 to 24H on that filesystem back when it still worked. > > 2 things that come to mind > > 1) can lowmem have some progress working so that I know if I'm looking > > at days, weeks, or even months before it will be done? > > It's hard to estimate, especially when every cross check involves a lot > of disk IO. > But at least, we could add such indicator to show we're doing something. Yes, anything to show that I should still wait is still good :) > > 2) non lowmem is
Re: So, does btrfs check lowmem take days? weeks?
On Fri, Jun 29, 2018 at 01:35:06PM +0800, Su Yue wrote: > > It's hard to estimate, especially when every cross check involves a lot > > of disk IO. > > > > But at least, we could add such indicator to show we're doing something. > > Maybe we can account all roots in root tree first, before checking a > tree, report i/num_roots. So users can see the what is the check doing > something meaningful or silly dead looping. Sounds reasonable. Do you want to submit something in git master for btrfs-progs, I pull it, and just my btrfs check again? In the meantime, how sane does the output I just posted, look? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: So, does btrfs check lowmem take days? weeks?
On Fri, Jun 29, 2018 at 01:48:17PM +0800, Qu Wenruo wrote: > Just normal btrfs check, and post the output. > If normal check eats up all your memory, btrfs check --mode=lowmem. Does check without --repair eat less RAM? > --repair should be considered as the last method. If --repair doesn't work, check is useless to me sadly. I know that for FS analysis and bug reporting, you want to have the FS without changing it to something maybe worse, but for my use, if it can't be mounted and can't be fixed, then it gets deleted which is even worse than check doing the wrong thing. > > The last two ERROR lines took over a day to get generated, so I'm not sure > > if it's still working, but just slowly. > > OK, that explains something. > > One extent is referred hundreds times, no wonder it will take a long time. > > Just one tip here, there are really too many snapshots/reflinked files. > It's highly recommended to keep the number of snapshots to a reasonable > number (lower two digits). > Although btrfs snapshot is super fast, it puts a lot of pressure on its > extent tree, so there is no free lunch here. Agreed, I doubt I have over or much over 100 snapshots though (but I can't check right now). Sadly I'm not allowed to mount even read only while check is running: gargamel:~# mount -o ro /dev/mapper/dshelf2 /mnt/mnt2 mount: /dev/mapper/dshelf2 already mounted or /mnt/mnt2 busy > > I see. Is there any reasonably easy way to check on this running process? > > GDB attach would be good. > Interrupt and check the inode number if it's checking fs tree. > Check the extent bytenr number if it's checking extent tree. > > But considering how many snapshots there are, it's really hard to determine. > > In this case, the super large extent tree is causing a lot of problem, > maybe it's a good idea to allow btrfs check to skip extent tree check? I only see --init-extent-tree in the man page, which option did you have in mind? > > Then again, maybe it already fixed enough that I can mount my filesystem > > again. > > This needs the initial btrfs check report and the kernel messages how it > fails to mount. mount command hangs, kernel does not show anything special outside of disk access hanging. Jun 23 17:23:26 gargamel kernel: [ 341.802696] BTRFS warning (device dm-2): 'recovery' is deprecated, use 'useback uproot' instead Jun 23 17:23:26 gargamel kernel: [ 341.828743] BTRFS info (device dm-2): trying to use backup root at mount time Jun 23 17:23:26 gargamel kernel: [ 341.850180] BTRFS info (device dm-2): disk space caching is enabled Jun 23 17:23:26 gargamel kernel: [ 341.869014] BTRFS info (device dm-2): has skinny extents Jun 23 17:23:26 gargamel kernel: [ 342.206289] BTRFS info (device dm-2): bdev /dev/mapper/dshelf2 errs: wr 0, rd 0 , flush 0, corrupt 2, gen 0 Jun 23 17:26:26 gargamel kernel: [ 521.571392] BTRFS info (device dm-2): enabling ssd optimizations Jun 23 17:55:58 gargamel kernel: [ 2293.914867] perf: interrupt took too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 79750 Jun 23 17:56:22 gargamel kernel: [ 2317.718406] BTRFS info (device dm-2): disk space caching is enabled Jun 23 17:56:22 gargamel kernel: [ 2317.737277] BTRFS info (device dm-2): has skinny extents Jun 23 17:56:22 gargamel kernel: [ 2318.069461] BTRFS info (device dm-2): bdev /dev/mapper/dshelf2 errs: wr 0, rd 0 , flush 0, corrupt 2, gen 0 Jun 23 17:59:22 gargamel kernel: [ 2498.256167] BTRFS info (device dm-2): enabling ssd optimizations Jun 23 18:05:23 gargamel kernel: [ 2859.107057] BTRFS info (device dm-2): disk space caching is enabled Jun 23 18:05:23 gargamel kernel: [ 2859.125883] BTRFS info (device dm-2): has skinny extents Jun 23 18:05:24 gargamel kernel: [ 2859.448018] BTRFS info (device dm-2): bdev /dev/mapper/dshelf2 errs: wr 0, rd 0 , flush 0, corrupt 2, gen 0 Jun 23 18:08:23 gargamel kernel: [ 3039.023305] BTRFS info (device dm-2): enabling ssd optimizations Jun 23 18:13:41 gargamel kernel: [ 3356.626037] perf: interrupt took too long (3143 > 3133), lowering kernel.perf_event_max_sample_rate to 63500 Jun 23 18:17:23 gargamel kernel: [ 3578.937225] Process accounting resumed Jun 23 18:33:47 gargamel kernel: [ 4563.356252] JFS: nTxBlock = 8192, nTxLock = 65536 Jun 23 18:33:48 gargamel kernel: [ 4563.446715] ntfs: driver 2.1.32 [Flags: R/W MODULE]. Jun 23 18:42:20 gargamel kernel: [ 5075.995254] INFO: task sync:20253 blocked for more than 120 seconds. Jun 23 18:42:20 gargamel kernel: [ 5076.015729] Not tainted 4.17.2-amd64-preempt-sysrq-20180817 #1 Jun 23 18:42:20 gargamel kernel: [ 5076.036141] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 23 18:42:20 gargamel kernel: [ 5076.060637] syncD0 20253 15327 0x20020080 Jun 23 18:42:20 gargamel kernel: [ 5076.078032] Call Trace: Jun 23 18:42:20 gargamel kernel: [ 5076.086366] ? __schedule+0x53e/0x59b Jun 23 18:42:20 gargamel kernel: [ 5076.098311] schedule+0x7f/0x98 Ju
Re: So, does btrfs check lowmem take days? weeks?
On Fri, Jun 29, 2018 at 02:02:19PM +0800, Su Yue wrote: > I have figured out the bug is lowmem check can't deal with shared tree block > in reloc tree. The fix is simple, you can try the follow repo: > > https://github.com/Damenly/btrfs-progs/tree/tmp1 Not sure if I undertand that you meant, here. > Please run lowmem check "without =--repair" first to be sure whether > your filesystem is fine. The filesystem is not fine, it caused btrfs balance to hang, whether balance actually broke it further or caused the breakage, I can't say. Then mount hangs, even with recovery, unless I use ro. This filesystem is trash to me and will require over a week to rebuild manually if I can't repair it. Running check without repair for likely several days just to know that my filesystem is not clear (I already know this) isn't useful :) Or am I missing something? > Though the bug and phenomenon are clear enough, before sending my patch, > I have to make a test image. I have spent a week to study btrfs balance > but it seems a liitle hard for me. thanks for having a look, either way. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: So, does btrfs check lowmem take days? weeks?
On Fri, Jun 29, 2018 at 02:32:44PM +0800, Su Yue wrote: > > > https://github.com/Damenly/btrfs-progs/tree/tmp1 > > > > Not sure if I undertand that you meant, here. > > > Sorry for my unclear words. > Simply speaking, I suggest you to stop current running check. > Then, clone above branch to compile binary then run > 'btrfs check --mode=lowmem $dev'. I understand, I'll build and try it. > > This filesystem is trash to me and will require over a week to rebuild > > manually if I can't repair it. > > Understood your anxiety, a log of check without '--repair' will help > us to make clear what's wrong with your filesystem. Ok, I'll run your new code without repair and report back. It will likely take over a day though. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: So, does btrfs check lowmem take days? weeks?
On Fri, Jun 29, 2018 at 02:29:10PM +0800, Qu Wenruo wrote: > > If --repair doesn't work, check is useless to me sadly. > > Not exactly. > Although it's time consuming, I have manually patched several users fs, > which normally ends pretty well. Ok I understand now. > > Agreed, I doubt I have over or much over 100 snapshots though (but I > > can't check right now). > > Sadly I'm not allowed to mount even read only while check is running: > > gargamel:~# mount -o ro /dev/mapper/dshelf2 /mnt/mnt2 > > mount: /dev/mapper/dshelf2 already mounted or /mnt/mnt2 busy Ok, so I just checked now, 270 snapshots, but not because I'm crazy, because I use btrfs send a lot :) > This looks like super block corruption? > > What about "btrfs inspect dump-super -fFa /dev/mapper/dshelf2"? Sure, there you go: https://pastebin.com/uF1pHTsg > And what about "skip_balance" mount option? I have this in my fstab :) > Another problem is, with so many snapshots, balance is also hugely > slowed, thus I'm not 100% sure if it's really a hang. I sent another thread about this last week, balance got hung after 2 days of doing nothing and just moving a single chunk. Ok, I was able to remount the filesystem read only. I was wrong, I have 270 snapshots: gargamel:/mnt/mnt# btrfs subvolume list . | grep -c 'path backup/' 74 gargamel:/mnt/mnt# btrfs subvolume list . | grep -c 'path backup-btrfssend/' 196 It's a backup server, I use btrfs send for many machines and for each btrs send, I keep history, maybe 10 or so backups. So it adds up in the end. Is btrfs unable to deal with this well enough? > If for that usage, btrfs-restore would fit your use case more, > Unfortunately it needs extra disk space and isn't good at restoring > subvolume/snapshots. > (Although it's much faster than repairing the possible corrupted extent > tree) It's a backup server, it only contains data from other machines. If the filesystem cannot be recovered to a working state, I will need over a week to restart the many btrfs send commands from many servers. This is why anything other than --repair is useless ot me, I don't need the data back, it's still on the original machines, I need the filesystem to work again so that I don't waste a week recreating the many btrfs send/receive relationships. > > Is that possible at all? > > At least for file recovery (fs tree repair), we have such behavior. > > However, the problem you hit (and a lot of users hit) is all about > extent tree repair, which doesn't even goes to file recovery. > > All the hassle are in extent tree, and for extent tree, it's just good > or bad. Any corruption in extent tree may lead to later bugs. > The only way to avoid extent tree problems is to mount the fs RO. > > So, I'm afraid it is at least impossible for recent years. Understood, thanks for answering. Does the pastebin help and is 270 snapshots ok enough? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: So, does btrfs check lowmem take days? weeks?
On Fri, Jun 29, 2018 at 12:09:54PM +0500, Roman Mamedov wrote: > On Thu, 28 Jun 2018 23:59:03 -0700 > Marc MERLIN wrote: > > > I don't waste a week recreating the many btrfs send/receive relationships. > > Consider not using send/receive, and switching to regular rsync instead. > Send/receive is very limiting and cumbersome, including because of what you > described. And it doesn't gain you much over an incremental rsync. As for Err, sorry but I cannot agree with you here, at all :) btrfs send/receive is pretty much the only reason I use btrfs. rsync takes hours on big filesystems scanning every single inode on both sides and then seeing what changed, and only then sends the differences It's super inefficient. btrfs send knows in seconds what needs to be sent, and works on it right away. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: So, does btrfs check lowmem take days? weeks?
On Fri, Jun 29, 2018 at 03:20:42PM +0800, Qu Wenruo wrote: > If certain btrfs specific operations are involved, it's definitely not OK: > 1) Balance > 2) Quota > 3) Btrfs check Ok, I understand. I'll try to balance almost never then. My problems did indeed start because I ran balance and it got stuck 2 days with 0 progress. That still seems like a bug though. I'm ok with slow, but stuck for 2 days with only 270 snapshots or so means there is a bug, or the algorithm is so expensive that 270 snapshots could cause it to take days or weeks to proceed? > > It's a backup server, it only contains data from other machines. > > If the filesystem cannot be recovered to a working state, I will need > > over a week to restart the many btrfs send commands from many servers. > > This is why anything other than --repair is useless ot me, I don't need > > the data back, it's still on the original machines, I need the > > filesystem to work again so that I don't waste a week recreating the > > many btrfs send/receive relationships. > > Now totally understand why you need to repair the fs. I also understand that my use case is atypical :) But I guess this also means that using btrfs for a lot of send/receive on a backup server is not going to work well unfortunately :-/ Now I'm wondering if I'm the only person even doing this. > > Does the pastebin help and is 270 snapshots ok enough? > > The super dump doesn't show anything wrong. > > So the problem may be in the super large extent tree. > > In this case, plain check result with Su's patch would help more, other > than the not so interesting super dump. First I tried to mount with skip balance after the partial repair, and it hung a long time: [445635.716318] BTRFS info (device dm-2): disk space caching is enabled [445635.736229] BTRFS info (device dm-2): has skinny extents [445636.101999] BTRFS info (device dm-2): bdev /dev/mapper/dshelf2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 [445825.053205] BTRFS info (device dm-2): enabling ssd optimizations [446511.006588] BTRFS info (device dm-2): disk space caching is enabled [446511.026737] BTRFS info (device dm-2): has skinny extents [446511.325470] BTRFS info (device dm-2): bdev /dev/mapper/dshelf2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 [446699.593501] BTRFS info (device dm-2): enabling ssd optimizations [446964.077045] INFO: task btrfs-transacti:9211 blocked for more than 120 seconds. [446964.099802] Not tainted 4.17.2-amd64-preempt-sysrq-20180818 #3 [446964.120004] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. So, I rebooted, and will now run Su's btrfs check without repair and report back. Thanks both for your help. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send/receive vs rsync
On Fri, Jun 29, 2018 at 10:04:02AM +0200, Lionel Bouton wrote: > Hi, > > On 29/06/2018 09:22, Marc MERLIN wrote: > > On Fri, Jun 29, 2018 at 12:09:54PM +0500, Roman Mamedov wrote: > >> On Thu, 28 Jun 2018 23:59:03 -0700 > >> Marc MERLIN wrote: > >> > >>> I don't waste a week recreating the many btrfs send/receive relationships. > >> Consider not using send/receive, and switching to regular rsync instead. > >> Send/receive is very limiting and cumbersome, including because of what you > >> described. And it doesn't gain you much over an incremental rsync. As for > > Err, sorry but I cannot agree with you here, at all :) > > > > btrfs send/receive is pretty much the only reason I use btrfs. > > rsync takes hours on big filesystems scanning every single inode on both > > sides and then seeing what changed, and only then sends the differences > > It's super inefficient. > > btrfs send knows in seconds what needs to be sent, and works on it right > > away. > > I've not yet tried send/receive but I feel the pain of rsyncing millions > of files (I had to use lsyncd to limit the problem to the time the > origin servers reboot which is a relatively rare event) so this thread > picked my attention. Looking at the whole thread I wonder if you could > get a more manageable solution by splitting the filesystem. So, let's be clear. I did backups with rsync for 10+ years. It was slow and painful. On my laptop an hourly rsync between 2 drives slowed down my machine to a crawl while everything was being stat'ed, it took forever. Now with btrfs send/receive, it just works, I don't even see it happening in the background. Here is a page I wrote about it in 2014: http://marc.merlins.org/perso/btrfs/2014-03.html#Btrfs-Tips_-Doing-Fast-Incremental-Backups-With-Btrfs-Send-and-Receive Here is a talk I gave in 2014 too, scroll to the bottom of the page, and the bottom of the talk outline: http://marc.merlins.org/perso/btrfs/2014-05.html#My-Btrfs-Talk-at-Linuxcon-JP-2014 and click on 'Btrfs send/receive' > If instead of using a single BTRFS filesystem you used LVM volumes > (maybe with Thin provisioning and monitoring of the volume group free > space) for each of your servers to backup with one BTRFS filesystem per > volume you would have less snapshots per filesystem and isolate problems > in case of corruption. If you eventually decide to start from scratch > again this might help a lot in your case. So, I already have problems due to too many block layers: - raid 5 + ssd - bcache - dmcrypt - btrfs I get occasional deadlocks due to upper layers sending more data to the lower layer (bcache) than it can process. I'm a bit warry of adding yet another layer (LVM), but you're otherwise correct than keeping smaller btrfs filesystems would help with performance and containing possible damage. Has anyone actually done this? :) Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: So, does btrfs check lowmem take days? weeks?
On Fri, Jun 29, 2018 at 12:28:31AM -0700, Marc MERLIN wrote: > So, I rebooted, and will now run Su's btrfs check without repair and > report back. As expected, it will likely still take days, here's the start: gargamel:~# btrfs check --mode=lowmem -p /dev/mapper/dshelf2 Checking filesystem on /dev/mapper/dshelf2 UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, owner: 374857, offset: 3407872) wanted: 2, have: 4 ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, owner: 374857, offset: 3407872) wanted: 2, have: 4 ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, owner: 374857, offset: 114540544) wanted: 180, have: 240 ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, owner: 374857, offset: 126754816) wanted: 67, have: 115 ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, owner: 374857, offset: 126754816) wanted: 67, have: 115 ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, owner: 374857, offset: 131866624) wanted: 114, have: 143 ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, owner: 374857, offset: 131866624) wanted: 114, have: 143 ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, owner: 374857, offset: 148234240) wanted: 301, have: 431 ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, owner: 374857, offset: 148234240) wanted: 355, have: 433 ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, owner: 374857, offset: 180371456) wanted: 160, have: 240 ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, owner: 374857, offset: 180371456) wanted: 161, have: 240 ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, owner: 374857, offset: 192200704) wanted: 169, have: 249 ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, owner: 374857, offset: 192200704) wanted: 171, have: 251 ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, owner: 374857, offset: 217653248) wanted: 347, have: 418 ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, owner: 374857, offset: 235175936) wanted: 1, have: 1449 ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, owner: 374857, offset: 235175936) wanted: 1, have: 1452 Mmmh, these look similar (but not identical) to the last run earlier in this thread: ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, owner: 374857, offset: 3407872) wanted: 3, have: 4 Created new chunk [18457780224000 1073741824] Delete backref in extent [84302495744 69632] ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, owner: 374857, offset: 3407872) wanted: 3, have: 4 Delete backref in extent [84302495744 69632] ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, owner: 374857, offset: 114540544) wanted: 181, have: 240 Delete backref in extent [125712527360 12214272] ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, owner: 374857, offset: 126754816) wanted: 68, have: 115 Delete backref in extent [125730848768 5111808] ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, owner: 374857, offset: 126754816) wanted: 68, have: 115 Delete backref in extent [125730848768 5111808] ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, owner: 374857, offset: 131866624) wanted: 115, have: 143 Delete backref in extent [125736914944 6037504] ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, owner: 374857, offset: 131866624) wanted: 115, have: 143 Delete backref in extent [125736914944 6037504] ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, owner: 374857, offset: 148234240) wanted: 302, have: 431 Delete backref in extent [129952120832 20242432] ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, owner: 374857, offset: 148234240) wanted: 356, have: 433 Delete backref in extent [129952120832 20242432] ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, owner: 374857, offset: 180371456) wanted: 161, have: 240 Delete backref in extent [134925357056 11829248] ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, owner: 374857, offset: 180371456) wanted: 162, have: 240 Delete backref in extent [134925357056 11829248] ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, owner: 374857, offset: 192200704) wanted: 170, have: 249 Delete backref in extent [147895111680 12345344] ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, owner: 374857, offset: 192200704) wanted: 172, have: 251 Delete b
Re: So, does btrfs check lowmem take days? weeks?
Well, there goes that. After about 18H: ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, owner: 374857, offset: 235175936) wanted: 1, have: 1452 backref.c:466: __add_missing_keys: Assertion `ref->root_id` failed, value 0 btrfs(+0x3a232)[0x56091704f232] btrfs(+0x3ab46)[0x56091704fb46] btrfs(+0x3b9f5)[0x5609170509f5] btrfs(btrfs_find_all_roots+0x9)[0x560917050a45] btrfs(+0x572ff)[0x56091706c2ff] btrfs(+0x60b13)[0x560917075b13] btrfs(cmd_check+0x2634)[0x56091707d431] btrfs(main+0x88)[0x560917027260] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f93aa508561] btrfs(_start+0x2a)[0x560917026dfa] Aborted That's https://github.com/Damenly/btrfs-progs.git Whoops, I didn't use the tmp1 branch, let me try again with that and report back, although the problem above is still going to be there since I think the only difference will be this, correct? https://github.com/Damenly/btrfs-progs/commit/b5851513a12237b3e19a3e71f3ad00b966d25b3a Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: So, does btrfs check lowmem take days? weeks?
On Sat, Jun 30, 2018 at 10:49:07PM +0800, Qu Wenruo wrote: > But the last abort looks pretty possible to be the culprit. > > Would you try to dump the extent tree? > # btrfs inspect dump-tree -t extent | grep -A50 156909494272 Sure, there you go: item 25 key (156909494272 EXTENT_ITEM 55320576) itemoff 14943 itemsize 24 refs 19715 gen 31575 flags DATA item 26 key (156909494272 EXTENT_DATA_REF 571620086735451015) itemoff 14915 itemsize 28 extent data backref root 21641 objectid 374857 offset 235175936 count 1452 item 27 key (156909494272 EXTENT_DATA_REF 1765833482087969671) itemoff 14887 itemsize 28 extent data backref root 23094 objectid 374857 offset 235175936 count 1442 item 28 key (156909494272 EXTENT_DATA_REF 1807626434455810951) itemoff 14859 itemsize 28 extent data backref root 21503 objectid 374857 offset 235175936 count 1454 item 29 key (156909494272 EXTENT_DATA_REF 1879818091602916231) itemoff 14831 itemsize 28 extent data backref root 21462 objectid 374857 offset 235175936 count 1454 item 30 key (156909494272 EXTENT_DATA_REF 3610854505775117191) itemoff 14803 itemsize 28 extent data backref root 23134 objectid 374857 offset 235175936 count 1442 item 31 key (156909494272 EXTENT_DATA_REF 3754675454231458695) itemoff 14775 itemsize 28 extent data backref root 23052 objectid 374857 offset 235175936 count 1442 item 32 key (156909494272 EXTENT_DATA_REF 5060494667839714183) itemoff 14747 itemsize 28 extent data backref root 23174 objectid 374857 offset 235175936 count 1440 item 33 key (156909494272 EXTENT_DATA_REF 5476627808561673095) itemoff 14719 itemsize 28 extent data backref root 22911 objectid 374857 offset 235175936 count 1 item 34 key (156909494272 EXTENT_DATA_REF 6378484416458011527) itemoff 14691 itemsize 28 extent data backref root 23012 objectid 374857 offset 235175936 count 1442 item 35 key (156909494272 EXTENT_DATA_REF 7338474132555182983) itemoff 14663 itemsize 28 extent data backref root 21872 objectid 374857 offset 235175936 count 1 item 36 key (156909494272 EXTENT_DATA_REF 7516565391717970823) itemoff 14635 itemsize 28 extent data backref root 21826 objectid 374857 offset 235175936 count 1452 item 37 key (156909494272 SHARED_DATA_REF 14871537025024) itemoff 14631 itemsize 4 shared data backref count 10 item 38 key (156909494272 SHARED_DATA_REF 14871617568768) itemoff 14627 itemsize 4 shared data backref count 73 item 39 key (156909494272 SHARED_DATA_REF 14871619846144) itemoff 14623 itemsize 4 shared data backref count 59 item 40 key (156909494272 SHARED_DATA_REF 14871623270400) itemoff 14619 itemsize 4 shared data backref count 68 item 41 key (156909494272 SHARED_DATA_REF 14871623532544) itemoff 14615 itemsize 4 shared data backref count 70 item 42 key (156909494272 SHARED_DATA_REF 14871626383360) itemoff 14611 itemsize 4 shared data backref count 76 item 43 key (156909494272 SHARED_DATA_REF 14871635132416) itemoff 14607 itemsize 4 shared data backref count 60 item 44 key (156909494272 SHARED_DATA_REF 14871649533952) itemoff 14603 itemsize 4 shared data backref count 79 item 45 key (156909494272 SHARED_DATA_REF 14871862378496) itemoff 14599 itemsize 4 shared data backref count 70 item 46 key (156909494272 SHARED_DATA_REF 14909667098624) itemoff 14595 itemsize 4 shared data backref count 72 item 47 key (156909494272 SHARED_DATA_REF 14909669720064) itemoff 14591 itemsize 4 shared data backref count 58 item 48 key (156909494272 SHARED_DATA_REF 14909734567936) itemoff 14587 itemsize 4 shared data backref count 73 item 49 key (156909494272 SHARED_DATA_REF 14909920477184) itemoff 14583 itemsize 4 shared data backref count 79 item 50 key (156909494272 SHARED_DATA_REF 14942279335936) itemoff 14579 itemsize 4 shared data backref count 79 item 51 key (156909494272 SHARED_DATA_REF 14942304862208) itemoff 14575 itemsize 4 shared data backref count 72 item 52 key (156909494272 SHARED_DATA_REF 14942348378112) itemoff 14571 itemsize 4 shared data backref count 67 item 53 key (156909494272 SHARED_DATA_REF 14942366138368) itemoff 14567 itemsize 4 shared data backref count 51 item 54 key (156909494272 SHARED_DATA_REF 14942384799744) itemoff 14563 itemsize 4 shared data backref count 64 item 55 key (156909494272 SHARED_DATA_REF 14978234613760) it
Re: Incremental send/receive broken after snapshot restore
Sorry that I missed the beginning of this discussion, but I think this is what I documented here after hitting hte same problem: http://marc.merlins.org/perso/btrfs/post_2018-03-09_Btrfs-Tips_-Rescuing-A-Btrfs-Send-Receive-Relationship.html Marc On Sun, Jul 01, 2018 at 01:03:37AM +0200, Hannes Schweizer wrote: > On Sat, Jun 30, 2018 at 10:02 PM Andrei Borzenkov wrote: > > > > 30.06.2018 21:49, Andrei Borzenkov пишет: > > > 30.06.2018 20:49, Hannes Schweizer пишет: > > ... > > >> > > >> I've tested a few restore methods beforehand, and simply creating a > > >> writeable clone from the restored snapshot does not work for me, eg: > > >> # create some source snapshots > > >> btrfs sub create test_root > > >> btrfs sub snap -r test_root test_snap1 > > >> btrfs sub snap -r test_root test_snap2 > > >> > > >> # send a full and incremental backup to external disk > > >> btrfs send test_snap2 | btrfs receive /run/media/schweizer/external > > >> btrfs sub snap -r test_root test_snap3 > > >> btrfs send -c test_snap2 test_snap3 | btrfs receive > > >> /run/media/schweizer/external > > >> > > >> # simulate disappearing source > > >> btrfs sub del test_* > > >> > > >> # restore full snapshot from external disk > > >> btrfs send /run/media/schweizer/external/test_snap3 | btrfs receive . > > >> > > >> # create writeable clone > > >> btrfs sub snap test_snap3 test_root > > >> > > >> # try to continue with backup scheme from source to external > > >> btrfs sub snap -r test_root test_snap4 > > >> > > >> # this fails!! > > >> btrfs send -c test_snap3 test_snap4 | btrfs receive > > >> /run/media/schweizer/external > > >> At subvol test_snap4 > > >> ERROR: parent determination failed for 2047 > > >> ERROR: empty stream is not considered valid > > >> > > > > > > Yes, that's expected. Incremental stream always needs valid parent - > > > this will be cloned on destination and incremental changes applied to > > > it. "-c" option is just additional sugar on top of it which might reduce > > > size of stream, but in this case (i.e. without "-p") it also attempts to > > > guess parent subvolume for test_snap4 and this fails because test_snap3 > > > and test_snap4 do not have common parent so test_snap3 is rejected as > > > valid parent snapshot. You can restart incremental-forever chain by > > > using explicit "-p" instead: > > > > > > btrfs send -p test_snap3 test_snap4 > > > > > > Subsequent snapshots (test_snap5 etc) will all have common parent with > > > immediate predecessor again so "-c" will work. > > > > > > Note that technically "btrfs send" with single "-c" option is entirely > > > equivalent to "btrfs -p". Using "-p" would have avoided this issue. :) > > > Although this implicit check for common parent may be considered a good > > > thing in this case. > > > > > > P.S. looking at the above, it probably needs to be in manual page for > > > btrfs-send. It took me quite some time to actually understand the > > > meaning of "-p" and "-c" and behavior if they are present. > > > > > ... > > >> > > >> Is there some way to reset the received_uuid of the following snapshot > > >> on online? > > >> ID 258 gen 13742 top level 5 parent_uuid - > > >>received_uuid 6c683d90-44f2-ad48-bb84-e9f241800179 uuid > > >> 46db1185-3c3e-194e-8d19-7456e532b2f3 path diablo > > >> > > > > > > There is no "official" tool but this question came up quite often. > > > Search this list, I believe recently one-liner using python-btrfs was > > > posted. Note that also patch that removes received_uuid when "ro" > > > propery is removed was suggested, hopefully it will be merged at some > > > point. Still I personally consider ability to flip read-only property > > > the very bad thing that should have never been exposed in the first place. > > > > > > > Note that if you remove received_uuid (explicitly or - in the future - > > implicitly) you will not be able to restart incremental send anymore. > > Without received_uuid there will be no way to match source test_snap3 > > with destination test_snap3. So you *must* preserve it and start with > > writable clone. > > > > received_uuid is misnomer. I wish it would be named "content_uuid" or > > "snap_uuid" with semantic > > > > 1. When read-only snapshot of writable volume is created, content_uuid > > is initialized > > > > 2. Read-only snapshot of read-only snapshot inherits content_uuid > > > > 3. destination of "btrfs send" inherits content_uuid > > > > 4. writable snapshot of read-only snapshot clears content_uuid > > > > 5. clearing read-only property clears content_uuid > > > > This would make it more straightforward to cascade and restart > > replication by having single subvolume property to match against. > > Indeed, the current terminology is a bit confusing, and the patch > removing the received_uuid when manually switching ro to false should > definitely be merged. As recommended, I'll simply create a writeable > clone of the restored snapshot and use -p instead of -c when restoring > again (
btrfs check of a raid0?
Howdy, I have a btrfs filesystem made out of 2 devices: [ 75.141414] BTRFS: device label btrfs_space devid 1 transid 429220 /dev/bcache3 [ 75.164745] BTRFS: device label btrfs_space devid 2 transid 429220 /dev/bcache2 One of the 2 devices had a hardware error (not btrfs' fault): [201504.939659] BTRFS error (device bcache3): bdev /dev/bcache2 errs: wr 552, rd 39, flush 1, corrupt 0, gen 0 [201504.995967] BTRFS warning (device bcache3): bcache3 checksum verify failed on 38976 wanted F3019EEA found E6A97DC4 level 0 [201505.032209] BTRFS error (device bcache3): bdev /dev/bcache2 errs: wr 552, rd 40, flush 1, corrupt 0, gen 0 [201505.062447] BTRFS error (device bcache3): parent transid verify failed on 38976 wanted 434763 found 434245 [201600.262142] BTRFS error (device bcache3): bdev /dev/bcache2 errs: wr 552, rd 41, flush 1, corrupt 0, gen 0 I unmounted it, and I'm trying to check the filesystem now. How is it supposed to work when you have multiple devices for a btrfs filesystem? gargamel:~# btrfs check --repair -p /dev/bcache2 enabling repair mode ERROR: mount check: cannot open /dev/bcache2: No such device or address ERROR: could not check mount status: No such device or address gargamel:~# btrfs check --repair -p /dev/bcache3 enabling repair mode ERROR: cannot open device '/dev/bcache3': Device or resource busy ERROR: cannot open file system [205248.299528] BTRFS info (device bcache3): disk space caching is enabled [205248.320335] BTRFS error (device bcache3): Remounting read-write after error is not allowed Yes, rebooting should likely get around the problem, but I'd rather not reboot, I have long running stuff I would rather not stop. Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check of a raid0?
On Sun, Jul 01, 2018 at 01:15:09PM -0600, Chris Murphy wrote: > > How is it supposed to work when you have multiple devices for a btrfs > > filesystem? > > > > gargamel:~# btrfs check --repair -p /dev/bcache2 > > enabling repair mode > > ERROR: mount check: cannot open /dev/bcache2: No such device or address > > ERROR: could not check mount status: No such device or address > > gargamel:~# btrfs check --repair -p /dev/bcache3 > > enabling repair mode > > ERROR: cannot open device '/dev/bcache3': Device or resource busy > > ERROR: cannot open file system > > > > [205248.299528] BTRFS info (device bcache3): disk space caching is enabled > > [205248.320335] BTRFS error (device bcache3): Remounting read-write after > > error is not allowed > > If it's successfully unmounted, I don't understand the error messages > that it can't be opened. Is umount hung? Sounds to me like btrfs check > thinks it's still mounted. I spent more time on this and apparently because the underlying device had a hardware fault (fell off the bus), its dmcrpyt device is still there but not working. In turn, I can't dmsetup rm it because it's in use by bcache which didn't free it, but bcache won't let me free it because it got removed. So, I'm stuck with a reboot in the end, oh well... Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: So, does btrfs check lowmem take days? weeks?
On Thu, Jun 28, 2018 at 11:43:54PM -0700, Marc MERLIN wrote: > On Fri, Jun 29, 2018 at 02:32:44PM +0800, Su Yue wrote: > > > > https://github.com/Damenly/btrfs-progs/tree/tmp1 > > > > > > Not sure if I undertand that you meant, here. > > > > > Sorry for my unclear words. > > Simply speaking, I suggest you to stop current running check. > > Then, clone above branch to compile binary then run > > 'btrfs check --mode=lowmem $dev'. > > I understand, I'll build and try it. > > > > This filesystem is trash to me and will require over a week to rebuild > > > manually if I can't repair it. > > > > Understood your anxiety, a log of check without '--repair' will help > > us to make clear what's wrong with your filesystem. > > Ok, I'll run your new code without repair and report back. It will > likely take over a day though. Well, it got stuck for over a day, and then I had to reboot :( saruman:/var/local/src/btrfs-progs.sy# git remote -v origin https://github.com/Damenly/btrfs-progs.git (fetch) origin https://github.com/Damenly/btrfs-progs.git (push) saruman:/var/local/src/btrfs-progs.sy# git branch master * tmp1 saruman:/var/local/src/btrfs-progs.sy# git pull Already up to date. saruman:/var/local/src/btrfs-progs.sy# make Making all in Documentation make[1]: Nothing to be done for 'all'. However, it still got stuck here: gargamel:~# btrfs check --mode=lowmem -p /dev/mapper/dshelf2 Checking filesystem on /dev/mapper/dshelf2 UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, owner: 374857, offset: 3407872) wanted: 2 have: 3 ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, owner: 374857, offset: 3407872) wanted: 2 have: 4 ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, owner: 374857, offset: 114540544) wan d: 180, have: 181 ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, owner: 374857, offset: 126754816) want : 67, have: 68 ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, owner: 374857, offset: 126754816) want : 67, have: 115 ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, owner: 374857, offset: 131866624) want : 114, have: 115 ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, owner: 374857, offset: 131866624) want : 114, have: 143 ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, owner: 374857, offset: 148234240) wan d: 301, have: 302 ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, owner: 374857, offset: 148234240) wan d: 355, have: 433 ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, owner: 374857, offset: 180371456) wan d: 160, have: 161 ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, owner: 374857, offset: 180371456) wan d: 161, have: 240 ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, owner: 374857, offset: 192200704) wan d: 169, have: 170 ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, owner: 374857, offset: 192200704) wan d: 171, have: 251 ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, owner: 374857, offset: 217653248) wan d: 347, have: 348 ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, owner: 374857, offset: 235175936) wan d: 1, have: 1449 ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, owner: 374857, offset: 235175936) wan d: 1, have: 556 What should I try next? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: So, does btrfs check lowmem take days? weeks?
On Mon, Jul 02, 2018 at 10:02:33AM +0800, Su Yue wrote: > Could you try follow dumps? They shouldn't cost much time. > > #btrfs inspect dump-tree -t 21872 | grep -C 50 "374857 > EXTENT_DATA " > > #btrfs inspect dump-tree -t 22911 | grep -C 50 "374857 > EXTENT_DATA " Ok, that's 29MB, so it doesn't fit on pastebin: http://marc.merlins.org/tmp/dshelf2_inspect.txt Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: So, does btrfs check lowmem take days? weeks?
On Mon, Jul 02, 2018 at 02:22:20PM +0800, Su Yue wrote: > > Ok, that's 29MB, so it doesn't fit on pastebin: > > http://marc.merlins.org/tmp/dshelf2_inspect.txt > > > Sorry Marc. After offline communication with Qu, both > of us think the filesystem is hard to repair. > The filesystem is too large to debug step by step. > Every time check and debug spent is too expensive. > And it already costs serveral days. > > Sadly, I am afarid that you have to recreate filesystem > and reback up your data. :( > > Sorry again and thanks for you reports and patient. I appreciate your help. Honestly I only wanted to help you find why the tools aren't working. Fixing filesystems by hand (and remotely via Email on top of that), is way too time consuming like you said. Is the btrfs design flawed in a way that repair tools just cannot repair on their own? I understand that data can be lost, but I don't understand how the tools just either keep crashing for me, go in infinite loops, or otherwise fail to give me back a stable filesystem, even if some data is missing after that. Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to best segment a big block device in resizeable btrfs filesystems?
Hi Qu, I'll split this part into a new thread: > 2) Don't keep unrelated snapshots in one btrfs. >I totally understand that maintain different btrfs would hugely add >maintenance pressure, but as explains, all snapshots share one >fragile extent tree. Yes, I understand that this is what I should do given what you explained. My main problem is knowing how to segment things so I don't end up with filesystems that are full while others are almost empty :) Am I supposed to put LVM thin volumes underneath so that I can share the same single 10TB raid5? If I do this, I would have software raid 5 < dmcrypt < bcache < lvm < btrfs That's a lot of layers, and that's also starting to make me nervous :) Is there any other way that does not involve me creating smaller block devices for multiple btrfs filesystems and hope that they are the right size because I won't be able to change it later? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: So, does btrfs check lowmem take days? weeks?
Hi Qu, thanks for the detailled and honest answer. A few comments inline. On Mon, Jul 02, 2018 at 10:42:40PM +0800, Qu Wenruo wrote: > For full, it depends. (but for most real world case, it's still flawed) > We have small and crafted images as test cases, which btrfs check can > repair without problem at all. > But such images are *SMALL*, and only have *ONE* type of corruption, > which can represent real world case at all. right, they're just unittest images, I understand. > 1) Too large fs (especially too many snapshots) >The use case (too many snapshots and shared extents, a lot of extents >get shared over 1000 times) is in fact a super large challenge for >lowmem mode check/repair. >It needs O(n^2) or even O(n^3) to check each backref, which hugely >slow the progress and make us hard to locate the real bug. So, the non lowmem version would work better, but it's a problem if it doesn't fit in RAM. I've always considered it a grave bug that btrfs check repair can use so much kernel memory that it will crash the entire system. This should not be possible. While it won't help me here, can btrfs check be improved not to suck all the kernel memory, and ideally even allow using swap space if the RAM is not enough? Is btrfs check regular mode still being maintained? I think it's still better than lowmem, correct? > 2) Corruption in extent tree and our objective is to mount RW >Extent tree is almost useless if we just want to read data. >But when we do any write, we needs it and if it goes wrong even a >tiny bit, your fs could be damaged really badly. > >For other corruption, like some fs tree corruption, we could do >something to discard some corrupted files, but if it's extent tree, >we either mount RO and grab anything we have, or hopes the >almost-never-working --init-extent-tree can work (that's mostly >miracle). I understand that it's the weak point of btrfs, thanks for explaining. > 1) Don't keep too many snapshots. >Really, this is the core. >For send/receive backup, IIRC it only needs the parent subvolume >exists, there is no need to keep the whole history of all those >snapshots. You are correct on history. The reason I keep history is because I may want to recover a file from last week or 2 weeks ago after I finally notice that it's gone. I have terabytes of space on the backup server, so it's easier to keep history there than on the client which may not have enough space to keep a month's worth of history. As you know, back when we did tape backups, we also kept history of at least several weeks (usually several months, but that's too much for btrfs snapshots). >Keep the number of snapshots to minimal does greatly improve the >possibility (both manual patch or check repair) of a successful >repair. >Normally I would suggest 4 hourly snapshots, 7 daily snapshots, 12 >monthly snapshots. I actually have fewer snapshots than this per filesystem, but I backup more than 10 filesystems. If I used as many snapshots as you recommend, that would already be 230 snapshots for 10 filesystems :) Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to best segment a big block device in resizeable btrfs filesystems?
On Mon, Jul 02, 2018 at 12:59:02PM -0400, Austin S. Hemmelgarn wrote: > > Am I supposed to put LVM thin volumes underneath so that I can share > > the same single 10TB raid5? > > Actually, because of the online resize ability in BTRFS, you don't > technically _need_ to use thin provisioning here. It makes the maintenance > a bit easier, but it also adds a much more complicated layer of indirection > than just doing regular volumes. You're right that I can use btrfs resize, but then I still need an LVM device underneath, correct? So, if I have 10 backup targets, I need 10 LVM LVs, I give them 10% each of the full size available (as a guess), and then I'd have to - btrfs resize down one that's bigger than I need - LVM shrink the LV - LVM grow the other LV - LVM resize up the other btrfs and I think LVM resize and btrfs resize are not linked so I have to do them separately and hope to type the right numbers each time, correct? (or is that easier now?) I kind of linked the thin provisioning idea because it's hands off, which is appealing. Any reason against it? > You could (in theory) merge the LVM and software RAID5 layers, though that > may make handling of the RAID5 layer a bit complicated if you choose to use > thin provisioning (for some reason, LVM is unable to do on-line checks and > rebuilds of RAID arrays that are acting as thin pool data or metadata). Does LVM do built in raid5 now? Is it as good/trustworthy as mdadm radi5? But yeah, if it's incompatible with thin provisioning, it's not that useful. > Alternatively, you could increase your array size, remove the software RAID > layer, and switch to using BTRFS in raid10 mode so that you could eliminate > one of the layers, though that would probably reduce the effectiveness of > bcache (you might want to get a bigger cache device if you do this). Sadly that won't work. I have more data than will fit on raid10 Thanks for your suggestions though. Still need to read up on whether I should do thin provisioning, or not. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: So, does btrfs check lowmem take days? weeks?
On Mon, Jul 02, 2018 at 10:33:09PM +0500, Roman Mamedov wrote: > On Mon, 2 Jul 2018 08:19:03 -0700 > Marc MERLIN wrote: > > > I actually have fewer snapshots than this per filesystem, but I backup > > more than 10 filesystems. > > If I used as many snapshots as you recommend, that would already be 230 > > snapshots for 10 filesystems :) > > (...once again me with my rsync :) > > If you didn't use send/receive, you wouldn't be required to keep a separate > snapshot trail per filesystem backed up, one trail of snapshots for the entire > backup server would be enough. Rsync everything to subdirs within one > subvolume, then do timed or event-based snapshots of it. You only need more > than one trail if you want different retention policies for different datasets > (e.g. in my case I have 91 and 31 days). This is exactly how I used to do backups before btrfs. I did cp -al backup.olddate backup.newdate rsync -avSH src/ backup.newdate/ You don't even need snapshots or btrfs anymore. Also, sorry to say, but I have different data retention needs for different backups. Some need to rotate more quickly than others, but if you're using rsync, the method I gave above works fine at any rotation interval you need. It is almost as efficient as btrfs on space, but as I said, the time penalty on all those stats for many files was what killed it for me. If I go back to rsync backups (and I'm really unlikely to), then I'd also go back to ext4. There would be no point in dealing with the complexity and fragility of btrfs anymore. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to best segment a big block device in resizeable btrfs filesystems?
On Mon, Jul 02, 2018 at 02:35:19PM -0400, Austin S. Hemmelgarn wrote: > >I kind of linked the thin provisioning idea because it's hands off, > >which is appealing. Any reason against it? > No, not currently, except that it adds a whole lot more stuff between > BTRFS and whatever layer is below it. That increase in what's being > done adds some overhead (it's noticeable on 7200 RPM consumer SATA > drives, but not on decent consumer SATA SSD's). > > There used to be issues running BTRFS on top of LVM thin targets which > had zero mode turned off, but AFAIK, all of those problems were fixed > long ago (before 4.0). I see, thanks for the heads up. > >Does LVM do built in raid5 now? Is it as good/trustworthy as mdadm > >radi5? > Actually, it uses MD's RAID5 implementation as a back-end. Same for > RAID6, and optionally for RAID0, RAID1, and RAID10. Ok, that makes me feel a bit better :) > >But yeah, if it's incompatible with thin provisioning, it's not that > >useful. > It's technically not incompatible, just a bit of a pain. Last time I > tried to use it, you had to jump through hoops to repair a damaged RAID > volume that was serving as an underlying volume in a thin pool, and it > required keeping the thin pool offline for the entire duration of the > rebuild. Argh, not good :( / thanks for the heads up. > If you do go with thin provisioning, I would encourage you to make > certain to call fstrim on the BTRFS volumes on a semi regular basis so > that the thin pool doesn't get filled up with old unused blocks, That's a very good point/reminder, thanks for that. I guess it's like running on an ssd :) > preferably when you are 100% certain that there are no ongoing writes on > them (trimming blocks on BTRFS gets rid of old root trees, so it's a bit > dangerous to do it while writes are happening). Argh, that will be harder, but I'll try. Given what you said, it sounds like I'll still be best off with separate layers to avoid the rebuild problem you mentioned. So it'll be swraid5 / dmcrypt / bcache / lvm dm thin / btrfs Hopefully that will work well enough. Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to best segment a big block device in resizeable btrfs filesystems?
On Tue, Jul 03, 2018 at 12:51:30AM +, Paul Jones wrote: > You could combine bcache and lvm if you are happy to use dm-cache instead > (which lvm uses). > I use it myself (but without thin provisioning) and it works well. Interesting point. So, I used to use lvm and then lvm2 many years ago until I got tired with its performance, especially as asoon as I took even a single snapshot. But that was a long time ago now, just saying that I'm a bit rusty on LVM itself. That being said, if I have raid5 dm-cache dm-crypt dm-thin That's still 4 block layers under btrfs. Am I any better off using dm-cache instead of bcache, my understanding is that it only replaces one block layer with another one and one codebase with another. Mmmh, a bit of reading shows that dm-cache is now used as lvmcache, which might change things, or not. I'll admit that setting up and maintaining bcache is a bit of a pain, I only used it at the time because it seemed more ready then, but we're a few years later now. So, what do you recommend nowadays, assuming you've used both? (given that it's literally going to take days to recreate my array, I'd rather do it once and the right way the first time :) ) Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to best segment a big block device in resizeable btrfs filesystems?
On Tue, Jul 03, 2018 at 09:37:47AM +0800, Qu Wenruo wrote: > > If I do this, I would have > > software raid 5 < dmcrypt < bcache < lvm < btrfs > > That's a lot of layers, and that's also starting to make me nervous :) > > If you could keep the number of snapshots to minimal (less than 10) for > each btrfs (and the number of send source is less than 5), one big btrfs > may work in that case. Well, we kind of discussed this already. If btrfs falls over if you reach 100 snapshots or so, and it sure seems to in my case, I won't be much better off. Having btrfs check --repair fail because 32GB of RAM is not enough, and it's unable to use swap, is a big deal in my case. You also confirmed that btrfs check lowmem does not scale to filesystems like mine, so this translates into "if regular btrfs check repair can't fit in 32GB, I am completely out of luck if anything happens to the filesystem" You're correct that I could tweak my backups and snapshot rotation to get from 250 or so down to 100, but it seems that I'll just be hoping to avoid the problem by being just under the limit, until I'm not, again, and it'll be too late to do anything it next time I'm in trouble again, putting me back right in the same spot I'm in now. Is all this fair to say, or did I misunderstand? > BTW, IMHO the bcache is not really helping for backup system, which is > more write oriented. That's a good point. So, what I didn't explain is that I still have some old filesystem that do get backed up with rsync instead of btrfs send (going into the same filesystem, but not same subvolume). Because rsync is so painfully slow when it needs to scan both sides before it'll even start doing any work, bcache helps there. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: So, does btrfs check lowmem take days? weeks?
On Mon, Jul 02, 2018 at 06:31:43PM -0600, Chris Murphy wrote: > So the idea behind journaled file systems is that journal replay > enabled mount time "repair" that's faster than an fsck. Already Btrfs > use cases with big, but not huge, file systems makes btrfs check a > problem. Either running out of memory or it takes too long. So already > it isn't scaling as well as ext4 or XFS in this regard. > > So what's the future hold? It seems like the goal is that the problems > must be avoided in the first place rather than to repair them after > the fact. > > Are the problem's Marc is running into understood well enough that > there can eventually be a fix, maybe even an on-disk format change, > that prevents such problems from happening in the first place? > > Or does it make sense for him to be running with btrfs debug or some > subset of btrfs integrity checking mask to try to catch the problems > in the act of them happening? Those are all good questions. To be fair, I cannot claim that btrfs was at fault for whatever filesystem damage I ended up with. It's very possible that it happened due to a flaky Sata card that kicked drives off the bus when it shouldn't have. Sure in theory a journaling filesystem can recover from unexpected power loss and drives dropping off at bad times, but I'm going to guess that btrfs' complexity also means that it has data structures (extent tree?) that need to be updated completely "or else". I'm obviously ok with a filesystem check being necessary to recover in cases like this, afterall I still occasionally have to run e2fsck on ext4 too, but I'm a lot less thrilled with the btrfs situation where basically the repair tools can either completely crash your kernel, or take days and then either get stuck in an infinite loop or hit an algorithm that can't scale if you have too many hardlinks/snapshots. It sounds like there may not be a fix to this problem with the filesystem's design, outside of "do not get there, or else". It would even be useful for btrfs tools to start computing heuristics and output warnings like "you have more than 100 snapshots on this filesystem, this is not recommended, please read http://url/"; Qu, Su, does that sound both reasonable and doable? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to best segment a big block device in resizeable btrfs filesystems?
On Tue, Jul 03, 2018 at 04:26:37AM +, Paul Jones wrote: > I don't have any experience with this, but since it's the internet let me > tell you how I'd do it anyway 😝 That's the spirit :) > raid5 > dm-crypt > lvm (using thin provisioning + cache) > btrfs > > The cache mode on lvm requires you to set up all your volumes first, then > add caching to those volumes last. If you need to modify the volume then > you have to remove the cache, make your changes, then re-add the cache. It > sounds like a pain, but having the cache separate from the data is quite > handy. I'm ok enough with that. > Given you are running a backup server I don't think the cache would > really do much unless you enable writeback mode. If you can split up your > filesystem a bit to the point that btrfs check doesn't OOM that will > seriously help performance as well. Rsync might be feasible again. I'm a bit warry of write caching with the issues I've had. I may do write-through, but not writeback :) But caching helps indeed for my older filesystems that are still backed up via rsync because the source fs is ext4 and not btrfs. Thanks for the suggestions Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: So, does btrfs check lowmem take days? weeks?
On Tue, Jul 03, 2018 at 04:50:48PM +0800, Qu Wenruo wrote: > > It sounds like there may not be a fix to this problem with the filesystem's > > design, outside of "do not get there, or else". > > It would even be useful for btrfs tools to start computing heuristics and > > output warnings like "you have more than 100 snapshots on this filesystem, > > this is not recommended, please read http://url/"; > > This looks pretty doable, but maybe it's better to add some warning at > btrfs progs (both "subvolume snapshot" and "receive"). This is what I meant to say, correct. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: So, does btrfs check lowmem take days? weeks?
On Tue, Jul 03, 2018 at 03:34:45PM -0600, Chris Murphy wrote: > On Tue, Jul 3, 2018 at 2:34 AM, Su Yue wrote: > > > Yes, extent tree is the hardest part for lowmem mode. I'm quite > > confident the tool can deal well with file trees(which records metadata > > about file and directory name, relationships). > > As for extent tree, I have few confidence due to its complexity. > > I have to ask again if there's some metadata integrity mask opion Marc > should use to try to catch the corruption cause in the first place? > > His use case really can't afford either mode of btrfs check. And also > check is only backward looking, it doesn't show what was happening at > the time. And for big file systems, check rapidly doesn't scale at all > anyway. > > And now he's modifying his layout to avoid the problem from happening > again which makes it less likely to catch the cause, and get it fixed. > I think if he's willing to build a kernel with integrity checker > enabled, it should be considered but only if it's likely to reveal why > the problem is happening, even if it can't repair the problem once > it's happened. He's already in that situation so masked integrity > checking is no worse, at least it gives a chance to improve Btrfs > rather than it being a mystery how it got corrupt. Yeah, I'm fine waiting a few more ays with this down and gather data if that helps. But due to the size, a full btrfs image may be a bit larger than we want, not counting some confidential data in some filenames. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: So, does btrfs check lowmem take days? weeks?
On Tue, Jul 03, 2018 at 03:46:59PM -0600, Chris Murphy wrote: > On Tue, Jul 3, 2018 at 2:50 AM, Qu Wenruo wrote: > > > > > > There must be something wrong, however due to the size of the fs, and > > the complexity of extent tree, I can't tell. > > Right, which is why I'm asking if any of the metadata integrity > checker mask options might reveal what's going wrong? > > I guess the big issues are: > a. compile kernel with CONFIG_BTRFS_FS_CHECK_INTEGRITY=y is necessary > b. it can come with a high resource burden depending on the mask and > where the log is being written (write system logs to a different file > system for sure) > c. the granularity offered in the integrity checker might not be enough. > d. might take a while before corruptions are injected before > corruption is noticed and flagged. Back to where I'm at right now. I'm going to delete this filesystem and start over very soon. Tomorrow or the day after. I'm happy to get more data off it if someone wants it for posterity, but I indeed need to recover soon since being with a dead backup server is not a good place to be in :) Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: So, does btrfs check lowmem take days? weeks?
On Tue, Jul 10, 2018 at 09:34:36AM +0800, Qu Wenruo wrote: > Ok, this is where I am now: > WARNING: debug: end of checking extent item[18457780273152 169 1] > type: 176 offset: 2 > checking extent items [18457780273152/18457780273152] > ERROR: errors found in extent allocation tree or chunk allocation > checking fs roots > ERROR: root 17592 EXTENT_DATA[25937109 4096] gap exists, expected: > EXTENT_DATA[25937109 4033] > > The expected end is not even aligned to sectorsize. > > I think there is something wrong. > Dump tree on this INODE would definitely help in this case. > > Marc, would you please try dump using the following command? > > # btrfs ins dump-tree -t 17592 | grep -C 40 25937109 Sure, there you go: gargamel:~# btrfs ins dump-tree -t 17592 /dev/mapper/dshelf2 | grep -C 40 25937109 extent data disk byte 3259370151936 nr 114688 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 144 key (2009526 EXTENT_DATA 1179648) itemoff 7931 itemsize 53 generation 18462 type 1 (regular) extent data disk byte 3259370266624 nr 118784 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 145 key (2009526 EXTENT_DATA 1310720) itemoff 7878 itemsize 53 generation 18462 type 1 (regular) extent data disk byte 3259370385408 nr 118784 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 146 key (2009526 EXTENT_DATA 1441792) itemoff 7825 itemsize 53 generation 18462 type 1 (regular) extent data disk byte 3259370504192 nr 118784 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 147 key (2009526 EXTENT_DATA 1572864) itemoff 7772 itemsize 53 generation 18462 type 1 (regular) extent data disk byte 3259370622976 nr 114688 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 148 key (2009526 EXTENT_DATA 1703936) itemoff 7719 itemsize 53 generation 18462 type 1 (regular) extent data disk byte 3259370737664 nr 118784 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 149 key (2009526 EXTENT_DATA 1835008) itemoff 7666 itemsize 53 generation 18462 type 1 (regular) extent data disk byte 3259370856448 nr 118784 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 150 key (2009526 EXTENT_DATA 1966080) itemoff 7613 itemsize 53 generation 18462 type 1 (regular) extent data disk byte 3259370975232 nr 118784 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 151 key (2009526 EXTENT_DATA 2097152) itemoff 7560 itemsize 53 generation 18462 type 1 (regular) extent data disk byte 3259371094016 nr 114688 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 152 key (2009526 EXTENT_DATA 2228224) itemoff 7507 itemsize 53 generation 18462 type 1 (regular) extent data disk byte 3259371208704 nr 114688 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 153 key (2009526 EXTENT_DATA 2359296) itemoff 7454 itemsize 53 generation 18462 type 1 (regular) extent data disk byte 3259371323392 nr 110592 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 154 key (2009526 EXTENT_DATA 2490368) itemoff 7401 itemsize 53 generation 18462 type 1 (regular) extent data disk byte 3259371433984 nr 114688 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 155 key (2009526 EXTENT_DATA 2621440) itemoff 7348 itemsize 53 generation 18462 type 1 (regular) extent data disk byte 3259371548672 nr 110592 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 156 key (2009526 EXTENT_DATA 2752512) itemoff 7295 itemsize 53 generation 18462 type 1 (regular) extent data disk byte 3259371659264 nr 114688 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 157 key (2009526 EXTENT_DATA 2883584) itemoff 7242 itemsize 53 generation 18462 type 1 (regular) extent data disk byte 3259371773952 nr 106496 extent dat
Re: So, does btrfs check lowmem take days? weeks?
To fill in for the spectators on the list :) Su gave me a modified version of btrfsck lowmem that was able to clean most of my filesystem. It's not a general case solution since it had some hardcoding specific to my filesystem problems, but still a great success. Email quoted below, along with responses to Qu On Tue, Jul 10, 2018 at 09:09:33AM +0800, Qu Wenruo wrote: > > > On 2018年07月10日 01:48, Marc MERLIN wrote: > > Success! > > Well done Su, this is a huge improvement to the lowmem code. It went from > > days to less than 3 hours. > > Awesome work! > > > I'll paste the logs below. > > > > Questions: > > 1) I assume I first need to delete a lot of snapshots. What is the limit in > > your opinion? > > 100? 150? other? > > My personal recommendation is just 20. Not 150, not even 100. I see. Then, I may be forced to recreate multiple filesystems anyway. I have about 25 btrfs send/receive relationships and I have around 10 historical snapshots for each. In the future, can't we segment extents/snapshots per subvolume, making subvolumes mini filesystems within the bigger filesystem? > But snapshot deletion will take time (and it's delayed, you won't know > if something wrong happened just after "btrfs subv delete") and even > require a healthy extent tree. > If all extent tree errors are just false alert, that should not be a big > problem at all. > > > > > 2) my filesystem is somewhat misbalanced. Which balance options do you > > think are safe to use? > > I would recommend to manually check extent tree for BLOCK_GROUP_ITEM, > which will tell how big a block group is and how many space is used. > And gives you an idea on which block group can be relocated. > Then use vrange= to specify exact block group to relocation. > > One example would be: > > # btrfs ins dump-tree -t extent | grep -A1 BLOCK_GROUP_ITEM |\ > tee block_group_dump > > Then the output contains: > item 1 key (13631488 BLOCK_GROUP_ITEM 8388608) itemoff 16206 itemsize 24 > block group used 262144 chunk_objectid 256 flags DATA > > The "13631488" is the bytenr of the block group. > The "8388608" is the length of the block group. > The "262144" is the used bytes of the block group. > > The less used space the higher priority it should be relocated. (and > faster to relocate). > You could write a small script to do it, or there should be some tool to > do the calculation for you. I usually use something simpler: Label: 'btrfs_boot' uuid: e4c1daa8-9c39-4a59-b0a9-86297d397f3b Total devices 1 FS bytes used 30.19GiB devid1 size 79.93GiB used 78.01GiB path /dev/mapper/cryptroot This is bad, I have 30GB of data, but 78 out of 80GB of structures full. This is bad news and recommends a balance, correct? If so, I always struggle as to what value I should give to dusage and musage... > And only relocate one block group each time, to avoid possible problem. > > The last but not the least, it's highly recommend to do the relocation > only after unused snapshots are completely deleted. > (Or it would be super super slow to relocate) Thank you for the advise. Hopefully this hepls someone else too, and maybe someone can write some reallocate helper tool if I don't have the time to do it myself. > > 3) Should I start a scrub now (takes about 1 day) or anything else to > > check that the filesystem is hopefully not damaged anymore? > > I would normally recommend to use btrfs check, but neither mode really > works here. > And scrub only checks csum, doesn't check the internal cross reference > (like content of extent tree). > > Maybe Su could skip the whole extent tree check and let lowmem to check > the fs tree only, with --check-data-csum it should be a better work than > scrub. I will wait to hear back from Su, but I think the current situation is that I still have some problems on my FS, they are just 1) not important enough to block mount rw (now it works again) 2) currently ignored by the modified btrfsck I have, but would cause problems if I used real btrfsck. Correct? > > > > 4) should btrfs check reset the corrupt counter? > > bdev /dev/mapper/dshelf2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 > > for now, should I reset it manually? > > It could be pretty easy to implement if not already implemented. Seems like it's not given that Su's btrfsck --repair ran to completion and I still have corrupt set to '2' :) Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs check lowmem, take 2
Thanks to Su and Qu, I was able to get my filesystem to a point that it's mountable. I then deleted loads of snapshots and I'm down to 26. IT now looks like this: gargamel:~# btrfs fi show /mnt/mnt Label: 'dshelf2' uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d Total devices 1 FS bytes used 12.30TiB devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2 gargamel:~# btrfs fi df /mnt/mnt Data, single: total=13.57TiB, used=12.19TiB System, DUP: total=32.00MiB, used=1.55MiB Metadata, DUP: total=124.50GiB, used=115.62GiB Metadata, single: total=216.00MiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B Problems 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the server, despite my deleting lots of snapshots. Is it because I have too many files then? 2) I tried Su's master git branch for btrfs-progs to try and see how a normal check would go, and I'm stuck on this: gargamel:/var/local/src/btrfs-progs.sy# time ./btrfsck --mode=lowmem --repair /dev/mapper/dshelf2 enabling repair mode WARNING: low-memory mode repair support is only partial Checking filesystem on /dev/mapper/dshelf2 UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d root 18446744073709551607 has a root item with a more recent gen (143376) compared to the found root node (139061) ERROR: failed to repair root items: Invalid argument real75m8.046s user0m14.591s sys 0m52.431s I understand what the message means, I just need to switch to the newer root but honestly I'm not quite sure how to do this from the btrfs-check man page. This didn't work: time ./btrfsck --mode=lowmem --repair --chunk-root=18446744073709551607 /dev/mapper/dshelf2 enabling repair mode WARNING: low-memory mode repair support is only partial WARNING: chunk_root_bytenr 18446744073709551607 is unaligned to 4096, ignore it How do I address the error above? Thanks Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check lowmem, take 2
On Wed, Jul 11, 2018 at 08:53:58AM +0800, Su Yue wrote: > > Problems > > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the > > server, despite my deleting lots of snapshots. > > Is it because I have too many files then? > > > Yes. Original check first gather all infomation about extent tree and > your files in RAM, then process one by one. > But deleting still counts, it does speed lowmem check up. Understood. > > 2) I tried Su's master git branch for btrfs-progs to try and see how > Oh..No... My master branch is still 4.14. The true mater branch is > David's here: > https://github.com/kdave/btrfs-progs > But the master branch has a known bug which I fixed yesterday, please see > the mail. So, if I git sync it now, it should have your fix, and I can run it, correct? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check lowmem, take 2
On Wed, Jul 11, 2018 at 09:08:40AM +0800, Su Yue wrote: > > > On 07/11/2018 08:58 AM, Marc MERLIN wrote: > > On Wed, Jul 11, 2018 at 08:53:58AM +0800, Su Yue wrote: > > > > Problems > > > > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the > > > > server, despite my deleting lots of snapshots. > > > > Is it because I have too many files then? > > > > > > > Yes. Original check first gather all infomation about extent tree and > > > your files in RAM, then process one by one. > > > But deleting still counts, it does speed lowmem check up. > > > > Understood. > > > > > > 2) I tried Su's master git branch for btrfs-progs to try and see how > > > Oh..No... My master branch is still 4.14. The true mater branch is > > > David's here: > > > https://github.com/kdave/btrfs-progs > > > But the master branch has a known bug which I fixed yesterday, please see > > > the mail. > > > > So, if I git sync it now, it should have your fix, and I can run it, > > correct? > > > Yes, please. Ok, I am now running gargamel:~# time btrfs check --mode=lowmem --repair /dev/mapper/dshelf2 using git master from https://github.com/kdave/btrfs-progs I will report back how long it takes with extent tree check and whether it returns clean, or not. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check lowmem, take 2
On Wed, Jul 11, 2018 at 09:58:36AM +0800, Su Yue wrote: > > > On 07/11/2018 09:44 AM, Marc MERLIN wrote: > > On Wed, Jul 11, 2018 at 09:08:40AM +0800, Su Yue wrote: > > > > > > > > > On 07/11/2018 08:58 AM, Marc MERLIN wrote: > > > > On Wed, Jul 11, 2018 at 08:53:58AM +0800, Su Yue wrote: > > > > > > Problems > > > > > > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes > > > > > > the > > > > > > server, despite my deleting lots of snapshots. > > > > > > Is it because I have too many files then? > > > > > > > > > > > Yes. Original check first gather all infomation about extent tree and > > > > > your files in RAM, then process one by one. > > > > > But deleting still counts, it does speed lowmem check up. > > > > > > > > Understood. > > > > > > > > > > 2) I tried Su's master git branch for btrfs-progs to try and see how > > > > > Oh..No... My master branch is still 4.14. The true mater branch is > > > > > David's here: > > > > > https://github.com/kdave/btrfs-progs > > > > > But the master branch has a known bug which I fixed yesterday, please > > > > > see > > > > > the mail. > > > > > > > > So, if I git sync it now, it should have your fix, and I can run it, > > > > correct? > > > > > > > Yes, please. > > > > Ok, I am now running > > gargamel:~# time btrfs check --mode=lowmem --repair /dev/mapper/dshelf2 > > using git master from https://github.com/kdave/btrfs-progs > > > Please stop check, plese. > > The branch 'it' which I mean is > https://github.com/Damenly/btrfs-progs/tree/tmp1 Ok, sorry I thought you said you had pushed your changes to https://github.com/kdave/btrfs-progs yesterday. So, I went back to https://github.com/Damenly/btrfs-progs.git/tmp1 and I'm running it without the extra options you added with hardcoded stuff: gargamel:/var/local/src/btrfs-progs.sy-test# ./btrfsck --mode=lowmem --repair /dev/mapper/dshelf2 Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check lowmem, take 2
On Wed, Jul 11, 2018 at 12:07:05PM +0800, Su Yue wrote: > > So, I went back to https://github.com/Damenly/btrfs-progs.git/tmp1 and > > I'm running it without the extra options you added with hardcoded stuff: > > gargamel:/var/local/src/btrfs-progs.sy-test# ./btrfsck --mode=lowmem > > --repair /dev/mapper/dshelf2 > > > This is okay. Let's wait to see the result. Sadly, it crashes quickly: Starting program: /var/local/src/btrfs-progs.sy-test/btrfs check --mode=lowmem --repair /dev/mapper/dshelf2 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". enabling repair mode WARNING: low-memory mode repair support is only partial Checking filesystem on /dev/mapper/dshelf2 UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d checking extents Program received signal SIGSEGV, Segmentation fault. check_tree_block_backref (fs_info=fs_info@entry=0x55825e10, root_id=root_id@entry=18446744073709551607, bytenr=bytenr@entry=655589376, level=level@entry=1) at check/mode-lowmem.c:3744 3744if (btrfs_header_bytenr(node) != bytenr) { (gdb) bt #0 check_tree_block_backref (fs_info=fs_info@entry=0x55825e10, root_id=root_id@entry=18446744073709551607, bytenr=bytenr@entry=655589376, level=level@entry=1) at check/mode-lowmem.c:3744 #1 0x555cb1f9 in check_extent_item (fs_info=fs_info@entry=0x55825e10, path=path@entry=0x7fffdc60) at check/mode-lowmem.c:4194 #2 0x555d06e9 in check_leaf_items (account_bytes=1, nrefs=0x7fffdb80, path=0x7fffdc60, root=0x558262f0) at check/mode-lowmem.c:4654 #3 walk_down_tree (check_all=1, nrefs=0x7fffdb80, level=, path=0x7fffdc60, root=0x558262f0) at check/mode-lowmem.c:4790 #4 check_btrfs_root (root=root@entry=0x558262f0, check_all=check_all@entry=1) at check/mode-lowmem.c:5114 #5 0x555d144f in check_chunks_and_extents_lowmem (fs_info=fs_info@entry=0x55825e10) at check/mode-lowmem.c:5475 #6 0x555b44b1 in do_check_chunks_and_extents (fs_info=0x55825e10) at check/main.c:8369 #7 cmd_check (argc=, argv=) at check/main.c:9899 #8 0x55567510 in main (argc=4, argv=0x7fffe390) at btrfs.c:302 Would you like anything off gdb? (feel free to Email me directly or point me to an online chat platform you have access to) Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check mode normal still hard crash-hanging systems
On Wed, Jul 11, 2018 at 11:09:56AM -0600, Chris Murphy wrote: > On Tue, Jul 10, 2018 at 12:09 PM, Marc MERLIN wrote: > > Thanks to Su and Qu, I was able to get my filesystem to a point that > > it's mountable. > > I then deleted loads of snapshots and I'm down to 26. > > > > IT now looks like this: > > gargamel:~# btrfs fi show /mnt/mnt > > Label: 'dshelf2' uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d > > Total devices 1 FS bytes used 12.30TiB > > devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2 > > > > gargamel:~# btrfs fi df /mnt/mnt > > Data, single: total=13.57TiB, used=12.19TiB > > System, DUP: total=32.00MiB, used=1.55MiB > > Metadata, DUP: total=124.50GiB, used=115.62GiB > > Metadata, single: total=216.00MiB, used=0.00B > > GlobalReserve, single: total=512.00MiB, used=0.00B > > > > > > Problems > > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the > > server, despite my deleting lots of snapshots. > > Is it because I have too many files then? > > I think originally needs most of metdata in memory. > > I'm not understanding why btrfs check won't use swap like at least > xfs_repair and pretty sure e2fsck will as well. > > Using 128G swap on nvme with original check is still gonna be faster > than lowmem mode. Yeah, that's been also a concern/question of mine all these years, even if Su isn't working on that code, and likely is the wrong person to ask. Personally, my take is that if btrfs wants to be taken seriously, at the very least its fsck tool should not hard crash a system you run it on. (and it really does the worst kind of hard crash I've ever seen, OOM can't trigger fast enough, linux doesn't panic, so it can't self reboot either, it just hard dies and hangs) Maybe David knows? Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs check --repair: ERROR: cannot read chunk root
I have a filesystem on top of md raid5 that got a few problems due to the underlying block layer (bad data cable). The filesystem mounts fine, but had a few issues Scrub runs (I didn't let it finish, it takes a _long_ time) But check --repair won't even run at all: myth:~# btrfs --version btrfs-progs v4.7.3 myth:~# uname -r 4.8.5-ia32-20161028 myth:~# btrfs check -p --repair /dev/mapper/crypt_bcache0 2>&1 | tee /var/spool/repair bytenr mismatch, want=13835462344704, have=0 ERROR: cannot read chunk root Couldn't open file system enabling repair mode myth:~# myth:~# btrfs rescue super-recover -v /dev//mapper/crypt_bcache0 All Devices: Device: id = 1, name = /dev//mapper/crypt_bcache0 Before Recovering: [All good supers]: device name = /dev//mapper/crypt_bcache0 superblock bytenr = 65536 device name = /dev//mapper/crypt_bcache0 superblock bytenr = 67108864 device name = /dev//mapper/crypt_bcache0 superblock bytenr = 274877906944 [All bad supers]: All supers are valid, no need to recover I don't care about the data, it's a backup array, but I'd still like to know if I can recover from this state and do a repair to see how much data got damaged Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check --repair: ERROR: cannot read chunk root
On Mon, Oct 31, 2016 at 09:02:50AM +0800, Qu Wenruo wrote: > Your chunk root is corrupted, and since chunk tree provides the > underlying disk layout, even for single device, so if we failed to read > it, then it will never be able to be mounted. That's the thing though, I can mount the filesystem just fine :) > You could try to use backup chunk root. > > "btrfs inspect-internal dump-super -f" to find the backup chunk root, > and use "btrfs check --chunk-root " to have > another try. Am I doing this right? It doesn't seem to work myth:~# btrfs check -p --repair --chunk-root 13835462344704 /dev/mapper/crypt_bcache0 2>&1 | tee /var/spool/repair2 bytenr mismatch, want=13835462344704, have=0 ERROR: cannot read chunk root Couldn't open file system enabling repair mode myth:~# btrfs inspect-internal dump-super -f /dev/mapper/crypt_bcache0 | less superblock: bytenr=65536, device=/dev/mapper/crypt_bcache0 - csum_type 0 (crc32c) csum_size 4 csum0x3814e4a0 [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid6692cf4c-93d9-438c-ac30-5db6381dc4f2 label DS5 generation 51176 root13845513109504 sys_array_size 129 chunk_root_generation 51135 root_level 1 chunk_root 13835462344704 chunk_root_level1 log_root0 log_root_transid0 log_root_level 0 total_bytes 16002599346176 bytes_used 14584560160768 sectorsize 4096 nodesize16384 leafsize16384 stripesize 4096 root_dir6 num_devices 1 compat_flags0x0 compat_ro_flags 0x0 incompat_flags 0x169 ( MIXED_BACKREF | COMPRESS_LZO | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) cache_generation51176 uuid_tree_generation51176 dev_item.uuid 0cf779be-8e16-4982-b7d7-f8241deea0d1 dev_item.fsid 6692cf4c-93d9-438c-ac30-5db6381dc4f2 [match] dev_item.type 0 dev_item.total_bytes16002599346176 dev_item.bytes_used 14691011133440 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size4096 dev_item.devid 1 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 sys_chunk_array[2048]: item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 13835461197824) chunk length 33554432 owner 2 stripe_len 65536 type SYSTEM|DUP num_stripes 2 stripe 0 devid 1 offset 13500327919616 dev uuid: 0cf779be-8e16-4982-b7d7-f8241deea0d1 stripe 1 devid 1 offset 13500361474048 dev uuid: 0cf779be-8e16-4982-b7d7-f8241deea0d1 backup_roots[4]: backup 0: backup_tree_root: 12801101791232 gen: 51174 level: 1 backup_chunk_root: 13835462344704 gen: 51135 level: 1 backup_extent_root: 12801124352000 gen: 51174 level: 3 backup_fs_root: 10548133724160 gen: 51172 level: 0 backup_dev_root:11125467824128 gen: 51172 level: 1 backup_csum_root: 12801133953024 gen: 51174 level: 3 backup_total_bytes: 16002599346176 backup_bytes_used: 14584560160768 backup_num_devices: 1 backup 1: backup_tree_root: 13842532810752 gen: 51175 level: 1 backup_chunk_root: 13835462344704 gen: 51135 level: 1 backup_extent_root: 13843784695808 gen: 51175 level: 3 backup_fs_root: 10548133724160 gen: 51172 level: 0 backup_dev_root:11125467824128 gen: 51172 level: 1 backup_csum_root: 13842542362624 gen: 51175 level: 3 backup_total_bytes: 16002599346176 backup_bytes_used: 14584560160768 backup_num_devices: 1 backup 2: backup_tree_root: 13845513109504 gen: 51176 level: 1 backup_chunk_root: 13835462344704 gen: 51135 level: 1 backup_extent_root: 13845513191424 gen: 51176 level: 3 backup_fs_root: 10548133724160 gen: 51172 level: 0 backup_dev_root:11125467824128 gen: 51172 level: 1 backup_csum_root: 13852180938752 gen: 51176 level: 3 backup_total_bytes:
Re: btrfs check --repair: ERROR: cannot read chunk root
On Sun, Oct 30, 2016 at 07:06:16PM -0700, Marc MERLIN wrote: > On Mon, Oct 31, 2016 at 09:02:50AM +0800, Qu Wenruo wrote: > > Your chunk root is corrupted, and since chunk tree provides the > > underlying disk layout, even for single device, so if we failed to read > > it, then it will never be able to be mounted. > > That's the thing though, I can mount the filesystem just fine :) Actually, has anyone seen any configuration where the kernel can mount a filesystem without ro, or recovery, it can just mount it read/write and btrfs check --repair can't open it? This kind of sounds like a bug in check --repair IMO. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check --repair: ERROR: cannot read chunk root
On Mon, Oct 31, 2016 at 01:27:56PM +0800, Qu Wenruo wrote: > Would you please dump the following bytes? > That's the chunk root tree block on your disk. > > offset: 13500329066496 length: 16384 > offset: 13500330213376 length: 16384 Sorry for asking, am I doing this wrong? myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32 skip=26367830208 dd: reading `/dev/mapper/crypt_bcache0': Invalid argument 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000401393 s, 0.0 kB/s > According to your fsck error output, I assume btrfs-progs fails to read > the first copy of chunk root, and due to a bug, it doesn't continue to > read 2nd copy. > > While kernel continues to read the 2nd copy and everything goes on. Ah, that would make sense. But from what you're saying, I should be able to do recovery by pointing to the 2nd copy of the chunk root, but somehow I haven't typed the right command to do so yet, correct? Should I try another command offset than btrfs check -p --repair --chunk-root 13835462344704 /dev/mapper/crypt_bcache0 ? Or are you saying the btrfs progs bug causes it to fail to even try to read the 2nd copy of the chunk root even though it was given on the command line? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check --repair: ERROR: cannot read chunk root
On Mon, Oct 31, 2016 at 02:04:10PM +0800, Qu Wenruo wrote: > >Sorry for asking, am I doing this wrong? > >myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32 > >skip=26367830208 > >dd: reading `/dev/mapper/crypt_bcache0': Invalid argument > >0+0 records in > >0+0 records out > >0 bytes (0 B) copied, 0.000401393 s, 0.0 kB/s > > So, the underlying MD RAID5 are complaining about some wrong data, and > refuse to read out. > > It seems that btrfs-progs can't handle read failure? > Maybe dm-error could emulate it. > > And what about the 2nd range? they both fail the same, but I wasn' tsure if I typed the wrong dd command or not. myth:~# btrfs fi df /mnt/mnt Data, single: total=13.22TiB, used=13.19TiB System, DUP: total=32.00MiB, used=1.42MiB Metadata, DUP: total=74.00GiB, used=72.82GiB GlobalReserve, single: total=512.00MiB, used=0.00B myth:~# btrfs fi show Label: 'DS5' uuid: 6692cf4c-93d9-438c-ac30-5db6381dc4f2 Total devices 1 FS bytes used 13.26TiB devid1 size 14.55TiB used 13.36TiB path /dev/mapper/crypt_bcache0 For now, I mounted the filesystem and I'm running scrub on it to see how much damage there is. It will take all night: BTRFS warning (device dm-0): checksum error at logical 27886878720 on dev /dev/mapper/crypt_bcache0, sector 56580096, root 9461, inode 45837, offset 15460089856, length 4096, links 1 (path: system/mlocate/mlocate.db) BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 BTRFS error (device dm-0): unable to fixup (regular) error at logical 27887009792 on dev /dev/mapper/crypt_bcache0 BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 BTRFS error (device dm-0): unable to fixup (regular) error at logical 27886878720 on dev /dev/mapper/crypt_bcache0 BTRFS warning (device dm-0): checksum error at logical 27885961216 on dev /dev/mapper/crypt_bcache0, sector 56578304, root 9461, inode 45837, offset 15459172352, length 4096, links 1 (path: system/mlocate/mlocate.db) BTRFS warning (device dm-0): checksum error at logical 27885830144 on dev /dev/mapper/crypt_bcache0, sector 56578048, root 9461, inode 45837, offset 15459041280, length 4096, links 1 (path: system/mlocate/mlocate.db) BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 BTRFS error (device dm-0): unable to fixup (regular) error at logical 27885830144 on dev /dev/mapper/crypt_bcache0 BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 BTRFS error (device dm-0): unable to fixup (regular) error at logical 27885961216 on dev /dev/mapper/crypt_bcache0 BTRFS warning (device dm-0): checksum error at logical 27887013888 on dev /dev/mapper/crypt_bcache0, sector 56580360, root 9461, inode 45837, offset 15460225024, length 4096, links 1 (path: system/mlocate/mlocate.db) BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0 BTRFS error (device dm-0): unable to fixup (regular) error at logical 27887013888 on dev /dev/mapper/crypt_bcache0 BTRFS warning (device dm-0): checksum error at logical 27885834240 on dev /dev/mapper/crypt_bcache0, sector 56578056, root 9461, inode 45837, offset 15459045376, length 4096, links 1 (path: system/mlocate/mlocate.db) BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0 BTRFS error (device dm-0): unable to fixup (regular) error at logical 27885834240 on dev /dev/mapper/crypt_bcache0 BTRFS warning (device dm-0): checksum error at logical 27887017984 on dev /dev/mapper/crypt_bcache0, sector 56580368, root 9461, inode 45837, offset 15460229120, length 4096, links 1 (path: system/mlocate/mlocate.db) BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0 BTRFS error (device dm-0): unable to fixup (regular) error at logical 27887017984 on dev /dev/mapper/crypt_bcache0 So far, it looks like mnior damage limited to one file, I'll see tomorrow morning after it's done reading the whole array > And further more, all backup chunk root are in facts pointing to current > chunk root, so --chunk-root doesn't work at all. Ah, ok, so there is nothing I can do at the moment until I get a new btrfs-progs, correct? Thanks for your answers Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check --repair: ERROR: cannot read chunk root
On Mon, Oct 31, 2016 at 02:32:53PM +0800, Qu Wenruo wrote: > > > At 10/31/2016 02:25 PM, Marc MERLIN wrote: > >On Mon, Oct 31, 2016 at 02:04:10PM +0800, Qu Wenruo wrote: > >>>Sorry for asking, am I doing this wrong? > >>>myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32 > >>>skip=26367830208 > >>>dd: reading `/dev/mapper/crypt_bcache0': Invalid argument > >>>0+0 records in > >>>0+0 records out > >>>0 bytes (0 B) copied, 0.000401393 s, 0.0 kB/s > >> > >>So, the underlying MD RAID5 are complaining about some wrong data, and > >>refuse to read out. > >> > >>It seems that btrfs-progs can't handle read failure? > >>Maybe dm-error could emulate it. > >> > >>And what about the 2nd range? > > > >they both fail the same, but I wasn' tsure if I typed the wrong dd command > >or not. > > Strange, your command seems OK to me. > > Does it has anything to do with your security setup or something like that? > Or is it related to dm-crypt or bcache? > > > But this reminds me, if dd can't read it, maybe btrfs-progs is the same. > > Maybe only kernel can read dm-crypt device while user space tools can't > access dm-crypt devices directly? It can, it's just the offset seems wrong: myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32 skip=26367830208 dd: reading `/dev/mapper/crypt_bcache0': Invalid argument 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000421662 s, 0.0 kB/s If I divide by 1000, it works: myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32 skip=26367830 32+0 records in 32+0 records out 16384 bytes (16 kB) copied, 0.139005 s, 118 kB/s so that's why I was asking you if I counted the offset wrong. I took the value you asked and divided by 512, but it seems too big 13500329066496 / 512 = 26367830208 Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check --repair: ERROR: cannot read chunk root
On Mon, Oct 31, 2016 at 08:44:12AM +, Hugo Mills wrote: > > Any idea on special dm setup which can make us fail to read out some > > data range? > >I've seen both btrfs check and btrfs dump-super give wrong answers > (particularly, some addresses end up larger than the device, for some > reason) when run on a mounted filesystem. Worth ruling that one out. I just finished running my scrub overnight, and it failed around 10%: [115500.316921] BTRFS error (device dm-0): bad tree block start 8461247125784585065 17619396231168 [115500.332354] BTRFS error (device dm-0): bad tree block start 8461247125784585065 17619396231168 [115500.332626] BTRFS: error (device dm-0) in __btrfs_free_extent:6954: errno=-5 IO failure [115500.332629] BTRFS info (device dm-0): forced readonly [115500.332632] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:2960: errno=-5 IO failure [115500.436002] btrfs_printk: 550 callbacks suppressed [115500.436024] BTRFS warning (device dm-0): Skipping commit of aborted transaction. [115500.436029] BTRFS: error (device dm-0) in cleanup_transaction:1854: errno=-5 IO failure myth:~# ionice -c 3 nice -10 btrfs scrub start -Bd /mnt/mnt (...) scrub device /dev/mapper/crypt_bcache0 (id 1) canceled scrub started at Sun Oct 30 22:52:59 2016 and was aborted after 09:03:11 total bytes scrubbed: 1.15TiB with 512 errors error details: csum=512 corrected errors: 0, uncorrectable errors: 512, unverified errors: 0 Am I correct that if I see "__btrfs_free_extent:6954: errno=-5 IO failure" it means that btrfs had physical read errors from the underlying block layer? Do I have some weird mismatch between the size of my md array and the size of my filesystem (as per dd apparently thinking parts of it are out of bounds?) Yet, the sizes seem to match: myth:~# mdadm --query --detail /dev/md5 /dev/md5: Version : 1.2 Creation Time : Tue Jan 21 10:35:52 2014 Raid Level : raid5 Array Size : 15627542528 (14903.59 GiB 16002.60 GB) Used Dev Size : 3906885632 (3725.90 GiB 4000.65 GB) Raid Devices : 5 Total Devices : 5 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Mon Oct 31 07:56:07 2016 State : clean Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Name : gargamel.svh.merlins.org:5 UUID : ec672af7:a66d9557:2f00d76c:38c9f705 Events : 147992 Number Major Minor RaidDevice State 0 8 970 active sync /dev/sdg1 6 8 1131 active sync /dev/sdh1 2 8 812 active sync /dev/sdf1 3 8 653 active sync /dev/sde1 5 8 494 active sync /dev/sdd1 myth:~# btrfs fi df /mnt/mnt Data, single: total=13.22TiB, used=13.19TiB System, DUP: total=32.00MiB, used=1.42MiB Metadata, DUP: total=75.00GiB, used=72.82GiB GlobalReserve, single: total=512.00MiB, used=6.73MiB Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ signature.asc Description: Digital signature
Re: btrfs check --repair: ERROR: cannot read chunk root
So, I'm willing to wait 2 more days before I wipe this filesystem and start over if I can't get check --repair to work on it. If you need longer, please let me konw you have an upcoming patch for me to try and I'll wait. Thanks, Marc On Mon, Oct 31, 2016 at 08:04:22AM -0700, Marc MERLIN wrote: > On Mon, Oct 31, 2016 at 08:44:12AM +, Hugo Mills wrote: > > > Any idea on special dm setup which can make us fail to read out some > > > data range? > > > >I've seen both btrfs check and btrfs dump-super give wrong answers > > (particularly, some addresses end up larger than the device, for some > > reason) when run on a mounted filesystem. Worth ruling that one out. > > I just finished running my scrub overnight, and it failed around 10%: > [115500.316921] BTRFS error (device dm-0): bad tree block start > 8461247125784585065 17619396231168 > [115500.332354] BTRFS error (device dm-0): bad tree block start > 8461247125784585065 17619396231168 > [115500.332626] BTRFS: error (device dm-0) in __btrfs_free_extent:6954: > errno=-5 IO failure > [115500.332629] BTRFS info (device dm-0): forced readonly > [115500.332632] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:2960: > errno=-5 IO failure > [115500.436002] btrfs_printk: 550 callbacks suppressed > [115500.436024] BTRFS warning (device dm-0): Skipping commit of aborted > transaction. > [115500.436029] BTRFS: error (device dm-0) in cleanup_transaction:1854: > errno=-5 IO failure > > > myth:~# ionice -c 3 nice -10 btrfs scrub start -Bd /mnt/mnt > (...) > scrub device /dev/mapper/crypt_bcache0 (id 1) canceled > scrub started at Sun Oct 30 22:52:59 2016 and was aborted after > 09:03:11 > total bytes scrubbed: 1.15TiB with 512 errors > error details: csum=512 > corrected errors: 0, uncorrectable errors: 512, unverified errors: 0 > > Am I correct that if I see "__btrfs_free_extent:6954: errno=-5 IO failure" it > means > that btrfs had physical read errors from the underlying block layer? > > Do I have some weird mismatch between the size of my md array and the size of > my filesystem > (as per dd apparently thinking parts of it are out of bounds?) > Yet, the sizes seem to match: > > > myth:~# mdadm --query --detail /dev/md5 > /dev/md5: > Version : 1.2 > Creation Time : Tue Jan 21 10:35:52 2014 > Raid Level : raid5 > Array Size : 15627542528 (14903.59 GiB 16002.60 GB) > Used Dev Size : 3906885632 (3725.90 GiB 4000.65 GB) >Raid Devices : 5 > Total Devices : 5 > Persistence : Superblock is persistent > > Intent Bitmap : Internal > > Update Time : Mon Oct 31 07:56:07 2016 > State : clean > Active Devices : 5 > Working Devices : 5 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 512K > >Name : gargamel.svh.merlins.org:5 >UUID : ec672af7:a66d9557:2f00d76c:38c9f705 > Events : 147992 > > Number Major Minor RaidDevice State >0 8 970 active sync /dev/sdg1 >6 8 1131 active sync /dev/sdh1 >2 8 812 active sync /dev/sdf1 >3 8 653 active sync /dev/sde1 >5 8 494 active sync /dev/sdd1 > > myth:~# btrfs fi df /mnt/mnt > Data, single: total=13.22TiB, used=13.19TiB > System, DUP: total=32.00MiB, used=1.42MiB > Metadata, DUP: total=75.00GiB, used=72.82GiB > GlobalReserve, single: total=512.00MiB, used=6.73MiB > > Thanks, > Marc > -- > "A mouse is a device used to point at the xterm you want to type in" - A.S.R. > Microsoft is to operating systems > what McDonalds is to gourmet > cooking > Home page: http://marc.merlins.org/ -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check --repair: ERROR: cannot read chunk root
On Tue, Nov 01, 2016 at 12:13:38PM +0800, Qu Wenruo wrote: > Would you try to locate the range where we starts to fail to read? > > I still think the root problem is we failed to read the device in user > space. Understood. I'll run this then: myth:~# dd if=/dev/mapper/crypt_bcache0 of=/dev/null bs=1M & [2] 21108 myth:~# while :; do killall -USR1 dd; sleep 1200; done 275+0 records in 274+0 records out 287309824 bytes (287 MB) copied, 7.20248 s, 39.9 MB/s This will take a while to run, I'll report back on how far it goes. Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check --repair: ERROR: cannot read chunk root
On Mon, Oct 31, 2016 at 09:21:40PM -0700, Marc MERLIN wrote: > On Tue, Nov 01, 2016 at 12:13:38PM +0800, Qu Wenruo wrote: > > Would you try to locate the range where we starts to fail to read? > > > > I still think the root problem is we failed to read the device in user > > space. > > Understood. > > I'll run this then: > myth:~# dd if=/dev/mapper/crypt_bcache0 of=/dev/null bs=1M & > [2] 21108 > myth:~# while :; do killall -USR1 dd; sleep 1200; done > 275+0 records in > 274+0 records out > 287309824 bytes (287 MB) copied, 7.20248 s, 39.9 MB/s > > This will take a while to run, I'll report back on how far it goes. Well, turns out you were right. My array is 14TB and dd was only able to copy 8.8TB out of it. I wonder if it's a bug with bcache and source devices that are too big? 8782434271232 bytes (8.8 TB) copied, 214809 s, 40.9 MB/s dd: reading `/dev/mapper/crypt_bcache0': Invalid argument 8388608+0 records in 8388608+0 records out 8796093022208 bytes (8.8 TB) copied, 215197 s, 40.9 MB/s [2]+ Exit 1 dd if=/dev/mapper/crypt_bcache0 of=/dev/null bs=1M What's vexing is that absolutely nothing has been logged in the kernel dmesg buffer about this read error. Basically I have this: sde8:64 0 3.7T 0 └─sde1 8:65 0 3.7T 0 └─md59:50 14.6T 0 └─bcache0252:00 14.6T 0 └─crypt_bcache0 (dm-0) 253:00 14.6T 0 I'll try dd'ing the md5 directly now, but that's going to take another 2 days :( That said, given that almost half the device is not readable from user space for some reason, that would explain why btrfs check is failing. Obviously it can't do its job if it can't read blocks. I'll report back on what I find out with this problem but if you have suggestions on what to look for, let me know :) Thanks. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check --repair: ERROR: cannot read chunk root
On Fri, Nov 04, 2016 at 02:00:43PM +0500, Roman Mamedov wrote: > On Fri, 4 Nov 2016 01:01:13 -0700 > Marc MERLIN wrote: > > > Basically I have this: > > sde8:64 0 3.7T 0 > > └─sde1 8:65 0 3.7T 0 > > └─md59:50 14.6T 0 > > └─bcache0252:00 14.6T 0 > > └─crypt_bcache0 (dm-0) 253:00 14.6T 0 > > > > I'll try dd'ing the md5 directly now, but that's going to take another 2 > > days :( > > > > That said, given that almost half the device is not readable from user space > > for some reason, that would explain why btrfs check is failing. Obviously it > > can't do its job if it can't read blocks. > > I don't see anything to support the notion that "half is unreadable", maybe > just a 512-byte sector is unreadable -- but that would be enough to make > regular dd bail out -- which is why you should be using dd_rescue for this, > not regular dd. Assuming you just want to copy over as much data as possible, > and not simply test if dd fails or not (but in any case dd_rescue at least > would not fail instantly and would tell you precise count of how much is > unreadable). Thanks for the plug on ddrescue, I have used it to rescue drives in the past. Here, however, everything after the 8.8TB mark, is unreadable, so there is nothing to skip. Because the underlying drives are fine, I'm not entirely sure where the issue is although it has to be on the mdadm side and not related to btrfs. And of course the mdadm array shows clean, and I have already disabled the mdadm per drive bad block (mis-)feature which probably is responsible for all the problems I've had here. myth:~# mdadm --examine-badblocks /dev/sd[defgh]1 No bad-blocks list configured on /dev/sdd1 No bad-blocks list configured on /dev/sde1 No bad-blocks list configured on /dev/sdf1 No bad-blocks list configured on /dev/sdg1 No bad-blocks list configured on /dev/sdh1 I'm also still perplexed as to why despite the rear error I'm getting, absolutely nothing is logged in the kernel :-/ I'll pursue that further and post a summary on the thread here if I find something interesting. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?
On Mon, Nov 07, 2016 at 09:11:54AM +0800, Qu Wenruo wrote: > > Well, turns out you were right. My array is 14TB and dd was only able to > > copy 8.8TB out of it. > > > > I wonder if it's a bug with bcache and source devices that are too big? > > At least we know it's not a problem of btrfs-progs. > > And for bcache/soft raid/encryption, unfortunately I'm not familiar with any > of them. > > I would recommend to report it to bcache/mdadm/encryption ML after locating > the layer which returns EINVAL. So, Neil Brown found the problem. myth:/sys/block/md5/md# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190 dd: reading `/dev/md5': Invalid argument 2+0 records in 2+0 records out 2147483648 bytes (2.1 GB) copied, 37.0785 s, 57.9 MB/s myth:/sys/block/md5/md# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190 count=3 iflag=direct 3+0 records in 3+0 records out On Mon, Nov 07, 2016 at 11:16:56AM +1100, NeilBrown wrote: > EINVAL from a read() system call is surprising in this context. > > do_generic_file_read can return it: > if (unlikely(*ppos >= inode->i_sb->s_maxbytes)) > return -EINVAL; > > s_maxbytes will be MAX_LFS_FILESIZE which, on a 32bit system, is > > #define MAX_LFS_FILESIZE(((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1) > > That is 2^(12+31) or 2^43 or 8TB. > > Is this a 32bit system you are using? Such systems can only support > buffered IO up to 8TB. If you use iflags=direct to avoid buffering, you > should get access to the whole device. I am indeed using a 32bit system, and now we know why the kernel can mount and use my filesystem just fine while btrfs check repair fails to deal with it. The filesystem is more than 8TB on a 32bit kernel with 32bit userland. Since iflag=direct fixes the issue with dd, it sounds like something similar could be done for btrfs progs, to support filesystems bigger than 8TB on 32bit systems. However, could you confirm that filesystems more than 8TB are supported by the kernel code itself on 32bit systems? (I think so, but just wanting to make sure) Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs support for filesystems >8TB on 32bit architectures
(sorry for the bad subject line from the mdadm list on the previous mail) On Mon, Nov 07, 2016 at 12:18:10PM +0800, Qu Wenruo wrote: > I'm totally wrong here. > > DirectIO needs the 'buf' parameter of read()/pread() to be 512 bytes > aligned. > > While we are using a lot of stack memory() and normal malloc()/calloc() > allocated memory, which are seldom aligned to 512 bytes. > > So to *workaround* the problem in btrfs-progs, we may need to change any > pread() caller to use aligned memory allocation. > > I really don't think David will accept such huge change for a workdaround... Thanks for looking into it. So basically should we just document that btrfs filesystems past 8TB in size are not supported on 32bit architectures? (as in you can mount them and use them I believe, but you cannot create, or repair them) Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs support for filesystems >8TB on 32bit architectures
On Mon, Nov 07, 2016 at 02:16:37PM +0800, Qu Wenruo wrote: > > > At 11/07/2016 01:36 PM, Marc MERLIN wrote: > > (sorry for the bad subject line from the mdadm list on the previous mail) > > > > On Mon, Nov 07, 2016 at 12:18:10PM +0800, Qu Wenruo wrote: > > > I'm totally wrong here. > > > > > > DirectIO needs the 'buf' parameter of read()/pread() to be 512 bytes > > > aligned. > > > > > > While we are using a lot of stack memory() and normal malloc()/calloc() > > > allocated memory, which are seldom aligned to 512 bytes. > > > > > > So to *workaround* the problem in btrfs-progs, we may need to change any > > > pread() caller to use aligned memory allocation. > > > > > > I really don't think David will accept such huge change for a > > > workdaround... > > > > Thanks for looking into it. > > So basically should we just document that btrfs filesystems past 8TB in > > size are not supported on 32bit architectures? > > (as in you can mount them and use them I believe, but you cannot create, > > or repair them) > > > > Marc > > > Add David to this thread. > > For create, it should be OK. As at create time, we hardly write beyond 3G. > So it won't be a big problem. > > For repair, we do have a possibility that btrfsck can't handle it. > > Anyway, I'd like to see how David thinks what we should do the handle the > problem. Understood. One big thing (for me) I forgot to confirm: 1) btrfs receive 2) btrfs scrub should both be able to work because the IO operations are done directly inside the kernel and not from user space, correct? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs support for filesystems >8TB on 32bit architectures
On Tue, Nov 08, 2016 at 08:35:54AM +0800, Qu Wenruo wrote: > >Understood. One big thing (for me) I forgot to confirm: > >1) btrfs receive > > Unfortunately, receive is completely done in userspace. > Only send works inside kernel. right, I've confirmed that btrfs receive fails. It looks like btrfs balance is also failing, which is more surprising. Isn't that one in the kernel? Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs support for filesystems >8TB on 32bit architectures
On Tue, Nov 08, 2016 at 08:43:34AM +0800, Qu Wenruo wrote: > That's strange, balance is done completely in kernel space. > > Unless we're calling vfs_* function we won't go through the extra check. > > What's the error reported? See below. Note however that is may be because btrfs received messed up the filesystem first. BTRFS info (device dm-0): use zlib compression BTRFS info (device dm-0): disk space caching is enabled BTRFS info (device dm-0): has skinny extents BTRFS info (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 512, gen 0 BTRFS info (device dm-0): detected SSD devices, enabling SSD mode BTRFS info (device dm-0): continuing balance BTRFS info (device dm-0): The free space cache file (1593999097856) is invalid. skip it BTRFS info (device dm-0): The free space cache file (1671308509184) is invalid. skip it BTRFS info (device dm-0): relocating block group 13835461197824 flags 34 [ cut here ] WARNING: CPU: 0 PID: 22825 at fs/btrfs/disk-io.c:520 btree_csum_one_bio.isra.39+0xf7/0x100 Modules linked in: bcache configs rc_hauppauge ir_kbd_i2c cpufreq_userspace cpufreq_powersave cpufreq_conservative autofs4 snd_hda_codec_hdmi joydev snd_hda_codec_realtek snd_hda_codec_generic tuner_simple tuner_types tda9887 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep tda8290 coretemp snd_pcm_oss snd_mixer_oss tuner snd_pcm msp3400 snd_seq_midi snd_seq_midi_event firewire_sbp2 saa7127 snd_rawmidi hwmon_vid dm_crypt dm_mod saa7115 snd_seq bttv hid_generic snd_seq_device snd_timer ehci_pci ivtv tea575x videobuf_dma_sg rc_core videobuf_core input_leds tveeprom cx2341x v4l2_common ehci_hcd videodev media acpi_cpufreq tpm_tis tpm_tis_core gpio_ich snd soundcore tpm psmouse lpc_ich evdev asus_atk0110 serio_raw lp parport raid456 async_raid6_recov async_pq async_xor async_memcpy async_tx multipath usbhid hid sr_mod cdrom sg firewire_ohci firewire_core floppy crc_itu_t i915 atl1 fjes mii uhci_hcd usbcore usb_common CPU: 0 PID: 22825 Comm: kworker/u9:2 Tainted: GW 4.8.5-ia32-20161028 #2 Hardware name: System manufacturer P5E-VM HDMI/P5E-VM HDMI, BIOS 0604 07/16/2008 Workqueue: btrfs-worker-high btrfs_worker_helper 00200286 00200286 d3d81e48 df414827 dfa12da5 d3d81e78 df05677a df9ed884 5929 dfa12da5 0208 df2cf067 0208 f7463fa0 f401a080 d3d81e8c df05684a 0009 d3d81eb4 Call Trace: [] dump_stack+0x58/0x81 [] __warn+0xea/0x110 [] ? btree_csum_one_bio.isra.39+0xf7/0x100 [] warn_slowpath_null+0x2a/0x30 [] btree_csum_one_bio.isra.39+0xf7/0x100 [] __btree_submit_bio_start+0x15/0x20 [] run_one_async_start+0x30/0x40 [] btrfs_scrubparity_helper+0xcd/0x2d0 [] ? run_one_async_free+0x20/0x20 [] btrfs_worker_helper+0xd/0x10 [] process_one_work+0x10b/0x400 [] worker_thread+0x37/0x4b0 [] ? process_one_work+0x400/0x400 [] kthread+0x9b/0xb0 [] ret_from_kernel_thread+0xe/0x24 [] ? kthread_stop+0x100/0x100 ---[ end trace f461faff989bf258 ]--- BTRFS: error (device dm-0) in btrfs_commit_transaction:2232: errno=-5 IO failure (Error while writing out transaction) BTRFS info (device dm-0): forced readonly BTRFS warning (device dm-0): Skipping commit of aborted transaction. [ cut here ] WARNING: CPU: 0 PID: 22318 at fs/btrfs/transaction.c:1854 btrfs_commit_transaction+0x2f5/0xcc0 BTRFS: Transaction aborted (error -5) Modules linked in: bcache configs rc_hauppauge ir_kbd_i2c cpufreq_userspace cpufreq_powersave cpufreq_conservative autofs4 snd_hda_codec_hdmi joydev snd_hda_codec_realtek snd_hda_codec_generic tuner_simple tuner_types tda9887 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep tda8290 coretemp snd_pcm_oss snd_mixer_oss tuner snd_pcm msp3400 snd_seq_midi snd_seq_midi_event firewire_sbp2 saa7127 snd_rawmidi hwmon_vid dm_crypt dm_mod saa7115 snd_seq bttv hid_generic snd_seq_device snd_timer ehci_pci ivtv tea575x videobuf_dma_sg rc_core videobuf_core input_leds tveeprom cx2341x v4l2_common ehci_hcd videodev media acpi_cpufreq tpm_tis tpm_tis_core gpio_ich snd soundcore tpm psmouse lpc_ich evdev asus_atk0110 serio_raw lp parport raid456 async_raid6_recov async_pq async_xor async_memcpy async_tx multipath usbhid hid sr_mod cdrom sg firewire_ohci firewire_core floppy crc_itu_t i915 atl1 fjes mii uhci_hcd usbcore usb_common CPU: 0 PID: 22318 Comm: btrfs-balance Tainted: GW 4.8.5-ia32-20161028 #2 Hardware name: System manufacturer P5E-VM HDMI/P5E-VM HDMI, BIOS 0604 07/16/2008 0286 0286 d74a3ca4 df414827 d74a3ce8 dfa132ab d74a3cd4 df05677a dfa075cc d74a3d04 572e dfa132ab 073e df2d7de5 073e f698dc00 e9173e70 fffb d74a3cf0 df0567db 0009 d74a3ce8 dfa075cc Call Trace: [] dump_stack+0x58/0x81 [] __warn+0xea/0x110 [] ? btrfs_commit_transaction+0x2f5/0xcc0 [] warn_slowpath_fmt+0x3b/0x40 [] btrfs_commit_transaction+0x2f5/0xcc0 [] ? prepare_to_wait_event+0xd0/0xd0 [] prepare_to_r
Re: btrfs support for filesystems >8TB on 32bit architectures
On Tue, Nov 08, 2016 at 09:17:43AM +0800, Qu Wenruo wrote: > > > At 11/08/2016 09:06 AM, Marc MERLIN wrote: > >On Tue, Nov 08, 2016 at 08:43:34AM +0800, Qu Wenruo wrote: > >>That's strange, balance is done completely in kernel space. > >> > >>Unless we're calling vfs_* function we won't go through the extra check. > >> > >>What's the error reported? > > > >See below. Note however that is may be because btrfs received messed up the > >filesystem first. > > If receive can easily screw up the fs, then fsstress can also screw up > btrfs easily. > > So I didn't think that's the case. (Several years ago it's possible) So now I'm even more confused. I put the array back in my 64bit system and check --repair comes back clean, but scrub does not. Is that supposed to be possible? gargamel:~# btrfs check -p --repair /dev/mapper/crypt_bcache2 2>&1 | tee /mnt/dshelf1/other/btrfs2 enabling repair mode Checking filesystem on /dev/mapper/crypt_bcache2 UUID: 6692cf4c-93d9-438c-ac30-5db6381dc4f2 checking extents [.] Fixed 0 roots. cache and super generation don't match, space cache will be invalidated checking fs roots [o] checking csums checking root refs found 14622791987200 bytes used err is 0 total csum bytes: 14200176492 total tree bytes: 78239416320 total fs tree bytes: 59524497408 total extent tree bytes: 3236872192 btree space waste bytes: 10068589919 file data blocks allocated: 18101311373312 referenced 18038641020928 Nov 8 06:55:40 gargamel kernel: [35631.988896] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 513, gen 0 Nov 8 06:55:40 gargamel kernel: [35631.988897] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 514, gen 0 Nov 8 06:55:40 gargamel kernel: [35631.988899] BTRFS warning (device dm-6): checksum error at logical 27885961216 on dev /dev/mapper/crypt_bcache2, sector 56578304, root 9461, inode 45837, offset 15459172352, length 4096, links 1 (path: system/mlocate/mlocate.db) Nov 8 06:55:40 gargamel kernel: [35631.988900] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 515, gen 0 Nov 8 06:55:40 gargamel kernel: [35631.988903] BTRFS warning (device dm-6): checksum error at logical 27887534080 on dev /dev/mapper/crypt_bcache2, sector 56581376, root 9461, inode 45837, offset 15460745216, length 4096, links 1 (path: system/mlocate/mlocate.db) Nov 8 06:55:40 gargamel kernel: [35631.988904] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27887009792 on dev /dev/mapper/crypt_bcache2 Nov 8 06:55:40 gargamel kernel: [35631.988905] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27886878720 on dev /dev/mapper/crypt_bcache2 Nov 8 06:55:40 gargamel kernel: [35631.988906] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 516, gen 0 Nov 8 06:55:40 gargamel kernel: [35631.988907] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27887837184 on dev /dev/mapper/crypt_bcache2 Nov 8 06:55:40 gargamel kernel: [35631.988908] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 517, gen 0 Nov 8 06:55:40 gargamel kernel: [35631.988909] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 518, gen 0 Nov 8 06:55:40 gargamel kernel: [35631.988910] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27885830144 on dev /dev/mapper/crypt_bcache2 Nov 8 06:55:40 gargamel kernel: [35631.988911] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27885961216 on dev /dev/mapper/crypt_bcache2 Nov 8 06:55:40 gargamel kernel: [35631.988912] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27887534080 on dev /dev/mapper/crypt_bcache2 Nov 8 06:55:40 gargamel kernel: [35631.92] BTRFS warning (device dm-6): checksum error at logical 27887403008 on dev /dev/mapper/crypt_bcache2, sector 56581120, root 9461, inode 45837, offset 15460614144, length 4096, links 1 (path: system/mlocate/mlocate.db) Nov 8 06:55:40 gargamel kernel: [35631.95] BTRFS warning (device dm-6): checksum error at logical 27887009792 on dev /dev/mapper/crypt_bcache2, sector 56580352, root 9461, inode 45837, offset 15460220928, length 4096, links 1 (path: system/mlocate/mlocate.db) Nov 8 06:55:40 gargamel kernel: [35631.97] BTRFS warning (device dm-6): checksum error at logical 27886878720 on dev /dev/mapper/crypt_bcache2, sector 56580096, root 9461, inode 45837, offset 15460089856, length 4096, links 1 (path: system/mlocate/mlocate.db) Nov 8 06:55:40 gargamel kernel: [35631.988890] BTRFS warning (device dm-6): checksum error at logical 27887837184 on dev /dev/mapper/crypt_bcache2, sect
Re: btrfs support for filesystems >8TB on 32bit architectures
On Wed, Nov 09, 2016 at 09:50:08AM +0800, Qu Wenruo wrote: > Yeah, quite possible! > > The truth is, current btrfs check only checks: > 1) Metadata >while --check-data-csum option will check data, but still >follow the restriction 3). > 2) Crossing reference of metadata (contents of metadata) > 3) The first good mirror/backup > > So quite a lot of problems can't be detected by btrfs check: > 1) Data corruption (csum mismatch) > 2) 2nd mirror corruption(DUP/RAID0/10) or parity error(RAID5/6) > > For btrfsck to check all mirror and data, you could try out-of-tree > offline scrub patchset: > https://github.com/adam900710/btrfs-progs/tree/fsck_scrub > > Which implements the kernel scrub equivalent in btrfs-progs. I see, thanks for the answer. Note that this is very confusing to the end user. If check --repair returns success, the filesystem should be clean. Hopefully that patchset can be included in btrfs-progs But sure enough, I'm seeing a lot of these: BTRFS warning (device dm-6): checksum error at logical 269783986176 on dev /dev/mapper/crypt_bcache2, sector 529035384, root 16755, inode 1225897, offset 77824, length 4096, links 5 (path: magic/20150624/home/merlin/public_html/rig3/img/thumb800_302_1-Wire.jpg) This is bad because I would expect check --repair to find them all and offer to remove all the corrupted files after giving me a list of what I've lost, or just recompute the checksum to be correct, know the file is now corrupted but "clean" and I have the option of keeping them as is (ok-ish for a video file) or restore them from backup. The worst part with scrub is that I have to find all these files, and then find all the snapshots they're in (maybe 10 or 20) and delete them all, and then some of those snapshots are read only because they are btrfs send source, so I need to destroy those snapshots and lose my btrfs send relationship and am forced to recreate it (maybe 2 to 6 days of syncing over a slow-ish link) When data is corrupted, no solution is perfect, but hopefully check --repair will indeed be able to restore the entire filesystem to a clean state, even if some data must be lost in the process. Thanks for considering. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs support for filesystems >8TB on 32bit architectures
On Tue, Nov 08, 2016 at 06:05:19PM -0800, Marc MERLIN wrote: > On Wed, Nov 09, 2016 at 09:50:08AM +0800, Qu Wenruo wrote: > > Yeah, quite possible! > > > > The truth is, current btrfs check only checks: > > 1) Metadata > >while --check-data-csum option will check data, but still > >follow the restriction 3). > > 2) Crossing reference of metadata (contents of metadata) > > 3) The first good mirror/backup > > > > So quite a lot of problems can't be detected by btrfs check: > > 1) Data corruption (csum mismatch) > > 2) 2nd mirror corruption(DUP/RAID0/10) or parity error(RAID5/6) > > > > For btrfsck to check all mirror and data, you could try out-of-tree > > offline scrub patchset: > > https://github.com/adam900710/btrfs-progs/tree/fsck_scrub > > > > Which implements the kernel scrub equivalent in btrfs-progs. > > I see, thanks for the answer. > Note that this is very confusing to the end user. > If check --repair returns success, the filesystem should be clean. > Hopefully that patchset can be included in btrfs-progs > > But sure enough, I'm seeing a lot of these: > BTRFS warning (device dm-6): checksum error at logical 269783986176 on dev > /dev/mapper/crypt_bcache2, sector 529035384, root 16755, inode 1225897, > offset 77824, length 4096, links 5 (path: > magic/20150624/home/merlin/public_html/rig3/img/thumb800_302_1-Wire.jpg) So, I ran check -repair, then I ran scrub and I deleted all the files that were referenced by pathname and failed scrub. Now I have this: BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785128960 on dev /dev/mapper/crypt_bcache2 BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1545, gen 0 BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785133056 on dev /dev/mapper/crypt_bcache2 BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1546, gen 0 BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785137152 on dev /dev/mapper/crypt_bcache2 BTRFS warning (device dm-6): checksum error at logical 269784580096 on dev /dev/mapper/crypt_bcache2, sector 529036544, root 17564, inode 1225903, offset 16384: path resolving failed with ret=-2 BTRFS warning (device dm-6): checksum error at logical 269784584192 on dev /dev/mapper/crypt_bcache2, sector 529036552, root 17564, inode 1225903, offset 20480: path resolving failed with ret=-2 BTRFS warning (device dm-6): checksum error at logical 269784588288 on dev /dev/mapper/crypt_bcache2, sector 529036560, root 17564, inode 1225903, offset 24576: path resolving failed with ret=-2 BTRFS warning (device dm-6): checksum error at logical 269784592384 on dev /dev/mapper/crypt_bcache2, sector 529036568, root 17564, inode 1225903, offset 28672: path resolving failed with ret=-2 BTRFS warning (device dm-6): checksum error at logical 269784596480 on dev /dev/mapper/crypt_bcache2, sector 529036576, root 17564, inode 1225903, offset 32768: path resolving failed with ret=-2 BTRFS warning (device dm-6): checksum error at logical 269784600576 on dev /dev/mapper/crypt_bcache2, sector 529036584, root 17564, inode 1225903, offset 36864: path resolving failed with ret=-2 BTRFS warning (device dm-6): checksum error at logical 269784604672 on dev /dev/mapper/crypt_bcache2, sector 529036592, root 17564, inode 1225903, offset 40960: path resolving failed with ret=-2 BTRFS warning (device dm-6): checksum error at logical 269784608768 on dev /dev/mapper/crypt_bcache2, sector 529036600, root 17564, inode 1225903, offset 45056: path resolving failed with ret=-2 BTRFS warning (device dm-6): checksum error at logical 269784612864 on dev /dev/mapper/crypt_bcache2, sector 529036608, root 17564, inode 1225903, offset 49152: path resolving failed with ret=-2 How am I supposed to deal with those? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: when btrfs scrub reports errors and btrfs check --repair does not
On Fri, Nov 11, 2016 at 11:55:21AM +0800, Qu Wenruo wrote: > It seems to be orphan inodes. > Btrfs doesn't remove all the contents of an inode at rm time. > It just unlink the inode and put it into a state called orphan inodes.(Can't > be referred from any directory). BTRFS warning (device dm-6): checksum error at logical 269783928832 on dev /dev/mapper/crypt_bcache2, sector 529035272, root 17564, inode 1225897, offset 20480: path resolving failed with ret=-2 BTRFS warning (device dm-6): checksum error at logical 269783932928 on dev /dev/mapper/crypt_bcache2, sector 529035280, root 17564, inode 1225897, offset 24576: path resolving failed with ret=-2 Do you mean I should be using find /mnt/mnt -inum ? Well, how about that, you're right: gargamel:/mnt/mnt/DS2/backup# find /mnt/mnt -inum 1225897 /mnt/mnt/DS2/backup/debian64_rw.20160713_03:21:57/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg So basically the breakage in my filesystem is enough that the backlink from the inode to the pathname is gone? That's not good :-/ > And then free their data extents in next several trans. > > Try to find these inodes using inode number in specified subvolume. > If not found, then they are orphan inodes, nothing to worry. > These wrong data extent will disappear soon or later. > > Or you can use "btrfs fi sync" to make sure orphan inodes are really removed > from tree. So, I ran btrfi fi sync /mnt/mnt, butit returned instantly. scrub after that, still returns: btrfs scrub start -Bd /mnt/mnt BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1793, gen 0 BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785628672 on dev /dev/mapper/crypt_bcache2 BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1794, gen 0 BTRFS error (device dm-6): unable to fixup (regular) error at logical 269784580096 on dev /dev/mapper/crypt_bcache2 BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1795, gen 0 BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785632768 on dev /dev/mapper/crypt_bcache2 BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1796, gen 0 BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785104384 on dev /dev/mapper/crypt_bcache2 BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1797, gen 0 BTRFS error (device dm-6): unable to fixup (regular) error at logical 269784584192 on dev /dev/mapper/crypt_bcache2 BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1798, gen 0 BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785636864 on dev /dev/mapper/crypt_bcache2 BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1799, gen 0 BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785108480 on dev /dev/mapper/crypt_bcache2 BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1800, gen 0 BTRFS error (device dm-6): unable to fixup (regular) error at logical 269784588288 on dev /dev/mapper/crypt_bcache2 BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1801, gen 0 BTRFS error (device dm-6): unable to fixup (regular) error at logical 269784055808 on dev /dev/mapper/crypt_bcache2 BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1802, gen 0 BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785640960 on dev /dev/mapper/crypt_bcache2 What am I supposed to do about these, I'm not even clear where this corruption is located and how to clear it. I understand you're saying that this does not seem to affect any remaining data, but if scrub is not clean, it can't even see what file an inode is linked to, and that inode doesn't get cleaned 2 days later, my filesystem is in a bad state that check --repair should fix, is it not? Yes, I can wipe it and start over, but I'm trying to use this as a learning experience as well as seeing if the tools are working as they should. Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: when btrfs scrub reports errors and btrfs check --repair does not
On Fri, Nov 11, 2016 at 07:17:08PM -0800, Marc MERLIN wrote: > On Fri, Nov 11, 2016 at 11:55:21AM +0800, Qu Wenruo wrote: > > It seems to be orphan inodes. > > Btrfs doesn't remove all the contents of an inode at rm time. > > It just unlink the inode and put it into a state called orphan inodes.(Can't > > be referred from any directory). > > BTRFS warning (device dm-6): checksum error at logical 269783928832 on dev > /dev/mapper/crypt_bcache2, sector 529035272, root 17564, inode 1225897, > offset 20480: path resolving failed with ret=-2 > BTRFS warning (device dm-6): checksum error at logical 269783932928 on dev > /dev/mapper/crypt_bcache2, sector 529035280, root 17564, inode 1225897, > offset 24576: path resolving failed with ret=-2 > > Do you mean I should be using find /mnt/mnt -inum ? > Well, how about that, you're right: > gargamel:/mnt/mnt/DS2/backup# find /mnt/mnt -inum 1225897 > /mnt/mnt/DS2/backup/debian64_rw.20160713_03:21:57/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg > So basically the breakage in my filesystem is enough that the backlink > from the inode to the pathname is gone? That's not good :-/ Mmmn, been doing find -inum, deleting hits, running scrub, and then scrub still fails with more, and now I'm seeing this; gargamel:~# find /mnt/mnt -inum 1225897 /mnt/mnt/DS2/backup/ubuntu_rw.20160713_03:25:42/gandalfthegrey/20100718/var/local/www/Pix/albums/Trips/200509_Malaysia/500_KapalaiIsland/BestOf/33_Diving-Dive5-2_139.jpg /mnt/mnt/DS2/backup/debian64_ro.20160720_02:58:38/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg /mnt/mnt/DS2/backup/debian64_ro.20160720_02:58:38/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y679z6.jpg (...) /mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg /mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y679z6.jpg /mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y81z9.jpg And then I see this: gargamel:~# ls -li /mnt/mnt/DS2/backup/ubuntu_rw.20160713_03:25:42/gandalfthegrey/20100718/var/local/www/Pix/albums/Trips/200509_Malaysia/500_KapalaiIsland/BestOf/33_Diving-Dive5-2_139.jpg /mnt/mnt/DS2/backup/debian64_ro.20160720_02:58:38/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg /mnt/mnt/DS2/backup/debian64_ro.20160720_02:58:38/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y679z6.jpg /mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg /mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y679z6.jpg /mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y81z9.jpg 1225897 -rw-r--r-- 5 merlin merlin 13794 Jan 7 2012 /mnt/mnt/DS2/backup/debian64_ro.20160720_02:58:38/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg 1225898 -rw-r--r-- 5 merlin merlin 13048 Jan 7 2012 /mnt/mnt/DS2/backup/debian64_ro.20160720_02:58:38/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y679z6.jpg 1225897 -rw-r--r-- 5 merlin merlin 13794 Jan 7 2012 /mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg 1225898 -rw-r--r-- 5 merlin merlin 13048 Jan 7 2012 /mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y679z6.jpg 1225913 -rw-r--r-- 5 merlin merlin 15247 Jan 7 2012 /mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y81z9.jpg 1225897 lrwxrwxrwx 1 merlin merlin35 Aug 1 2010 /mnt/mnt/DS2/backup/ubuntu_rw.20160713_03:25:42/gandalfthegrey/20100718/var/local/www/Pix/albums/Trips/200509_Malaysia/500_KapalaiIsland/BestOf/33_Diving-Dive5-2_139.jpg -> ../33_Diving/BestOf/Dive5-2_139.jpg So first: a) find -inum returns some inodes that don't match b) but argh, multiple files (very different) have the same inode number, so finding files by inode number after scrub flagged an inode bad, isn't going to work :( At this point, I'm starting to lose patience (and running out of time), so I'm going to wipe this filesystem after I hear back from you, but basically scrub and repair and still not up to what they should be IMO (as per my previous comment): One should be able to fully repair an unclean filesystem with check --repair, and scrub should give me things I can either fi
Re: when btrfs scrub reports errors and btrfs check --repair does not
On Sun, Nov 13, 2016 at 08:13:29PM +0500, Roman Mamedov wrote: > On Sun, 13 Nov 2016 07:06:30 -0800 > Marc MERLIN wrote: > > > So first: > > a) find -inum returns some inodes that don't match > > b) but argh, multiple files (very different) have the same inode number, so > > finding > > files by inode number after scrub flagged an inode bad, isn't going to work > > :( > > I wonder why do you even need scrub to verify file readability. Just try > reading all files by using e.g. "cfv -Crr", the read errors produced will > point you directly to files which are unreadable, without the need to lookup > them in a backward way via inum. Then just restore those from backups. I could read the files, but we're talking about maybe 100 million files? that would take a while... (and most of them are COW copies of the same physical data), so scrub is _much_ faster. Scrub is also reporting issues not related to files, but data structures it seems, while repair is not fiding them. As for the data, it's a backup device, so I can just wipe it, but again, I'm using this as an example of how I would simply bring a drive back to a clean state, and that's not pretty right now. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 4.8.8, bcache deadlock and hard lockup
+btrfs mailing list, see below why On Tue, Nov 29, 2016 at 12:59:44PM -0800, Eric Wheeler wrote: > On Mon, 27 Nov 2016, Coly Li wrote: > > > > Yes, too many work queues... I guess the locking might be caused by some > > very obscure reference of closure code. I cannot have any clue if I > > cannot find a stable procedure to reproduce this issue. > > > > Hmm, if there is a tool to clone all the meta data of the back end cache > > and whole cached device, there might be a method to replay the oops much > > easier. > > > > Eric, do you have any hint ? > > Note that the backing device doesn't have any metadata, just a superblock. > You can easily dd that off onto some other volume without transferring the > data. By default, data starts at 8k, or whatever you used in `make-bcache > -w`. Ok, Linus helped me find a workaround for this problem: https://lkml.org/lkml/2016/11/29/667 namely: echo 2 > /proc/sys/vm/dirty_ratio echo 1 > /proc/sys/vm/dirty_background_ratio (it's a 24GB system, so the defaults of 20 and 10 were creating too many requests in th buffers) Note that this is only a workaround, not a fix. When I did this and re tried my big copy again, I still got 100+ kernel work queues, but apparently the underlying swraid5 was able to unblock and satisfy the write requests before too many accumulated and crashed the kernel. I'm not a kernel coder, but seems to me that bcache needs a way to throttle incoming requests if there are too many so that it does not end up in a state where things blow up due to too many piled up requests. You should be able to reproduce this by taking 5 spinning rust drives, put raid5 on top, dmcrypt, bcache and hopefully any filesystem (although I used btrfs) and send lots of requests. Actually to be honest, the problems have mostly been happening when I do btrfs scrub and btrfs send/receive which both generate I/O from within the kernel instead of user space. So here, btrfs may be a contributor to the problem too, but while btrfs still trashes my system if I remove the caching device on bcache (and with the default dirty ratio values), it doesn't crash the kernel. I'll start another separate thread with the btrfs folks on how much pressure is put on the system, but on your side it would be good to help ensure that bcache doesn't crash the system altogether if too many requests are allowed to pile up. Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs flooding the I/O subsystem and hanging the machine, with bcache cache turned off
On Wed, Nov 30, 2016 at 08:46:46AM -0800, Marc MERLIN wrote: > +btrfs mailing list, see below why > > Ok, Linus helped me find a workaround for this problem: > https://lkml.org/lkml/2016/11/29/667 > namely: >echo 2 > /proc/sys/vm/dirty_ratio >echo 1 > /proc/sys/vm/dirty_background_ratio > (it's a 24GB system, so the defaults of 20 and 10 were creating too many > requests in th buffers) I'll remove the bcache list on this followup since I want to concentrate here on the fact that btrfs does behave badly with the default dirty_ratio values. As a reminder, it's a btrfs send/receive copy between 2 swraid5 arrays on spinning rust. swraid5 < bcache < dmcrypt < btrfs Copying with btrfs send/receive causes massive hangs on the system. Please see this explanation from Linus on why the workaround was suggested: https://lkml.org/lkml/2016/11/29/667 The hangs that I'm getting with bcache cache turned off (i.e. passthrough) are now very likely only due to btrfs and mess up anything doing file IO that ends up timing out, break USB even as reads time out in the middle of USB requests, interrupts lost, and so forth. All of this mostly went away with Linus' suggestion: echo 2 > /proc/sys/vm/dirty_ratio echo 1 > /proc/sys/vm/dirty_background_ratio But that's hiding the symptom which I think is that btrfs is piling up too many I/O requests during btrfs send/receive and btrfs scrub (probably balance too) and not looking at resulting impact to system health. Is there a way to stop flodding the entire system with I/O and causing so much strain on it? (I realize that if there is a caching layer underneath that just takes requests and says thank you without giving other clues that underneath bad things are happening, it may be hard, but I'm asking anyway :) [10338.968912] perf: interrupt took too long (3927 > 3917), lowering kernel.perf_event_max_sample_rate to 50750 [12971.047705] ftdi_sio ttyUSB15: usb_serial_generic_read_bulk_callback - urb stopped: -32 [17761.122238] usb 4-1.4: USB disconnect, device number 39 [17761.141063] usb 4-1.4: usbfs: USBDEVFS_CONTROL failed cmd hub-ctrl rqt 160 rq 6 len 1024 ret -108 [17761.263252] usb 4-1: reset SuperSpeed USB device number 2 using xhci_hcd [17761.938575] usb 4-1.4: new SuperSpeed USB device number 40 using xhci_hcd [24130.574425] hpet1: lost 2306 rtc interrupts [24156.034950] hpet1: lost 1628 rtc interrupts [24173.314738] hpet1: lost 1104 rtc interrupts [24180.129950] hpet1: lost 436 rtc interrupts [24257.557955] hpet1: lost 4954 rtc interrupts [24267.522656] hpet1: lost 637 rtc interrupts [28034.954435] INFO: task btrfs:5618 blocked for more than 120 seconds. [28034.975471] Tainted: G U 4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #12 [28035.000964] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [28035.025429] btrfs D 91154d33fc70 0 5618 5372 0x0080 [28035.047717] 91154d33fc70 00200246 911842f880c0 9115a4cf01c0 [28035.071020] 91154d33fc58 91154d34 91165493bca0 9115623773f0 [28035.094252] 1000 0001 91154d33fc88 b86cf1a6 [28035.117538] Call Trace: [28035.125791] [] schedule+0x8b/0xa3 [28035.141550] [] btrfs_start_ordered_extent+0xce/0x122 [28035.162457] [] ? wake_up_atomic_t+0x2c/0x2c [28035.180891] [] btrfs_wait_ordered_range+0xa9/0x10d [28035.201723] [] btrfs_truncate+0x40/0x24b [28035.219269] [] btrfs_setattr+0x1da/0x2d7 [28035.237032] [] notify_change+0x252/0x39c [28035.254566] [] do_truncate+0x81/0xb4 [28035.271057] [] vfs_truncate+0xd9/0xf9 [28035.287782] [] do_sys_truncate+0x63/0xa7 [28155.781987] INFO: task btrfs:5618 blocked for more than 120 seconds. [28155.802229] Tainted: G U 4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #12 [28155.827894] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [28155.852479] btrfs D 91154d33fc70 0 5618 5372 0x0080 [28155.874761] 91154d33fc70 00200246 911842f880c0 9115a4cf01c0 [28155.898059] 91154d33fc58 91154d34 91165493bca0 9115623773f0 [28155.921464] 1000 0001 91154d33fc88 b86cf1a6 [28155.944720] Call Trace: [28155.953176] [] schedule+0x8b/0xa3 [28155.968945] [] btrfs_start_ordered_extent+0xce/0x122 [28155.989811] [] ? wake_up_atomic_t+0x2c/0x2c [28156.008195] [] btrfs_wait_ordered_range+0xa9/0x10d [28156.028498] [] btrfs_truncate+0x40/0x24b [28156.046081] [] btrfs_setattr+0x1da/0x2d7 [28156.063621] [] notify_change+0x252/0x39c [28156.081667] [] do_truncate+0x81/0xb4 [28156.098732] [] vfs_truncate+0xd9/0xf9 [28156.115489] [] do_sys_truncate+0x63/0xa7 [28156.133389] [] SyS_truncate+0xe/0x10 [28156.149831] [] do_syscall_64+0x61/0x72 [28156.167179] [] entry_SYSCALL64_slow_path+0x25/0x2