Re: Is stability a joke?

2016-09-11 Thread Marc MERLIN
On Sun, Sep 11, 2016 at 02:39:14PM +0200, Waxhead wrote:
> That is exactly the same reason I don't edit the wiki myself. I could of
> course get it started and hopefully someone will correct what I write, but I
> feel that if I start this off I don't have deep enough knowledge to do a
> proper start. Perhaps I will change my mind about this.

My first edits to the wiki was when I had barely started btrfs myself,
to simply write down answers to questions I had asked on the list and
that were not present on the wiki yet.

You don't have to be 100% right for everything, if something is wrong,
it'll likely bother someone and they'll go edit your changes, which is
more motivation and less work for them and write your changes from
scratch.
You can also add a small disclaimer "to the best of my knowledge",
etc...

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


brtfs on top of dmcrypt with SSD. No corruption iff write cache off?

2012-01-29 Thread Marc MERLIN
Howdy,

I'm considering using brtfs for my new laptop install.

Encryption is however a requirement, and ecryptfs doesn't quite cut it for
me, so that leaves me with dmcrypt which is what I've been using with
ext3/ext4 for years.

https://btrfs.wiki.kernel.org/articles/g/o/t/Gotchas.html 
still states that 
'dm-crypt block devices require write-caching to be turned off on the
underlying HDD'
While the report was for 2.6.33, I'll assume it's still true.


I was considering migrating to a recent 256GB SSD and 3.2.x kernel.

First, I'd like to check if the 'turn off write cache' comment is still
accurate and if it does apply to SSDs too.

Second, I was wondering if anyone is running btrfs over dmcrypt on an SSD
and what the performance is like with write cache turned off (I'm actually
not too sure what the impact is for SSDs considering that writing to flash
can actually be slower than writing to a hard drive).

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: brtfs on top of dmcrypt with SSD. No corruption iff write cache off?

2012-02-01 Thread Marc MERLIN
On Wed, Feb 01, 2012 at 12:56:24PM -0500, Chris Mason wrote:
> > Second, I was wondering if anyone is running btrfs over dmcrypt on an SSD
> > and what the performance is like with write cache turned off (I'm actually
> > not too sure what the impact is for SSDs considering that writing to flash
> > can actually be slower than writing to a hard drive).
> 
> Performance without the cache on is going to vary wildly from one SSD to
> another.  Some really need it to give them nice fat writes while others
> do better on smaller writes.  It's best to just test yours and see.
> 
> With a 3.2 kernel (it really must be 3.2 or higher), both btrfs and dm
> are doing the right thing for barriers.

Thanks for the answer.
Can you confirm that I still must disable write cache on the SSD to avoid
corruption with btrfs on top of dmcrypt, or is there a chance that it just
works now?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: brtfs on top of dmcrypt with SSD. No corruption iff write cache off?

2012-02-12 Thread Marc MERLIN
On Thu, Feb 02, 2012 at 07:27:22AM -0800, Marc MERLIN wrote:
> On Thu, Feb 02, 2012 at 07:42:41AM -0500, Chris Mason wrote:
> > On Wed, Feb 01, 2012 at 07:23:45PM -0800, Marc MERLIN wrote:
> > > On Wed, Feb 01, 2012 at 12:56:24PM -0500, Chris Mason wrote:
> > > > > Second, I was wondering if anyone is running btrfs over dmcrypt on an 
> > > > > SSD
> > > > > and what the performance is like with write cache turned off (I'm 
> > > > > actually
> > > > > not too sure what the impact is for SSDs considering that writing to 
> > > > > flash
> > > > > can actually be slower than writing to a hard drive).
> > > > 
> > > > Performance without the cache on is going to vary wildly from one SSD to
> > > > another.  Some really need it to give them nice fat writes while others
> > > > do better on smaller writes.  It's best to just test yours and see.
> > > > 
> > > > With a 3.2 kernel (it really must be 3.2 or higher), both btrfs and dm
> > > > are doing the right thing for barriers.
> > > 
> > > Thanks for the answer.
> > > Can you confirm that I still must disable write cache on the SSD to avoid
> > > corruption with btrfs on top of dmcrypt, or is there a chance that it just
> > > works now?
> > 
> > No, with 3.2 or higher it is expected to work.  dm-crypt is doing the
> > barriers correctly and as of 3.2 btrfs is sending them down correctly.
> 
> Thanks for confirming, I'll give this a shot.
> (no warranty implied of course :) ).

Actually I had one more question.

I read this page:
http://www.redhat.com/archives/dm-devel/2011-July/msg00042.html

I'm not super clear if with 3.2.5 kernel, I need to pass the special
allow_discards option for brtfs and dm-crypt to be safe together, or whether
they now talk through an API and everything "just works" :)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: brtfs on top of dmcrypt with SSD. No corruption iff write cache off?

2012-02-12 Thread Marc MERLIN
On Mon, Feb 13, 2012 at 12:47:54AM +0100, Milan Broz wrote:
> On 02/12/2012 11:32 PM, Marc MERLIN wrote:
> >Actually I had one more question.
> >
> >I read this page:
> >http://www.redhat.com/archives/dm-devel/2011-July/msg00042.html
> >
> >I'm not super clear if with 3.2.5 kernel, I need to pass the special
> >allow_discards option for brtfs and dm-crypt to be safe together, or 
> >whether
> >they now talk through an API and everything "just works" :)
> 
> If you want discards to be supported in dmcrypt, you have to enable it 
> manually.
> 
> The most comfortable way is just use recent cryptsetup and add
> --allow-discards option to luksOpen command.
> 
> It will be never enabled by default in dmcrypt for security reasons
> http://asalor.blogspot.com/2011/08/trim-dm-crypt-problems.html

Thanks for the answer.
I knew that it created some security problems but I had not yet found the
page you just gave, which effectively states that TRIM isn't actually that
big a win on recent SSDs (I thought it was actually pretty important to use
it on them until now).

Considering that I have a fairly new crucial 256GB SDD, I'm going to assume
that this bit applies to me:
"On the other side, TRIM is usually overrated. Drive itself should keep good
performance even without TRIM, either by using internal garbage collecting
process or by some area reserved for optimal writes handling."

So it sounds like I should just not give the "ssd" mount option to btrfs,
and not worry about TRIM. 
My main concern was to make sure I wasn't risking corruption with btrfs on
top of dm-crypt, which is now true with 3.2.x, and I now understand that it
is true regardless of whether I use --allow-discards in cryptsetup, so I'm
just not going to use it given the warning page you just posted.

Thanks for the answer.
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: brtfs on top of dmcrypt with SSD. No corruption iff write cache off?

2012-02-15 Thread Marc MERLIN
On Wed, Feb 15, 2012 at 10:42:43AM -0500, Calvin Walton wrote:
> On Sun, 2012-02-12 at 16:14 -0800, Marc MERLIN wrote:
> > Considering that I have a fairly new crucial 256GB SDD, I'm going to assume
> > that this bit applies to me:
> > "On the other side, TRIM is usually overrated. Drive itself should keep good
> > performance even without TRIM, either by using internal garbage collecting
> > process or by some area reserved for optimal writes handling."
> > 
> > So it sounds like I should just not give the "ssd" mount option to btrfs,
> > and not worry about TRIM. 
> 
> The 'ssd' option on btrfs is actually completely unrelated to trim
> support. Instead, it changes how blocks are allocated on the device,
> taking advantage of the the improved random read/write speed. The 'ssd'
> option should be autodetected on most SSDs, but I don't know if this is
> handled correctly when you're using dm-crypt. (Btrfs prints a message at
> mount time when it autodetects this.) It shouldn't hurt to leave it.
 
Yes, I found out more after I got my laptop back up (I had limited search
while I was rebuilding it). Thanks for clearing up my improper guess at the
time :)

The good news is that ssd mode is autodetected through dmcrypt:
[   23.130486] device label btrfs_pool1 devid 1 transid 732 
/dev/mapper/cryptroot
[   23.130854] btrfs: disk space caching is enabled
[   23.175547] Btrfs detected SSD devices, enabling SSD mode

> Discard is handled with a separate mount option on btrfs (called
> 'discard'), and is disabled by default even if you have the 'ssd' option
> enabled, because of the negative performance impact it has had on some
> SSDs.

That's what I read up later. It's a bit counter intuitive after all the work
what went into TRIM to then figure out that actually there are more reasons
not to bother with it then to do :)
On the plus side, it means SSDs are getting better and don't need special
code that makes data recovery harder should you ever need it.

I tried updating the wiki pages, because:
https://btrfs.wiki.kernel.org/articles/f/a/q/FAQ_1fe9.html
says nothing about
- trim/discard
- dmcrypt

while
https://btrfs.wiki.kernel.org/articles/g/o/t/Gotchas.html
still states 'btrfs volumes on top of dm-crypt block devices (and possibly
LVM) require write-caching to be turned off on the underlying HDD. Failing
to do so, in the event of a power failure, may result in corruption not yet
handled by btrfs code. (2.6.33) '

I'm happy to fix both pages, but the login link of course doesn't work and
I'm not sure where the canonical copy to edit actually is or if I can get
access.

That said if someone else can fix it too, that's great :)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: brtfs on top of dmcrypt with SSD. No corruption iff write cache off?

2012-02-18 Thread Marc MERLIN
On Sat, Feb 18, 2012 at 01:39:24PM +0100, Martin Steigerwald wrote:
> To Milan Broz: Well now I noticed that you linked to your own blog entry. 

He did not, I'm the one who did.
I asked a bunch of questions since the online docs didn't address them for
me. Some of you answered those for me, I asked access to the wiki and I
updated the wiki to have the information you gave me.

While I have no inherent bias one way or another, obviously I did put some
of your opinions on the wiki :)

> Please do not take my below statements personally - I might have written 
> them a bit harshly. Actually I do not really know whether your statement 
> that TRIM is overrated is correct, but before believing that TRIM does not 
> give much advantage, I would like to see at least some evidence of any 
> sort, cause for me my explaination below that it should make a difference 
> at least seems logical to me.

That sounds like a reasonable request to me.

In the meantime I changed the page to 
'Does Btrfs support TRIM/discard?

"-o discard" is supported, but can have some negative consequences on
performance on some SSDs or at least whether it adds worthwhile performance
is up for debate depending on who you ask, and makes undeletion/recovery
near impossible while being a security problem if you use dm-crypt
underneath (see
http://asalor.blogspot.com/2011/08/trim-dm-crypt-problems.html ), therefore
it is not enabled by default. You are welcome to run your own benchmarks and
post them here, with the caveat that they'll be very SSD firmware specific.'

I'll leave further edits to others ;)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Creating backup snapshots (8 per filesystem) causes No space left on device?

2012-04-15 Thread Marc MERLIN
Howdy,

I have a little script that creates hourly/daily/weekly snapshots on a device 
that
otherwise has plenty of disk space free:
gandalfthegreat:~# df -h | grep cryptroot
/dev/mapper/cryptroot 232G  144G   85G  63% /
/dev/mapper/cryptroot 232G  144G   85G  63% /usr
/dev/mapper/cryptroot 232G  144G   85G  63% /var
/dev/mapper/cryptroot 232G  144G   85G  63% /home
/dev/mapper/cryptroot 232G  144G   85G  63% /tmp
/dev/mapper/cryptroot 232G  144G   85G  63% /mnt/btrfs_pool1

I have kernel 3.3.1.

The FAQ of course talks about the topic:
https://btrfs.wiki.kernel.org/articles/f/a/q/FAQ_1fe9.html

but I can't get the filesystem show command to output anything useful:
gandalfthegreat:~# btrfs filesystem show /dev/mapper/cryptroot 
Btrfs Btrfs v0.19
gandalfthegreat:~# 

and the btrfs df ssems to show that I'm ok:
gandalfthegreat:~# btrfs filesystem df /home
Data: total=169.01GB, used=134.70GB
System, DUP: total=8.00MB, used=28.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=5.88GB, used=4.39GB
Metadata: total=8.00MB, used=0.00
gandalfthegreat:~# 

I read about rebalance but it's a mostly new fliesystem will little churn, and 
I'm 
not anywhere close to full filesystem yet.

So far, when this happened, I've had to delete a set of older snapshots.
This would make sense if I was close to full, but at 63% I'm nowhere that.

Any idea what's going on and how I can debug further and more specifically
what I should capture next time I get a no free space error in userspace?

Thanks,
Marc


gandalfthegreat:/mnt/btrfs_pool1# l
total 4
dr-xr-xr-x 1 root root 2210 Apr 15 08:00 ./
drwxr-xr-x 1 root root  112 Feb 12 17:38 ../
drwxr-xr-x 1 root root   12 Feb 12 17:57 home/
drwxr-xr-x 1 root root   12 Feb 12 17:57 home_daily_20120412_00:01:01/
drwxr-xr-x 1 root root   12 Feb 12 17:57 home_daily_20120413_00:01:02/
drwxr-xr-x 1 root root   12 Feb 12 17:57 home_daily_20120414_00:01:01/
drwxr-xr-x 1 root root   12 Feb 12 17:57 home_daily_20120415_00:01:01/
drwxr-xr-x 1 root root   12 Feb 12 17:57 home_hourly_20120415_06:00:01/
drwxr-xr-x 1 root root   12 Feb 12 17:57 home_hourly_20120415_07:00:01/
drwxr-xr-x 1 root root   12 Feb 12 17:57 home_hourly_20120415_08:00:01/
drwxr-xr-x 1 root root   12 Feb 12 17:57 home_weekly_20120415_00:02:01/
drwxr-xr-x 1 root root  436 Apr  3 07:26 root/
drwxr-xr-x 1 root root  436 Apr  3 07:26 root_daily_20120412_00:01:01/
drwxr-xr-x 1 root root  436 Apr  3 07:26 root_daily_20120414_00:01:01/
drwxr-xr-x 1 root root  436 Apr  3 07:26 root_daily_20120415_00:01:01/
drwxr-xr-x 1 root root  436 Apr  3 07:26 root_hourly_20120415_06:00:01/
drwxr-xr-x 1 root root  436 Apr  3 07:26 root_hourly_20120415_07:00:01/
drwxr-xr-x 1 root root  436 Apr  3 07:26 root_hourly_20120415_08:00:01/
drwxr-xr-x 1 root root  436 Apr  3 07:26 root_weekly_20120415_00:02:01/
drwxrwxrwt 1 root root 7476 Apr 15 08:05 tmp/
drwxrwxrwt 1 root root 7156 Apr 12 00:01 tmp_daily_20120412_00:01:01/
drwxrwxrwt 1 root root 7130 Apr 13 00:01 tmp_daily_20120413_00:01:02/
drwxrwxrwt 1 root root 7236 Apr 14 00:01 tmp_daily_20120414_00:01:01/
drwxrwxrwt 1 root root 7368 Apr 15 00:01 tmp_daily_20120415_00:01:01/
drwxrwxrwt 1 root root 7368 Apr 15 06:00 tmp_hourly_20120415_06:00:01/
drwxrwxrwt 1 root root 7368 Apr 15 07:00 tmp_hourly_20120415_07:00:01/
drwxrwxrwt 1 root root 7476 Apr 15 08:00 tmp_hourly_20120415_08:00:01/
drwxrwxrwt 1 root root 7368 Apr 15 00:02 tmp_weekly_20120415_00:02:01/
drwxr-xr-x 1 root root  206 Mar 31 11:07 usr/
drwxr-xr-x 1 root root  206 Mar 31 11:07 usr_daily_20120412_00:01:01/
drwxr-xr-x 1 root root  206 Mar 31 11:07 usr_daily_20120413_00:01:02/
drwxr-xr-x 1 root root  206 Mar 31 11:07 usr_daily_20120414_00:01:01/
drwxr-xr-x 1 root root  206 Mar 31 11:07 usr_daily_20120415_00:01:01/
drwxr-xr-x 1 root root  206 Mar 31 11:07 usr_hourly_20120415_06:00:01/
drwxr-xr-x 1 root root  206 Mar 31 11:07 usr_hourly_20120415_07:00:01/
drwxr-xr-x 1 root root  206 Mar 31 11:07 usr_hourly_20120415_08:00:01/
drwxr-xr-x 1 root root  206 Mar 31 11:07 usr_weekly_20120415_00:02:01/
drwxr-xr-x 1 root root  130 Feb 12 23:52 var/
drwxr-xr-x 1 root root  130 Feb 12 23:52 var_daily_20120412_00:01:01/
drwxr-xr-x 1 root root  130 Feb 12 23:52 var_daily_20120413_00:01:02/
drwxr-xr-x 1 root root  130 Feb 12 23:52 var_daily_20120414_00:01:01/
drwxr-xr-x 1 root root  130 Feb 12 23:52 var_daily_20120415_00:01:01/
drwxr-xr-x 1 root root  130 Feb 12 23:52 var_hourly_20120415_06:00:01/
drwxr-xr-x 1 root root  130 Feb 12 23:52 var_hourly_20120415_07:00:01/
drwxr-xr-x 1 root root  130 Feb 12 23:52 var_hourly_20120415_08:00:01/
drwxr-xr-x 1 root root  130 Feb 12 23:52 var_weekly_20120415_00:02:01/

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a m

Re: Creating backup snapshots (8 per filesystem) causes No space left on device?

2012-04-15 Thread Marc MERLIN
(replying on list)

On Sun, Apr 15, 2012 at 05:52:05PM +0200, Bart Noordervliet wrote:
> Hi Marc,
> 
> there's a known regression causing early "Out of space"-errors in
> kernel 3.3. A patch for stable has been queued I think, but it's not
> in 3.3.1 yet. So your best bet would be to either downgrade to 3.2 or
> use a 3.4-rc kernel. Otherwise you'd have to apply the patch in
> question yourself. It's been discussed on this list very recently.

I'll watch for 3.3.x updates (I see nothing in 3.3.2 yet), thanks.

Or is it just a matter of reverting this patch?
https://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=5500cdbe14d7435e04f66ff3cfb8ecd8b8e44ebf
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index dc083f5..079e5a1 100644 (file)
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4108,7 +4108,7 @@ static u64 calc_global_metadata_size(struct btrfs_fs_info 
*fs_info)
num_bytes += div64_u64(data_used + meta_used, 50);
 
if (num_bytes * 3 > meta_used)
-   num_bytes = div64_u64(meta_used, 3);
+   num_bytes = div64_u64(meta_used, 3) * 2;
 
return ALIGN(num_bytes, fs_info->extent_root->leafsize << 10);
 }


On Sun, Apr 15, 2012 at 10:19:30AM -0600, cwillu wrote:
> > but I can't get the filesystem show command to output anything useful:
> > gandalfthegreat:~# btrfs filesystem show /dev/mapper/cryptroot
> > Btrfs Btrfs v0.19
> 
> You need to run that as root.
 
That was run as root :)  '#'

Thanks for the replies,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Creating backup snapshots (8 per filesystem) causes No space left on device?

2012-04-17 Thread Marc MERLIN
On Sun, Apr 15, 2012 at 09:27:27AM -0700, Marc MERLIN wrote:
> I'll watch for 3.3.x updates (I see nothing in 3.3.2 yet), thanks.
> 
> Or is it just a matter of reverting this patch?
> https://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=5500cdbe14d7435e04f66ff3cfb8ecd8b8e44ebf
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index dc083f5..079e5a1 100644 (file)
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4108,7 +4108,7 @@ static u64 calc_global_metadata_size(struct 
> btrfs_fs_info *fs_info)
> num_bytes += div64_u64(data_used + meta_used, 50);
>  
> if (num_bytes * 3 > meta_used)
> -   num_bytes = div64_u64(meta_used, 3);
> +   num_bytes = div64_u64(meta_used, 3) * 2;
>  
> return ALIGN(num_bytes, fs_info->extent_root->leafsize << 10);
>  }

After I knew what to look for, I searched the archives some more and
they only seemed to point to this patch.
I have reverted it, but I'm still seeing the same problem on my laptop.

It sounds like I'll have to downgrade back to 3.2.x unless there is
some other patch to revert that I missed.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to change/fix 'Received UUID'

2018-03-05 Thread Marc MERLIN
Howdy,

I did a bunch of copies and moving around subvolumes between disks and
at some point, I did a snapshot dir1/Win_ro.20180205_21:18:31 
dir2/Win_ro.20180205_21:18:31

As a result, I lost the ro flag, and apparently 'Received UUID' which is
now preventing me from restarting the btrfs send/receive.

I changed the snapshot back to 'ro' but that's not enough:

Source:
Name:   Win_ro.20180205_21:18:31
UUID:   23ccf2bd-f494-e348-b34e-1f28486b2540
Parent UUID:-
Received UUID:  3cc327e1-358f-284e-92e2-4e4fde92b16f
Creation time:  2018-02-15 20:14:42 -0800
Subvolume ID:   964
Generation: 4062
Gen at creation:459
Parent ID:  5
Top level ID:   5
Flags:  readonly

Dest:
Name:   Win_ro.20180205_21:18:31
UUID:   a1e8777c-c52b-af4e-9ce2-45ca4d4d2df8
Parent UUID:-
Received UUID:  -
Creation time:  2018-02-17 22:20:25 -0800
Subvolume ID:   94826
Generation: 250714
Gen at creation:250540
Parent ID:  89160
Top level ID:   89160
Flags:  readonly

If I absolutely know that the data is the same on both sides, how do I
either
1) force back in a 'Received UUID' value on the destination
2) force a btrfs receive to work despite the lack of matching 'Received
UUID' 

Yes, I could discard and start over, but my 2nd such subvolume is 8TB,
so I'd really rather not :)

Any ideas?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to change/fix 'Received UUID'

2018-03-05 Thread Marc MERLIN
On Mon, Mar 05, 2018 at 10:38:16PM +0300, Andrei Borzenkov wrote:
> > If I absolutely know that the data is the same on both sides, how do I
> > either
> > 1) force back in a 'Received UUID' value on the destination
> 
> I suppose the most simple is to write small program that does it using
> BTRFS_IOC_SET_RECEIVED_SUBVOL.

Understdood.
Given that I have not worked with the code at all, what is the best 
tool in btrfs progs, to add this to?

btrfstune?
btrfs propery set?
other?

David, is this something you'd be willing to add support for?
(to be honest, it'll be quicker for someone who knows the code to add than
for me, but if no one has the time, I'l see if I can have a shot at it)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to change/fix 'Received UUID'

2018-03-06 Thread Marc MERLIN
On Tue, Mar 06, 2018 at 08:12:15PM +0100, Hans van Kranenburg wrote:
> On 05/03/2018 20:47, Marc MERLIN wrote:
> > On Mon, Mar 05, 2018 at 10:38:16PM +0300, Andrei Borzenkov wrote:
> >>> If I absolutely know that the data is the same on both sides, how do I
> >>> either
> >>> 1) force back in a 'Received UUID' value on the destination
> >>
> >> I suppose the most simple is to write small program that does it using
> >> BTRFS_IOC_SET_RECEIVED_SUBVOL.
> > 
> > Understdood.
> > Given that I have not worked with the code at all, what is the best 
> > tool in btrfs progs, to add this to?
> > 
> > btrfstune?
> > btrfs propery set?
> > other?
> > 
> > David, is this something you'd be willing to add support for?
> > (to be honest, it'll be quicker for someone who knows the code to add than
> > for me, but if no one has the time, I'l see if I can have a shot at it)
> 
> If you want something right now that works, so you can continue doing
> your backups, python-btrfs also has the ioctl, since v9, together with
> an example of using it:
> 
> https://github.com/knorrie/python-btrfs/commit/1ace623f95300ecf581b1182780fd6432a46b24d

Well, I had never heard about it until now, thank you.

I'll see if I can make it work when I get a bit of time.

Dear btrfs-progs folks, this would be great to add to the canonical
btrfs-progs too :)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to change/fix 'Received UUID'

2018-03-07 Thread Marc MERLIN
On Tue, Mar 06, 2018 at 12:02:47PM -0800, Marc MERLIN wrote:
> > https://github.com/knorrie/python-btrfs/commit/1ace623f95300ecf581b1182780fd6432a46b24d
> 
> Well, I had never heard about it until now, thank you.
> 
> I'll see if I can make it work when I get a bit of time.

Sorry, I missed the fact that there was no code to write at all.
gargamel:/var/local/src/python-btrfs/examples# ./set_received_uuid.py 
2afc7a5e-107f-d54b-8929-197b80b70828 31337 1234.5678 
/mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41
Current subvolume information:
  subvol_id: 94887
  received_uuid: ----
  stime: 0.0 (1970-01-01T00:00:00)
  stransid: 0  
  rtime: 0.0 (1970-01-01T00:00:00)
  rtransid: 0  

Setting received subvolume...

Resulting subvolume information:
  subvol_id: 94887
  received_uuid: 2afc7a5e-107f-d54b-8929-197b80b70828
  stime: 1234.5678 (1970-01-01T00:20:34.567800)
  stransid: 31337
  rtime: 1520488877.415709329 (2018-03-08T06:01:17.415709)
  rtransid: 255755

gargamel:/var/local/src/python-btrfs/examples# btrfs property set -ts 
/mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41 ro true


ABORT: btrfs send -p /mnt/btrfs_pool1/Video_ro.20180205_21:05:15 
Video_ro.20180307_22:03:03 |  btrfs receive /mnt/btrfs_bigbackup/DS1//. failed
At subvol Video_ro.20180307_22:03:03
At snapshot Video_ro.20180307_22:03:03
ERROR: cannot find parent subvolume

gargamel:/mnt/btrfs_pool1# btrfs subvolume show 
/mnt/btrfs_pool1/Video_ro.20180220_21\:03\:41/
Video_ro.20180220_21:03:41
Name:   Video_ro.20180220_21:03:41
UUID:   2afc7a5e-107f-d54b-8929-197b80b70828
Parent UUID:e5ec5c1e-6b49-084e-8820-5a8cfaa1b089
Received UUID:  0e220a4f-6426-4745-8399-0da0084f8b23
Creation time:  2018-02-20 21:03:42 -0800
Subvolume ID:   11228
Generation: 4174
Gen at creation:4150
Parent ID:  5
Top level ID:   5
Flags:  readonly
Snapshot(s):
Video_rw.20180220_21:03:41
Video


Wasn't I supposed to set 2afc7a5e-107f-d54b-8929-197b80b70828 onto the 
destination?

Doesn't that look ok now? Is there something else I'm missing?
gargamel:/mnt/btrfs_pool1# btrfs subvolume show 
/mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41
DS1/Video_ro.20180220_21:03:41
Name:   Video_ro.20180220_21:03:41
UUID:   cb4f343c-5e79-7f49-adf0-7ce0b29f23b3
Parent UUID:0e220a4f-6426-4745-8399-0da0084f8b23
Received UUID:  2afc7a5e-107f-d54b-8929-197b80b70828
Creation time:  2018-02-20 21:13:36 -0800
Subvolume ID:   94887
Generation: 250689
Gen at creation:250689
Parent ID:  89160
Top level ID:   89160
Flags:  readonly
Snapshot(s):

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to change/fix 'Received UUID'

2018-03-08 Thread Marc MERLIN
On Thu, Mar 08, 2018 at 09:34:45AM +0300, Andrei Borzenkov wrote:
> 08.03.2018 09:06, Marc MERLIN пишет:
> > On Tue, Mar 06, 2018 at 12:02:47PM -0800, Marc MERLIN wrote:
> >>> https://github.com/knorrie/python-btrfs/commit/1ace623f95300ecf581b1182780fd6432a46b24d
> >>
> >> Well, I had never heard about it until now, thank you.
> >>
> >> I'll see if I can make it work when I get a bit of time.
> > 
> > Sorry, I missed the fact that there was no code to write at all.
> > gargamel:/var/local/src/python-btrfs/examples# ./set_received_uuid.py 
> > 2afc7a5e-107f-d54b-8929-197b80b70828 31337 1234.5678 
> > /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41
> > Current subvolume information:
> >   subvol_id: 94887
> >   received_uuid: ----
> >   stime: 0.0 (1970-01-01T00:00:00)
> >   stransid: 0  
> >   rtime: 0.0 (1970-01-01T00:00:00)
> >   rtransid: 0  
> > 
> > Setting received subvolume...
> > 
> > Resulting subvolume information:
> >   subvol_id: 94887
> >   received_uuid: 2afc7a5e-107f-d54b-8929-197b80b70828
> >   stime: 1234.5678 (1970-01-01T00:20:34.567800)
> >   stransid: 31337
> >   rtime: 1520488877.415709329 (2018-03-08T06:01:17.415709)
> >   rtransid: 255755
> > 
> > gargamel:/var/local/src/python-btrfs/examples# btrfs property set -ts 
> > /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41 ro true
> > 
> > 
> > ABORT: btrfs send -p /mnt/btrfs_pool1/Video_ro.20180205_21:05:15 
> > Video_ro.20180307_22:03:03 |  btrfs receive /mnt/btrfs_bigbackup/DS1//. 
> > failed
> > At subvol Video_ro.20180307_22:03:03
> > At snapshot Video_ro.20180307_22:03:03
> > ERROR: cannot find parent subvolume
> > 
> > gargamel:/mnt/btrfs_pool1# btrfs subvolume show 
> > /mnt/btrfs_pool1/Video_ro.20180220_21\:03\:41/
> > Video_ro.20180220_21:03:41
> 
> Not sure I understand how this subvolume is related. You send
> differences between Video_ro.20180205_21:05:15 and
> Video_ro.20180307_22:03:03, so you need to have (replica of)
> Video_ro.20180205_21:05:15 on destination. How exactly
> Video_ro.20180220_21:03:41 comes in picture here?
 
Sorry, I pasted the wrong thing.
ABORT: btrfs send -p /mnt/btrfs_pool1/Video_ro.20180220_21:03:41 
Video_ro.20180308_07:50:06 |  btrfs receive /mnt/btrfs_bigbackup/DS1//. failed
At subvol Video_ro.20180308_07:50:06
At snapshot Video_ro.20180308_07:50:06
ERROR: cannot find parent subvolume

Same problem basically, just copied the wrong attempt, sorry about that.

Do I need to make sure of more than
DS1/Video_ro.20180220_21:03:41
Received UUID:  2afc7a5e-107f-d54b-8929-197b80b70828

be equal to
Name:   Video_ro.20180220_21:03:41
UUID:   2afc7a5e-107f-d54b-8929-197b80b70828

Thanks,
Marc


> > Name:   Video_ro.20180220_21:03:41
> > UUID:   2afc7a5e-107f-d54b-8929-197b80b70828
> > Parent UUID:e5ec5c1e-6b49-084e-8820-5a8cfaa1b089
> > Received UUID:  0e220a4f-6426-4745-8399-0da0084f8b23>   
> >   Creation time:  2018-02-20 21:03:42 -0800
> > Subvolume ID:   11228
> > Generation: 4174
> > Gen at creation:4150
> > Parent ID:  5
> > Top level ID:   5
> > Flags:  readonly
> > Snapshot(s):
> > Video_rw.20180220_21:03:41
> > Video
> > 
> > 
> > Wasn't I supposed to set 2afc7a5e-107f-d54b-8929-197b80b70828 onto the 
> > destination?
> > 
> > Doesn't that look ok now? Is there something else I'm missing?
> > gargamel:/mnt/btrfs_pool1# btrfs subvolume show 
> > /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41
> > DS1/Video_ro.20180220_21:03:41
> > Name:   Video_ro.20180220_21:03:41
> > UUID:   cb4f343c-5e79-7f49-adf0-7ce0b29f23b3
> > Parent UUID:0e220a4f-6426-4745-8399-0da0084f8b23
> > Received UUID:  2afc7a5e-107f-d54b-8929-197b80b70828
> > Creation time:  2018-02-20 21:13:36 -0800
> > Subvolume ID:   94887
> > Generation: 250689
> > Gen at creation:250689
> > Parent ID:  89160
> > Top level ID:   89160
> > Flags:  readonly
> > Snapshot(s):
> > 
> > Thanks,
> > Marc
> > 
> 
> 

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to change/fix 'Received UUID'

2018-03-08 Thread Marc MERLIN
On Thu, Mar 08, 2018 at 09:36:49PM +0300, Andrei Borzenkov wrote:
> Yes. Your source has Received UUID. In this case btrfs send will
> transmit received UUID instead of subvolume UUID as reference to base
> snapshot. You need to either clear received UUID on source or set
> received UUID on destination to received UUID of source (not to
> subvolume UUID of source).

gargamel:/var/local/src/python-btrfs/examples# ./set_received_uuid.py 
0e220a4f-6426-4745-8399-0da0084f8b23 313
37 1234.5678 /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41
  
Current subvolume information:  
  
  subvol_id: 94887  
  
  received_uuid: 2afc7a5e-107f-d54b-8929-197b80b70828   
  
  stime: 1234.5678 (1970-01-01T00:20:34.567800) 
  
  stransid: 31337   
  
  rtime: 1520488877.415709329 (2018-03-08T06:01:17.415709)  
  
  rtransid: 255755  
  

  
Setting received subvolume...   
  

  
Resulting subvolume information:
  
  subvol_id: 94887  
  
  received_uuid: 0e220a4f-6426-4745-8399-0da0084f8b23   
  
  stime: 1234.5678 (1970-01-01T00:20:34.567800) 
  
  stransid: 31337   
  
  rtime: 1520537034.890253770 (2018-03-08T19:23:54.890254)  
  
  rtransid: 256119  
  

  
gargamel:/var/local/src/python-btrfs/examples# btrfs property set -ts 
/mnt/btrfs_bigbackup/DS1/Video_ro.201802
20_21:03:41 ro true

This worked fine, thank you so much.
I now have an incremental send that is going on and will take a few dozen 
minutes instead
of days for 8TB+ :)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to change/fix 'Received UUID'

2018-03-10 Thread Marc MERLIN
Thanks all for the help again.
I just wrote a blog post to explain the process to others should anyone
need this later.

http://marc.merlins.org/perso/btrfs/post_2018-03-09_Btrfs-Tips_-Rescuing-A-Btrfs-Send-Receive-Relationship.html

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


5.1.21: fs/btrfs/extent-tree.c:7100 __btrfs_free_extent+0x18b/0x921

2019-10-17 Thread Marc MERLIN
This happened almost after a resume from suspend to disk.
First corruption and read only I got a very long time.

Could they be related?

[26062.126505] [ cut here ]
[26062.126524] WARNING: CPU: 7 PID: 12394 at fs/btrfs/extent-tree.c:7100 
__btrfs_free_extent+0x18b/0x921
[26062.126526] Modules linked in: msr ccm ipt_MASQUERADE ipt_REJECT 
nf_reject_ipv4 xt_tcpudp xt_conntrack nf_log_ipv4 nf_log_common xt_LOG 
iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle 
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables 
x_tables bpfilter rfcomm ax25 bnep pci_stub vboxpci(O) vboxnetadp(O) 
vboxnetflt(O) vboxdrv(O) autofs4 binfmt_misc uinput nfsd auth_rpcgss nfs_acl 
nfs lockd grace fscache sunrpc nls_utf8 nls_cp437 vfat fat cuse ecryptfs 
bbswitch(OE) configs input_polldev loop firewire_sbp2 firewire_core crc_itu_t 
ppdev parport_pc lp parport uvcvideo videobuf2_vmalloc videobuf2_memops 
videobuf2_v4l2 videobuf2_common videodev media btusb btrtl btbcm btintel 
bluetooth ecdh_generic hid_generic usbhid hid joydev arc4 coretemp 
x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass 
snd_hda_codec_realtek snd_hda_codec_generic intel_wmi_thunderbolt iTCO_wdt 
wmi_bmof mei_hdcp iTCO_vendor_support rtsx_pci_sdmmc snd_hda_intel
[26062.126561]  snd_hda_codec iwlmvm crct10dif_pclmul snd_hda_core crc32_pclmul 
mac80211 snd_hwdep thinkpad_acpi snd_pcm ghash_clmulni_intel nvram 
ledtrig_audio intel_cstate deflate snd_seq efi_pstore iwlwifi snd_seq_device 
snd_timer intel_rapl_perf psmouse pcspkr efivars wmi hwmon snd ac battery 
cfg80211 mei_me soundcore xhci_pci xhci_hcd rtsx_pci i2c_i801 rfkill sg 
nvidiafb intel_pch_thermal usbcore vgastate fb_ddc pcc_cpufreq sata_sil24 r8169 
libphy mii fuse fan raid456 multipath mmc_block mmc_core dm_snapshot dm_bufio 
dm_mirror dm_region_hash dm_log dm_crypt dm_mod async_raid6_recov async_pq 
async_xor async_memcpy async_tx blowfish_x86_64 blowfish_common crc32c_intel 
bcache crc64 aesni_intel input_leds i915 aes_x86_64 crypto_simd cryptd ptp 
glue_helper serio_raw pps_core thermal evdev [last unloaded: e1000e]
[26062.126597] CPU: 7 PID: 12394 Comm: btrfs-transacti Tainted: GW  OE  
   5.1.21-amd64-preempt-sysrq-20190816 #5
[26062.126599] Hardware name: LENOVO 20ERCTO1WW/20ERCTO1WW, BIOS N1DET95W (2.21 
) 12/13/2017
[26062.126604] RIP: 0010:__btrfs_free_extent+0x18b/0x921
[26062.126606] Code: 00 8b 45 40 44 29 e0 83 f8 05 0f 8f 2e 05 00 00 41 ff cc 
eb a5 83 f8 fe 0f 85 29 07 00 00 48 c7 c7 f8 67 f0 89 e8 f6 cb dd ff <0f> 0b 48 
8b 7d 00 e8 e5 54 00 00 4c 89 fa 48 c7 c6 85 e0 f4 89 41
[26062.126608] RSP: 0018:b2d9c46e7c88 EFLAGS: 00010246
[26062.126611] RAX: 0024 RBX: 9abca20884e0 RCX: 
[26062.126613] RDX:  RSI: 9abccf5d6558 RDI: 9abccf5d6558
[26062.126617] RBP: 9ab5a4545460 R08: 0001 R09: 8a80c7af
[26062.126618] R10: 0002 R11: b2d9c46e7b2f R12: 0169
[26062.126622] R13: fffe R14: 0104 R15: 006ac918e000
[26062.126625] FS:  () GS:9abccf5c() 
knlGS:
[26062.126627] CS:  0010 DS:  ES:  CR0: 80050033
[26062.126629] CR2: 199a9fb4d000 CR3: 00016c20e006 CR4: 003606e0
[26062.126633] DR0:  DR1:  DR2: 
[26062.126634] DR3:  DR6: fffe0ff0 DR7: 0400
[26062.126636] Call Trace:
[26062.126647]  __btrfs_run_delayed_refs+0x750/0xc36
[26062.126653]  ? __switch_to_asm+0x41/0x70
[26062.126655]  ? __switch_to_asm+0x35/0x70
[26062.126658]  ? __switch_to_asm+0x41/0x70
[26062.126662]  ? __switch_to+0x13d/0x3d5
[26062.126668]  btrfs_run_delayed_refs+0x5d/0x132
[26062.126672]  btrfs_commit_transaction+0x55/0x7c8
[26062.126676]  ? start_transaction+0x347/0x3cb
[26062.126679]  transaction_kthread+0xc9/0x135
[26062.126683]  ? btrfs_cleanup_transaction+0x403/0x403
[26062.126688]  kthread+0xeb/0xf0
[26062.126692]  ? kthread_create_worker_on_cpu+0x65/0x65
[26062.126695]  ret_from_fork+0x35/0x40
[26062.126698] ---[ end trace 4c1a6b3749a2f650 ]---
[26062.126703] BTRFS info (device dm-2): leaf 510067163136 gen 2427077 total 
ptrs 130 free space 4329 owner 2
[26062.126706]  item 0 key (458630676480 168 65536) itemoff 16217 itemsize 66
[26062.126708]  extent refs 2 gen 2369265 flags 1
[26062.126709]  ref#0: extent data backref root 456 objectid 72925787 
offset 5472256 count 1
[26062.126711]  ref#1: shared data backref parent 437615230976 count 1
[26062.126714]  item 1 key (458630856704 168 69632) itemoff 16151 itemsize 66
[26062.126715]  extent refs 2 gen 2369025 flags 1
[26062.126716]  ref#0: extent data backref root 456 objectid 72925787 
offset 4796416 count 1
[26062.126718]  ref#1: shared data backref parent 437615230976 count 1
[26062.126720]  item 2 key (458631012352 168 16384) itemoff 

Re: 5.1.21: fs/btrfs/extent-tree.c:7100 __btrfs_free_extent+0x18b/0x921

2019-10-18 Thread Marc MERLIN
8192
Fixed discount file extents for inode: 75432801 in root: 456
root 456 inode 75432801 errors 100, file extent discount
Found file extent holes:
start: 0, len: 4096
Fixed discount file extents for inode: 75432807 in root: 456
Fixed discount file extents for inode: 75432817 in root: 456
Fixed discount file extents for inode: 75432829 in root: 456
Fixed discount file extents for inode: 75432860 in root: 456
root 456 inode 75432860 errors 100, file extent discount
Found file extent holes:
start: 0, len: 4096
Fixed discount file extents for inode: 75432862 in root: 456
Fixed discount file extents for inode: 75432863 in root: 456
Fixed discount file extents for inode: 75432869 in root: 456
Fixed discount file extents for inode: 75432870 in root: 456
Fixed discount file extents for inode: 75432871 in root: 456
Fixed discount file extents for inode: 75432872 in root: 456
Fixed discount file extents for inode: 75432875 in root: 456
Fixed discount file extents for inode: 75432877 in root: 456
Fixed discount file extents for inode: 75432882 in root: 456
Fixed discount file extents for inode: 75432883 in root: 456
Fixed discount file extents for inode: 75432893 in root: 456
Fixed discount file extents for inode: 75432894 in root: 456
Fixed discount file extents for inode: 75432897 in root: 456
Fixed discount file extents for inode: 75432899 in root: 456
Fixed discount file extents for inode: 75432900 in root: 456
Fixed discount file extents for inode: 75432905 in root: 456
Fixed discount file extents for inode: 75432906 in root: 456
Fixed discount file extents for inode: 75432916 in root: 456
Fixed discount file extents for inode: 75432917 in root: 456
Fixed discount file extents for inode: 75432919 in root: 456
Fixed discount file extents for inode: 75432920 in root: 456
Fixed discount file extents for inode: 75432923 in root: 456
Fixed discount file extents for inode: 75432942 in root: 456
Fixed discount file extents for inode: 75432944 in root: 456
Fixed discount file extents for inode: 75432948 in root: 456
root 456 inode 75432948 errors 100, file extent discount
Found file extent holes:
start: 0, len: 8192
Fixed discount file extents for inode: 75432949 in root: 456
root 456 inode 75432949 errors 100, file extent discount
Found file extent holes:
start: 0, len: 8192
and it loops forever on 456

On Thu, Oct 17, 2019 at 07:56:04PM -0700, Marc MERLIN wrote:
> This happened almost after a resume from suspend to disk.
> First corruption and read only I got a very long time.
> 
> Could they be related?
> 
> [26062.126505] [ cut here ]
> [26062.126524] WARNING: CPU: 7 PID: 12394 at fs/btrfs/extent-tree.c:7100 
> __btrfs_free_extent+0x18b/0x921
> [26062.126526] Modules linked in: msr ccm ipt_MASQUERADE ipt_REJECT 
> nf_reject_ipv4 xt_tcpudp xt_conntrack nf_log_ipv4 nf_log_common xt_LOG 
> iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle 
> ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables 
> x_tables bpfilter rfcomm ax25 bnep pci_stub vboxpci(O) vboxnetadp(O) 
> vboxnetflt(O) vboxdrv(O) autofs4 binfmt_misc uinput nfsd auth_rpcgss nfs_acl 
> nfs lockd grace fscache sunrpc nls_utf8 nls_cp437 vfat fat cuse ecryptfs 
> bbswitch(OE) configs input_polldev loop firewire_sbp2 firewire_core crc_itu_t 
> ppdev parport_pc lp parport uvcvideo videobuf2_vmalloc videobuf2_memops 
> videobuf2_v4l2 videobuf2_common videodev media btusb btrtl btbcm btintel 
> bluetooth ecdh_generic hid_generic usbhid hid joydev arc4 coretemp 
> x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass 
> snd_hda_codec_realtek snd_hda_codec_generic intel_wmi_thunderbolt iTCO_wdt 
> wmi_bmof mei_hdcp iTCO_vendor_support rtsx_pci_sdmmc snd_hda_intel
> [26062.126561]  snd_hda_codec iwlmvm crct10dif_pclmul snd_hda_core 
> crc32_pclmul mac80211 snd_hwdep thinkpad_acpi snd_pcm ghash_clmulni_intel 
> nvram ledtrig_audio intel_cstate deflate snd_seq efi_pstore iwlwifi 
> snd_seq_device snd_timer intel_rapl_perf psmouse pcspkr efivars wmi hwmon snd 
> ac battery cfg80211 mei_me soundcore xhci_pci xhci_hcd rtsx_pci i2c_i801 
> rfkill sg nvidiafb intel_pch_thermal usbcore vgastate fb_ddc pcc_cpufreq 
> sata_sil24 r8169 libphy mii fuse fan raid456 multipath mmc_block mmc_core 
> dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_crypt dm_mod 
> async_raid6_recov async_pq async_xor async_memcpy async_tx blowfish_x86_64 
> blowfish_common crc32c_intel bcache crc64 aesni_intel input_leds i915 
> aes_x86_64 crypto_simd cryptd ptp glue_helper serio_raw pps_core thermal 
> evdev [last unloaded: e1000e]
> [26062.126597] CPU: 7 PID: 12394 Comm: btrfs-transacti Tainted: GW  
> OE 5.1.21-amd64-preempt-sysrq-20190816 #5
> [26062.126599] Hardware name: LENOVO 20ERCTO1WW/20ERCTO1WW, BIOS N1DET95W 
> (2.21 ) 12/13/2017
> [26062.126604] 

Re: 5.1.21: fs/btrfs/extent-tree.c:7100 __btrfs_free_extent+0x18b/0x921

2019-10-22 Thread Marc MERLIN
On Fri, Oct 18, 2019 at 08:07:28PM -0700, Marc MERLIN wrote:
> Ok, so before blowing the filesystem away after it was apparently badly
> damaged by a suspend to disk, I tried check --repair and I hit an
> infinite loop.
> 
> Let me know if you'd like anything off the FS before I delete it.

I heard nothing back, so I deleted the FS and restored from backup.

But now I'm scared of ever doing a suspend to disk again.
Could someone please look at the logs and give me some idea of what
happened, if at all possible?

Non recoverable data corruption on my laptop when I travel and
backups/restores are complicated, is a bit unnerving...

Thanks,
Marc

> Thanks,
> Marc
> 
> enabling repair mode
> repair mode will force to clear out log tree, are you sure? [y/N]: y
> Checking filesystem on /dev/mapper/pool1
> UUID: fda628bc-1ca4-49c5-91c2-4260fe967a23
> checking extents
> Backref 415334400 parent 36028797198598144 not referenced back 0x5648ef1870e0
> Backref 415334400 parent 179634176 root 179634176 not found in extent tree
> Incorrect global backref count on 415334400 found 2 wanted 1
> backpointer mismatch on [415334400 16384]
> repair deleting extent record: key 415334400 169 0
> adding new tree backref on start 415334400 len 16384 parent 179634176 root 
> 179634176
> Repaired extent references for 415334400
> ref mismatch on [101995261952 4096] extent item 36028797018963969, found 1
> repair deleting extent record: key 101995261952 168 4096
> adding new data backref on 101995261952 root 456 owner 74455677 offset 
> 64892928 found 1
> Repaired extent references for 101995261952
> Incorrect local backref count on 458640384000 root 456 owner 81409181 offset 
> 17039360 found 0 wanted 1 back 0x5648eefd3d10
> Backref disk bytenr does not match extent record, bytenr=458640384000, ref 
> bytenr=0
> Backref 458640384000 root 456 owner 73020573 offset 17039360 num_refs 0 not 
> found in extent tree
> Incorrect local backref count on 458640384000 root 456 owner 73020573 offset 
> 17039360 found 1 wanted 0 back 0x5648b32a9600
> backpointer mismatch on [458640384000 86016]
> repair deleting extent record: key 458640384000 168 86016
> adding new data backref on 458640384000 parent 438017720320 owner 0 offset 0 
> found 1
> adding new data backref on 458640384000 root 456 owner 73020573 offset 
> 17039360 found 1
> Repaired extent references for 458640384000
> Fixed 0 roots.
> checking free space cache
> cache and super generation don't match, space cache will be invalidated
> checking fs roots
> Deleting bad dir index [10138517,96,436945] root 456
> Deleting bad dir index [10138518,96,646273] root 456
> Deleting bad dir index [10138517,96,437016] root 456
> Deleting bad dir index [10138518,96,808999] root 456
> Deleting bad dir index [10215134,96,149427] root 456
> Deleting bad dir index [10240541,96,268037] root 456
> Deleting bad dir index [10138517,96,540247] root 456
> Deleting bad dir index [10138518,96,825234] root 456
> Deleting bad dir index [10138517,96,736673] root 456
> Deleting bad dir index [10138518,96,1118221] root 456
> Deleting bad dir index [10240541,96,439703] root 456
> Deleting bad dir index [10138517,96,752282] root 456
> root 456 inode 75431563 errors 100, file extent discount
> Found file extent holes:
>   start: 4096, len: 4096
> root 456 inode 75431568 errors 100, file extent discount
> Found file extent holes:
>   start: 0, len: 1638400
> root 456 inode 75431583 errors 100, file extent discount
> Found file extent holes:
>   start: 0, len: 147456
> root 456 inode 75431585 errors 100, file extent discount
> Found file extent holes:
>   start: 0, len: 208896
> root 456 inode 75431591 errors 100, file extent discount
> Found file extent holes:
>   start: 0, len: 2523136
> root 456 inode 75431730 errors 100, file extent discount
> Found file extent holes:
>   start: 0, len: 208896
> root 456 inode 75431744 errors 100, file extent discount
> Found file extent holes:
>   start: 0, len: 2084864
> root 456 inode 75431751 errors 100, file extent discount
> Found file extent holes:
>   start: 0, len: 172032
> root 456 inode 75431756 errors 100, file extent discount
> Found file extent holes:
>   start: 0, len: 8192
> root 456 inode 75431760 errors 100, file extent discount
> Found file extent holes:
>   start: 0, len: 12288
> root 456 inode 75431765 errors 100, file extent discount
> Found file extent holes:
>   start: 0, len: 32768
> root 456 inode 75431773 errors 100, file extent discount
> Found file extent holes:
>   start: 0, len: 90112
> Fixed discount file extents for inode: 75432421 in root: 456
> Fixed discount file extents for inode: 75432429 in root: 4

Re: Why original mode doesn't use swap? (Original: Re: btrfs check lowmem, take 2)

2018-07-12 Thread Marc MERLIN
On Thu, Jul 12, 2018 at 01:26:41PM +0800, Qu Wenruo wrote:
> 
> 
> On 2018年07月12日 01:09, Chris Murphy wrote:
> > On Tue, Jul 10, 2018 at 12:09 PM, Marc MERLIN  wrote:
> >> Thanks to Su and Qu, I was able to get my filesystem to a point that
> >> it's mountable.
> >> I then deleted loads of snapshots and I'm down to 26.
> >>
> >> IT now looks like this:
> >> gargamel:~# btrfs fi show /mnt/mnt
> >> Label: 'dshelf2'  uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
> >> Total devices 1 FS bytes used 12.30TiB
> >> devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2
> >>
> >> gargamel:~# btrfs fi df /mnt/mnt
> >> Data, single: total=13.57TiB, used=12.19TiB
> >> System, DUP: total=32.00MiB, used=1.55MiB
> >> Metadata, DUP: total=124.50GiB, used=115.62GiB
> >> Metadata, single: total=216.00MiB, used=0.00B
> >> GlobalReserve, single: total=512.00MiB, used=0.00B
> >>
> >>
> >> Problems
> >> 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the
> >> server, despite my deleting lots of snapshots.
> >> Is it because I have too many files then?
> > 
> > I think originally needs most of metdata in memory.
> > 
> > I'm not understanding why btrfs check won't use swap like at least
> > xfs_repair and pretty sure e2fsck will as well.
> 
> I don't understand either.
> 
> Isn't memory from malloc() swappable?

I never looked at the code and why/how it crashes, but my guess was
that it somehow causes the kernel to grab a lot of memory in the btrfs
driver and that is what is what is crashing the system.
If it were just malloc() the btrfs user space tool, it should be both
swappable like you said, and should also get OOM'ed.

I suppose I can still be completely wrong, but I can't find another
logical explanation.

I just tried running it again to trigger the problem, but because I
freed a lot of snapshots, btrfs check --repair goes back to only using
10GB instead of 32GB, so I wasn't able to replicate OOM for you.

Incidently, it died with:
gargamel:~# btrfs check --repair /dev/mapper/dshelf2
enabling repair mode
Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
root 18446744073709551607 has a root item with a more recent gen (143376) 
compared to the found
 root node (139061)
ERROR: failed to repair root items: Invalid argument

That said, when it was using a fair amount of RAM, I captured this:
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root  1376  1.4 25.2 8256368 8240392 pts/18 R+  14:52   1:07 btrfs check 
--repair /dev/mapper/dshelf2

I don't know how to read /proc/meminfo, but that's what it said:
MemTotal:   32643792 kB
MemFree: 1367516 kB
MemAvailable:   15554836 kB
Buffers: 3491672 kB
Cached: 15900320 kB
SwapCached: 2092 kB
Active: 14577228 kB
Inactive:   15028608 kB
Active(anon):   12122180 kB
Inactive(anon):  2643176 kB
Active(file):2455048 kB
Inactive(file): 12385432 kB
Unevictable:8068 kB
Mlocked:8068 kB
SwapTotal:  15616764 kB   < swap was totally unused and stays unused when I 
get the system to crash 
SwapFree:   15578020 kB
Dirty: 71956 kB
Writeback:64 kB
AnonPages:  10219976 kB
Mapped:  4033568 kB
Shmem:   4545552 kB
Slab: 713300 kB
SReclaimable: 395508 kB
SUnreclaim:   317792 kB
KernelStack:   11788 kB
PageTables:52592 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit:31938660 kB
Committed_AS:   20070736 kB
VmallocTotal:   34359738367 kB
VmallocUsed:   0 kB
VmallocChunk:  0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages:0 kB
ShmemPmdMapped:0 kB
CmaTotal:  16384 kB
CmaFree:   0 kB
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
Hugetlb:   0 kB
DirectMap4k: 1207572 kB
DirectMap2M:32045056 kB

Does it help figure out where the memory was going and wehther kernel
memory was being used?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


task btrfs-transacti:921 blocked for more than 120 seconds during check repair

2018-07-17 Thread Marc MERLIN
I got the following on 4.17.6 while running btrfs check --repair on an
unmounted filesystem (not the lowmem version)

I understand that btrfs check is userland only, although it seems that
it caused these FS hangs on a different filesystem (the trace of course
does not provide info on which FS)

Any idea what happened here?
I'm going to wait a few hours without running btrfs check to see if it
happens again and then if running btrfs check will re-create this issue,
but other suggestions (if any), are welcome:

[ 2538.566952] Workqueue: btrfs-endio-write btrfs_endio_write_helper
[ 2538.616484] Call Trace:
[ 2538.623828]  ? __schedule+0x53e/0x59b
[ 2538.634802]  schedule+0x7f/0x98
[ 2538.644214]  wait_current_trans+0x9b/0xd8
[ 2538.656229]  ? add_wait_queue+0x3a/0x3a
[ 2538.668239]  start_transaction+0x1ce/0x325
[ 2538.680556]  btrfs_finish_ordered_io+0x240/0x5d3
[ 2538.694414]  normal_work_helper+0x118/0x277
[ 2538.706984]  process_one_work+0x19c/0x281
[ 2538.719036]  ? rescuer_thread+0x279/0x279
[ 2538.731064]  worker_thread+0x197/0x246
[ 2538.742322]  kthread+0xeb/0xf0
[ 2538.751492]  ? kthread_create_worker_on_cpu+0x66/0x66
[ 2538.76]  ret_from_fork+0x35/0x40
[ 2538.777403] INFO: task kworker/u16:11:369 blocked for more than 120 seconds.
[ 2538.799025]   Not tainted 4.17.6-amd64-preempt-sysrq-20180818 #4
[ 2538.818109] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2538.841640] kworker/u16:11  D0   369  2 0x8000
[ 2538.858112] Workqueue: btrfs-endio-write btrfs_endio_write_helper
[ 2538.876401] Call Trace:
[ 2538.883770]  ? __schedule+0x53e/0x59b
[ 2538.894760]  schedule+0x7f/0x98
[ 2538.904192]  wait_current_trans+0x9b/0xd8
[ 2538.916242]  ? add_wait_queue+0x3a/0x3a
[ 2538.927772]  start_transaction+0x1ce/0x325
[ 2538.940081]  btrfs_finish_ordered_io+0x240/0x5d3
[ 2538.953973]  normal_work_helper+0x118/0x277
[ 2538.966523]  process_one_work+0x19c/0x281
[ 2538.978546]  ? rescuer_thread+0x279/0x279
[ 2538.990560]  worker_thread+0x197/0x246
[ 2539.001797]  kthread+0xeb/0xf0
[ 2539.010986]  ? kthread_create_worker_on_cpu+0x66/0x66
[ 2539.026137]  ret_from_fork+0x35/0x40
[ 2539.037666] INFO: task btrfs-transacti:921 blocked for more than 120 seconds.
[ 2539.059851]   Not tainted 4.17.6-amd64-preempt-sysrq-20180818 #4
[ 2539.079733] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2539.104007] btrfs-transacti D0   921  2 0x8000
[ 2539.121257] Call Trace:
[ 2539.129377]  ? __schedule+0x53e/0x59b
[ 2539.141171]  schedule+0x7f/0x98
[ 2539.151370]  btrfs_tree_lock+0xa6/0x19d
[ 2539.163621]  ? add_wait_queue+0x3a/0x3a
[ 2539.175876]  btrfs_search_slot+0x5aa/0x756
[ 2539.188899]  lookup_inline_extent_backref+0x11a/0x485
[ 2539.204781]  ? fixup_slab_list.isra.43+0x1b/0x72
[ 2539.219360]  __btrfs_free_extent+0xf1/0xa72
[ 2539.232597]  ? btrfs_merge_delayed_refs+0x18b/0x1a7
[ 2539.247922]  ? __mutex_trylock_or_owner+0x43/0x54
[ 2539.262708]  __btrfs_run_delayed_refs+0xad8/0xc40
[ 2539.277504]  btrfs_run_delayed_refs+0x6e/0x16a
[ 2539.291519]  btrfs_commit_transaction+0x42/0x710
[ 2539.306043]  ? start_transaction+0x295/0x325
[ 2539.319516]  transaction_kthread+0xc9/0x135
[ 2539.332757]  ? btrfs_cleanup_transaction+0x3ee/0x3ee
[ 2539.348327]  kthread+0xeb/0xf0
[ 2539.358155]  ? kthread_create_worker_on_cpu+0x66/0x66
[ 2539.373977]  ret_from_fork+0x35/0x40
[ 2539.385394] INFO: task vnstatd:6338 blocked for more than 120 seconds.
[ 2539.405667]   Not tainted 4.17.6-amd64-preempt-sysrq-20180818 #4

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs check (not lowmem) and OOM-like hangs (4.17.6)

2018-07-17 Thread Marc MERLIN
On Tue, Jul 17, 2018 at 10:50:32AM -0700, Marc MERLIN wrote:
> I got the following on 4.17.6 while running btrfs check --repair on an
> unmounted filesystem (not the lowmem version)
> 
> I understand that btrfs check is userland only, although it seems that
> it caused these FS hangs on a different filesystem (the trace of course
> does not provide info on which FS)
> 
> Any idea what happened here?
> I'm going to wait a few hours without running btrfs check to see if it
> happens again and then if running btrfs check will re-create this issue,
> but other suggestions (if any), are welcome:

Hi Qu, I know we were talking about this last week and then, btrfs check
just worked for me so I wasn't able to reproduce.
Now I'm able to reproduce again.

I tried again, it's definitely triggered by btrfs check --repair

I tried to capture what happens, and memory didn't dip to 0, but the system
got very slow and things started failing.
btrfs was never killed though while ssh was.
Is there a chance that maybe btrfs is in some kernel OOM exclude list?

Here is what I got when the system was not doing well (it took minutes to run):

 total   used   free sharedbuffers cached
Mem:  32643788   32070952 572836  0 1021604378772
-/+ buffers/cache:   275900205053768
Swap: 15616764 973596   14643168

gargamel:~# cat /proc/meminfo
MemTotal:   32643788 kB
MemFree: 2726276 kB
MemAvailable:2502200 kB
Buffers:   12360 kB
Cached:  1676388 kB
SwapCached: 11048580 kB
Active: 16443004 kB
Inactive:   12010456 kB
Active(anon):   16287780 kB
Inactive(anon): 11651692 kB
Active(file): 155224 kB
Inactive(file):   358764 kB
Unevictable:5776 kB
Mlocked:5776 kB
SwapTotal:  15616764 kB
SwapFree: 294592 kB
Dirty:  3032 kB
Writeback: 76064 kB
AnonPages:  15723272 kB
Mapped:   612124 kB
Shmem:   1171032 kB
Slab: 399824 kB
SReclaimable:  84568 kB
SUnreclaim:   315256 kB
KernelStack:   20576 kB
PageTables:94268 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit:31938656 kB
Committed_AS:   37909452 kB
VmallocTotal:   34359738367 kB
VmallocUsed:   0 kB
VmallocChunk:  0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 98304 kB
ShmemHugePages:0 kB
ShmemPmdMapped:0 kB
CmaTotal:  16384 kB
CmaFree:   0 kB
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
Hugetlb:   0 kB
DirectMap4k:  355604 kB
DirectMap2M:32897024 kB

and console:
[ 9184.345329] INFO: task zmtrigger.pl:9981 blocked for more than 120 seconds.
[ 9184.366258]   Not tainted 4.17.6-amd64-preempt-sysrq-20180818 #4
[ 9184.385323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 9184.408803] zmtrigger.plD0  9981   9804 0x20020080
[ 9184.425249] Call Trace:
[ 9184.432580]  ? __schedule+0x53e/0x59b
[ 9184.443551]  schedule+0x7f/0x98
[ 9184.452960]  io_schedule+0x16/0x38
[ 9184.463154]  wait_on_page_bit_common+0x10c/0x199
[ 9184.476996]  ? file_check_and_advance_wb_err+0xd7/0xd7
[ 9184.493339]  shmem_getpage_gfp+0x2dd/0x975
[ 9184.506558]  shmem_fault+0x188/0x1c3
[ 9184.518199]  ? filemap_map_pages+0x6f/0x295
[ 9184.531680]  __do_fault+0x1d/0x6e
[ 9184.542505]  __handle_mm_fault+0x675/0xa61
[ 9184.555653]  ? list_move+0x21/0x3a
[ 9184.566737]  handle_mm_fault+0x11c/0x16b
[ 9184.579355]  __do_page_fault+0x324/0x41c
[ 9184.591996]  ? page_fault+0x8/0x30
[ 9184.603059]  page_fault+0x1e/0x30
[ 9184.613846] RIP: 0023:0xf7d2d022
[ 9184.624366] RSP: 002b:ffeb9fe8 EFLAGS: 00010202
[ 9184.640868] RAX: f7eed000 RBX: 567e6000 RCX: 0004
[ 9184.663095] RDX: 587fecb0 RSI: 5876538c RDI: 0004
[ 9184.685308] RBP: 58185160 R08:  R09: 
[ 9184.707524] R10:  R11: 0286 R12: 
[ 9184.729757] R13:  R14:  R15: 
[ 9184.751988] INFO: task /usr/sbin/apach:11868 blocked for more than 120 
seconds.
[ 9184.775106]   Not tainted 4.17.6-amd64-preempt-sysrq-20180818 #4
[ 9184.795072] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 9184.819423] /usr/sbin/apach D0 11868  11311 0x20020080
[ 9184.836748] Call Trace:
[ 9184.844926]  ? __schedule+0x53e/0x59b
[ 9184.856811]  schedule+0x7f/0x98
[ 9184.867075]  io_schedule+0x16/0x38
[ 9184.878114]  wait_on_page_bit_common+0x10c/0x199
[ 9184.892807]  ? file_check_and_advance_wb_err+0xd7/0xd7
[ 9184.909036]  shmem_getpage_gfp+0x2dd/0x975
[ 9184.922157]  shmem_fault+0x188/0x1c3
[ 9184.933667]  ? filemap_map_pages+0x6f/0x29

Re: btrfs check (not lowmem) and OOM-like hangs (4.17.6)

2018-07-17 Thread Marc MERLIN
Ok, I did more testing. Qu is right that btrfs check does not crash the kernel.
It just takes all the memory until linux hangs everywhere, and somehow (no idea 
why) 
the OOM killer never triggers.
Details below:

On Tue, Jul 17, 2018 at 01:32:57PM -0700, Marc MERLIN wrote:
> Here is what I got when the system was not doing well (it took minutes to 
> run):
> 
>  total   used   free sharedbuffers cached
> Mem:  32643788   32070952 572836  0 1021604378772
> -/+ buffers/cache:   275900205053768
> Swap: 15616764 973596   14643168

ok, the reason it was not that close to 0 was due to /dev/shm it seems.
I cleared that, and now I can get it to go to near 0 again.
I'm wrong about the system being fully crashed, it's not, it's just very
close to being hung.
I can type killall -9 btrfs in the serial console and wait a few minutes.
The system eventually recovers, but it's impossible to fix anything via ssh 
apparently because networking does not get to run when I'm in this state.

I'm not sure why my system reproduces this easy while Qu's system does not, 
but Qu was right that the kernel is not dead and that it's merely a problem of 
userspace
taking all the RAM and somehow not being killed by OOM

I checked the PID and don't see why it's not being killed:
gargamel:/proc/31006# grep . oom*
oom_adj:0
oom_score:221   << this increases a lot, but OOM never kills it
oom_score_adj:0

I have these variables:
/proc/sys/vm/oom_dump_tasks:1
/proc/sys/vm/oom_kill_allocating_task:0
/proc/sys/vm/overcommit_kbytes:0
/proc/sys/vm/overcommit_memory:0
/proc/sys/vm/overcommit_ratio:50  << is this bad (seems default)

Here is my system when it virtually died:
ER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root 31006 21.2 90.7 29639020 29623180 pts/19 D+ 13:49   1:35 ./btrfs check 
/dev/mapper/dshelf2

 total   used   free sharedbuffers cached
Mem:  32643788   32180100 463688  0  44664 119508
-/+ buffers/cache:   32015928 627860
Swap: 15616764 443676   15173088

MemTotal:   32643788 kB
MemFree:  463440 kB
MemAvailable:  44864 kB
Buffers:   44664 kB
Cached:   120360 kB
SwapCached:87064 kB
Active: 30381404 kB
Inactive: 585952 kB
Active(anon):   30334696 kB
Inactive(anon):   474624 kB
Active(file):  46708 kB
Inactive(file):   111328 kB
Unevictable:5616 kB
Mlocked:5616 kB
SwapTotal:  15616764 kB
SwapFree:   15173088 kB
Dirty:  1636 kB
Writeback: 4 kB
AnonPages:  30734240 kB
Mapped:67236 kB
Shmem:  3036 kB
Slab: 267884 kB
SReclaimable:  51528 kB
SUnreclaim:   216356 kB
KernelStack:   10144 kB
PageTables:69284 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit:31938656 kB
Committed_AS:   32865492 kB
VmallocTotal:   34359738367 kB
VmallocUsed:   0 kB
VmallocChunk:  0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages:0 kB
ShmemPmdMapped:0 kB
CmaTotal:  16384 kB
CmaFree:   0 kB
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
Hugetlb:   0 kB
DirectMap4k:  560404 kB
DirectMap2M:32692224 kB


-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check (not lowmem) and OOM-like hangs (4.17.6)

2018-07-17 Thread Marc MERLIN
On Wed, Jul 18, 2018 at 08:05:51AM +0800, Qu Wenruo wrote:
> No OOM triggers? That's a little strange.
> Maybe it's related to how kernel handles memory over-commit?
 
Yes, I think you are correct.

> And for the hang, I think it's related to some memory allocation failure
> and error handler just didn't handle it well, so it's causing deadlock
> for certain page.

That indeed matches what I'm seeing.

> ENOMEM handling is pretty common but hardly verified, so it's not that
> strange, but we must locate the problem.

I seem to be getting deadlocks in the kernel, so I'm hoping that at least
it's checked there, but maybe not?

> In my system, at least I'm not using btrfs as root fs, and for the
> memory eating program I normally ensure it's eating all the memory +
> swap, so OOM killer is always triggered, maybe that's the cause.
> 
> So in your case, maybe it's btrfs not really taking up all memory, thus
> OOM killer not triggered.

Correct, the swap is not used.

> Any kernel dmesg about OOM killer triggered?
 
Nothing at all. It never gets triggered.

> > Here is my system when it virtually died:
> > ER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> > root 31006 21.2 90.7 29639020 29623180 pts/19 D+ 13:49   1:35 ./btrfs 
> > check /dev/mapper/dshelf2

See how btrs was taking 29GB in that ps output (that's before it takes
everything and I can't even type ps anymore)
Note that VSZ is almost equal to RSS. Nothing gets swapped.

Then see free output:

> >  total   used   free sharedbuffers cached
> > Mem:  32643788   32180100 463688  0  44664 119508
> > -/+ buffers/cache:   32015928 627860
> > Swap: 15616764 443676   15173088
> 
> For swap, it looks like only some other program's memory is swapped out,
> not btrfs'.

That's exactly correct. btrfs check never goes to swap, I'm not sure why,
and because there is virtual memory free, maybe that's why OOM does not
trigger?
So I guess I can probably "fix" my problem by removing swap, but ultimately
it would be useful to know why memory taken by btrfs check does not end up
in swap.

> And unfortunately, I'm not so familiar with OOM/MM code outside of
> filesystem.
> Any help from other experienced developers would definitely help to
> solve why memory of 'btrfs check' is not swapped out or why OOM killer
> is not triggered.

Do you have someone from linux-vm you might be able to ask, or should we Cc
this thread there?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check (not lowmem) and OOM-like hangs (4.17.6)

2018-07-18 Thread Marc MERLIN
On Wed, Jul 18, 2018 at 10:42:21PM +0300, Andrei Borzenkov wrote:
> > Any help from other experienced developers would definitely help to
> > solve why memory of 'btrfs check' is not swapped out or why OOM killer
> > is not triggered.
> 
> Almost all used memory is marked as "active" and active pages are not
> swapped. Page is active if it was accessed recently. Is it possible that
> btrfs logic does frequent scans across all allocated memory?
> >>
> >> Active: 30381404 kB
> >> Inactive: 585952 kB

That is a very good find.

Yes, the linux kernel VM may be smart enough not to swap pages that got used
recently and when btrfs slurps all the extents to cross check everything, I
think it does cross reference them all many times.
This is why it can run in a few hours when btrfs check lowmem requires days
to run in a similar situation.

I'm not sure if there is a good way around this, but it's good to know that
btrfs repair can effectively abuse the linux VM in a way that it'll take
everything down without OOM having a chance to trigger.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Have 15GB missing in btrfs filesystem.

2018-10-23 Thread Marc MERLIN
Normally when btrfs fi show will show lost space because 
your trees aren't balanced.
Balance usually reclaims that space, or most of it.
In this case, not so much.

kernel 4.17.6:

saruman:/mnt/btrfs_pool1# btrfs fi show .
Label: 'btrfs_pool1'  uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23
Total devices 1 FS bytes used 186.89GiB
devid1 size 228.67GiB used 207.60GiB path /dev/mapper/pool1

Ok, I have 21GB between used by FS and used in block layer.

saruman:/mnt/btrfs_pool1# btrfs balance start -dusage=40 -v .
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=40
Done, had to relocate 1 out of 210 chunks
saruman:/mnt/btrfs_pool1# btrfs balance start -musage=60 -v .
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=60
  SYSTEM (flags 0x2): balancing, usage=60
Done, had to relocate 4 out of 209 chunks
saruman:/mnt/btrfs_pool1# btrfs fi show .
Label: 'btrfs_pool1'  uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23
Total devices 1 FS bytes used 186.91GiB
devid1 size 228.67GiB used 205.60GiB path /dev/mapper/pool1

That didn't help much, delta is now 19GB

saruman:/mnt/btrfs_pool1# btrfs balance start -dusage=80 -v .
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=80
Done, had to relocate 8 out of 207 chunks
saruman:/mnt/btrfs_pool1# btrfs fi show .
Label: 'btrfs_pool1'  uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23
Total devices 1 FS bytes used 187.03GiB
devid1 size 228.67GiB used 201.54GiB path /dev/mapper/pool1

Ok, now delta is 14GB

saruman:/mnt/btrfs_pool1# btrfs balance start -musage=80 -v .
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=80
  SYSTEM (flags 0x2): balancing, usage=80
Done, had to relocate 5 out of 202 chunks
saruman:/mnt/btrfs_pool1# btrfs fi show .
Label: 'btrfs_pool1'  uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23
Total devices 1 FS bytes used 188.24GiB
devid1 size 228.67GiB used 203.54GiB path /dev/mapper/pool1

and it's back to 15GB :-/

How can I get 188.24 and 203.54 to converge further? Where is all that
space gone?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08


Re: Have 15GB missing in btrfs filesystem.

2018-10-27 Thread Marc MERLIN
On Wed, Oct 24, 2018 at 01:07:25PM +0800, Qu Wenruo wrote:
> > saruman:/mnt/btrfs_pool1# btrfs balance start -musage=80 -v .
> > Dumping filters: flags 0x6, state 0x0, force is off
> >   METADATA (flags 0x2): balancing, usage=80
> >   SYSTEM (flags 0x2): balancing, usage=80
> > Done, had to relocate 5 out of 202 chunks
> > saruman:/mnt/btrfs_pool1# btrfs fi show .
> > Label: 'btrfs_pool1'  uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23
> > Total devices 1 FS bytes used 188.24GiB
> > devid1 size 228.67GiB used 203.54GiB path /dev/mapper/pool1
> > 
> > and it's back to 15GB :-/
> > 
> > How can I get 188.24 and 203.54 to converge further? Where is all that
> > space gone?
> 
> Your original chunks are already pretty compact.
> Thus really no need to do extra balance.
> 
> You may get some extra space by doing full system balance (no usage=
> filter), but that's really not worthy in my opinion.
> 
> Maybe you could try defrag to free some space wasted by CoW instead?
> (If you're not using many snapshots)

Thanks for the reply.

So right now, I have:
saruman:~# btrfs fi show /mnt/btrfs_pool1/
Label: 'btrfs_pool1'  uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23
Total devices 1 FS bytes used 188.25GiB
devid1 size 228.67GiB used 203.54GiB path /dev/mapper/pool1

saruman:~# btrfs fi df /mnt/btrfs_pool1/
Data, single: total=192.48GiB, used=184.87GiB
System, DUP: total=32.00MiB, used=48.00KiB
Metadata, DUP: total=5.50GiB, used=3.38GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

I've been using btrfs for a long time now but I've never had a
filesystem where I had 15GB apparently unusable (7%) after a balance.

I can't drop all the snapshots since at least two is used for btrfs
send/receive backups.
However, if I delete more snapshots, and do a full balance, you think
it'll free up more space?
I can try a defrag next, but since I have COW for snapshots, it's not
going to help much, correct?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08


Re: Have 15GB missing in btrfs filesystem.

2018-10-27 Thread Marc MERLIN
On Sat, Oct 27, 2018 at 02:12:02PM -0400, Remi Gauvin wrote:
> On 2018-10-27 01:42 PM, Marc MERLIN wrote:
> 
> > 
> > I've been using btrfs for a long time now but I've never had a
> > filesystem where I had 15GB apparently unusable (7%) after a balance.
> > 
> 
> The space isn't unusable.  It's just allocated.. (It's used in the sense
> that it's reserved for data chunks.).  Start writing data to the drive,
> and the data will fill that space before more gets allocated.. (Unless
> you are using an older kernel and the filesystem gets mounted with ssd
> option, in which case, you'll want to add nossd option to prevent that
> behaviour.)
> 
> You can use btrfs fi usage to display that more clearly.
 
Got it. I have disk space free alerts based on df, which I know doesn't
mean that much on btrfs. Maybe I'll just need to change that alert code
to make it btrfs aware.
 
> > I can try a defrag next, but since I have COW for snapshots, it's not
> > going to help much, correct?
> 
> The defrag will end up using more space, as the fragmented parts of
> files will get duplicated.  That being said, if you have the luxury to
> defrag *before* taking new snapshots, that would be the time to do it.

Thanks for confirming. Because I always have snapshots for btrfs
send/receive, defrag will duplicate as you say, but once the older
snapshots get freed up, the duplicate blocks should go away, correct?

Back to usage, thanks for pointing out that command:
saruman:/mnt/btrfs_pool1# btrfs fi usage .
Overall:
Device size: 228.67GiB
Device allocated:203.54GiB
Device unallocated:   25.13GiB
Device missing:  0.00B
Used:192.01GiB
Free (estimated): 32.44GiB  (min: 19.88GiB)
Data ratio:   1.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,single: Size:192.48GiB, Used:185.16GiB
   /dev/mapper/pool1 192.48GiB

Metadata,DUP: Size:5.50GiB, Used:3.42GiB
   /dev/mapper/pool1  11.00GiB

System,DUP: Size:32.00MiB, Used:48.00KiB
   /dev/mapper/pool1  64.00MiB

Unallocated:
   /dev/mapper/pool1  25.13GiB


I'm still seing that I'm using 192GB, but 203GB allocated.
Do I have 25GB usable:
Device unallocated:   25.13GiB

Or 35GB usable?
Device size: 228.67GiB
  -
Used:192.01GiB
  = 36GB ?

Yes I know that I shouldn't get close to filling up the device, just
trying to clear up if I should stay below 25GB or below 35GB

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08


Re: Have 15GB missing in btrfs filesystem.

2018-10-27 Thread Marc MERLIN
On Sun, Oct 28, 2018 at 07:27:22AM +0800, Qu Wenruo wrote:
> > I can't drop all the snapshots since at least two is used for btrfs
> > send/receive backups.
> > However, if I delete more snapshots, and do a full balance, you think
> > it'll free up more space?
> 
> No.
> 
> You're already too worried about an non-existing problem.
> Your fs looks pretty healthy.

Thanks both for the answers. I'll go back and read them more carefully
later to see how I can adjust my monitoring but basically I hit the 90%
space used in df alert, and I know that once I get close to full, or
completely full, very bad things happen with btrfs, making the system
sometimes so unusable that it's very hard to reclaim space and fix the
issue (not counting that if you have btrfs send snapshots, you're forced
to break the snapshot relationship and start over since deleting data
does not reclaim blocks that are obviously still marked as used by the
last snapshot that was sent to the backup server).

Long story short, I try very hard to not ever hit this problem again :)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08


4.15.6 crash: BUG at fs/btrfs/ctree.c:1862

2018-05-14 Thread Marc MERLIN
static noinline struct extent_buffer *
read_node_slot(struct btrfs_fs_info *fs_info, struct extent_buffer *parent,
   int slot)
{
int level = btrfs_header_level(parent);
struct extent_buffer *eb;

if (slot < 0 || slot >= btrfs_header_nritems(parent))
return ERR_PTR(-ENOENT);

BUG_ON(level == 0);



BTRFS info (device dm-2): relocating block group 13404622290944 flags data
BTRFS info (device dm-2): found 9959 extents
BTRFS info (device dm-2): found 9959 extents
BTRFS info (device dm-2): relocating block group 13403548549120 flags data
[ cut here ]
kernel BUG at fs/btrfs/ctree.c:1862!
invalid opcode:  [#1] PREEMPT SMP PTI
CPU: 5 PID: 8103 Comm: btrfs Tainted: G U   
4.15.6-amd64-preempt-sysrq-20171018 #3
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 
04/27/2013
RIP: 0010:read_node_slot+0x3c/0x9e
RSP: 0018:becfaa0b7b58 EFLAGS: 00210246
RAX: 00a0 RBX: 000c RCX: 0003
RDX: 000c RSI: 9a60e9d9de78 RDI: 00052f6e
RBP: 9a60e9d9de78 R08: 0001 R09: becfaa0b7bf6
R10: 9a64988bd7e9 R11: 9a64988bd7c8 R12: e003d4bdb800
R13: 9a64a481 R14:  R15: 
FS:  7fba34c9c8c0() GS:9a64de34() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 5a8b9c9a CR3: 0001446c6004 CR4: 001606e0
Call Trace:
 tree_advance+0xb1/0x11e
 btrfs_compare_trees+0x1c2/0x4d6
 ? process_extent+0xdcf/0xdcf
 btrfs_ioctl_send+0x81e/0xc70
 ? __kmalloc_track_caller+0xfb/0x10f
 _btrfs_ioctl_send+0xbc/0xe6
 ? paravirt_sched_clock+0x5/0x8
 ? set_task_rq+0x2f/0x80
 ? task_rq_unlock+0x22/0x36
 btrfs_ioctl+0x162f/0x1dc8
 ? select_task_rq_fair+0xb65/0xb7a
 ? update_load_avg+0x16d/0x442
 ? list_add+0x15/0x2e
 ? cfs_rq_throttled.isra.30+0x9/0x18
 ? vfs_ioctl+0x1b/0x28
 vfs_ioctl+0x1b/0x28
 do_vfs_ioctl+0x4f4/0x53f
 ? __audit_syscall_entry+0xbf/0xe3
 SyS_ioctl+0x52/0x76
 do_syscall_64+0x72/0x81
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7fba34d835e7
RSP: 002b:7ffc32cf4cb8 EFLAGS: 0202 ORIG_RAX: 0010
RAX: ffda RBX: 523f RCX: 7fba34d835e7
RDX: 7ffc32cf4d40 RSI: 40489426 RDI: 0004
RBP: 0004 R08:  R09: 7fba34c9b700
R10: 7fba34c9b9d0 R11: 0202 R12: 0003
R13: 563a30b87020 R14: 0001 R15: 0001
Code: f5 53 4c 8b a6 98 00 00 00 89 d3 4c 89 e7 e8 67 fd ff ff 85 db 78 63 4c 
89 e7 41 88 c6 e8 92 fb ff ff 39 d8 76 54 45 84 f6 75 02 <0f> 0b 89 de 48 89 ef 
e8 2e ff ff ff 89 de 49 89 c4 48 89 ef e8
RIP: read_node_slot+0x3c/0x9e RSP: becfaa0b7b58
---[ end trace a24e7de6b77b5cb1 ]---
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x1900 from 0x8100 (relocation range: 
0x8000-0xbfff)

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.15.6 crash: BUG at fs/btrfs/ctree.c:1862

2018-05-15 Thread Marc MERLIN
On Tue, May 15, 2018 at 09:36:11AM +0100, Filipe Manana wrote:
> We got a fix for this recently:  https://patchwork.kernel.org/patch/10396523/

Thanks very much for the notice, sorry that I missed it.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs balance did not progress after 12H

2018-06-18 Thread Marc MERLIN
So, I ran this:
gargamel:/mnt/btrfs_pool2# btrfs balance start -dusage=60 -v .  &
[1] 24450
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=60
gargamel:/mnt/btrfs_pool2# while :; do btrfs balance status .; sleep 60; done
0 out of about 0 chunks balanced (0 considered), -nan% left
Balance on '.' is running
0 out of about 73 chunks balanced (2 considered), 100% left
Balance on '.' is running

After about 20mn, it changed to this:
1 out of about 73 chunks balanced (6724 considered),  99% left
Balance on '.' is running

Now, 12H later, it's still there, only 1 out of 73.

gargamel:/mnt/btrfs_pool2# btrfs fi show .
Label: 'dshelf2'  uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
Total devices 1 FS bytes used 12.72TiB
devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2

gargamel:/mnt/btrfs_pool2# btrfs fi df .
Data, single: total=13.57TiB, used=12.60TiB
System, DUP: total=32.00MiB, used=1.55MiB
Metadata, DUP: total=121.50GiB, used=116.53GiB
GlobalReserve, single: total=512.00MiB, used=848.00KiB

kernel: 4.16.8

Is that expected? Should I be ready to wait days possibly for this
balance to finish?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balance did not progress after 12H

2018-06-19 Thread Marc MERLIN
On Mon, Jun 18, 2018 at 06:00:55AM -0700, Marc MERLIN wrote:
> So, I ran this:
> gargamel:/mnt/btrfs_pool2# btrfs balance start -dusage=60 -v .  &
> [1] 24450
> Dumping filters: flags 0x1, state 0x0, force is off
>   DATA (flags 0x2): balancing, usage=60
> gargamel:/mnt/btrfs_pool2# while :; do btrfs balance status .; sleep 60; done
> 0 out of about 0 chunks balanced (0 considered), -nan% left
> Balance on '.' is running
> 0 out of about 73 chunks balanced (2 considered), 100% left
> Balance on '.' is running
> 
> After about 20mn, it changed to this:
> 1 out of about 73 chunks balanced (6724 considered),  99% left
> Balance on '.' is running
> 
> Now, 12H later, it's still there, only 1 out of 73.
> 
> gargamel:/mnt/btrfs_pool2# btrfs fi show .
> Label: 'dshelf2'  uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
> Total devices 1 FS bytes used 12.72TiB
> devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2
> 
> gargamel:/mnt/btrfs_pool2# btrfs fi df .
> Data, single: total=13.57TiB, used=12.60TiB
> System, DUP: total=32.00MiB, used=1.55MiB
> Metadata, DUP: total=121.50GiB, used=116.53GiB
> GlobalReserve, single: total=512.00MiB, used=848.00KiB
> 
> kernel: 4.16.8
> 
> Is that expected? Should I be ready to wait days possibly for this
> balance to finish?
 
It's now beeen 2 days, and it's still stuck at 1%
1 out of about 73 chunks balanced (6724 considered),  99% left

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

2018-06-25 Thread Marc MERLIN
On Tue, Jun 19, 2018 at 12:58:44PM -0400, Austin S. Hemmelgarn wrote:
> > In your situation, I would run "btrfs pause ", wait to hear from
> > a btrfs developer, and not use the volume whatsoever in the meantime.
> I would say this is probably good advice.  I don't really know what's going
> on here myself actually, though it looks like the balance got stuck (the
> output hasn't changed for over 36 hours, unless you've got an insanely slow
> storage array, that's extremely unusual (it should only be moving at most
> 3GB of data per chunk)).

I didn't hear from any developer, so I had to continue.
- btrfs scrub cancel did not work (hang)
- at reboot mounting the filesystem hung, even with 4.17, which is
  disappointing (it should not hang)
- mount -o recovery still hung
- mount -o ro did not hang though

Sigh, why is my FS corrupted again?
Anyway, back to 
btrfs check --repair
and, it took all my 32GB of RAM on a system I can't add more RAM to, so
I'm hosed. I'll note in passing (and it's not ok at all) that check
--repair after a 20 to 30mn pause, takes all the kernel RAM more quickly
than the system can OOM or log anything, and just deadlocks it.
This is repeateable and totally not ok :(

I'm now left with btrfs-progs git master, and lowmem which finally does
a bit of repair.
So far:
gargamel:~# btrfs check --mode=lowmem --repair -p /dev/mapper/dshelf2  
enabling repair mode  
WARNING: low-memory mode repair support is only partial  
Checking filesystem on /dev/mapper/dshelf2  
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d  
Fixed 0 roots.  
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Created new chunk [18457780224000 1073741824]
Delete backref in extent [84302495744 69632]
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Delete backref in extent [84302495744 69632]
ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
owner: 374857, offset: 114540544) wanted: 181, have: 240
Delete backref in extent [125712527360 12214272]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
owner: 374857, offset: 148234240) wanted: 302, have: 431
Delete backref in extent [129952120832 20242432]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
owner: 374857, offset: 148234240) wanted: 356, have: 433
Delete backref in extent [129952120832 20242432]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
owner: 374857, offset: 180371456) wanted: 161, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
owner: 374857, offset: 180371456) wanted: 162, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
owner: 374857, offset: 192200704) wanted: 170, have: 249
Delete backref in extent [147895111680 12345344]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
owner: 374857, offset: 192200704) wanted: 172, have: 251
Delete backref in extent [147895111680 12345344]
ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, 
owner: 374857, offset: 217653248) wanted: 348, have: 418
Delete backref in extent [150850146304 17522688]
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, 
owner: 374857, offset: 235175936) wanted: 555, have: 1449
Deleted root 2 item[156909494272, 178, 5476627808561673095]
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
owner: 374857, offset: 235175936) wanted: 556, have: 1452
Deleted root 2 item[156909494272, 178, 7338474132555182983]

At the rate it's going, it'll probably take days though, it's already been 36H

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"

Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

2018-06-25 Thread Marc MERLIN
On Mon, Jun 25, 2018 at 06:24:37PM +0200, Hans van Kranenburg wrote:
> >> output hasn't changed for over 36 hours, unless you've got an insanely slow
> >> storage array, that's extremely unusual (it should only be moving at most
> >> 3GB of data per chunk)).
> > 
> > I didn't hear from any developer, so I had to continue.
> > - btrfs scrub cancel did not work (hang)
> 
> Did you mean balance cancel? It waits until the current block group is
> finished.
 
Yes, I meant that, thanks for correcting me.  And you're correct that
because it was hung, cancel wasn't going to go anywhere.
At least my filesystem was still working at the time (as in IO was going
on just fine)

> > - at reboot mounting the filesystem hung, even with 4.17, which is
> >   disappointing (it should not hang)
> > - mount -o recovery still hung
> > - mount -o ro did not hang though
> > 
> > Sigh, why is my FS corrupted again?
> 
> Again? Do you think balance is corrupting the filesystem? Or have there
> been previous btrfs check --repair operations which made smaller
> problems bigger in the past?

Honestly, I don't fully remember at this point, I keep notes, but not
detailled enough and it's been a little while.
I know I've had to delete/recreate this filesystem twice already over
the last years, but I'm not fully certain I remember when this one was
last wiped.
Yes, I do run balance along with scrub once a month:

btrfs balance start -musage=0 -v $mountpoint 2>&1 | grep -Ev "$FILTER"
# After metadata, let's do data:
btrfs balance start -dusage=0 -v $mountpoint 2>&1 | grep -Ev "$FILTER"
btrfs balance start -dusage=20 -v $mountpoint 2>&1 | grep -Ev "$FILTER"
echo btrfs scrub start -Bd $mountpoint
ionice -c 3 nice -10 btrfs scrub start -Bd $mountpoint

Hard to say if balance has damaged my filesystem over time, but it's
definitely possible.

> Am I right to interpret the messages below, and see that you have
> extents that are referenced hundreds of times?
 
I'm not certain, but it's a backup server with many blocks that are the same, 
so it 
could be some COW stuff, even if I didn't run any dedupe commands myself.

> Is there heavy snapshotting or deduping going on in this filesystem? If
> so, it's not surprising balance will get a hard time moving extents
> around, since it has to update all of the metadata for each extent again
> in hundreds of places.

There is some snapshotting, but maybe around 20 or so per subvolume, not 
hundreds.

> Did you investigate what balance was doing if it takes long? Is is using
> cpu all the time, or is it reading from disk slowly (random reads) or is
> it writing to disk all the time at full speed?

I couldn't see what it was doing, but it's running in the kernel, is it not?
(or can you just strace the user space command?)
Either way, it's too late for that now, and given that it didn't make progress 
of
a single block in 36H, I'm assuming it was well deadlocked.

Thanks for the reply.
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

2018-06-25 Thread Marc MERLIN
On Mon, Jun 25, 2018 at 01:07:10PM -0400, Austin S. Hemmelgarn wrote:
> > - mount -o recovery still hung
> > - mount -o ro did not hang though
> One tip here specifically, if you had to reboot during a balance and the FS
> hangs when it mounts, try mounting with `-o skip_balance`.  That should
> pause the balance instead of resuming it on mount, at which point you should
> also be able to cancel it without it hanging.

Very good tip, I have this in all my mountpoints :)

#LABEL=dshelf2 /mnt/btrfs_pool2 btrfs defaults,compress=lzo,skip_balance,noatime

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Marc MERLIN
Regular btrfs check --repair has a nice progress option. It wasn't
perfect, but it showed something.

But then it also takes all your memory quicker than the linux kernel can
defend itself and reliably completely kills my 32GB server quicker than
it can OOM anything.

lowmem repair seems to be going still, but it's been days and -p seems
to do absolutely nothing.

My filesystem is "only" 10TB or so, albeit with a lot of files.

2 things that come to mind
1) can lowmem have some progress working so that I know if I'm looking
at days, weeks, or even months before it will be done?

2) non lowmem is more efficient obviously when it doesn't completely
crash your machine, but could lowmem be given an amount of memory to use
for caching, or maybe use some heuristics based on RAM free so that it's
not so excrutiatingly slow?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 01:07:20PM +0800, Qu Wenruo wrote:
> > lowmem repair seems to be going still, but it's been days and -p seems
> > to do absolutely nothing.
> 
> I'm a afraid you hit a bug in lowmem repair code.
> By all means, --repair shouldn't really be used unless you're pretty
> sure the problem is something btrfs check can handle.
> 
> That's also why --repair is still marked as dangerous.
> Especially when it's combined with experimental lowmem mode.

Understood, but btrfs got corrupted (by itself or not, I don't know)
I cannot mount the filesystem read/write
I cannot btrfs check --repair it since that code will kill my machine
What do I have left?

> > My filesystem is "only" 10TB or so, albeit with a lot of files.
> 
> Unless you have tons of snapshots and reflinked (deduped) files, it
> shouldn't take so long.

I may have a fair amount.
gargamel:~# btrfs check --mode=lowmem --repair -p /dev/mapper/dshelf2 
enabling repair mode
WARNING: low-memory mode repair support is only partial
Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
Fixed 0 roots.
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Created new chunk [18457780224000 1073741824]
Delete backref in extent [84302495744 69632]
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Delete backref in extent [84302495744 69632]
ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
owner: 374857, offset: 114540544) wanted: 181, have: 240
Delete backref in extent [125712527360 12214272]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
owner: 374857, offset: 148234240) wanted: 302, have: 431
Delete backref in extent [129952120832 20242432]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
owner: 374857, offset: 148234240) wanted: 356, have: 433
Delete backref in extent [129952120832 20242432]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
owner: 374857, offset: 180371456) wanted: 161, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
owner: 374857, offset: 180371456) wanted: 162, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
owner: 374857, offset: 192200704) wanted: 170, have: 249
Delete backref in extent [147895111680 12345344]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
owner: 374857, offset: 192200704) wanted: 172, have: 251
Delete backref in extent [147895111680 12345344]
ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, 
owner: 374857, offset: 217653248) wanted: 348, have: 418
Delete backref in extent [150850146304 17522688]
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, 
owner: 374857, offset: 235175936) wanted: 555, have: 1449
Deleted root 2 item[156909494272, 178, 5476627808561673095]
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
owner: 374857, offset: 235175936) wanted: 556, have: 1452
Deleted root 2 item[156909494272, 178, 7338474132555182983]
ERROR: file extent[374857 235184128] root 21872 owner 21872 backref lost
Add one extent data backref [156909494272 55320576]
ERROR: file extent[374857 235184128] root 22911 owner 22911 backref lost
Add one extent data backref [156909494272 55320576]

The last two ERROR lines took over a day to get generated, so I'm not sure if 
it's still working, but just slowly.
For what it's worth non lowmem check used to take 12 to 24H on that filesystem 
back when it still worked.

> > 2 things that come to mind
> > 1) can lowmem have some progress working so that I know if I'm looking
> > at days, weeks, or even months before it will be done?
> 
> It's hard to estimate, especially when every cross check involves a lot
> of disk IO.
> But at least, we could add such indicator to show we're doing something.

Yes, anything to show that I should still wait is still good :)

> > 2) non lowmem is

Re: So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 01:35:06PM +0800, Su Yue wrote:
> > It's hard to estimate, especially when every cross check involves a lot
> > of disk IO.
> > 
> > But at least, we could add such indicator to show we're doing something.
> > Maybe we can account all roots in root tree first, before checking a
> tree, report i/num_roots. So users can see the what is the check doing
> something meaningful or silly dead looping.

Sounds reasonable.
Do you want to submit something in git master for btrfs-progs, I pull
it, and just my btrfs check again?

In the meantime, how sane does the output I just posted, look?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 01:48:17PM +0800, Qu Wenruo wrote:
> Just normal btrfs check, and post the output.
> If normal check eats up all your memory, btrfs check --mode=lowmem.
 
Does check without --repair eat less RAM?

> --repair should be considered as the last method.

If --repair doesn't work, check is useless to me sadly. I know that for
FS analysis and bug reporting, you want to have the FS without changing
it to something maybe worse, but for my use, if it can't be mounted and
can't be fixed, then it gets deleted which is even worse than check
doing the wrong thing.

> > The last two ERROR lines took over a day to get generated, so I'm not sure 
> > if it's still working, but just slowly.
> 
> OK, that explains something.
> 
> One extent is referred hundreds times, no wonder it will take a long time.
> 
> Just one tip here, there are really too many snapshots/reflinked files.
> It's highly recommended to keep the number of snapshots to a reasonable
> number (lower two digits).
> Although btrfs snapshot is super fast, it puts a lot of pressure on its
> extent tree, so there is no free lunch here.
 
Agreed, I doubt I have over or much over 100 snapshots though (but I
can't check right now).
Sadly I'm not allowed to mount even read only while check is running:
gargamel:~# mount -o ro /dev/mapper/dshelf2 /mnt/mnt2
mount: /dev/mapper/dshelf2 already mounted or /mnt/mnt2 busy

> > I see. Is there any reasonably easy way to check on this running process?
> 
> GDB attach would be good.
> Interrupt and check the inode number if it's checking fs tree.
> Check the extent bytenr number if it's checking extent tree.
> 
> But considering how many snapshots there are, it's really hard to determine.
> 
> In this case, the super large extent tree is causing a lot of problem,
> maybe it's a good idea to allow btrfs check to skip extent tree check?

I only see --init-extent-tree in the man page, which option did you have
in mind?

> > Then again, maybe it already fixed enough that I can mount my filesystem 
> > again.
> 
> This needs the initial btrfs check report and the kernel messages how it
> fails to mount.

mount command hangs, kernel does not show anything special outside of disk 
access hanging.

Jun 23 17:23:26 gargamel kernel: [  341.802696] BTRFS warning (device dm-2): 
'recovery' is deprecated, use 'useback
uproot' instead
Jun 23 17:23:26 gargamel kernel: [  341.828743] BTRFS info (device dm-2): 
trying to use backup root at mount time
Jun 23 17:23:26 gargamel kernel: [  341.850180] BTRFS info (device dm-2): disk 
space caching is enabled
Jun 23 17:23:26 gargamel kernel: [  341.869014] BTRFS info (device dm-2): has 
skinny extents
Jun 23 17:23:26 gargamel kernel: [  342.206289] BTRFS info (device dm-2): bdev 
/dev/mapper/dshelf2 errs: wr 0, rd 0 , flush 0, corrupt 2, gen 0
Jun 23 17:26:26 gargamel kernel: [  521.571392] BTRFS info (device dm-2): 
enabling ssd optimizations
Jun 23 17:55:58 gargamel kernel: [ 2293.914867] perf: interrupt took too long 
(2507 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
Jun 23 17:56:22 gargamel kernel: [ 2317.718406] BTRFS info (device dm-2): disk 
space caching is enabled
Jun 23 17:56:22 gargamel kernel: [ 2317.737277] BTRFS info (device dm-2): has 
skinny extents
Jun 23 17:56:22 gargamel kernel: [ 2318.069461] BTRFS info (device dm-2): bdev 
/dev/mapper/dshelf2 errs: wr 0, rd 0 , flush 0, corrupt 2, gen 0
Jun 23 17:59:22 gargamel kernel: [ 2498.256167] BTRFS info (device dm-2): 
enabling ssd optimizations
Jun 23 18:05:23 gargamel kernel: [ 2859.107057] BTRFS info (device dm-2): disk 
space caching is enabled
Jun 23 18:05:23 gargamel kernel: [ 2859.125883] BTRFS info (device dm-2): has 
skinny extents
Jun 23 18:05:24 gargamel kernel: [ 2859.448018] BTRFS info (device dm-2): bdev 
/dev/mapper/dshelf2 errs: wr 0, rd 0 , flush 0, corrupt 2, gen 0
Jun 23 18:08:23 gargamel kernel: [ 3039.023305] BTRFS info (device dm-2): 
enabling ssd optimizations
Jun 23 18:13:41 gargamel kernel: [ 3356.626037] perf: interrupt took too long 
(3143 > 3133), lowering kernel.perf_event_max_sample_rate to 63500
Jun 23 18:17:23 gargamel kernel: [ 3578.937225] Process accounting resumed
Jun 23 18:33:47 gargamel kernel: [ 4563.356252] JFS: nTxBlock = 8192, nTxLock = 
65536
Jun 23 18:33:48 gargamel kernel: [ 4563.446715] ntfs: driver 2.1.32 [Flags: R/W 
MODULE].
Jun 23 18:42:20 gargamel kernel: [ 5075.995254] INFO: task sync:20253 blocked 
for more than 120 seconds.
Jun 23 18:42:20 gargamel kernel: [ 5076.015729]   Not tainted 
4.17.2-amd64-preempt-sysrq-20180817 #1
Jun 23 18:42:20 gargamel kernel: [ 5076.036141] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 23 18:42:20 gargamel kernel: [ 5076.060637] syncD0 20253  
15327 0x20020080
Jun 23 18:42:20 gargamel kernel: [ 5076.078032] Call Trace:
Jun 23 18:42:20 gargamel kernel: [ 5076.086366]  ? __schedule+0x53e/0x59b
Jun 23 18:42:20 gargamel kernel: [ 5076.098311]  schedule+0x7f/0x98
Ju

Re: So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 02:02:19PM +0800, Su Yue wrote:
> I have figured out the bug is lowmem check can't deal with shared tree block
> in reloc tree. The fix is simple, you can try the follow repo:
> 
> https://github.com/Damenly/btrfs-progs/tree/tmp1

Not sure if I undertand that you meant, here.

> Please run lowmem check "without =--repair" first to be sure whether
> your filesystem is fine.
 
The filesystem is not fine, it caused btrfs balance to hang, whether
balance actually broke it further or caused the breakage, I can't say.

Then mount hangs, even with recovery, unless I use ro.

This filesystem is trash to me and will require over a week to rebuild
manually if I can't repair it.
Running check without repair for likely several days just to know that
my filesystem is not clear (I already know this) isn't useful :)
Or am I missing something?

> Though the bug and phenomenon are clear enough, before sending my patch,
> I have to make a test image. I have spent a week to study btrfs balance
> but it seems a liitle hard for me.

thanks for having a look, either way.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 02:32:44PM +0800, Su Yue wrote:
> > > https://github.com/Damenly/btrfs-progs/tree/tmp1
> > 
> > Not sure if I undertand that you meant, here.
> > 
> Sorry for my unclear words.
> Simply speaking, I suggest you to stop current running check.
> Then, clone above branch to compile binary then run
> 'btrfs check --mode=lowmem $dev'.
 
I understand, I'll build and try it.

> > This filesystem is trash to me and will require over a week to rebuild
> > manually if I can't repair it.
> 
> Understood your anxiety, a log of check without '--repair' will help
> us to make clear what's wrong with your filesystem.

Ok, I'll run your new code without repair and report back. It will
likely take over a day though.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 02:29:10PM +0800, Qu Wenruo wrote:
> > If --repair doesn't work, check is useless to me sadly.
> 
> Not exactly.
> Although it's time consuming, I have manually patched several users fs,
> which normally ends pretty well.
 
Ok I understand now.

> > Agreed, I doubt I have over or much over 100 snapshots though (but I
> > can't check right now).
> > Sadly I'm not allowed to mount even read only while check is running:
> > gargamel:~# mount -o ro /dev/mapper/dshelf2 /mnt/mnt2
> > mount: /dev/mapper/dshelf2 already mounted or /mnt/mnt2 busy

Ok, so I just checked now, 270 snapshots, but not because I'm crazy,
because I use btrfs send a lot :)

> This looks like super block corruption?
> 
> What about "btrfs inspect dump-super -fFa /dev/mapper/dshelf2"?

Sure, there you go: https://pastebin.com/uF1pHTsg

> And what about "skip_balance" mount option?
 
I have this in my fstab :)

> Another problem is, with so many snapshots, balance is also hugely
> slowed, thus I'm not 100% sure if it's really a hang.

I sent another thread about this last week, balance got hung after 2
days of doing nothing and just moving a single chunk.

Ok, I was able to remount the filesystem read only. I was wrong, I have
270 snapshots:
gargamel:/mnt/mnt# btrfs subvolume list . | grep -c 'path backup/'
74
gargamel:/mnt/mnt# btrfs subvolume list . | grep -c 'path backup-btrfssend/'
196

It's a backup server, I use btrfs send for many machines and for each btrs
send, I keep history, maybe 10 or so backups. So it adds up in the end.

Is btrfs unable to deal with this well enough?

> If for that usage, btrfs-restore would fit your use case more,
> Unfortunately it needs extra disk space and isn't good at restoring
> subvolume/snapshots.
> (Although it's much faster than repairing the possible corrupted extent
> tree)

It's a backup server, it only contains data from other machines.
If the filesystem cannot be recovered to a working state, I will need
over a week to restart the many btrfs send commands from many servers.
This is why anything other than --repair is useless ot me, I don't need
the data back, it's still on the original machines, I need the
filesystem to work again so that I don't waste a week recreating the
many btrfs send/receive relationships.

> > Is that possible at all?
> 
> At least for file recovery (fs tree repair), we have such behavior.
> 
> However, the problem you hit (and a lot of users hit) is all about
> extent tree repair, which doesn't even goes to file recovery.
> 
> All the hassle are in extent tree, and for extent tree, it's just good
> or bad. Any corruption in extent tree may lead to later bugs.
> The only way to avoid extent tree problems is to mount the fs RO.
> 
> So, I'm afraid it is at least impossible for recent years.

Understood, thanks for answering.

Does the pastebin help and is 270 snapshots ok enough?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 12:09:54PM +0500, Roman Mamedov wrote:
> On Thu, 28 Jun 2018 23:59:03 -0700
> Marc MERLIN  wrote:
> 
> > I don't waste a week recreating the many btrfs send/receive relationships.
> 
> Consider not using send/receive, and switching to regular rsync instead.
> Send/receive is very limiting and cumbersome, including because of what you
> described. And it doesn't gain you much over an incremental rsync. As for

Err, sorry but I cannot agree with you here, at all :)

btrfs send/receive is pretty much the only reason I use btrfs. 
rsync takes hours on big filesystems scanning every single inode on both
sides and then seeing what changed, and only then sends the differences
It's super inefficient.
btrfs send knows in seconds what needs to be sent, and works on it right
away.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 03:20:42PM +0800, Qu Wenruo wrote:
> If certain btrfs specific operations are involved, it's definitely not OK:
> 1) Balance
> 2) Quota
> 3) Btrfs check

Ok, I understand. I'll try to balance almost never then. My problems did
indeed start because I ran balance and it got stuck 2 days with 0
progress.
That still seems like a bug though. I'm ok with slow, but stuck for 2
days with only 270 snapshots or so means there is a bug, or the
algorithm is so expensive that 270 snapshots could cause it to take days
or weeks to proceed?

> > It's a backup server, it only contains data from other machines.
> > If the filesystem cannot be recovered to a working state, I will need
> > over a week to restart the many btrfs send commands from many servers.
> > This is why anything other than --repair is useless ot me, I don't need
> > the data back, it's still on the original machines, I need the
> > filesystem to work again so that I don't waste a week recreating the
> > many btrfs send/receive relationships.
> 
> Now totally understand why you need to repair the fs.

I also understand that my use case is atypical :)
But I guess this also means that using btrfs for a lot of send/receive
on a backup server is not going to work well unfortunately :-/

Now I'm wondering if I'm the only person even doing this.

> > Does the pastebin help and is 270 snapshots ok enough?
> 
> The super dump doesn't show anything wrong.
> 
> So the problem may be in the super large extent tree.
> 
> In this case, plain check result with Su's patch would help more, other
> than the not so interesting super dump.

First I tried to mount with skip balance after the partial repair, and
it hung a long time:
[445635.716318] BTRFS info (device dm-2): disk space caching is enabled
[445635.736229] BTRFS info (device dm-2): has skinny extents
[445636.101999] BTRFS info (device dm-2): bdev /dev/mapper/dshelf2 errs: wr 0, 
rd 0, flush 0, corrupt 2, gen 0
[445825.053205] BTRFS info (device dm-2): enabling ssd optimizations
[446511.006588] BTRFS info (device dm-2): disk space caching is enabled
[446511.026737] BTRFS info (device dm-2): has skinny extents
[446511.325470] BTRFS info (device dm-2): bdev /dev/mapper/dshelf2 errs: wr 0, 
rd 0, flush 0, corrupt 2, gen 0
[446699.593501] BTRFS info (device dm-2): enabling ssd optimizations
[446964.077045] INFO: task btrfs-transacti:9211 blocked for more than 120 
seconds.
[446964.099802]   Not tainted 4.17.2-amd64-preempt-sysrq-20180818 #3
[446964.120004] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

So, I rebooted, and will now run Su's btrfs check without repair and
report back.

Thanks both for your help.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send/receive vs rsync

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 10:04:02AM +0200, Lionel Bouton wrote:
> Hi,
> 
> On 29/06/2018 09:22, Marc MERLIN wrote:
> > On Fri, Jun 29, 2018 at 12:09:54PM +0500, Roman Mamedov wrote:
> >> On Thu, 28 Jun 2018 23:59:03 -0700
> >> Marc MERLIN  wrote:
> >>
> >>> I don't waste a week recreating the many btrfs send/receive relationships.
> >> Consider not using send/receive, and switching to regular rsync instead.
> >> Send/receive is very limiting and cumbersome, including because of what you
> >> described. And it doesn't gain you much over an incremental rsync. As for
> > Err, sorry but I cannot agree with you here, at all :)
> >
> > btrfs send/receive is pretty much the only reason I use btrfs. 
> > rsync takes hours on big filesystems scanning every single inode on both
> > sides and then seeing what changed, and only then sends the differences
> > It's super inefficient.
> > btrfs send knows in seconds what needs to be sent, and works on it right
> > away.
> 
> I've not yet tried send/receive but I feel the pain of rsyncing millions
> of files (I had to use lsyncd to limit the problem to the time the
> origin servers reboot which is a relatively rare event) so this thread
> picked my attention. Looking at the whole thread I wonder if you could
> get a more manageable solution by splitting the filesystem.

So, let's be clear. I did backups with rsync for 10+ years. It was slow
and painful. On my laptop an hourly rsync between 2 drives slowed down
my machine to a crawl while everything was being stat'ed, it took
forever.
Now with btrfs send/receive, it just works, I don't even see it
happening in the background.

Here is a page I wrote about it in 2014:
http://marc.merlins.org/perso/btrfs/2014-03.html#Btrfs-Tips_-Doing-Fast-Incremental-Backups-With-Btrfs-Send-and-Receive

Here is a talk I gave in 2014 too, scroll to the bottom of the page, and
the bottom of the talk outline:
http://marc.merlins.org/perso/btrfs/2014-05.html#My-Btrfs-Talk-at-Linuxcon-JP-2014
and click on 'Btrfs send/receive'

> If instead of using a single BTRFS filesystem you used LVM volumes
> (maybe with Thin provisioning and monitoring of the volume group free
> space) for each of your servers to backup with one BTRFS filesystem per
> volume you would have less snapshots per filesystem and isolate problems
> in case of corruption. If you eventually decide to start from scratch
> again this might help a lot in your case.

So, I already have problems due to too many block layers:
- raid 5 + ssd
- bcache
- dmcrypt
- btrfs

I get occasional deadlocks due to upper layers sending more data to the
lower layer (bcache) than it can process. I'm a bit warry of adding yet
another layer (LVM), but you're otherwise correct than keeping smaller
btrfs filesystems would help with performance and containing possible
damage.

Has anyone actually done this? :)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 12:28:31AM -0700, Marc MERLIN wrote:
> So, I rebooted, and will now run Su's btrfs check without repair and
> report back.

As expected, it will likely still take days, here's the start:

gargamel:~# btrfs check --mode=lowmem  -p /dev/mapper/dshelf2  
Checking filesystem on /dev/mapper/dshelf2 
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d 
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
owner: 374857, offset: 3407872) wanted: 2, have: 4
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
owner: 374857, offset: 3407872) wanted: 2, have: 4
ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
owner: 374857, offset: 114540544) wanted: 180, have: 240
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
owner: 374857, offset: 126754816) wanted: 67, have: 115
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
owner: 374857, offset: 126754816) wanted: 67, have: 115
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
owner: 374857, offset: 131866624) wanted: 114, have: 143
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
owner: 374857, offset: 131866624) wanted: 114, have: 143
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
owner: 374857, offset: 148234240) wanted: 301, have: 431
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
owner: 374857, offset: 148234240) wanted: 355, have: 433
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
owner: 374857, offset: 180371456) wanted: 160, have: 240
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
owner: 374857, offset: 180371456) wanted: 161, have: 240
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
owner: 374857, offset: 192200704) wanted: 169, have: 249
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
owner: 374857, offset: 192200704) wanted: 171, have: 251
ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, 
owner: 374857, offset: 217653248) wanted: 347, have: 418
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, 
owner: 374857, offset: 235175936) wanted: 1, have: 1449
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
owner: 374857, offset: 235175936) wanted: 1, have: 1452

Mmmh, these look similar (but not identical) to the last run earlier in this 
thread:
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Created new chunk [18457780224000 1073741824]
Delete backref in extent [84302495744 69632]
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Delete backref in extent [84302495744 69632]
ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
owner: 374857, offset: 114540544) wanted: 181, have: 240
Delete backref in extent [125712527360 12214272]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
owner: 374857, offset: 148234240) wanted: 302, have: 431
Delete backref in extent [129952120832 20242432]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
owner: 374857, offset: 148234240) wanted: 356, have: 433
Delete backref in extent [129952120832 20242432]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
owner: 374857, offset: 180371456) wanted: 161, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
owner: 374857, offset: 180371456) wanted: 162, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
owner: 374857, offset: 192200704) wanted: 170, have: 249
Delete backref in extent [147895111680 12345344]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
owner: 374857, offset: 192200704) wanted: 172, have: 251
Delete b

Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
Well, there goes that. After about 18H:
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
owner: 374857, offset: 235175936) wanted: 1, have: 1452 
backref.c:466: __add_missing_keys: Assertion `ref->root_id` failed, value 0 
btrfs(+0x3a232)[0x56091704f232] 
btrfs(+0x3ab46)[0x56091704fb46] 
btrfs(+0x3b9f5)[0x5609170509f5] 
btrfs(btrfs_find_all_roots+0x9)[0x560917050a45] 
btrfs(+0x572ff)[0x56091706c2ff] 
btrfs(+0x60b13)[0x560917075b13] 
btrfs(cmd_check+0x2634)[0x56091707d431] 
btrfs(main+0x88)[0x560917027260] 
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f93aa508561] 
btrfs(_start+0x2a)[0x560917026dfa] 
Aborted 

That's https://github.com/Damenly/btrfs-progs.git

Whoops, I didn't use the tmp1 branch, let me try again with that and
report back, although the problem above is still going to be there since
I think the only difference will be this, correct?
https://github.com/Damenly/btrfs-progs/commit/b5851513a12237b3e19a3e71f3ad00b966d25b3a

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-30 Thread Marc MERLIN
On Sat, Jun 30, 2018 at 10:49:07PM +0800, Qu Wenruo wrote:
> But the last abort looks pretty possible to be the culprit.
> 
> Would you try to dump the extent tree?
> # btrfs inspect dump-tree -t extent  | grep -A50 156909494272

Sure, there you go:

item 25 key (156909494272 EXTENT_ITEM 55320576) itemoff 14943 itemsize 
24
refs 19715 gen 31575 flags DATA
item 26 key (156909494272 EXTENT_DATA_REF 571620086735451015) itemoff 
14915 itemsize 28
extent data backref root 21641 objectid 374857 offset 235175936 
count 1452
item 27 key (156909494272 EXTENT_DATA_REF 1765833482087969671) itemoff 
14887 itemsize 28
extent data backref root 23094 objectid 374857 offset 235175936 
count 1442
item 28 key (156909494272 EXTENT_DATA_REF 1807626434455810951) itemoff 
14859 itemsize 28
extent data backref root 21503 objectid 374857 offset 235175936 
count 1454
item 29 key (156909494272 EXTENT_DATA_REF 1879818091602916231) itemoff 
14831 itemsize 28
extent data backref root 21462 objectid 374857 offset 235175936 
count 1454
item 30 key (156909494272 EXTENT_DATA_REF 3610854505775117191) itemoff 
14803 itemsize 28
extent data backref root 23134 objectid 374857 offset 235175936 
count 1442
item 31 key (156909494272 EXTENT_DATA_REF 3754675454231458695) itemoff 
14775 itemsize 28
extent data backref root 23052 objectid 374857 offset 235175936 
count 1442
item 32 key (156909494272 EXTENT_DATA_REF 5060494667839714183) itemoff 
14747 itemsize 28
extent data backref root 23174 objectid 374857 offset 235175936 
count 1440
item 33 key (156909494272 EXTENT_DATA_REF 5476627808561673095) itemoff 
14719 itemsize 28
extent data backref root 22911 objectid 374857 offset 235175936 
count 1
item 34 key (156909494272 EXTENT_DATA_REF 6378484416458011527) itemoff 
14691 itemsize 28
extent data backref root 23012 objectid 374857 offset 235175936 
count 1442
item 35 key (156909494272 EXTENT_DATA_REF 7338474132555182983) itemoff 
14663 itemsize 28
extent data backref root 21872 objectid 374857 offset 235175936 
count 1
item 36 key (156909494272 EXTENT_DATA_REF 7516565391717970823) itemoff 
14635 itemsize 28
extent data backref root 21826 objectid 374857 offset 235175936 
count 1452
item 37 key (156909494272 SHARED_DATA_REF 14871537025024) itemoff 14631 
itemsize 4
shared data backref count 10
item 38 key (156909494272 SHARED_DATA_REF 14871617568768) itemoff 14627 
itemsize 4
shared data backref count 73
item 39 key (156909494272 SHARED_DATA_REF 14871619846144) itemoff 14623 
itemsize 4
shared data backref count 59
item 40 key (156909494272 SHARED_DATA_REF 14871623270400) itemoff 14619 
itemsize 4
shared data backref count 68
item 41 key (156909494272 SHARED_DATA_REF 14871623532544) itemoff 14615 
itemsize 4
shared data backref count 70
item 42 key (156909494272 SHARED_DATA_REF 14871626383360) itemoff 14611 
itemsize 4
shared data backref count 76
item 43 key (156909494272 SHARED_DATA_REF 14871635132416) itemoff 14607 
itemsize 4
shared data backref count 60
item 44 key (156909494272 SHARED_DATA_REF 14871649533952) itemoff 14603 
itemsize 4
shared data backref count 79
item 45 key (156909494272 SHARED_DATA_REF 14871862378496) itemoff 14599 
itemsize 4
shared data backref count 70
item 46 key (156909494272 SHARED_DATA_REF 14909667098624) itemoff 14595 
itemsize 4
shared data backref count 72
item 47 key (156909494272 SHARED_DATA_REF 14909669720064) itemoff 14591 
itemsize 4
shared data backref count 58
item 48 key (156909494272 SHARED_DATA_REF 14909734567936) itemoff 14587 
itemsize 4
shared data backref count 73
item 49 key (156909494272 SHARED_DATA_REF 14909920477184) itemoff 14583 
itemsize 4
shared data backref count 79
item 50 key (156909494272 SHARED_DATA_REF 14942279335936) itemoff 14579 
itemsize 4
shared data backref count 79
item 51 key (156909494272 SHARED_DATA_REF 14942304862208) itemoff 14575 
itemsize 4
shared data backref count 72
item 52 key (156909494272 SHARED_DATA_REF 14942348378112) itemoff 14571 
itemsize 4
shared data backref count 67
item 53 key (156909494272 SHARED_DATA_REF 14942366138368) itemoff 14567 
itemsize 4
shared data backref count 51
item 54 key (156909494272 SHARED_DATA_REF 14942384799744) itemoff 14563 
itemsize 4
shared data backref count 64
item 55 key (156909494272 SHARED_DATA_REF 14978234613760) it

Re: Incremental send/receive broken after snapshot restore

2018-06-30 Thread Marc MERLIN
Sorry that I missed the beginning of this discussion, but I think this is
what I documented here after hitting hte same problem:
http://marc.merlins.org/perso/btrfs/post_2018-03-09_Btrfs-Tips_-Rescuing-A-Btrfs-Send-Receive-Relationship.html

Marc

On Sun, Jul 01, 2018 at 01:03:37AM +0200, Hannes Schweizer wrote:
> On Sat, Jun 30, 2018 at 10:02 PM Andrei Borzenkov  wrote:
> >
> > 30.06.2018 21:49, Andrei Borzenkov пишет:
> > > 30.06.2018 20:49, Hannes Schweizer пишет:
> > ...
> > >>
> > >> I've tested a few restore methods beforehand, and simply creating a
> > >> writeable clone from the restored snapshot does not work for me, eg:
> > >> # create some source snapshots
> > >> btrfs sub create test_root
> > >> btrfs sub snap -r test_root test_snap1
> > >> btrfs sub snap -r test_root test_snap2
> > >>
> > >> # send a full and incremental backup to external disk
> > >> btrfs send test_snap2 | btrfs receive /run/media/schweizer/external
> > >> btrfs sub snap -r test_root test_snap3
> > >> btrfs send -c test_snap2 test_snap3 | btrfs receive
> > >> /run/media/schweizer/external
> > >>
> > >> # simulate disappearing source
> > >> btrfs sub del test_*
> > >>
> > >> # restore full snapshot from external disk
> > >> btrfs send /run/media/schweizer/external/test_snap3 | btrfs receive .
> > >>
> > >> # create writeable clone
> > >> btrfs sub snap test_snap3 test_root
> > >>
> > >> # try to continue with backup scheme from source to external
> > >> btrfs sub snap -r test_root test_snap4
> > >>
> > >> # this fails!!
> > >> btrfs send -c test_snap3 test_snap4 | btrfs receive
> > >> /run/media/schweizer/external
> > >> At subvol test_snap4
> > >> ERROR: parent determination failed for 2047
> > >> ERROR: empty stream is not considered valid
> > >>
> > >
> > > Yes, that's expected. Incremental stream always needs valid parent -
> > > this will be cloned on destination and incremental changes applied to
> > > it. "-c" option is just additional sugar on top of it which might reduce
> > > size of stream, but in this case (i.e. without "-p") it also attempts to
> > > guess parent subvolume for test_snap4 and this fails because test_snap3
> > > and test_snap4 do not have common parent so test_snap3 is rejected as
> > > valid parent snapshot. You can restart incremental-forever chain by
> > > using explicit "-p" instead:
> > >
> > > btrfs send -p test_snap3 test_snap4
> > >
> > > Subsequent snapshots (test_snap5 etc) will all have common parent with
> > > immediate predecessor again so "-c" will work.
> > >
> > > Note that technically "btrfs send" with single "-c" option is entirely
> > > equivalent to "btrfs -p". Using "-p" would have avoided this issue. :)
> > > Although this implicit check for common parent may be considered a good
> > > thing in this case.
> > >
> > > P.S. looking at the above, it probably needs to be in manual page for
> > > btrfs-send. It took me quite some time to actually understand the
> > > meaning of "-p" and "-c" and behavior if they are present.
> > >
> > ...
> > >>
> > >> Is there some way to reset the received_uuid of the following snapshot
> > >> on online?
> > >> ID 258 gen 13742 top level 5 parent_uuid -
> > >>received_uuid 6c683d90-44f2-ad48-bb84-e9f241800179 uuid
> > >> 46db1185-3c3e-194e-8d19-7456e532b2f3 path diablo
> > >>
> > >
> > > There is no "official" tool but this question came up quite often.
> > > Search this list, I believe recently one-liner using python-btrfs was
> > > posted. Note that also patch that removes received_uuid when "ro"
> > > propery is removed was suggested, hopefully it will be merged at some
> > > point. Still I personally consider ability to flip read-only property
> > > the very bad thing that should have never been exposed in the first place.
> > >
> >
> > Note that if you remove received_uuid (explicitly or - in the future -
> > implicitly) you will not be able to restart incremental send anymore.
> > Without received_uuid there will be no way to match source test_snap3
> > with destination test_snap3. So you *must* preserve it and start with
> > writable clone.
> >
> > received_uuid is misnomer. I wish it would be named "content_uuid" or
> > "snap_uuid" with semantic
> >
> > 1. When read-only snapshot of writable volume is created, content_uuid
> > is initialized
> >
> > 2. Read-only snapshot of read-only snapshot inherits content_uuid
> >
> > 3. destination of "btrfs send" inherits content_uuid
> >
> > 4. writable snapshot of read-only snapshot clears content_uuid
> >
> > 5. clearing read-only property clears content_uuid
> >
> > This would make it more straightforward to cascade and restart
> > replication by having single subvolume property to match against.
> 
> Indeed, the current terminology is a bit confusing, and the patch
> removing the received_uuid when manually switching ro to false should
> definitely be merged. As recommended, I'll simply create a writeable
> clone of the restored snapshot and use -p instead of -c when restoring
> again (

btrfs check of a raid0?

2018-07-01 Thread Marc MERLIN
Howdy,

I have a btrfs filesystem made out of 2 devices:
[   75.141414] BTRFS: device label btrfs_space devid 1 transid 429220 
/dev/bcache3
[   75.164745] BTRFS: device label btrfs_space devid 2 transid 429220 
/dev/bcache2

One of the 2 devices had a hardware error (not btrfs' fault):
[201504.939659] BTRFS error (device bcache3): bdev /dev/bcache2 errs: wr 552, 
rd 39, flush 1, corrupt 0, gen 0
[201504.995967] BTRFS warning (device bcache3): bcache3 checksum verify failed 
on 38976 wanted F3019EEA found E6A97DC4 level 0
[201505.032209] BTRFS error (device bcache3): bdev /dev/bcache2 errs: wr 552, 
rd 40, flush 1, corrupt 0, gen 0
[201505.062447] BTRFS error (device bcache3): parent transid verify failed on 
38976 wanted 434763 found 434245
[201600.262142] BTRFS error (device bcache3): bdev /dev/bcache2 errs: wr 552, 
rd 41, flush 1, corrupt 0, gen 0

I unmounted it, and I'm trying to check the filesystem now.

How is it supposed to work when you have multiple devices for a btrfs
filesystem?

gargamel:~# btrfs check --repair -p /dev/bcache2 
enabling repair mode
ERROR: mount check: cannot open /dev/bcache2: No such device or address
ERROR: could not check mount status: No such device or address
gargamel:~# btrfs check --repair -p /dev/bcache3
enabling repair mode
ERROR: cannot open device '/dev/bcache3': Device or resource busy
ERROR: cannot open file system

[205248.299528] BTRFS info (device bcache3): disk space caching is enabled
[205248.320335] BTRFS error (device bcache3): Remounting read-write after error 
is not allowed

Yes, rebooting should likely get around the problem, but I'd rather not
reboot, I have long running stuff I would rather not stop.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check of a raid0?

2018-07-01 Thread Marc MERLIN
On Sun, Jul 01, 2018 at 01:15:09PM -0600, Chris Murphy wrote:
> > How is it supposed to work when you have multiple devices for a btrfs
> > filesystem?
> >
> > gargamel:~# btrfs check --repair -p /dev/bcache2
> > enabling repair mode
> > ERROR: mount check: cannot open /dev/bcache2: No such device or address
> > ERROR: could not check mount status: No such device or address
> > gargamel:~# btrfs check --repair -p /dev/bcache3
> > enabling repair mode
> > ERROR: cannot open device '/dev/bcache3': Device or resource busy
> > ERROR: cannot open file system
> >
> > [205248.299528] BTRFS info (device bcache3): disk space caching is enabled
> > [205248.320335] BTRFS error (device bcache3): Remounting read-write after 
> > error is not allowed
> 
> If it's successfully unmounted, I don't understand the error messages
> that it can't be opened. Is umount hung? Sounds to me like btrfs check
> thinks it's still mounted.

I spent more time on this and apparently because the underlying device
had a hardware fault (fell off the bus), its dmcrpyt device is still
there but not working.
In turn, I can't dmsetup rm it because it's in use by bcache which
didn't free it, but bcache won't let me free it because it got removed.
So, I'm stuck with a reboot in the end, oh well...

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-01 Thread Marc MERLIN
On Thu, Jun 28, 2018 at 11:43:54PM -0700, Marc MERLIN wrote:
> On Fri, Jun 29, 2018 at 02:32:44PM +0800, Su Yue wrote:
> > > > https://github.com/Damenly/btrfs-progs/tree/tmp1
> > > 
> > > Not sure if I undertand that you meant, here.
> > > 
> > Sorry for my unclear words.
> > Simply speaking, I suggest you to stop current running check.
> > Then, clone above branch to compile binary then run
> > 'btrfs check --mode=lowmem $dev'.
>  
> I understand, I'll build and try it.
> 
> > > This filesystem is trash to me and will require over a week to rebuild
> > > manually if I can't repair it.
> > 
> > Understood your anxiety, a log of check without '--repair' will help
> > us to make clear what's wrong with your filesystem.
> 
> Ok, I'll run your new code without repair and report back. It will
> likely take over a day though.

Well, it got stuck for over a day, and then I had to reboot :(

saruman:/var/local/src/btrfs-progs.sy# git remote -v
origin  https://github.com/Damenly/btrfs-progs.git (fetch)
origin  https://github.com/Damenly/btrfs-progs.git (push)
saruman:/var/local/src/btrfs-progs.sy# git branch
  master
* tmp1
saruman:/var/local/src/btrfs-progs.sy# git pull
Already up to date.
saruman:/var/local/src/btrfs-progs.sy# make
Making all in Documentation
make[1]: Nothing to be done for 'all'.

However, it still got stuck here:
gargamel:~# btrfs check --mode=lowmem  -p /dev/mapper/dshelf2   
Checking filesystem on /dev/mapper/dshelf2  
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d  
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
owner: 374857, offset: 3407872) wanted: 2
have: 3  
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
owner: 374857, offset: 3407872) wanted: 2
have: 4  
ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
owner: 374857, offset: 114540544) wan
d: 180, have: 181  
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
owner: 374857, offset: 126754816) want
: 67, have: 68  
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
owner: 374857, offset: 126754816) want
: 67, have: 115  
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
owner: 374857, offset: 131866624) want
: 114, have: 115  
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
owner: 374857, offset: 131866624) want
: 114, have: 143  
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
owner: 374857, offset: 148234240) wan
d: 301, have: 302  
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
owner: 374857, offset: 148234240) wan
d: 355, have: 433  
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
owner: 374857, offset: 180371456) wan
d: 160, have: 161  
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
owner: 374857, offset: 180371456) wan
d: 161, have: 240  
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
owner: 374857, offset: 192200704) wan
d: 169, have: 170  
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
owner: 374857, offset: 192200704) wan
d: 171, have: 251  
ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, 
owner: 374857, offset: 217653248) wan
d: 347, have: 348  
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, 
owner: 374857, offset: 235175936) wan
d: 1, have: 1449  
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
owner: 374857, offset: 235175936) wan
d: 1, have: 556  

What should I try next?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-01 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 10:02:33AM +0800, Su Yue wrote:
> Could you try follow dumps? They shouldn't cost much time.
> 
> #btrfs inspect dump-tree -t 21872  | grep -C 50 "374857 
> EXTENT_DATA "
> 
> #btrfs inspect dump-tree -t 22911  | grep -C 50 "374857 
> EXTENT_DATA "

Ok, that's 29MB, so it doesn't fit on pastebin:
http://marc.merlins.org/tmp/dshelf2_inspect.txt

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 02:22:20PM +0800, Su Yue wrote:
> > Ok, that's 29MB, so it doesn't fit on pastebin:
> > http://marc.merlins.org/tmp/dshelf2_inspect.txt
> > 
> Sorry Marc. After offline communication with Qu, both
> of us think the filesystem is hard to repair.
> The filesystem is too large to debug step by step.
> Every time check and debug spent is too expensive.
> And it already costs serveral days.
> 
> Sadly, I am afarid that you have to recreate filesystem
> and reback up your data. :(
> 
> Sorry again and thanks for you reports and patient.

I appreciate your help. Honestly I only wanted to help you find why the
tools aren't working. Fixing filesystems by hand (and remotely via Email
on top of that), is way too time consuming like you said.

Is the btrfs design flawed in a way that repair tools just cannot repair
on their own? 
I understand that data can be lost, but I don't understand how the tools
just either keep crashing for me, go in infinite loops, or otherwise
fail to give me back a stable filesystem, even if some data is missing
after that.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to best segment a big block device in resizeable btrfs filesystems?

2018-07-02 Thread Marc MERLIN
Hi Qu,

I'll split this part into a new thread:

> 2) Don't keep unrelated snapshots in one btrfs.
>I totally understand that maintain different btrfs would hugely add
>maintenance pressure, but as explains, all snapshots share one
>fragile extent tree.

Yes, I understand that this is what I should do given what you
explained.
My main problem is knowing how to segment things so I don't end up with
filesystems that are full while others are almost empty :)

Am I supposed to put LVM thin volumes underneath so that I can share
the same single 10TB raid5?

If I do this, I would have
software raid 5 < dmcrypt < bcache < lvm < btrfs
That's a lot of layers, and that's also starting to make me nervous :)

Is there any other way that does not involve me creating smaller block
devices for multiple btrfs filesystems and hope that they are the right
size because I won't be able to change it later?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Marc MERLIN
Hi Qu,

thanks for the detailled and honest answer.
A few comments inline.

On Mon, Jul 02, 2018 at 10:42:40PM +0800, Qu Wenruo wrote:
> For full, it depends. (but for most real world case, it's still flawed)
> We have small and crafted images as test cases, which btrfs check can
> repair without problem at all.
> But such images are *SMALL*, and only have *ONE* type of corruption,
> which can represent real world case at all.
 
right, they're just unittest images, I understand.

> 1) Too large fs (especially too many snapshots)
>The use case (too many snapshots and shared extents, a lot of extents
>get shared over 1000 times) is in fact a super large challenge for
>lowmem mode check/repair.
>It needs O(n^2) or even O(n^3) to check each backref, which hugely
>slow the progress and make us hard to locate the real bug.
 
So, the non lowmem version would work better, but it's a problem if it
doesn't fit in RAM.
I've always considered it a grave bug that btrfs check repair can use so
much kernel memory that it will crash the entire system. This should not
be possible.
While it won't help me here, can btrfs check be improved not to suck all
the kernel memory, and ideally even allow using swap space if the RAM is
not enough?

Is btrfs check regular mode still being maintained? I think it's still
better than lowmem, correct?

> 2) Corruption in extent tree and our objective is to mount RW
>Extent tree is almost useless if we just want to read data.
>But when we do any write, we needs it and if it goes wrong even a
>tiny bit, your fs could be damaged really badly.
> 
>For other corruption, like some fs tree corruption, we could do
>something to discard some corrupted files, but if it's extent tree,
>we either mount RO and grab anything we have, or hopes the
>almost-never-working --init-extent-tree can work (that's mostly
>miracle).
 
I understand that it's the weak point of btrfs, thanks for explaining.

> 1) Don't keep too many snapshots.
>Really, this is the core.
>For send/receive backup, IIRC it only needs the parent subvolume
>exists, there is no need to keep the whole history of all those
>snapshots.

You are correct on history. The reason I keep history is because I may
want to recover a file from last week or 2 weeks ago after I finally
notice that it's gone. 
I have terabytes of space on the backup server, so it's easier to keep
history there than on the client which may not have enough space to keep
a month's worth of history.
As you know, back when we did tape backups, we also kept history of at
least several weeks (usually several months, but that's too much for
btrfs snapshots).

>Keep the number of snapshots to minimal does greatly improve the
>possibility (both manual patch or check repair) of a successful
>repair.
>Normally I would suggest 4 hourly snapshots, 7 daily snapshots, 12
>monthly snapshots.

I actually have fewer snapshots than this per filesystem, but I backup
more than 10 filesystems.
If I used as many snapshots as you recommend, that would already be 230
snapshots for 10 filesystems :)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to best segment a big block device in resizeable btrfs filesystems?

2018-07-02 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 12:59:02PM -0400, Austin S. Hemmelgarn wrote:
> > Am I supposed to put LVM thin volumes underneath so that I can share
> > the same single 10TB raid5?
>
> Actually, because of the online resize ability in BTRFS, you don't
> technically _need_ to use thin provisioning here.  It makes the maintenance
> a bit easier, but it also adds a much more complicated layer of indirection
> than just doing regular volumes.

You're right that I can use btrfs resize, but then I still need an LVM
device underneath, correct?
So, if I have 10 backup targets, I need 10 LVM LVs, I give them 10%
each of the full size available (as a guess), and then I'd have to 
- btrfs resize down one that's bigger than I need
- LVM shrink the LV
- LVM grow the other LV
- LVM resize up the other btrfs

and I think LVM resize and btrfs resize are not linked so I have to do
them separately and hope to type the right numbers each time, correct?
(or is that easier now?)

I kind of linked the thin provisioning idea because it's hands off,
which is appealing. Any reason against it?

> You could (in theory) merge the LVM and software RAID5 layers, though that
> may make handling of the RAID5 layer a bit complicated if you choose to use
> thin provisioning (for some reason, LVM is unable to do on-line checks and
> rebuilds of RAID arrays that are acting as thin pool data or metadata).
 
Does LVM do built in raid5 now? Is it as good/trustworthy as mdadm
radi5?
But yeah, if it's incompatible with thin provisioning, it's not that
useful.

> Alternatively, you could increase your array size, remove the software RAID
> layer, and switch to using BTRFS in raid10 mode so that you could eliminate
> one of the layers, though that would probably reduce the effectiveness of
> bcache (you might want to get a bigger cache device if you do this).

Sadly that won't work. I have more data than will fit on raid10

Thanks for your suggestions though.
Still need to read up on whether I should do thin provisioning, or not.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 10:33:09PM +0500, Roman Mamedov wrote:
> On Mon, 2 Jul 2018 08:19:03 -0700
> Marc MERLIN  wrote:
> 
> > I actually have fewer snapshots than this per filesystem, but I backup
> > more than 10 filesystems.
> > If I used as many snapshots as you recommend, that would already be 230
> > snapshots for 10 filesystems :)
> 
> (...once again me with my rsync :)
> 
> If you didn't use send/receive, you wouldn't be required to keep a separate
> snapshot trail per filesystem backed up, one trail of snapshots for the entire
> backup server would be enough. Rsync everything to subdirs within one
> subvolume, then do timed or event-based snapshots of it. You only need more
> than one trail if you want different retention policies for different datasets
> (e.g. in my case I have 91 and 31 days).

This is exactly how I used to do backups before btrfs.
I did 

cp -al backup.olddate backup.newdate
rsync -avSH src/ backup.newdate/

You don't even need snapshots or btrfs anymore.
Also, sorry to say, but I have different data retention needs for
different backups. Some need to rotate more quickly than others, but if
you're using rsync, the method I gave above works fine at any rotation
interval you need.

It is almost as efficient as btrfs on space, but as I said, the time
penalty on all those stats for many files was what killed it for me.
If I go back to rsync backups (and I'm really unlikely to), then I'd
also go back to ext4. There would be no point in dealing with the
complexity and fragility of btrfs anymore.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to best segment a big block device in resizeable btrfs filesystems?

2018-07-02 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 02:35:19PM -0400, Austin S. Hemmelgarn wrote:
> >I kind of linked the thin provisioning idea because it's hands off,
> >which is appealing. Any reason against it?
> No, not currently, except that it adds a whole lot more stuff between 
> BTRFS and whatever layer is below it.  That increase in what's being 
> done adds some overhead (it's noticeable on 7200 RPM consumer SATA 
> drives, but not on decent consumer SATA SSD's).
> 
> There used to be issues running BTRFS on top of LVM thin targets which 
> had zero mode turned off, but AFAIK, all of those problems were fixed 
> long ago (before 4.0).

I see, thanks for the heads up.

> >Does LVM do built in raid5 now? Is it as good/trustworthy as mdadm
> >radi5?
> Actually, it uses MD's RAID5 implementation as a back-end.  Same for 
> RAID6, and optionally for RAID0, RAID1, and RAID10.
 
Ok, that makes me feel a bit better :)

> >But yeah, if it's incompatible with thin provisioning, it's not that
> >useful.
> It's technically not incompatible, just a bit of a pain.  Last time I 
> tried to use it, you had to jump through hoops to repair a damaged RAID 
> volume that was serving as an underlying volume in a thin pool, and it 
> required keeping the thin pool offline for the entire duration of the 
> rebuild.

Argh, not good :( / thanks for the heads up.

> If you do go with thin provisioning, I would encourage you to make 
> certain to call fstrim on the BTRFS volumes on a semi regular basis so 
> that the thin pool doesn't get filled up with old unused blocks, 

That's a very good point/reminder, thanks for that. I guess it's like
running on an ssd :)

> preferably when you are 100% certain that there are no ongoing writes on 
> them (trimming blocks on BTRFS gets rid of old root trees, so it's a bit 
> dangerous to do it while writes are happening).
 
Argh, that will be harder, but I'll try.

Given what you said, it sounds like I'll still be best off with separate
layers to avoid the rebuild problem you mentioned.
So it'll be
swraid5 / dmcrypt / bcache / lvm dm thin / btrfs

Hopefully that will work well enough.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to best segment a big block device in resizeable btrfs filesystems?

2018-07-02 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 12:51:30AM +, Paul Jones wrote:
> You could combine bcache and lvm if you are happy to use dm-cache instead 
> (which lvm uses).
> I use it myself (but without thin provisioning) and it works well.

Interesting point. So, I used to use lvm and then lvm2 many years ago until
I got tired with its performance, especially as asoon as I took even a
single snapshot.
But that was a long time ago now, just saying that I'm a bit rusty on LVM
itself.

That being said, if I have
raid5
dm-cache
dm-crypt
dm-thin

That's still 4 block layers under btrfs.
Am I any better off using dm-cache instead of bcache, my understanding is
that it only replaces one block layer with another one and one codebase with
another.

Mmmh, a bit of reading shows that dm-cache is now used as lvmcache, which
might change things, or not.
I'll admit that setting up and maintaining bcache is a bit of a pain, I only
used it at the time because it seemed more ready then, but we're a few years
later now.

So, what do you recommend nowadays, assuming you've used both?
(given that it's literally going to take days to recreate my array, I'd
rather do it once and the right way the first time :) )

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to best segment a big block device in resizeable btrfs filesystems?

2018-07-02 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 09:37:47AM +0800, Qu Wenruo wrote:
> > If I do this, I would have
> > software raid 5 < dmcrypt < bcache < lvm < btrfs
> > That's a lot of layers, and that's also starting to make me nervous :)
> 
> If you could keep the number of snapshots to minimal (less than 10) for
> each btrfs (and the number of send source is less than 5), one big btrfs
> may work in that case.
 
Well, we kind of discussed this already. If btrfs falls over if you reach
100 snapshots or so, and it sure seems to in my case, I won't be much better
off.
Having btrfs check --repair fail because 32GB of RAM is not enough, and it's
unable to use swap, is a big deal in my case. You also confirmed that btrfs
check lowmem does not scale to filesystems like mine, so this translates
into "if regular btrfs check repair can't fit in 32GB, I am completely out
of luck if anything happens to the filesystem"

You're correct that I could tweak my backups and snapshot rotation to get
from 250 or so down to 100, but it seems that I'll just be hoping to avoid
the problem by being just under the limit, until I'm not, again, and it'll
be too late to do anything it next time I'm in trouble again, putting me
back right in the same spot I'm in now.
Is all this fair to say, or did I misunderstand?

> BTW, IMHO the bcache is not really helping for backup system, which is
> more write oriented.

That's a good point. So, what I didn't explain is that I still have some old
filesystem that do get backed up with rsync instead of btrfs send (going
into the same filesystem, but not same subvolume).
Because rsync is so painfully slow when it needs to scan both sides before
it'll even start doing any work, bcache helps there.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 06:31:43PM -0600, Chris Murphy wrote:
> So the idea behind journaled file systems is that journal replay
> enabled mount time "repair" that's faster than an fsck. Already Btrfs
> use cases with big, but not huge, file systems makes btrfs check a
> problem. Either running out of memory or it takes too long. So already
> it isn't scaling as well as ext4 or XFS in this regard.
> 
> So what's the future hold? It seems like the goal is that the problems
> must be avoided in the first place rather than to repair them after
> the fact.
> 
> Are the problem's Marc is running into understood well enough that
> there can eventually be a fix, maybe even an on-disk format change,
> that prevents such problems from happening in the first place?
> 
> Or does it make sense for him to be running with btrfs debug or some
> subset of btrfs integrity checking mask to try to catch the problems
> in the act of them happening?

Those are all good questions.
To be fair, I cannot claim that btrfs was at fault for whatever filesystem
damage I ended up with. It's very possible that it happened due to a flaky
Sata card that kicked drives off the bus when it shouldn't have.
Sure in theory a journaling filesystem can recover from unexpected power
loss and drives dropping off at bad times, but I'm going to guess that
btrfs' complexity also means that it has data structures (extent tree?) that
need to be updated completely "or else".

I'm obviously ok with a filesystem check being necessary to recover in cases
like this, afterall I still occasionally have to run e2fsck on ext4 too, but
I'm a lot less thrilled with the btrfs situation where basically the repair
tools can either completely crash your kernel, or take days and then either
get stuck in an infinite loop or hit an algorithm that can't scale if you
have too many hardlinks/snapshots.

It sounds like there may not be a fix to this problem with the filesystem's
design, outside of "do not get there, or else".
It would even be useful for btrfs tools to start computing heuristics and
output warnings like "you have more than 100 snapshots on this filesystem,
this is not recommended, please read http://url/";

Qu, Su, does that sound both reasonable and doable?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to best segment a big block device in resizeable btrfs filesystems?

2018-07-02 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 04:26:37AM +, Paul Jones wrote:
> I don't have any experience with this, but since it's the internet let me 
> tell you how I'd do it anyway 😝

That's the spirit :)

> raid5
> dm-crypt
> lvm (using thin provisioning + cache)
> btrfs
> 
> The cache mode on lvm requires you to set up all your volumes first, then
> add caching to those volumes last. If you need to modify the volume then
> you have to remove the cache, make your changes, then re-add the cache. It
> sounds like a pain, but having the cache separate from the data is quite
> handy.

I'm ok enough with that.

> Given you are running a backup server I don't think the cache would
> really do much unless you enable writeback mode. If you can split up your
> filesystem a bit to the point that btrfs check doesn't OOM that will
> seriously help performance as well. Rsync might be feasible again.

I'm a bit warry of write caching with the issues I've had. I may do
write-through, but not writeback :)

But caching helps indeed for my older filesystems that are still backed up
via rsync because the source fs is ext4 and not btrfs.

Thanks for the suggestions
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-03 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 04:50:48PM +0800, Qu Wenruo wrote:
> > It sounds like there may not be a fix to this problem with the filesystem's
> > design, outside of "do not get there, or else".
> > It would even be useful for btrfs tools to start computing heuristics and
> > output warnings like "you have more than 100 snapshots on this filesystem,
> > this is not recommended, please read http://url/";
> 
> This looks pretty doable, but maybe it's better to add some warning at
> btrfs progs (both "subvolume snapshot" and "receive").

This is what I meant to say, correct.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-03 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 03:34:45PM -0600, Chris Murphy wrote:
> On Tue, Jul 3, 2018 at 2:34 AM, Su Yue  wrote:
> 
> > Yes, extent tree is the hardest part for lowmem mode. I'm quite
> > confident the tool can deal well with file trees(which records metadata
> > about file and directory name, relationships).
> > As for extent tree, I have few confidence due to its complexity.
> 
> I have to ask again if there's some metadata integrity mask opion Marc
> should use to try to catch the corruption cause in the first place?
> 
> His use case really can't afford either mode of btrfs check. And also
> check is only backward looking, it doesn't show what was happening at
> the time. And for big file systems, check rapidly doesn't scale at all
> anyway.
> 
> And now he's modifying his layout to avoid the problem from happening
> again which makes it less likely to catch the cause, and get it fixed.
> I think if he's willing to build a kernel with integrity checker
> enabled, it should be considered but only if it's likely to reveal why
> the problem is happening, even if it can't repair the problem once
> it's happened. He's already in that situation so masked integrity
> checking is no worse, at least it gives a chance to improve Btrfs
> rather than it being a mystery how it got corrupt.

Yeah, I'm fine waiting a few more ays with this down and gather data if
that helps.
But due to the size, a full btrfs image may be a bit larger than we
want, not counting some confidential data in some filenames.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-03 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 03:46:59PM -0600, Chris Murphy wrote:
> On Tue, Jul 3, 2018 at 2:50 AM, Qu Wenruo  wrote:
> >
> >
> > There must be something wrong, however due to the size of the fs, and
> > the complexity of extent tree, I can't tell.
> 
> Right, which is why I'm asking if any of the metadata integrity
> checker mask options might reveal what's going wrong?
> 
> I guess the big issues are:
> a. compile kernel with CONFIG_BTRFS_FS_CHECK_INTEGRITY=y is necessary
> b. it can come with a high resource burden depending on the mask and
> where the log is being written (write system logs to a different file
> system for sure)
> c. the granularity offered in the integrity checker might not be enough.
> d. might take a while before corruptions are injected before
> corruption is noticed and flagged.

Back to where I'm at right now. I'm going to delete this filesystem and
start over very soon. Tomorrow or the day after.
I'm happy to get more data off it if someone wants it for posterity, but
I indeed need to recover soon since being with a dead backup server is
not a good place to be in :)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-09 Thread Marc MERLIN
On Tue, Jul 10, 2018 at 09:34:36AM +0800, Qu Wenruo wrote:
>  Ok, this is where I am now:
>  WARNING: debug: end of checking extent item[18457780273152 169 1]
>  type: 176 offset: 2
>  checking extent items [18457780273152/18457780273152]
>  ERROR: errors found in extent allocation tree or chunk allocation
>  checking fs roots
>  ERROR: root 17592 EXTENT_DATA[25937109 4096] gap exists, expected:
>  EXTENT_DATA[25937109 4033]
> 
> The expected end is not even aligned to sectorsize.
> 
> I think there is something wrong.
> Dump tree on this INODE would definitely help in this case.
> 
> Marc, would you please try dump using the following command?
> 
> # btrfs ins dump-tree -t 17592  | grep -C 40 25937109
 
Sure, there you go:
gargamel:~# btrfs ins dump-tree -t 17592 /dev/mapper/dshelf2  | grep -C 40 
25937109
extent data disk byte 3259370151936 nr 114688
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 144 key (2009526 EXTENT_DATA 1179648) itemoff 7931 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370266624 nr 118784
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 145 key (2009526 EXTENT_DATA 1310720) itemoff 7878 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370385408 nr 118784
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 146 key (2009526 EXTENT_DATA 1441792) itemoff 7825 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370504192 nr 118784
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 147 key (2009526 EXTENT_DATA 1572864) itemoff 7772 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370622976 nr 114688
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 148 key (2009526 EXTENT_DATA 1703936) itemoff 7719 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370737664 nr 118784
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 149 key (2009526 EXTENT_DATA 1835008) itemoff 7666 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370856448 nr 118784
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 150 key (2009526 EXTENT_DATA 1966080) itemoff 7613 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370975232 nr 118784
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 151 key (2009526 EXTENT_DATA 2097152) itemoff 7560 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371094016 nr 114688
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 152 key (2009526 EXTENT_DATA 2228224) itemoff 7507 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371208704 nr 114688
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 153 key (2009526 EXTENT_DATA 2359296) itemoff 7454 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371323392 nr 110592
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 154 key (2009526 EXTENT_DATA 2490368) itemoff 7401 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371433984 nr 114688
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 155 key (2009526 EXTENT_DATA 2621440) itemoff 7348 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371548672 nr 110592
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 156 key (2009526 EXTENT_DATA 2752512) itemoff 7295 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371659264 nr 114688
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 157 key (2009526 EXTENT_DATA 2883584) itemoff 7242 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371773952 nr 106496
extent dat

Re: So, does btrfs check lowmem take days? weeks?

2018-07-09 Thread Marc MERLIN
To fill in for the spectators on the list :)
Su gave me a modified version of btrfsck lowmem that was able to clean
most of my filesystem.
It's not a general case solution since it had some hardcoding specific
to my filesystem problems, but still a great success.
Email quoted below, along with responses to Qu

On Tue, Jul 10, 2018 at 09:09:33AM +0800, Qu Wenruo wrote:
> 
> 
> On 2018年07月10日 01:48, Marc MERLIN wrote:
> > Success!
> > Well done Su, this is a huge improvement to the lowmem code. It went from 
> > days to less than 3 hours.
> 
> Awesome work!
> 
> > I'll paste the logs below.
> > 
> > Questions:
> > 1) I assume I first need to delete a lot of snapshots. What is the limit in 
> > your opinion?
> > 100? 150? other?
> 
> My personal recommendation is just 20. Not 150, not even 100.
 
I see. Then, I may be forced to recreate multiple filesystems anyway.
I have about 25 btrfs send/receive relationships and I have around 10
historical snapshots for each.

In the future, can't we segment extents/snapshots per subvolume, making
subvolumes mini filesystems within the bigger filesystem?

> But snapshot deletion will take time (and it's delayed, you won't know
> if something wrong happened just after "btrfs subv delete") and even
> require a healthy extent tree.
> If all extent tree errors are just false alert, that should not be a big
> problem at all.
> 
> > 
> > 2) my filesystem is somewhat misbalanced. Which balance options do you 
> > think are safe to use?
> 
> I would recommend to manually check extent tree for BLOCK_GROUP_ITEM,
> which will tell how big a block group is and how many space is used.
> And gives you an idea on which block group can be relocated.
> Then use vrange= to specify exact block group to relocation.
> 
> One example would be:
> 
> # btrfs ins dump-tree -t extent  | grep -A1 BLOCK_GROUP_ITEM |\
>   tee block_group_dump
> 
> Then the output contains:
>   item 1 key (13631488 BLOCK_GROUP_ITEM 8388608) itemoff 16206 itemsize 24
>   block group used 262144 chunk_objectid 256 flags DATA
> 
> The "13631488" is the bytenr of the block group.
> The "8388608" is the length of the block group.
> The "262144" is the used bytes of the block group.
> 
> The less used space the higher priority it should be relocated. (and
> faster to relocate).
> You could write a small script to do it, or there should be some tool to
> do the calculation for you.
 
I usually use something simpler:
Label: 'btrfs_boot'  uuid: e4c1daa8-9c39-4a59-b0a9-86297d397f3b
Total devices 1 FS bytes used 30.19GiB
devid1 size 79.93GiB used 78.01GiB path /dev/mapper/cryptroot

This is bad, I have 30GB of data, but 78 out of 80GB of structures full.
This is bad news and recommends a balance, correct?
If so, I always struggle as to what value I should give to dusage and
musage...

> And only relocate one block group each time, to avoid possible problem.
> 
> The last but not the least, it's highly recommend to do the relocation
> only after unused snapshots are completely deleted.
> (Or it would be super super slow to relocate)

Thank you for the advise. Hopefully this hepls someone else too, and
maybe someone can write some reallocate helper tool if I don't have the
time to do it myself.

> > 3) Should I start a scrub now (takes about 1 day) or anything else to
> > check that the filesystem is hopefully not damaged anymore?
> 
> I would normally recommend to use btrfs check, but neither mode really
> works here.
> And scrub only checks csum, doesn't check the internal cross reference
> (like content of extent tree).
> 
> Maybe Su could skip the whole extent tree check and let lowmem to check
> the fs tree only, with --check-data-csum it should be a better work than
>  scrub.

I will wait to hear back from Su, but I think the current situation is
that I still have some problems on my FS, they are just
1) not important enough to block mount rw (now it works again)
2) currently ignored by the modified btrfsck I have, but would cause
problems if I used real btrfsck.

Correct?

> > 
> > 4) should btrfs check reset the corrupt counter?
> > bdev /dev/mapper/dshelf2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
> > for now, should I reset it manually?
> 
> It could be pretty easy to implement if not already implemented.

Seems like it's not given that Su's btrfsck --repair ran to completion
and I still have corrupt set to '2' :)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs check lowmem, take 2

2018-07-10 Thread Marc MERLIN
Thanks to Su and Qu, I was able to get my filesystem to a point that
it's mountable.
I then deleted loads of snapshots and I'm down to 26.

IT now looks like this:
gargamel:~# btrfs fi show /mnt/mnt
Label: 'dshelf2'  uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
Total devices 1 FS bytes used 12.30TiB
devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2

gargamel:~# btrfs fi df /mnt/mnt
Data, single: total=13.57TiB, used=12.19TiB
System, DUP: total=32.00MiB, used=1.55MiB
Metadata, DUP: total=124.50GiB, used=115.62GiB
Metadata, single: total=216.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B


Problems
1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the
server, despite my deleting lots of snapshots.
Is it because I have too many files then?

2) I tried Su's master git branch for btrfs-progs to try and see how a
normal check would go, and I'm stuck on this:
gargamel:/var/local/src/btrfs-progs.sy# time ./btrfsck --mode=lowmem --repair 
/dev/mapper/dshelf2
enabling repair mode
WARNING: low-memory mode repair support is only partial
Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
root 18446744073709551607 has a root item with a more recent gen (143376) 
compared to the found root node (139061)
ERROR: failed to repair root items: Invalid argument

real75m8.046s
user0m14.591s
sys 0m52.431s

I understand what the message means, I just need to switch to the newer root
but honestly I'm not quite sure how to do this from the btrfs-check man page.

This didn't work:
time ./btrfsck --mode=lowmem --repair --chunk-root=18446744073709551607  
/dev/mapper/dshelf2
enabling repair mode
WARNING: low-memory mode repair support is only partial
WARNING: chunk_root_bytenr 18446744073709551607 is unaligned to 4096, ignore it

How do I address the error above?

Thanks
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check lowmem, take 2

2018-07-10 Thread Marc MERLIN
On Wed, Jul 11, 2018 at 08:53:58AM +0800, Su Yue wrote:
> > Problems
> > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the
> > server, despite my deleting lots of snapshots.
> > Is it because I have too many files then?
> > 
> Yes. Original check first gather all infomation about extent tree and
> your files in RAM, then process one by one.
> But deleting still counts, it does speed lowmem check up.

Understood.

> > 2) I tried Su's master git branch for btrfs-progs to try and see how
> Oh..No... My master branch is still 4.14. The true mater branch is
> David's here:
> https://github.com/kdave/btrfs-progs
> But the master branch has a known bug which I fixed yesterday, please see
> the mail.

So, if I git sync it now, it should have your fix, and I can run it,
correct?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check lowmem, take 2

2018-07-10 Thread Marc MERLIN
On Wed, Jul 11, 2018 at 09:08:40AM +0800, Su Yue wrote:
> 
> 
> On 07/11/2018 08:58 AM, Marc MERLIN wrote:
> > On Wed, Jul 11, 2018 at 08:53:58AM +0800, Su Yue wrote:
> > > > Problems
> > > > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the
> > > > server, despite my deleting lots of snapshots.
> > > > Is it because I have too many files then?
> > > > 
> > > Yes. Original check first gather all infomation about extent tree and
> > > your files in RAM, then process one by one.
> > > But deleting still counts, it does speed lowmem check up.
> > 
> > Understood.
> > 
> > > > 2) I tried Su's master git branch for btrfs-progs to try and see how
> > > Oh..No... My master branch is still 4.14. The true mater branch is
> > > David's here:
> > > https://github.com/kdave/btrfs-progs
> > > But the master branch has a known bug which I fixed yesterday, please see
> > > the mail.
> > 
> > So, if I git sync it now, it should have your fix, and I can run it,
> > correct?
> > 
> Yes, please.

Ok, I am now running
gargamel:~# time btrfs check --mode=lowmem --repair /dev/mapper/dshelf2
using git master from https://github.com/kdave/btrfs-progs

I will report back how long it takes with extent tree check and whether
it returns clean, or not.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check lowmem, take 2

2018-07-10 Thread Marc MERLIN
On Wed, Jul 11, 2018 at 09:58:36AM +0800, Su Yue wrote:
> 
> 
> On 07/11/2018 09:44 AM, Marc MERLIN wrote:
> > On Wed, Jul 11, 2018 at 09:08:40AM +0800, Su Yue wrote:
> > > 
> > > 
> > > On 07/11/2018 08:58 AM, Marc MERLIN wrote:
> > > > On Wed, Jul 11, 2018 at 08:53:58AM +0800, Su Yue wrote:
> > > > > > Problems
> > > > > > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes 
> > > > > > the
> > > > > > server, despite my deleting lots of snapshots.
> > > > > > Is it because I have too many files then?
> > > > > > 
> > > > > Yes. Original check first gather all infomation about extent tree and
> > > > > your files in RAM, then process one by one.
> > > > > But deleting still counts, it does speed lowmem check up.
> > > > 
> > > > Understood.
> > > > 
> > > > > > 2) I tried Su's master git branch for btrfs-progs to try and see how
> > > > > Oh..No... My master branch is still 4.14. The true mater branch is
> > > > > David's here:
> > > > > https://github.com/kdave/btrfs-progs
> > > > > But the master branch has a known bug which I fixed yesterday, please 
> > > > > see
> > > > > the mail.
> > > > 
> > > > So, if I git sync it now, it should have your fix, and I can run it,
> > > > correct?
> > > > 
> > > Yes, please.
> > 
> > Ok, I am now running
> > gargamel:~# time btrfs check --mode=lowmem --repair /dev/mapper/dshelf2
> > using git master from https://github.com/kdave/btrfs-progs
> > 
> Please stop check, plese.
> 
> The branch 'it' which I mean is
> https://github.com/Damenly/btrfs-progs/tree/tmp1

Ok, sorry I thought you said you had pushed your changes to 
https://github.com/kdave/btrfs-progs
yesterday.

So, I went back to https://github.com/Damenly/btrfs-progs.git/tmp1 and
I'm running it without the extra options you added with hardcoded stuff:
gargamel:/var/local/src/btrfs-progs.sy-test# ./btrfsck --mode=lowmem --repair 
/dev/mapper/dshelf2

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check lowmem, take 2

2018-07-10 Thread Marc MERLIN
On Wed, Jul 11, 2018 at 12:07:05PM +0800, Su Yue wrote:
> > So, I went back to https://github.com/Damenly/btrfs-progs.git/tmp1 and
> > I'm running it without the extra options you added with hardcoded stuff:
> > gargamel:/var/local/src/btrfs-progs.sy-test# ./btrfsck --mode=lowmem 
> > --repair /dev/mapper/dshelf2
> > 
> This is okay. Let's wait to see the result.

Sadly, it crashes quickly:

Starting program: /var/local/src/btrfs-progs.sy-test/btrfs check --mode=lowmem 
--repair /dev/mapper/dshelf2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
enabling repair mode
WARNING: low-memory mode repair support is only partial
 Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
checking extents

Program received signal SIGSEGV, Segmentation fault.
check_tree_block_backref (fs_info=fs_info@entry=0x55825e10, 
root_id=root_id@entry=18446744073709551607, bytenr=bytenr@entry=655589376, 
level=level@entry=1)
at check/mode-lowmem.c:3744
3744if (btrfs_header_bytenr(node) != bytenr) {
(gdb)  bt
#0  check_tree_block_backref (fs_info=fs_info@entry=0x55825e10, 
root_id=root_id@entry=18446744073709551607, bytenr=bytenr@entry=655589376, 
level=level@entry=1)
at check/mode-lowmem.c:3744
#1  0x555cb1f9 in check_extent_item 
(fs_info=fs_info@entry=0x55825e10, 
path=path@entry=0x7fffdc60) at check/mode-lowmem.c:4194
#2  0x555d06e9 in check_leaf_items (account_bytes=1, 
nrefs=0x7fffdb80, 
path=0x7fffdc60, root=0x558262f0) at check/mode-lowmem.c:4654
#3  walk_down_tree (check_all=1, nrefs=0x7fffdb80, level=, 
path=0x7fffdc60, root=0x558262f0) at check/mode-lowmem.c:4790
#4  check_btrfs_root (root=root@entry=0x558262f0, 
check_all=check_all@entry=1)
at check/mode-lowmem.c:5114
#5  0x555d144f in check_chunks_and_extents_lowmem 
(fs_info=fs_info@entry=0x55825e10)
at check/mode-lowmem.c:5475
#6  0x555b44b1 in do_check_chunks_and_extents (fs_info=0x55825e10) 
at check/main.c:8369
#7  cmd_check (argc=, argv=) at check/main.c:9899
#8  0x55567510 in main (argc=4, argv=0x7fffe390) at btrfs.c:302


Would you like anything off gdb? (feel free to Email me directly or
point me to an online chat platform you have access to)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check mode normal still hard crash-hanging systems

2018-07-11 Thread Marc MERLIN
On Wed, Jul 11, 2018 at 11:09:56AM -0600, Chris Murphy wrote:
> On Tue, Jul 10, 2018 at 12:09 PM, Marc MERLIN  wrote:
> > Thanks to Su and Qu, I was able to get my filesystem to a point that
> > it's mountable.
> > I then deleted loads of snapshots and I'm down to 26.
> >
> > IT now looks like this:
> > gargamel:~# btrfs fi show /mnt/mnt
> > Label: 'dshelf2'  uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
> > Total devices 1 FS bytes used 12.30TiB
> > devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2
> >
> > gargamel:~# btrfs fi df /mnt/mnt
> > Data, single: total=13.57TiB, used=12.19TiB
> > System, DUP: total=32.00MiB, used=1.55MiB
> > Metadata, DUP: total=124.50GiB, used=115.62GiB
> > Metadata, single: total=216.00MiB, used=0.00B
> > GlobalReserve, single: total=512.00MiB, used=0.00B
> >
> >
> > Problems
> > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the
> > server, despite my deleting lots of snapshots.
> > Is it because I have too many files then?
> 
> I think originally needs most of metdata in memory.
> 
> I'm not understanding why btrfs check won't use swap like at least
> xfs_repair and pretty sure e2fsck will as well.
> 
> Using 128G swap on nvme with original check is still gonna be faster
> than lowmem mode.

Yeah, that's been also a concern/question of mine all these years, even if
Su isn't working on that code, and likely is the wrong person to ask.
Personally, my take is that if btrfs wants to be taken seriously, at the
very least its fsck tool should not hard crash a system you run it on.
(and it really does the worst kind of hard crash I've ever seen, OOM can't
trigger fast enough, linux doesn't panic, so it can't self reboot either, 
it just hard dies and hangs)

Maybe David knows?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs check --repair: ERROR: cannot read chunk root

2016-10-30 Thread Marc MERLIN
I have a filesystem on top of md raid5 that got a few problems due to the
underlying block layer (bad data cable).
The filesystem mounts fine, but had a few issues
Scrub runs (I didn't let it finish, it takes a _long_ time)
But check --repair won't even run at all:

myth:~# btrfs --version
btrfs-progs v4.7.3
myth:~# uname -r
4.8.5-ia32-20161028

myth:~# btrfs check -p --repair  /dev/mapper/crypt_bcache0  2>&1 | tee
/var/spool/repair
bytenr mismatch, want=13835462344704, have=0
ERROR: cannot read chunk root
Couldn't open file system
enabling repair mode
myth:~#

myth:~# btrfs rescue super-recover -v /dev//mapper/crypt_bcache0 
All Devices:
Device: id = 1, name = /dev//mapper/crypt_bcache0

Before Recovering:
[All good supers]:
device name = /dev//mapper/crypt_bcache0
superblock bytenr = 65536

device name = /dev//mapper/crypt_bcache0
superblock bytenr = 67108864

device name = /dev//mapper/crypt_bcache0
superblock bytenr = 274877906944

[All bad supers]:

All supers are valid, no need to recover


I don't care about the data, it's a backup array, but I'd still like to know
if I can recover from this state and do a repair to see how much data got
damaged

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-30 Thread Marc MERLIN
On Mon, Oct 31, 2016 at 09:02:50AM +0800, Qu Wenruo wrote:
> Your chunk root is corrupted, and since chunk tree provides the 
> underlying disk layout, even for single device, so if we failed to read 
> it, then it will never be able to be mounted.
 
That's the thing though, I can mount the filesystem just fine :)

> You could try to use backup chunk root.
> 
> "btrfs inspect-internal dump-super -f" to find the backup chunk root, 
> and use "btrfs check --chunk-root " to have 
> another try.

Am I doing this right? It doesn't seem to work

myth:~# btrfs check -p --repair --chunk-root 13835462344704 
/dev/mapper/crypt_bcache0  2>&1 | tee /var/spool/repair2
bytenr mismatch, want=13835462344704, have=0
ERROR: cannot read chunk root
Couldn't open file system
enabling repair mode


myth:~# btrfs inspect-internal dump-super -f /dev/mapper/crypt_bcache0 | less
superblock: bytenr=65536, device=/dev/mapper/crypt_bcache0
-
csum_type   0 (crc32c)
csum_size   4
csum0x3814e4a0 [match]
bytenr  65536
flags   0x1
( WRITTEN )
magic   _BHRfS_M [match]
fsid6692cf4c-93d9-438c-ac30-5db6381dc4f2
label   DS5
generation  51176
root13845513109504
sys_array_size  129
chunk_root_generation   51135
root_level  1
chunk_root  13835462344704
chunk_root_level1
log_root0
log_root_transid0
log_root_level  0
total_bytes 16002599346176
bytes_used  14584560160768
sectorsize  4096
nodesize16384
leafsize16384
stripesize  4096
root_dir6
num_devices 1
compat_flags0x0
compat_ro_flags 0x0
incompat_flags  0x169
( MIXED_BACKREF |
  COMPRESS_LZO |
  BIG_METADATA |
  EXTENDED_IREF |
  SKINNY_METADATA )
cache_generation51176
uuid_tree_generation51176
dev_item.uuid   0cf779be-8e16-4982-b7d7-f8241deea0d1
dev_item.fsid   6692cf4c-93d9-438c-ac30-5db6381dc4f2 [match]
dev_item.type   0
dev_item.total_bytes16002599346176
dev_item.bytes_used 14691011133440
dev_item.io_align   4096
dev_item.io_width   4096
dev_item.sector_size4096
dev_item.devid  1
dev_item.dev_group  0
dev_item.seek_speed 0
dev_item.bandwidth  0
dev_item.generation 0
sys_chunk_array[2048]:
item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 13835461197824)
chunk length 33554432 owner 2 stripe_len 65536
type SYSTEM|DUP num_stripes 2
stripe 0 devid 1 offset 13500327919616
dev uuid: 0cf779be-8e16-4982-b7d7-f8241deea0d1
stripe 1 devid 1 offset 13500361474048
dev uuid: 0cf779be-8e16-4982-b7d7-f8241deea0d1
backup_roots[4]:
backup 0:
backup_tree_root:   12801101791232  gen: 51174  level: 1
backup_chunk_root:  13835462344704  gen: 51135  level: 1
backup_extent_root: 12801124352000  gen: 51174  level: 3
backup_fs_root: 10548133724160  gen: 51172  level: 0
backup_dev_root:11125467824128  gen: 51172  level: 1
backup_csum_root:   12801133953024  gen: 51174  level: 3
backup_total_bytes: 16002599346176
backup_bytes_used:  14584560160768
backup_num_devices: 1

backup 1:
backup_tree_root:   13842532810752  gen: 51175  level: 1
backup_chunk_root:  13835462344704  gen: 51135  level: 1
backup_extent_root: 13843784695808  gen: 51175  level: 3
backup_fs_root: 10548133724160  gen: 51172  level: 0
backup_dev_root:11125467824128  gen: 51172  level: 1
backup_csum_root:   13842542362624  gen: 51175  level: 3
backup_total_bytes: 16002599346176
backup_bytes_used:  14584560160768
backup_num_devices: 1

backup 2:
backup_tree_root:   13845513109504  gen: 51176  level: 1
backup_chunk_root:  13835462344704  gen: 51135  level: 1
backup_extent_root: 13845513191424  gen: 51176  level: 3
backup_fs_root: 10548133724160  gen: 51172  level: 0
backup_dev_root:11125467824128  gen: 51172  level: 1
backup_csum_root:   13852180938752  gen: 51176  level: 3
backup_total_bytes:   

Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-30 Thread Marc MERLIN
On Sun, Oct 30, 2016 at 07:06:16PM -0700, Marc MERLIN wrote:
> On Mon, Oct 31, 2016 at 09:02:50AM +0800, Qu Wenruo wrote:
> > Your chunk root is corrupted, and since chunk tree provides the 
> > underlying disk layout, even for single device, so if we failed to read 
> > it, then it will never be able to be mounted.
>  
> That's the thing though, I can mount the filesystem just fine :)

Actually, has anyone seen any configuration where the kernel can mount a
filesystem without ro, or recovery, it can just mount it read/write and
btrfs check --repair can't open it?

This kind of sounds like a bug in check --repair IMO.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-30 Thread Marc MERLIN
On Mon, Oct 31, 2016 at 01:27:56PM +0800, Qu Wenruo wrote:
> Would you please dump the following bytes?
> That's the chunk root tree block on your disk.
> 
> offset: 13500329066496 length: 16384
> offset: 13500330213376 length: 16384
 
Sorry for asking, am I doing this wrong?
myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32
skip=26367830208
dd: reading `/dev/mapper/crypt_bcache0': Invalid argument
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000401393 s, 0.0 kB/s

> According to your fsck error output, I assume btrfs-progs fails to read 
> the first copy of chunk root, and due to a bug, it doesn't continue to 
> read 2nd copy.
> 
> While kernel continues to read the 2nd copy and everything goes on.
 
Ah, that would make sense.
But from what you're saying, I should be able to do recovery by pointing
to the 2nd copy of the chunk root, but somehow I haven't typed the right
command to do so yet, correct?

Should I try another command offset than 
btrfs check -p --repair --chunk-root 13835462344704 /dev/mapper/crypt_bcache0 
?

Or are you saying the btrfs progs bug causes it to fail to even try to read
the 2nd copy of the chunk root even though it was given on the command line?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-30 Thread Marc MERLIN
On Mon, Oct 31, 2016 at 02:04:10PM +0800, Qu Wenruo wrote:
> >Sorry for asking, am I doing this wrong?
> >myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32
> >skip=26367830208
> >dd: reading `/dev/mapper/crypt_bcache0': Invalid argument
> >0+0 records in
> >0+0 records out
> >0 bytes (0 B) copied, 0.000401393 s, 0.0 kB/s
> 
> So, the underlying MD RAID5 are complaining about some wrong data, and 
> refuse to read out.
> 
> It seems that btrfs-progs can't handle read failure?
> Maybe dm-error could emulate it.
> 
> And what about the 2nd range?

they both fail the same, but I wasn' tsure if I typed the wrong dd command
or not.

myth:~# btrfs fi df /mnt/mnt
Data, single: total=13.22TiB, used=13.19TiB
System, DUP: total=32.00MiB, used=1.42MiB
Metadata, DUP: total=74.00GiB, used=72.82GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
myth:~# btrfs fi show
Label: 'DS5'  uuid: 6692cf4c-93d9-438c-ac30-5db6381dc4f2
Total devices 1 FS bytes used 13.26TiB
devid1 size 14.55TiB used 13.36TiB path /dev/mapper/crypt_bcache0

For now, I mounted the filesystem and I'm running scrub on it to see how
much damage there is. It will take all night:
BTRFS warning (device dm-0): checksum error at logical 27886878720 on dev 
/dev/mapper/crypt_bcache0, sector 56580096, root 9461, inode 45837, offset 
15460089856, length 4096, links 1 (path: system/mlocate/mlocate.db)
BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, 
flush 0, corrupt 1, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 
27887009792 on dev /dev/mapper/crypt_bcache0
BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, 
flush 0, corrupt 2, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 
27886878720 on dev /dev/mapper/crypt_bcache0
BTRFS warning (device dm-0): checksum error at logical 27885961216 on dev 
/dev/mapper/crypt_bcache0, sector 56578304, root 9461, inode 45837, offset 
15459172352, length 4096, links 1 (path: system/mlocate/mlocate.db)
BTRFS warning (device dm-0): checksum error at logical 27885830144 on dev 
/dev/mapper/crypt_bcache0, sector 56578048, root 9461, inode 45837, offset 
15459041280, length 4096, links 1 (path: system/mlocate/mlocate.db)
BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, 
flush 0, corrupt 3, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 
27885830144 on dev /dev/mapper/crypt_bcache0
BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, 
flush 0, corrupt 4, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 
27885961216 on dev /dev/mapper/crypt_bcache0
BTRFS warning (device dm-0): checksum error at logical 27887013888 on dev 
/dev/mapper/crypt_bcache0, sector 56580360, root 9461, inode 45837, offset 
15460225024, length 4096, links 1 (path: system/mlocate/mlocate.db)
BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, 
flush 0, corrupt 5, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 
27887013888 on dev /dev/mapper/crypt_bcache0
BTRFS warning (device dm-0): checksum error at logical 27885834240 on dev 
/dev/mapper/crypt_bcache0, sector 56578056, root 9461, inode 45837, offset 
15459045376, length 4096, links 1 (path: system/mlocate/mlocate.db)
BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, 
flush 0, corrupt 6, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 
27885834240 on dev /dev/mapper/crypt_bcache0
BTRFS warning (device dm-0): checksum error at logical 27887017984 on dev 
/dev/mapper/crypt_bcache0, sector 56580368, root 9461, inode 45837, offset 
15460229120, length 4096, links 1 (path: system/mlocate/mlocate.db)
BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, 
flush 0, corrupt 7, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 
27887017984 on dev /dev/mapper/crypt_bcache0

So far, it looks like mnior damage limited to one file, I'll see tomorrow 
morning after it's done reading the whole array

> And further more, all backup chunk root are in facts pointing to current 
> chunk root, so --chunk-root doesn't work at all.

Ah, ok, so there is nothing I can do at the moment until I get a new 
btrfs-progs, correct?

Thanks for your answers
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-30 Thread Marc MERLIN
On Mon, Oct 31, 2016 at 02:32:53PM +0800, Qu Wenruo wrote:
> 
> 
> At 10/31/2016 02:25 PM, Marc MERLIN wrote:
> >On Mon, Oct 31, 2016 at 02:04:10PM +0800, Qu Wenruo wrote:
> >>>Sorry for asking, am I doing this wrong?
> >>>myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32
> >>>skip=26367830208
> >>>dd: reading `/dev/mapper/crypt_bcache0': Invalid argument
> >>>0+0 records in
> >>>0+0 records out
> >>>0 bytes (0 B) copied, 0.000401393 s, 0.0 kB/s
> >>
> >>So, the underlying MD RAID5 are complaining about some wrong data, and
> >>refuse to read out.
> >>
> >>It seems that btrfs-progs can't handle read failure?
> >>Maybe dm-error could emulate it.
> >>
> >>And what about the 2nd range?
> >
> >they both fail the same, but I wasn' tsure if I typed the wrong dd command
> >or not.
> 
> Strange, your command seems OK to me.
> 
> Does it has anything to do with your security setup or something like that?
> Or is it related to dm-crypt or bcache?
> 
> 
> But this reminds me, if dd can't read it, maybe btrfs-progs is the same.
> 
> Maybe only kernel can read dm-crypt device while user space tools can't 
> access dm-crypt devices directly?

It can, it's just the offset seems wrong:

myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32 
skip=26367830208
dd: reading `/dev/mapper/crypt_bcache0': Invalid argument
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000421662 s, 0.0 kB/s

If I divide by 1000, it works:
myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32 
skip=26367830
32+0 records in
32+0 records out
16384 bytes (16 kB) copied, 0.139005 s, 118 kB/s

so that's why I was asking you if I counted the offset wrong. I took the
value you asked and divided by 512, but it seems too big

13500329066496 / 512 = 26367830208

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-31 Thread Marc MERLIN
On Mon, Oct 31, 2016 at 08:44:12AM +, Hugo Mills wrote:
> > Any idea on special dm setup which can make us fail to read out some
> > data range?
> 
>I've seen both btrfs check and btrfs dump-super give wrong answers
> (particularly, some addresses end up larger than the device, for some
> reason) when run on a mounted filesystem. Worth ruling that one out.

I just finished running my scrub overnight, and it failed around 10%:
[115500.316921] BTRFS error (device dm-0): bad tree block start 
8461247125784585065 17619396231168
[115500.332354] BTRFS error (device dm-0): bad tree block start 
8461247125784585065 17619396231168
[115500.332626] BTRFS: error (device dm-0) in __btrfs_free_extent:6954: 
errno=-5 IO failure
[115500.332629] BTRFS info (device dm-0): forced readonly
[115500.332632] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:2960: 
errno=-5 IO failure
[115500.436002] btrfs_printk: 550 callbacks suppressed
[115500.436024] BTRFS warning (device dm-0): Skipping commit of aborted 
transaction.
[115500.436029] BTRFS: error (device dm-0) in cleanup_transaction:1854: 
errno=-5 IO failure


myth:~# ionice -c 3 nice -10 btrfs scrub start -Bd /mnt/mnt
(...)
scrub device /dev/mapper/crypt_bcache0 (id 1) canceled
scrub started at Sun Oct 30 22:52:59 2016 and was aborted after 09:03:11
total bytes scrubbed: 1.15TiB with 512 errors
error details: csum=512
corrected errors: 0, uncorrectable errors: 512, unverified errors: 0

Am I correct that if I see "__btrfs_free_extent:6954: errno=-5 IO failure" it 
means
that btrfs had physical read errors from the underlying block layer?

Do I have some weird mismatch between the size of my md array and the size of 
my filesystem
(as per dd apparently thinking parts of it are out of bounds?)
Yet,  the sizes seem to match:


myth:~#  mdadm --query --detail /dev/md5
/dev/md5:
Version : 1.2
  Creation Time : Tue Jan 21 10:35:52 2014
 Raid Level : raid5
 Array Size : 15627542528 (14903.59 GiB 16002.60 GB)
  Used Dev Size : 3906885632 (3725.90 GiB 4000.65 GB)
   Raid Devices : 5
  Total Devices : 5
Persistence : Superblock is persistent

  Intent Bitmap : Internal

Update Time : Mon Oct 31 07:56:07 2016
  State : clean 
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 512K

   Name : gargamel.svh.merlins.org:5
   UUID : ec672af7:a66d9557:2f00d76c:38c9f705
 Events : 147992

Number   Major   Minor   RaidDevice State
   0   8   970  active sync   /dev/sdg1
   6   8  1131  active sync   /dev/sdh1
   2   8   812  active sync   /dev/sdf1
   3   8   653  active sync   /dev/sde1
   5   8   494  active sync   /dev/sdd1

myth:~# btrfs fi df /mnt/mnt
Data, single: total=13.22TiB, used=13.19TiB
System, DUP: total=32.00MiB, used=1.42MiB
Metadata, DUP: total=75.00GiB, used=72.82GiB
GlobalReserve, single: total=512.00MiB, used=6.73MiB

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  


signature.asc
Description: Digital signature


Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-31 Thread Marc MERLIN
So, I'm willing to wait 2 more days before I wipe this filesystem and
start over if I can't get check --repair to work on it.
If you need longer, please let me konw you have an upcoming patch for me
to try and I'll wait.

Thanks,
Marc

On Mon, Oct 31, 2016 at 08:04:22AM -0700, Marc MERLIN wrote:
> On Mon, Oct 31, 2016 at 08:44:12AM +, Hugo Mills wrote:
> > > Any idea on special dm setup which can make us fail to read out some
> > > data range?
> > 
> >I've seen both btrfs check and btrfs dump-super give wrong answers
> > (particularly, some addresses end up larger than the device, for some
> > reason) when run on a mounted filesystem. Worth ruling that one out.
> 
> I just finished running my scrub overnight, and it failed around 10%:
> [115500.316921] BTRFS error (device dm-0): bad tree block start 
> 8461247125784585065 17619396231168
> [115500.332354] BTRFS error (device dm-0): bad tree block start 
> 8461247125784585065 17619396231168
> [115500.332626] BTRFS: error (device dm-0) in __btrfs_free_extent:6954: 
> errno=-5 IO failure
> [115500.332629] BTRFS info (device dm-0): forced readonly
> [115500.332632] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:2960: 
> errno=-5 IO failure
> [115500.436002] btrfs_printk: 550 callbacks suppressed
> [115500.436024] BTRFS warning (device dm-0): Skipping commit of aborted 
> transaction.
> [115500.436029] BTRFS: error (device dm-0) in cleanup_transaction:1854: 
> errno=-5 IO failure
> 
> 
> myth:~# ionice -c 3 nice -10 btrfs scrub start -Bd /mnt/mnt
> (...)
> scrub device /dev/mapper/crypt_bcache0 (id 1) canceled
> scrub started at Sun Oct 30 22:52:59 2016 and was aborted after 
> 09:03:11
> total bytes scrubbed: 1.15TiB with 512 errors
> error details: csum=512
> corrected errors: 0, uncorrectable errors: 512, unverified errors: 0
> 
> Am I correct that if I see "__btrfs_free_extent:6954: errno=-5 IO failure" it 
> means
> that btrfs had physical read errors from the underlying block layer?
> 
> Do I have some weird mismatch between the size of my md array and the size of 
> my filesystem
> (as per dd apparently thinking parts of it are out of bounds?)
> Yet,  the sizes seem to match:
> 
> 
> myth:~#  mdadm --query --detail /dev/md5
> /dev/md5:
> Version : 1.2
>   Creation Time : Tue Jan 21 10:35:52 2014
>  Raid Level : raid5
>  Array Size : 15627542528 (14903.59 GiB 16002.60 GB)
>   Used Dev Size : 3906885632 (3725.90 GiB 4000.65 GB)
>Raid Devices : 5
>   Total Devices : 5
> Persistence : Superblock is persistent
> 
>   Intent Bitmap : Internal
> 
> Update Time : Mon Oct 31 07:56:07 2016
>   State : clean 
>  Active Devices : 5
> Working Devices : 5
>  Failed Devices : 0
>   Spare Devices : 0
> 
>  Layout : left-symmetric
>  Chunk Size : 512K
> 
>Name : gargamel.svh.merlins.org:5
>UUID : ec672af7:a66d9557:2f00d76c:38c9f705
>  Events : 147992
> 
> Number   Major   Minor   RaidDevice State
>0   8   970  active sync   /dev/sdg1
>6   8  1131  active sync   /dev/sdh1
>2   8   812  active sync   /dev/sdf1
>3   8   653  active sync   /dev/sde1
>5   8   494  active sync   /dev/sdd1
> 
> myth:~# btrfs fi df /mnt/mnt
> Data, single: total=13.22TiB, used=13.19TiB
> System, DUP: total=32.00MiB, used=1.42MiB
> Metadata, DUP: total=75.00GiB, used=72.82GiB
> GlobalReserve, single: total=512.00MiB, used=6.73MiB
> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>    what McDonalds is to gourmet 
> cooking
> Home page: http://marc.merlins.org/  



-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check --repair: ERROR: cannot read chunk root

2016-10-31 Thread Marc MERLIN
On Tue, Nov 01, 2016 at 12:13:38PM +0800, Qu Wenruo wrote:
> Would you try to locate the range where we starts to fail to read?
> 
> I still think the root problem is we failed to read the device in user
> space.
 
Understood.

I'll run this then:
myth:~# dd if=/dev/mapper/crypt_bcache0 of=/dev/null bs=1M &
[2] 21108
myth:~# while :; do killall -USR1 dd; sleep 1200; done
275+0 records in
274+0 records out
287309824 bytes (287 MB) copied, 7.20248 s, 39.9 MB/s

This will take a while to run, I'll report back on how far it goes.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check --repair: ERROR: cannot read chunk root

2016-11-04 Thread Marc MERLIN
On Mon, Oct 31, 2016 at 09:21:40PM -0700, Marc MERLIN wrote:
> On Tue, Nov 01, 2016 at 12:13:38PM +0800, Qu Wenruo wrote:
> > Would you try to locate the range where we starts to fail to read?
> > 
> > I still think the root problem is we failed to read the device in user
> > space.
>  
> Understood.
> 
> I'll run this then:
> myth:~# dd if=/dev/mapper/crypt_bcache0 of=/dev/null bs=1M &
> [2] 21108
> myth:~# while :; do killall -USR1 dd; sleep 1200; done
> 275+0 records in
> 274+0 records out
> 287309824 bytes (287 MB) copied, 7.20248 s, 39.9 MB/s
> 
> This will take a while to run, I'll report back on how far it goes.

Well, turns out you were right. My array is 14TB and dd was only able to
copy 8.8TB out of it.

I wonder if it's a bug with bcache and source devices that are too big?

8782434271232 bytes (8.8 TB) copied, 214809 s, 40.9 MB/s
dd: reading `/dev/mapper/crypt_bcache0': Invalid argument
8388608+0 records in
8388608+0 records out
8796093022208 bytes (8.8 TB) copied, 215197 s, 40.9 MB/s
[2]+  Exit 1  dd if=/dev/mapper/crypt_bcache0 of=/dev/null bs=1M

What's vexing is that absolutely nothing has been logged in the kernel dmesg
buffer about this read error.

Basically I have this:
sde8:64   0   3.7T  0 
└─sde1 8:65   0   3.7T  0 
  └─md59:50  14.6T  0 
└─bcache0252:00  14.6T  0 
  └─crypt_bcache0 (dm-0) 253:00  14.6T  0 

I'll try dd'ing the md5 directly now, but that's going to take another 2 days :(

That said, given that almost half the device is not readable from user space
for some reason, that would explain why btrfs check is failing. Obviously it
can't do its job if it can't read blocks.

I'll report back on what I find out with this problem but if you have
suggestions on what to look for, let me know :)

Thanks.
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check --repair: ERROR: cannot read chunk root

2016-11-04 Thread Marc MERLIN
On Fri, Nov 04, 2016 at 02:00:43PM +0500, Roman Mamedov wrote:
> On Fri, 4 Nov 2016 01:01:13 -0700
> Marc MERLIN  wrote:
> 
> > Basically I have this:
> > sde8:64   0   3.7T  0 
> > └─sde1 8:65   0   3.7T  0 
> >   └─md59:50  14.6T  0 
> > └─bcache0252:00  14.6T  0 
> >   └─crypt_bcache0 (dm-0) 253:00  14.6T  0 
> > 
> > I'll try dd'ing the md5 directly now, but that's going to take another 2 
> > days :(
> > 
> > That said, given that almost half the device is not readable from user space
> > for some reason, that would explain why btrfs check is failing. Obviously it
> > can't do its job if it can't read blocks.
> 
> I don't see anything to support the notion that "half is unreadable", maybe
> just a 512-byte sector is unreadable -- but that would be enough to make
> regular dd bail out -- which is why you should be using dd_rescue for this,
> not regular dd. Assuming you just want to copy over as much data as possible,
> and not simply test if dd fails or not (but in any case dd_rescue at least
> would not fail instantly and would tell you precise count of how much is
> unreadable).

Thanks for the plug on ddrescue, I have used it to rescue drives in the
past.
Here, however, everything after the 8.8TB mark, is unreadable, so there
is nothing to skip.

Because the underlying drives are fine, I'm not entirely sure where the
issue is although it has to be on the mdadm side and not related to
btrfs.

And of course the mdadm array shows clean, and I have already disabled
the mdadm per drive bad block (mis-)feature which probably is
responsible for all the problems I've had here.
myth:~# mdadm --examine-badblocks /dev/sd[defgh]1
No bad-blocks list configured on /dev/sdd1
No bad-blocks list configured on /dev/sde1
No bad-blocks list configured on /dev/sdf1
No bad-blocks list configured on /dev/sdg1
No bad-blocks list configured on /dev/sdh1

I'm also still perplexed as to why despite the rear error I'm getting,
absolutely nothing is logged in the kernel :-/

I'll pursue that further and post a summary on the thread here if I find
something interesting.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?

2016-11-06 Thread Marc MERLIN
On Mon, Nov 07, 2016 at 09:11:54AM +0800, Qu Wenruo wrote:
> > Well, turns out you were right. My array is 14TB and dd was only able to
> > copy 8.8TB out of it.
> > 
> > I wonder if it's a bug with bcache and source devices that are too big?
> 
> At least we know it's not a problem of btrfs-progs.
> 
> And for bcache/soft raid/encryption, unfortunately I'm not familiar with any
> of them.
> 
> I would recommend to report it to bcache/mdadm/encryption ML after locating
> the layer which returns EINVAL.

So, Neil Brown found the problem.

myth:/sys/block/md5/md# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190
dd: reading `/dev/md5': Invalid argument
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 37.0785 s, 57.9 MB/s
myth:/sys/block/md5/md# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190 count=3 
iflag=direct
3+0 records in
3+0 records out


On Mon, Nov 07, 2016 at 11:16:56AM +1100, NeilBrown wrote:
> EINVAL from a read() system call is surprising in this context.
> 
> do_generic_file_read can return it:
>   if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
>   return -EINVAL;
> 
> s_maxbytes will be MAX_LFS_FILESIZE which, on a 32bit system, is
> 
> #define MAX_LFS_FILESIZE(((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)
> 
> That is 2^(12+31) or 2^43 or 8TB.
> 
> Is this a 32bit system you are using?  Such systems can only support
> buffered IO up to 8TB.  If you use iflags=direct to avoid buffering, you
> should get access to the whole device.

I am indeed using a 32bit system, and now we know why the kernel can
mount and use my filesystem just fine while btrfs check repair fails to
deal with it.
The filesystem is more than 8TB on a 32bit kernel with 32bit userland.

Since iflag=direct fixes the issue with dd, it sounds like something
similar could be done for btrfs progs, to support filesystems bigger
than 8TB on 32bit systems.

However, could you confirm that filesystems more than 8TB are supported
by the kernel code itself on 32bit systems? (I think so, but just
wanting to make sure)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-06 Thread Marc MERLIN
(sorry for the bad subject line from the mdadm list on the previous mail) 

On Mon, Nov 07, 2016 at 12:18:10PM +0800, Qu Wenruo wrote:
> I'm totally wrong here.
> 
> DirectIO needs the 'buf' parameter of read()/pread() to be 512 bytes
> aligned.
> 
> While we are using a lot of stack memory() and normal malloc()/calloc()
> allocated memory, which are seldom aligned to 512 bytes.
> 
> So to *workaround* the problem in btrfs-progs, we may need to change any
> pread() caller to use aligned memory allocation.
> 
> I really don't think David will accept such huge change for a workdaround...

Thanks for looking into it.
So basically should we just document that btrfs filesystems past 8TB in
size are not supported on 32bit architectures?
(as in you can mount them and use them I believe, but you cannot create,
or repair them)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-07 Thread Marc MERLIN
On Mon, Nov 07, 2016 at 02:16:37PM +0800, Qu Wenruo wrote:
> 
> 
> At 11/07/2016 01:36 PM, Marc MERLIN wrote:
> > (sorry for the bad subject line from the mdadm list on the previous mail)
> > 
> > On Mon, Nov 07, 2016 at 12:18:10PM +0800, Qu Wenruo wrote:
> > > I'm totally wrong here.
> > > 
> > > DirectIO needs the 'buf' parameter of read()/pread() to be 512 bytes
> > > aligned.
> > > 
> > > While we are using a lot of stack memory() and normal malloc()/calloc()
> > > allocated memory, which are seldom aligned to 512 bytes.
> > > 
> > > So to *workaround* the problem in btrfs-progs, we may need to change any
> > > pread() caller to use aligned memory allocation.
> > > 
> > > I really don't think David will accept such huge change for a 
> > > workdaround...
> > 
> > Thanks for looking into it.
> > So basically should we just document that btrfs filesystems past 8TB in
> > size are not supported on 32bit architectures?
> > (as in you can mount them and use them I believe, but you cannot create,
> > or repair them)
> > 
> > Marc
> > 
> Add David to this thread.
> 
> For create, it should be OK. As at create time, we hardly write beyond 3G.
> So it won't be a big problem.
> 
> For repair, we do have a possibility that btrfsck can't handle it.
> 
> Anyway, I'd like to see how David thinks what we should do the handle the
> problem.

Understood. One big thing (for me) I forgot to confirm:
1) btrfs receive
2) btrfs scrub
should both be able to work because the IO operations are done directly
inside the kernel and not from user space, correct?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-07 Thread Marc MERLIN
On Tue, Nov 08, 2016 at 08:35:54AM +0800, Qu Wenruo wrote:
> >Understood. One big thing (for me) I forgot to confirm:
> >1) btrfs receive
> 
> Unfortunately, receive is completely done in userspace.
> Only send works inside kernel.
 
right, I've confirmed that btrfs receive fails.
It looks like btrfs balance is also failing, which is more surprising.
Isn't that one in the kernel?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-07 Thread Marc MERLIN
On Tue, Nov 08, 2016 at 08:43:34AM +0800, Qu Wenruo wrote:
> That's strange, balance is done completely in kernel space.
> 
> Unless we're calling vfs_* function we won't go through the extra check.
> 
> What's the error reported?

See below. Note however that is may be because btrfs received messed up the
filesystem first.

BTRFS info (device dm-0): use zlib compression
BTRFS info (device dm-0): disk space caching is enabled
BTRFS info (device dm-0): has skinny extents
BTRFS info (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, 
flush 0, corrupt 512, gen 0
BTRFS info (device dm-0): detected SSD devices, enabling SSD mode
BTRFS info (device dm-0): continuing balance
BTRFS info (device dm-0): The free space cache file (1593999097856) is invalid. 
skip it

BTRFS info (device dm-0): The free space cache file (1671308509184) is invalid. 
skip it

BTRFS info (device dm-0): relocating block group 13835461197824 flags 34
[ cut here ]
WARNING: CPU: 0 PID: 22825 at fs/btrfs/disk-io.c:520 
btree_csum_one_bio.isra.39+0xf7/0x100
Modules linked in: bcache configs rc_hauppauge ir_kbd_i2c cpufreq_userspace 
cpufreq_powersave cpufreq_conservative autofs4 snd_hda_codec_hdmi joydev 
snd_hda_codec_realtek snd_hda_codec_generic tuner_simple tuner_types tda9887 
snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep tda8290 coretemp snd_pcm_oss 
snd_mixer_oss tuner snd_pcm msp3400 snd_seq_midi snd_seq_midi_event 
firewire_sbp2 saa7127 snd_rawmidi hwmon_vid dm_crypt dm_mod saa7115 snd_seq 
bttv hid_generic snd_seq_device snd_timer ehci_pci ivtv tea575x videobuf_dma_sg 
rc_core videobuf_core input_leds tveeprom cx2341x v4l2_common ehci_hcd videodev 
media acpi_cpufreq tpm_tis tpm_tis_core gpio_ich snd soundcore tpm psmouse 
lpc_ich evdev asus_atk0110 serio_raw lp parport raid456 async_raid6_recov 
async_pq async_xor async_memcpy async_tx multipath usbhid hid sr_mod cdrom sg 
firewire_ohci firewire_core floppy crc_itu_t i915 atl1 fjes mii uhci_hcd 
usbcore usb_common
CPU: 0 PID: 22825 Comm: kworker/u9:2 Tainted: GW   
4.8.5-ia32-20161028 #2
Hardware name: System manufacturer P5E-VM HDMI/P5E-VM HDMI, BIOS 0604
07/16/2008
Workqueue: btrfs-worker-high btrfs_worker_helper
 00200286 00200286 d3d81e48 df414827  dfa12da5 d3d81e78 df05677a
 df9ed884  5929 dfa12da5 0208 df2cf067 0208 f7463fa0
 f401a080  d3d81e8c df05684a 0009   d3d81eb4
Call Trace:
 [] dump_stack+0x58/0x81
 [] __warn+0xea/0x110
 [] ? btree_csum_one_bio.isra.39+0xf7/0x100
 [] warn_slowpath_null+0x2a/0x30
 [] btree_csum_one_bio.isra.39+0xf7/0x100
 [] __btree_submit_bio_start+0x15/0x20
 [] run_one_async_start+0x30/0x40
 [] btrfs_scrubparity_helper+0xcd/0x2d0
 [] ? run_one_async_free+0x20/0x20
 [] btrfs_worker_helper+0xd/0x10
 [] process_one_work+0x10b/0x400
 [] worker_thread+0x37/0x4b0
 [] ? process_one_work+0x400/0x400
 [] kthread+0x9b/0xb0
 [] ret_from_kernel_thread+0xe/0x24
 [] ? kthread_stop+0x100/0x100
---[ end trace f461faff989bf258 ]---
BTRFS: error (device dm-0) in btrfs_commit_transaction:2232: errno=-5 IO 
failure (Error while writing out transaction)
BTRFS info (device dm-0): forced readonly
BTRFS warning (device dm-0): Skipping commit of aborted transaction.
[ cut here ]
WARNING: CPU: 0 PID: 22318 at fs/btrfs/transaction.c:1854 
btrfs_commit_transaction+0x2f5/0xcc0
BTRFS: Transaction aborted (error -5)
Modules linked in: bcache configs rc_hauppauge ir_kbd_i2c cpufreq_userspace 
cpufreq_powersave cpufreq_conservative autofs4 snd_hda_codec_hdmi joydev 
snd_hda_codec_realtek snd_hda_codec_generic tuner_simple tuner_types tda9887 
snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep tda8290 coretemp snd_pcm_oss 
snd_mixer_oss tuner snd_pcm msp3400 snd_seq_midi snd_seq_midi_event 
firewire_sbp2 saa7127 snd_rawmidi hwmon_vid dm_crypt dm_mod saa7115 snd_seq 
bttv hid_generic snd_seq_device snd_timer ehci_pci ivtv tea575x videobuf_dma_sg 
rc_core videobuf_core input_leds tveeprom cx2341x v4l2_common ehci_hcd videodev 
media acpi_cpufreq tpm_tis tpm_tis_core gpio_ich snd soundcore tpm psmouse 
lpc_ich evdev asus_atk0110 serio_raw lp parport raid456 async_raid6_recov 
async_pq async_xor async_memcpy async_tx multipath usbhid hid sr_mod cdrom sg 
firewire_ohci firewire_core floppy crc_itu_t i915 atl1 fjes mii uhci_hcd 
usbcore usb_common
CPU: 0 PID: 22318 Comm: btrfs-balance Tainted: GW   
4.8.5-ia32-20161028 #2
Hardware name: System manufacturer P5E-VM HDMI/P5E-VM HDMI, BIOS 0604
07/16/2008
 0286 0286 d74a3ca4 df414827 d74a3ce8 dfa132ab d74a3cd4 df05677a
 dfa075cc d74a3d04 572e dfa132ab 073e df2d7de5 073e f698dc00
 e9173e70 fffb d74a3cf0 df0567db 0009  d74a3ce8 dfa075cc
Call Trace:
 [] dump_stack+0x58/0x81
 [] __warn+0xea/0x110
 [] ? btrfs_commit_transaction+0x2f5/0xcc0
 [] warn_slowpath_fmt+0x3b/0x40
 [] btrfs_commit_transaction+0x2f5/0xcc0
 [] ? prepare_to_wait_event+0xd0/0xd0
 [] prepare_to_r

Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-08 Thread Marc MERLIN
On Tue, Nov 08, 2016 at 09:17:43AM +0800, Qu Wenruo wrote:
> 
> 
> At 11/08/2016 09:06 AM, Marc MERLIN wrote:
> >On Tue, Nov 08, 2016 at 08:43:34AM +0800, Qu Wenruo wrote:
> >>That's strange, balance is done completely in kernel space.
> >>
> >>Unless we're calling vfs_* function we won't go through the extra check.
> >>
> >>What's the error reported?
> >
> >See below. Note however that is may be because btrfs received messed up the
> >filesystem first.
> 
> If receive can easily screw up the fs, then fsstress can also screw up 
> btrfs easily.
> 
> So I didn't think that's the case. (Several years ago it's possible)
 
So now I'm even more confused. I put the array back in my 64bit system and
check --repair comes back clean, but scrub does not. Is that supposed to be 
possible?

gargamel:~# btrfs check -p --repair /dev/mapper/crypt_bcache2 2>&1 | tee 
/mnt/dshelf1/other/btrfs2
enabling repair mode
Checking filesystem on /dev/mapper/crypt_bcache2
UUID: 6692cf4c-93d9-438c-ac30-5db6381dc4f2
checking extents [.]
Fixed 0 roots.
cache and super generation don't match, space cache will be invalidated
checking fs roots [o]
checking csums
checking root refs
found 14622791987200 bytes used err is 0
total csum bytes: 14200176492
total tree bytes: 78239416320
total fs tree bytes: 59524497408
total extent tree bytes: 3236872192
btree space waste bytes: 10068589919
file data blocks allocated: 18101311373312
 referenced 18038641020928

Nov  8 06:55:40 gargamel kernel: [35631.988896] BTRFS error (device dm-6): bdev 
/dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 513, gen 0
Nov  8 06:55:40 gargamel kernel: [35631.988897] BTRFS error (device dm-6): bdev 
/dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 514, gen 0
Nov  8 06:55:40 gargamel kernel: [35631.988899] BTRFS warning (device dm-6): 
checksum error at logical 27885961216 on dev /dev/mapper/crypt_bcache2, sector 
56578304, root 9461, inode 45837, offset 15459172352, length 4096, links 1 
(path: system/mlocate/mlocate.db)
Nov  8 06:55:40 gargamel kernel: [35631.988900] BTRFS error (device dm-6): bdev 
/dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 515, gen 0
Nov  8 06:55:40 gargamel kernel: [35631.988903] BTRFS warning (device dm-6): 
checksum error at logical 27887534080 on dev /dev/mapper/crypt_bcache2, sector 
56581376, root 9461, inode 45837, offset 15460745216, length 4096, links 1 
(path: system/mlocate/mlocate.db)
Nov  8 06:55:40 gargamel kernel: [35631.988904] BTRFS error (device dm-6): 
unable to fixup (regular) error at logical 27887009792 on dev 
/dev/mapper/crypt_bcache2
Nov  8 06:55:40 gargamel kernel: [35631.988905] BTRFS error (device dm-6): 
unable to fixup (regular) error at logical 27886878720 on dev 
/dev/mapper/crypt_bcache2
Nov  8 06:55:40 gargamel kernel: [35631.988906] BTRFS error (device dm-6): bdev 
/dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 516, gen 0
Nov  8 06:55:40 gargamel kernel: [35631.988907] BTRFS error (device dm-6): 
unable to fixup (regular) error at logical 27887837184 on dev 
/dev/mapper/crypt_bcache2
Nov  8 06:55:40 gargamel kernel: [35631.988908] BTRFS error (device dm-6): bdev 
/dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 517, gen 0
Nov  8 06:55:40 gargamel kernel: [35631.988909] BTRFS error (device dm-6): bdev 
/dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 518, gen 0
Nov  8 06:55:40 gargamel kernel: [35631.988910] BTRFS error (device dm-6): 
unable to fixup (regular) error at logical 27885830144 on dev 
/dev/mapper/crypt_bcache2
Nov  8 06:55:40 gargamel kernel: [35631.988911] BTRFS error (device dm-6): 
unable to fixup (regular) error at logical 27885961216 on dev 
/dev/mapper/crypt_bcache2
Nov  8 06:55:40 gargamel kernel: [35631.988912] BTRFS error (device dm-6): 
unable to fixup (regular) error at logical 27887534080 on dev 
/dev/mapper/crypt_bcache2
Nov  8 06:55:40 gargamel kernel: [35631.92] BTRFS warning (device dm-6): 
checksum error at logical 27887403008 on dev /dev/mapper/crypt_bcache2, sector 
56581120, root 9461, inode 45837, offset 15460614144, length 4096, links 1 
(path: system/mlocate/mlocate.db)
Nov  8 06:55:40 gargamel kernel: [35631.95] BTRFS warning (device dm-6): 
checksum error at logical 27887009792 on dev /dev/mapper/crypt_bcache2, sector 
56580352, root 9461, inode 45837, offset 15460220928, length 4096, links 1 
(path: system/mlocate/mlocate.db)
Nov  8 06:55:40 gargamel kernel: [35631.97] BTRFS warning (device dm-6): 
checksum error at logical 27886878720 on dev /dev/mapper/crypt_bcache2, sector 
56580096, root 9461, inode 45837, offset 15460089856, length 4096, links 1 
(path: system/mlocate/mlocate.db)
Nov  8 06:55:40 gargamel kernel: [35631.988890] BTRFS warning (device dm-6): 
checksum error at logical 27887837184 on dev /dev/mapper/crypt_bcache2, sect

Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-08 Thread Marc MERLIN
On Wed, Nov 09, 2016 at 09:50:08AM +0800, Qu Wenruo wrote:
> Yeah, quite possible!
> 
> The truth is, current btrfs check only checks:
> 1) Metadata
>while --check-data-csum option will check data, but still
>follow the restriction 3).
> 2) Crossing reference of metadata (contents of metadata)
> 3) The first good mirror/backup
> 
> So quite a lot of problems can't be detected by btrfs check:
> 1) Data corruption (csum mismatch)
> 2) 2nd mirror corruption(DUP/RAID0/10) or parity error(RAID5/6)
> 
> For btrfsck to check all mirror and data, you could try out-of-tree 
> offline scrub patchset:
> https://github.com/adam900710/btrfs-progs/tree/fsck_scrub
> 
> Which implements the kernel scrub equivalent in btrfs-progs.

I see, thanks for the answer.
Note that this is very confusing to the end user.
If check --repair returns success, the filesystem should be clean.
Hopefully that patchset can be included in btrfs-progs

But sure enough, I'm seeing a lot of these:
BTRFS warning (device dm-6): checksum error at logical 269783986176 on dev 
/dev/mapper/crypt_bcache2, sector 529035384, root 16755, inode 1225897, offset 
77824, length 4096, links 5 (path: 
magic/20150624/home/merlin/public_html/rig3/img/thumb800_302_1-Wire.jpg)

This is bad because I would expect check --repair to find them all and offer
to remove all the corrupted files after giving me a list of what I've lost,
or just recompute the checksum to be correct, know the file is now corrupted
but "clean" and I have the option of keeping them as is (ok-ish for a video
file) or restore them from backup.

The worst part with scrub is that I have to find all these files, and then
find all the snapshots they're in (maybe 10 or 20) and delete them all, and
then some of those snapshots are read only because they are btrfs send
source, so I need to destroy those snapshots and lose my btrfs send
relationship and am forced to recreate it (maybe 2 to 6 days of syncing over
a slow-ish link)

When data is corrupted, no solution is perfect, but hopefully check --repair
will indeed be able to restore the entire filesystem to a clean state, even
if some data must be lost in the process.

Thanks for considering.

Marc

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs support for filesystems >8TB on 32bit architectures

2016-11-10 Thread Marc MERLIN
On Tue, Nov 08, 2016 at 06:05:19PM -0800, Marc MERLIN wrote:
> On Wed, Nov 09, 2016 at 09:50:08AM +0800, Qu Wenruo wrote:
> > Yeah, quite possible!
> > 
> > The truth is, current btrfs check only checks:
> > 1) Metadata
> >while --check-data-csum option will check data, but still
> >follow the restriction 3).
> > 2) Crossing reference of metadata (contents of metadata)
> > 3) The first good mirror/backup
> > 
> > So quite a lot of problems can't be detected by btrfs check:
> > 1) Data corruption (csum mismatch)
> > 2) 2nd mirror corruption(DUP/RAID0/10) or parity error(RAID5/6)
> > 
> > For btrfsck to check all mirror and data, you could try out-of-tree 
> > offline scrub patchset:
> > https://github.com/adam900710/btrfs-progs/tree/fsck_scrub
> > 
> > Which implements the kernel scrub equivalent in btrfs-progs.
> 
> I see, thanks for the answer.
> Note that this is very confusing to the end user.
> If check --repair returns success, the filesystem should be clean.
> Hopefully that patchset can be included in btrfs-progs
> 
> But sure enough, I'm seeing a lot of these:
> BTRFS warning (device dm-6): checksum error at logical 269783986176 on dev 
> /dev/mapper/crypt_bcache2, sector 529035384, root 16755, inode 1225897, 
> offset 77824, length 4096, links 5 (path: 
> magic/20150624/home/merlin/public_html/rig3/img/thumb800_302_1-Wire.jpg)

So, I ran check -repair, then I ran scrub and I deleted all the files
that were referenced by pathname and failed scrub.
Now I have this:
BTRFS error (device dm-6): unable to fixup (regular) error at logical 
269785128960 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, 
flush 0, corrupt 1545, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 
269785133056 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, 
flush 0, corrupt 1546, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 
269785137152 on dev /dev/mapper/crypt_bcache2
BTRFS warning (device dm-6): checksum error at logical 269784580096 on dev 
/dev/mapper/crypt_bcache2, sector 529036544, root 17564, inode 1225903, offset 
16384: path resolving failed with ret=-2
BTRFS warning (device dm-6): checksum error at logical 269784584192 on dev 
/dev/mapper/crypt_bcache2, sector 529036552, root 17564, inode 1225903, offset 
20480: path resolving failed with ret=-2
BTRFS warning (device dm-6): checksum error at logical 269784588288 on dev 
/dev/mapper/crypt_bcache2, sector 529036560, root 17564, inode 1225903, offset 
24576: path resolving failed with ret=-2
BTRFS warning (device dm-6): checksum error at logical 269784592384 on dev 
/dev/mapper/crypt_bcache2, sector 529036568, root 17564, inode 1225903, offset 
28672: path resolving failed with ret=-2
BTRFS warning (device dm-6): checksum error at logical 269784596480 on dev 
/dev/mapper/crypt_bcache2, sector 529036576, root 17564, inode 1225903, offset 
32768: path resolving failed with ret=-2
BTRFS warning (device dm-6): checksum error at logical 269784600576 on dev 
/dev/mapper/crypt_bcache2, sector 529036584, root 17564, inode 1225903, offset 
36864: path resolving failed with ret=-2
BTRFS warning (device dm-6): checksum error at logical 269784604672 on dev 
/dev/mapper/crypt_bcache2, sector 529036592, root 17564, inode 1225903, offset 
40960: path resolving failed with ret=-2
BTRFS warning (device dm-6): checksum error at logical 269784608768 on dev 
/dev/mapper/crypt_bcache2, sector 529036600, root 17564, inode 1225903, offset 
45056: path resolving failed with ret=-2
BTRFS warning (device dm-6): checksum error at logical 269784612864 on dev 
/dev/mapper/crypt_bcache2, sector 529036608, root 17564, inode 1225903, offset 
49152: path resolving failed with ret=-2

How am I supposed to deal with those?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: when btrfs scrub reports errors and btrfs check --repair does not

2016-11-11 Thread Marc MERLIN
On Fri, Nov 11, 2016 at 11:55:21AM +0800, Qu Wenruo wrote:
> It seems to be orphan inodes.
> Btrfs doesn't remove all the contents of an inode at rm time.
> It just unlink the inode and put it into a state called orphan inodes.(Can't
> be referred from any directory).

BTRFS warning (device dm-6): checksum error at logical 269783928832 on dev 
/dev/mapper/crypt_bcache2, sector 529035272, root 17564, inode 1225897, offset 
20480: path resolving failed with ret=-2
BTRFS warning (device dm-6): checksum error at logical 269783932928 on dev 
/dev/mapper/crypt_bcache2, sector 529035280, root 17564, inode 1225897, offset 
24576: path resolving failed with ret=-2
 
Do you mean I should be using find /mnt/mnt -inum ?
Well, how about that, you're right:
gargamel:/mnt/mnt/DS2/backup# find /mnt/mnt -inum 1225897
/mnt/mnt/DS2/backup/debian64_rw.20160713_03:21:57/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg
So basically the breakage in my filesystem is enough that the backlink
from the inode to the pathname is gone? That's not good :-/

> And then free their data extents in next several trans.
> 
> Try to find these inodes using inode number in specified subvolume.
> If not found, then they are orphan inodes, nothing to worry.
> These wrong data extent will disappear soon or later.
> 
> Or you can use "btrfs fi sync" to make sure orphan inodes are really removed
> from tree.
 
So, I ran btrfi fi sync /mnt/mnt, butit returned instantly.

scrub after that, still returns:
btrfs scrub start -Bd /mnt/mnt
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, 
flush 0, corrupt 1793, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 
269785628672 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, 
flush 0, corrupt 1794, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 
269784580096 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, 
flush 0, corrupt 1795, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 
269785632768 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, 
flush 0, corrupt 1796, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 
269785104384 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, 
flush 0, corrupt 1797, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 
269784584192 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, 
flush 0, corrupt 1798, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 
269785636864 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, 
flush 0, corrupt 1799, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 
269785108480 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, 
flush 0, corrupt 1800, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 
269784588288 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, 
flush 0, corrupt 1801, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 
269784055808 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, 
flush 0, corrupt 1802, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 
269785640960 on dev /dev/mapper/crypt_bcache2

What am I supposed to do about these, I'm not even clear where this
corruption is located and how to clear it.

I understand you're saying that this does not seem to affect any
remaining data, but if scrub is not clean, it can't even see what
file an inode is linked to, and that inode doesn't get cleaned 2 days
later, my filesystem is in a bad state that check --repair should fix,
is it not?

Yes, I can wipe it and start over, but I'm trying to use this as a
learning experience as well as seeing if the tools are working as they
should.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: when btrfs scrub reports errors and btrfs check --repair does not

2016-11-13 Thread Marc MERLIN
On Fri, Nov 11, 2016 at 07:17:08PM -0800, Marc MERLIN wrote:
> On Fri, Nov 11, 2016 at 11:55:21AM +0800, Qu Wenruo wrote:
> > It seems to be orphan inodes.
> > Btrfs doesn't remove all the contents of an inode at rm time.
> > It just unlink the inode and put it into a state called orphan inodes.(Can't
> > be referred from any directory).
> 
> BTRFS warning (device dm-6): checksum error at logical 269783928832 on dev 
> /dev/mapper/crypt_bcache2, sector 529035272, root 17564, inode 1225897, 
> offset 20480: path resolving failed with ret=-2
> BTRFS warning (device dm-6): checksum error at logical 269783932928 on dev 
> /dev/mapper/crypt_bcache2, sector 529035280, root 17564, inode 1225897, 
> offset 24576: path resolving failed with ret=-2
>  
> Do you mean I should be using find /mnt/mnt -inum ?
> Well, how about that, you're right:
> gargamel:/mnt/mnt/DS2/backup# find /mnt/mnt -inum 1225897
> /mnt/mnt/DS2/backup/debian64_rw.20160713_03:21:57/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg
> So basically the breakage in my filesystem is enough that the backlink
> from the inode to the pathname is gone? That's not good :-/

Mmmn, been doing find -inum, deleting hits, running scrub, and then
scrub still fails with more, and now I'm seeing this;

gargamel:~# find /mnt/mnt -inum 1225897

/mnt/mnt/DS2/backup/ubuntu_rw.20160713_03:25:42/gandalfthegrey/20100718/var/local/www/Pix/albums/Trips/200509_Malaysia/500_KapalaiIsland/BestOf/33_Diving-Dive5-2_139.jpg
/mnt/mnt/DS2/backup/debian64_ro.20160720_02:58:38/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg
/mnt/mnt/DS2/backup/debian64_ro.20160720_02:58:38/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y679z6.jpg
(...)
/mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg
/mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y679z6.jpg
/mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y81z9.jpg

And then I see this:
gargamel:~# ls -li 
/mnt/mnt/DS2/backup/ubuntu_rw.20160713_03:25:42/gandalfthegrey/20100718/var/local/www/Pix/albums/Trips/200509_Malaysia/500_KapalaiIsland/BestOf/33_Diving-Dive5-2_139.jpg
 
/mnt/mnt/DS2/backup/debian64_ro.20160720_02:58:38/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg
 
/mnt/mnt/DS2/backup/debian64_ro.20160720_02:58:38/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y679z6.jpg
 
/mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg
 
/mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y679z6.jpg
 
/mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y81z9.jpg
1225897 -rw-r--r-- 5 merlin merlin 13794 Jan  7  2012 
/mnt/mnt/DS2/backup/debian64_ro.20160720_02:58:38/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg
1225898 -rw-r--r-- 5 merlin merlin 13048 Jan  7  2012 
/mnt/mnt/DS2/backup/debian64_ro.20160720_02:58:38/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y679z6.jpg
1225897 -rw-r--r-- 5 merlin merlin 13794 Jan  7  2012 
/mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg
1225898 -rw-r--r-- 5 merlin merlin 13048 Jan  7  2012 
/mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y679z6.jpg
1225913 -rw-r--r-- 5 merlin merlin 15247 Jan  7  2012 
/mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y81z9.jpg
1225897 lrwxrwxrwx 1 merlin merlin35 Aug  1  2010 
/mnt/mnt/DS2/backup/ubuntu_rw.20160713_03:25:42/gandalfthegrey/20100718/var/local/www/Pix/albums/Trips/200509_Malaysia/500_KapalaiIsland/BestOf/33_Diving-Dive5-2_139.jpg
 -> ../33_Diving/BestOf/Dive5-2_139.jpg

So first:
a) find -inum returns some inodes that don't match
b) but argh, multiple files (very different) have the same inode number, so 
finding
files by inode number after scrub flagged an inode bad, isn't going to work :(

At this point, I'm starting to lose patience (and running out of time),
so I'm going to wipe this filesystem after I hear back from you, but
basically scrub and repair and still not up to what they should be IMO
(as per my previous comment):
One should be able to fully repair an unclean filesystem with check --repair, 
and scrub should
give me things I can either fi

Re: when btrfs scrub reports errors and btrfs check --repair does not

2016-11-13 Thread Marc MERLIN
On Sun, Nov 13, 2016 at 08:13:29PM +0500, Roman Mamedov wrote:
> On Sun, 13 Nov 2016 07:06:30 -0800
> Marc MERLIN  wrote:
> 
> > So first:
> > a) find -inum returns some inodes that don't match
> > b) but argh, multiple files (very different) have the same inode number, so 
> > finding
> > files by inode number after scrub flagged an inode bad, isn't going to work 
> > :(
> 
> I wonder why do you even need scrub to verify file readability. Just try
> reading all files by using e.g. "cfv -Crr", the read errors produced will
> point you directly to files which are unreadable, without the need to lookup
> them in a backward way via inum. Then just restore those from backups.

I could read the files, but we're talking about maybe 100 million files?
that would take a while... (and most of them are COW copies of the same
physical data), so scrub is _much_ faster.

Scrub is also reporting issues not related to files, but data structures
it seems, while repair is not fiding them.

As for the data, it's a backup device, so I can just wipe it, but again,
I'm using this as an example of how I would simply bring a drive back to
a clean state, and that's not pretty right now.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.8.8, bcache deadlock and hard lockup

2016-11-30 Thread Marc MERLIN
+btrfs mailing list, see below why

On Tue, Nov 29, 2016 at 12:59:44PM -0800, Eric Wheeler wrote:
> On Mon, 27 Nov 2016, Coly Li wrote:
> > 
> > Yes, too many work queues... I guess the locking might be caused by some
> > very obscure reference of closure code. I cannot have any clue if I
> > cannot find a stable procedure to reproduce this issue.
> > 
> > Hmm, if there is a tool to clone all the meta data of the back end cache
> > and whole cached device, there might be a method to replay the oops much
> > easier.
> > 
> > Eric, do you have any hint ?
> 
> Note that the backing device doesn't have any metadata, just a superblock. 
> You can easily dd that off onto some other volume without transferring the 
> data. By default, data starts at 8k, or whatever you used in `make-bcache 
> -w`.

Ok, Linus helped me find a workaround for this problem:
https://lkml.org/lkml/2016/11/29/667
namely:
   echo 2 > /proc/sys/vm/dirty_ratio
   echo 1 > /proc/sys/vm/dirty_background_ratio
(it's a 24GB system, so the defaults of 20 and 10 were creating too many
requests in th buffers)

Note that this is only a workaround, not a fix.

When I did this and re tried my big copy again, I still got 100+ kernel
work queues, but apparently the underlying swraid5 was able to unblock
and satisfy the write requests before too many accumulated and crashed
the kernel.

I'm not a kernel coder, but seems to me that bcache needs a way to
throttle incoming requests if there are too many so that it does not end
up in a state where things blow up due to too many piled up requests.

You should be able to reproduce this by taking 5 spinning rust drives,
put raid5 on top, dmcrypt, bcache and hopefully any filesystem (although
I used btrfs) and send lots of requests.
Actually to be honest, the problems have mostly been happening when I do
btrfs scrub and btrfs send/receive which both generate I/O from within
the kernel instead of user space.
So here, btrfs may be a contributor to the problem too, but while btrfs
still trashes my system if I remove the caching device on bcache (and
with the default dirty ratio values), it doesn't crash the kernel.

I'll start another separate thread with the btrfs folks on how much
pressure is put on the system, but on your side it would be good to help
ensure that bcache doesn't crash the system altogether if too many
requests are allowed to pile up.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs flooding the I/O subsystem and hanging the machine, with bcache cache turned off

2016-11-30 Thread Marc MERLIN
On Wed, Nov 30, 2016 at 08:46:46AM -0800, Marc MERLIN wrote:
> +btrfs mailing list, see below why
> 
> Ok, Linus helped me find a workaround for this problem:
> https://lkml.org/lkml/2016/11/29/667
> namely:
>echo 2 > /proc/sys/vm/dirty_ratio
>echo 1 > /proc/sys/vm/dirty_background_ratio
> (it's a 24GB system, so the defaults of 20 and 10 were creating too many
> requests in th buffers)

I'll remove the bcache list on this followup since I want to concentrate
here on the fact that btrfs does behave badly with the default
dirty_ratio values.
As a reminder, it's a btrfs send/receive copy between 2 swraid5 arrays
on spinning rust.
swraid5 < bcache < dmcrypt < btrfs

Copying with btrfs send/receive causes massive hangs on the system.
Please see this explanation from Linus on why the workaround was
suggested:
https://lkml.org/lkml/2016/11/29/667

The hangs that I'm getting with bcache cache turned off (i.e.
passthrough) are now very likely only due to btrfs and mess up anything
doing file IO that ends up timing out, break USB even as reads time out
in the middle of USB requests, interrupts lost, and so forth.

All of this mostly went away with Linus' suggestion:
echo 2 > /proc/sys/vm/dirty_ratio
echo 1 > /proc/sys/vm/dirty_background_ratio

But that's hiding the symptom which I think is that btrfs is piling up too many 
I/O
requests during btrfs send/receive and btrfs scrub (probably balance too) and 
not 
looking at resulting impact to system health.

Is there a way to stop flodding the entire system with I/O and causing
so much strain on it?
(I realize that if there is a caching layer underneath that just takes
requests and says thank you without giving other clues that underneath
bad things are happening, it may be hard, but I'm asking anyway :)


[10338.968912] perf: interrupt took too long (3927 > 3917), lowering 
kernel.perf_event_max_sample_rate to 50750

[12971.047705] ftdi_sio ttyUSB15: usb_serial_generic_read_bulk_callback - urb 
stopped: -32

[17761.122238] usb 4-1.4: USB disconnect, device number 39
[17761.141063] usb 4-1.4: usbfs: USBDEVFS_CONTROL failed cmd hub-ctrl rqt 160 
rq 6 len 1024 ret -108
[17761.263252] usb 4-1: reset SuperSpeed USB device number 2 using xhci_hcd
[17761.938575] usb 4-1.4: new SuperSpeed USB device number 40 using xhci_hcd

[24130.574425] hpet1: lost 2306 rtc interrupts
[24156.034950] hpet1: lost 1628 rtc interrupts
[24173.314738] hpet1: lost 1104 rtc interrupts
[24180.129950] hpet1: lost 436 rtc interrupts
[24257.557955] hpet1: lost 4954 rtc interrupts
[24267.522656] hpet1: lost 637 rtc interrupts

[28034.954435] INFO: task btrfs:5618 blocked for more than 120 seconds.
[28034.975471]   Tainted: G U  
4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #12
[28035.000964] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[28035.025429] btrfs   D 91154d33fc70 0  5618   5372 0x0080
[28035.047717]  91154d33fc70 00200246 911842f880c0 
9115a4cf01c0
[28035.071020]  91154d33fc58 91154d34 91165493bca0 
9115623773f0
[28035.094252]  1000 0001 91154d33fc88 
b86cf1a6
[28035.117538] Call Trace:
[28035.125791]  [] schedule+0x8b/0xa3
[28035.141550]  [] btrfs_start_ordered_extent+0xce/0x122
[28035.162457]  [] ? wake_up_atomic_t+0x2c/0x2c
[28035.180891]  [] btrfs_wait_ordered_range+0xa9/0x10d
[28035.201723]  [] btrfs_truncate+0x40/0x24b
[28035.219269]  [] btrfs_setattr+0x1da/0x2d7
[28035.237032]  [] notify_change+0x252/0x39c
[28035.254566]  [] do_truncate+0x81/0xb4
[28035.271057]  [] vfs_truncate+0xd9/0xf9
[28035.287782]  [] do_sys_truncate+0x63/0xa7

[28155.781987] INFO: task btrfs:5618 blocked for more than 120 seconds.
[28155.802229]   Tainted: G U  
4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #12
[28155.827894] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[28155.852479] btrfs   D 91154d33fc70 0  5618   5372 0x0080
[28155.874761]  91154d33fc70 00200246 911842f880c0 
9115a4cf01c0
[28155.898059]  91154d33fc58 91154d34 91165493bca0 
9115623773f0
[28155.921464]  1000 0001 91154d33fc88 
b86cf1a6
[28155.944720] Call Trace:
[28155.953176]  [] schedule+0x8b/0xa3
[28155.968945]  [] btrfs_start_ordered_extent+0xce/0x122
[28155.989811]  [] ? wake_up_atomic_t+0x2c/0x2c
[28156.008195]  [] btrfs_wait_ordered_range+0xa9/0x10d
[28156.028498]  [] btrfs_truncate+0x40/0x24b
[28156.046081]  [] btrfs_setattr+0x1da/0x2d7
[28156.063621]  [] notify_change+0x252/0x39c
[28156.081667]  [] do_truncate+0x81/0xb4
[28156.098732]  [] vfs_truncate+0xd9/0xf9
[28156.115489]  [] do_sys_truncate+0x63/0xa7
[28156.133389]  [] SyS_truncate+0xe/0x10
[28156.149831]  [] do_syscall_64+0x61/0x72
[28156.167179]  [] entry_SYSCALL64_slow_path+0x25/0x2

  1   2   3   4   5   6   7   8   >