Re: So, does btrfs check lowmem take days? weeks?

2018-07-10 Thread Su Yue




On 07/10/2018 06:53 PM, Su Yue wrote:



On 07/10/2018 12:10 PM, Marc MERLIN wrote:

On Tue, Jul 10, 2018 at 08:56:15AM +0800, Su Yue wrote:
I'm just not clear if my FS is still damaged and btrfsck was just 
hacked to

ignore the damage it can't deal with, or whether it was able to repair
things to a consistent state.
The fact that I can mount read/write with no errors seems like a 
good sign.



Yes, a good sign. Since extent tree is fixed, the errors left are in
other trees. The most bad result I can see is that writes of some 
files will

reports IO Error. This is the cost of RW.


Ok, so we agreed that btrfs scrub won't find this, so ultimately I
should run normal btrfsck --repair without the special block skip code
you added?


Yes. Here is the normal btrfsck which skips extent tree to save time.
And I fixed a bug which is mentioned in other mail by Qu.
I have no time to add progress of fs trees check though.
https://github.com/Damenly/btrfs-progs/tree/tmp1

It may take a long time to fix errors unresolved.
#./btrfsck -e 2 --mode=lowmem --repair $dev
'-e' means to skip extent tree.
Here is the mail. Running above command should sloves errors.

If no other errors occurs, your FS will be good.

Please not run repair of master branch, please :(.
It will ruin all things we did in recent days.

Thanks,
Su

Thanks
Su


Since I can mount the filesystem read/write though, I can probably
delete a lot of snapshots to help the next fsck to run.
I assume the number of snapshots also affects the amount of memory taken
by regular fsck, so maybe if I delete enough of them regular fsck
--repair will work again?

Thanks,
Marc




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-10 Thread Su Yue




On 07/10/2018 12:55 PM, Qu Wenruo wrote:



On 2018年07月10日 11:50, Marc MERLIN wrote:

On Tue, Jul 10, 2018 at 09:34:36AM +0800, Qu Wenruo wrote:

Ok, this is where I am now:
WARNING: debug: end of checking extent item[18457780273152 169 1]
type: 176 offset: 2
checking extent items [18457780273152/18457780273152]
ERROR: errors found in extent allocation tree or chunk allocation
checking fs roots
ERROR: root 17592 EXTENT_DATA[25937109 4096] gap exists, expected:
EXTENT_DATA[25937109 4033]


The expected end is not even aligned to sectorsize.

I think there is something wrong.
Dump tree on this INODE would definitely help in this case.

Marc, would you please try dump using the following command?

# btrfs ins dump-tree -t 17592  | grep -C 40 25937109
  
Sure, there you go:

gargamel:~# btrfs ins dump-tree -t 17592 /dev/mapper/dshelf2  | grep -C 40 
25937109

[snip]

item 30 key (25937109 INODE_ITEM 0) itemoff 13611 itemsize 160
generation 137680 transid 137680 size 85312 nbytes 85953
block group 0 mode 100644 links 1 uid 500 gid 500 rdev 0
sequence 253 flags 0x0(none)
atime 1529023177.0 (2018-06-14 17:39:37)
ctime 1529023181.625870411 (2018-06-14 17:39:41)
mtime 1528885147.0 (2018-06-13 03:19:07)
otime 1529023159.138139719 (2018-06-14 17:39:19)
item 31 key (25937109 INODE_REF 14354867) itemoff 13559 itemsize 52
index 33627 namelen 42 name: 
thumb1024_112_DiveB-1_Oslob_Whaleshark.jpg
item 32 key (25937109 EXTENT_DATA 0) itemoff 11563 itemsize 1996
generation 137680 type 0 (inline)
inline extent data size 1975 ram_bytes 4033 compression 2 (lzo)
item 33 key (25937109 EXTENT_DATA 4033) itemoff 11510 itemsize 53
generation 143349 type 1 (regular)
extent data disk byte 0 nr 0
extent data offset 0 nr 63 ram 63
extent compression 0 (none)


OK this seems to be caused by btrfs check --repair.
(According to the generation difference).


Yes, this bug is due to old kernel behavior.
I fixed it in new version.

Thanks,
Su


So at least no data loss is caused in term of on-disk data.

However I'm not sure if kernel can handle it.
Please try to read it with caution, and see if kernel could handle it.
(I assume for the latest kernel, tree-checker would detect it and refuse
to read)

This needs some fix in btrfs check.

Thanks,
Qu


item 34 key (25937109 EXTENT_DATA 4096) itemoff 11457 itemsize 53
generation 137680 type 1 (regular)
extent data disk byte 1286516736 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
item 35 key (25937109 EXTENT_DATA 8192) itemoff 11404 itemsize 53
generation 137680 type 1 (regular)
extent data disk byte 1286520832 nr 8192
extent data offset 0 nr 12288 ram 12288
extent compression 2 (lzo)
item 36 key (25937109 EXTENT_DATA 20480) itemoff 11351 itemsize 53
generation 137680 type 1 (regular)
extent data disk byte 4199424000 nr 65536
extent data offset 0 nr 65536 ram 65536
extent compression 0 (none)






--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-09 Thread Qu Wenruo



On 2018年07月10日 11:50, Marc MERLIN wrote:
> On Tue, Jul 10, 2018 at 09:34:36AM +0800, Qu Wenruo wrote:
>> Ok, this is where I am now:
>> WARNING: debug: end of checking extent item[18457780273152 169 1]
>> type: 176 offset: 2
>> checking extent items [18457780273152/18457780273152]
>> ERROR: errors found in extent allocation tree or chunk allocation
>> checking fs roots
>> ERROR: root 17592 EXTENT_DATA[25937109 4096] gap exists, expected:
>> EXTENT_DATA[25937109 4033]
>>
>> The expected end is not even aligned to sectorsize.
>>
>> I think there is something wrong.
>> Dump tree on this INODE would definitely help in this case.
>>
>> Marc, would you please try dump using the following command?
>>
>> # btrfs ins dump-tree -t 17592  | grep -C 40 25937109
>  
> Sure, there you go:
> gargamel:~# btrfs ins dump-tree -t 17592 /dev/mapper/dshelf2  | grep -C 40 
> 25937109
[snip]
>   item 30 key (25937109 INODE_ITEM 0) itemoff 13611 itemsize 160
>   generation 137680 transid 137680 size 85312 nbytes 85953
>   block group 0 mode 100644 links 1 uid 500 gid 500 rdev 0
>   sequence 253 flags 0x0(none)
>   atime 1529023177.0 (2018-06-14 17:39:37)
>   ctime 1529023181.625870411 (2018-06-14 17:39:41)
>   mtime 1528885147.0 (2018-06-13 03:19:07)
>   otime 1529023159.138139719 (2018-06-14 17:39:19)
>   item 31 key (25937109 INODE_REF 14354867) itemoff 13559 itemsize 52
>   index 33627 namelen 42 name: 
> thumb1024_112_DiveB-1_Oslob_Whaleshark.jpg
>   item 32 key (25937109 EXTENT_DATA 0) itemoff 11563 itemsize 1996
>   generation 137680 type 0 (inline)
>   inline extent data size 1975 ram_bytes 4033 compression 2 (lzo)
>   item 33 key (25937109 EXTENT_DATA 4033) itemoff 11510 itemsize 53
>   generation 143349 type 1 (regular)
>   extent data disk byte 0 nr 0
>   extent data offset 0 nr 63 ram 63
>   extent compression 0 (none)

OK this seems to be caused by btrfs check --repair.
(According to the generation difference).

So at least no data loss is caused in term of on-disk data.

However I'm not sure if kernel can handle it.
Please try to read it with caution, and see if kernel could handle it.
(I assume for the latest kernel, tree-checker would detect it and refuse
to read)

This needs some fix in btrfs check.

Thanks,
Qu

>   item 34 key (25937109 EXTENT_DATA 4096) itemoff 11457 itemsize 53
>   generation 137680 type 1 (regular)
>   extent data disk byte 1286516736 nr 4096
>   extent data offset 0 nr 4096 ram 4096
>   extent compression 0 (none)
>   item 35 key (25937109 EXTENT_DATA 8192) itemoff 11404 itemsize 53
>   generation 137680 type 1 (regular)
>   extent data disk byte 1286520832 nr 8192
>   extent data offset 0 nr 12288 ram 12288
>   extent compression 2 (lzo)
>   item 36 key (25937109 EXTENT_DATA 20480) itemoff 11351 itemsize 53
>   generation 137680 type 1 (regular)
>   extent data disk byte 4199424000 nr 65536
>   extent data offset 0 nr 65536 ram 65536
>   extent compression 0 (none)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-09 Thread Marc MERLIN
To fill in for the spectators on the list :)
Su gave me a modified version of btrfsck lowmem that was able to clean
most of my filesystem.
It's not a general case solution since it had some hardcoding specific
to my filesystem problems, but still a great success.
Email quoted below, along with responses to Qu

On Tue, Jul 10, 2018 at 09:09:33AM +0800, Qu Wenruo wrote:
> 
> 
> On 2018年07月10日 01:48, Marc MERLIN wrote:
> > Success!
> > Well done Su, this is a huge improvement to the lowmem code. It went from 
> > days to less than 3 hours.
> 
> Awesome work!
> 
> > I'll paste the logs below.
> > 
> > Questions:
> > 1) I assume I first need to delete a lot of snapshots. What is the limit in 
> > your opinion?
> > 100? 150? other?
> 
> My personal recommendation is just 20. Not 150, not even 100.
 
I see. Then, I may be forced to recreate multiple filesystems anyway.
I have about 25 btrfs send/receive relationships and I have around 10
historical snapshots for each.

In the future, can't we segment extents/snapshots per subvolume, making
subvolumes mini filesystems within the bigger filesystem?

> But snapshot deletion will take time (and it's delayed, you won't know
> if something wrong happened just after "btrfs subv delete") and even
> require a healthy extent tree.
> If all extent tree errors are just false alert, that should not be a big
> problem at all.
> 
> > 
> > 2) my filesystem is somewhat misbalanced. Which balance options do you 
> > think are safe to use?
> 
> I would recommend to manually check extent tree for BLOCK_GROUP_ITEM,
> which will tell how big a block group is and how many space is used.
> And gives you an idea on which block group can be relocated.
> Then use vrange= to specify exact block group to relocation.
> 
> One example would be:
> 
> # btrfs ins dump-tree -t extent  | grep -A1 BLOCK_GROUP_ITEM |\
>   tee block_group_dump
> 
> Then the output contains:
>   item 1 key (13631488 BLOCK_GROUP_ITEM 8388608) itemoff 16206 itemsize 24
>   block group used 262144 chunk_objectid 256 flags DATA
> 
> The "13631488" is the bytenr of the block group.
> The "8388608" is the length of the block group.
> The "262144" is the used bytes of the block group.
> 
> The less used space the higher priority it should be relocated. (and
> faster to relocate).
> You could write a small script to do it, or there should be some tool to
> do the calculation for you.
 
I usually use something simpler:
Label: 'btrfs_boot'  uuid: e4c1daa8-9c39-4a59-b0a9-86297d397f3b
Total devices 1 FS bytes used 30.19GiB
devid1 size 79.93GiB used 78.01GiB path /dev/mapper/cryptroot

This is bad, I have 30GB of data, but 78 out of 80GB of structures full.
This is bad news and recommends a balance, correct?
If so, I always struggle as to what value I should give to dusage and
musage...

> And only relocate one block group each time, to avoid possible problem.
> 
> The last but not the least, it's highly recommend to do the relocation
> only after unused snapshots are completely deleted.
> (Or it would be super super slow to relocate)

Thank you for the advise. Hopefully this hepls someone else too, and
maybe someone can write some reallocate helper tool if I don't have the
time to do it myself.

> > 3) Should I start a scrub now (takes about 1 day) or anything else to
> > check that the filesystem is hopefully not damaged anymore?
> 
> I would normally recommend to use btrfs check, but neither mode really
> works here.
> And scrub only checks csum, doesn't check the internal cross reference
> (like content of extent tree).
> 
> Maybe Su could skip the whole extent tree check and let lowmem to check
> the fs tree only, with --check-data-csum it should be a better work than
>  scrub.

I will wait to hear back from Su, but I think the current situation is
that I still have some problems on my FS, they are just
1) not important enough to block mount rw (now it works again)
2) currently ignored by the modified btrfsck I have, but would cause
problems if I used real btrfsck.

Correct?

> > 
> > 4) should btrfs check reset the corrupt counter?
> > bdev /dev/mapper/dshelf2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
> > for now, should I reset it manually?
> 
> It could be pretty easy to implement if not already implemented.

Seems like it's not given that Su's btrfsck --repair ran to completion
and I still have corrupt set to '2' :)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-09 Thread Marc MERLIN
On Tue, Jul 10, 2018 at 09:34:36AM +0800, Qu Wenruo wrote:
>  Ok, this is where I am now:
>  WARNING: debug: end of checking extent item[18457780273152 169 1]
>  type: 176 offset: 2
>  checking extent items [18457780273152/18457780273152]
>  ERROR: errors found in extent allocation tree or chunk allocation
>  checking fs roots
>  ERROR: root 17592 EXTENT_DATA[25937109 4096] gap exists, expected:
>  EXTENT_DATA[25937109 4033]
> 
> The expected end is not even aligned to sectorsize.
> 
> I think there is something wrong.
> Dump tree on this INODE would definitely help in this case.
> 
> Marc, would you please try dump using the following command?
> 
> # btrfs ins dump-tree -t 17592  | grep -C 40 25937109
 
Sure, there you go:
gargamel:~# btrfs ins dump-tree -t 17592 /dev/mapper/dshelf2  | grep -C 40 
25937109
extent data disk byte 3259370151936 nr 114688
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 144 key (2009526 EXTENT_DATA 1179648) itemoff 7931 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370266624 nr 118784
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 145 key (2009526 EXTENT_DATA 1310720) itemoff 7878 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370385408 nr 118784
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 146 key (2009526 EXTENT_DATA 1441792) itemoff 7825 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370504192 nr 118784
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 147 key (2009526 EXTENT_DATA 1572864) itemoff 7772 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370622976 nr 114688
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 148 key (2009526 EXTENT_DATA 1703936) itemoff 7719 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370737664 nr 118784
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 149 key (2009526 EXTENT_DATA 1835008) itemoff 7666 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370856448 nr 118784
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 150 key (2009526 EXTENT_DATA 1966080) itemoff 7613 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370975232 nr 118784
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 151 key (2009526 EXTENT_DATA 2097152) itemoff 7560 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371094016 nr 114688
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 152 key (2009526 EXTENT_DATA 2228224) itemoff 7507 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371208704 nr 114688
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 153 key (2009526 EXTENT_DATA 2359296) itemoff 7454 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371323392 nr 110592
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 154 key (2009526 EXTENT_DATA 2490368) itemoff 7401 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371433984 nr 114688
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 155 key (2009526 EXTENT_DATA 2621440) itemoff 7348 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371548672 nr 110592
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 156 key (2009526 EXTENT_DATA 2752512) itemoff 7295 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371659264 nr 114688
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 157 key (2009526 EXTENT_DATA 2883584) itemoff 7242 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371773952 nr 106496
extent 

Re: So, does btrfs check lowmem take days? weeks?

2018-07-09 Thread Qu Wenruo



On 2018年07月10日 09:37, Su Yue wrote:
> [CC to linux-btrfs]
> 
> Here is the log of wrong extent data.
> 
> On 07/08/2018 01:21 AM, Marc MERLIN wrote:
>> On Fri, Jul 06, 2018 at 10:56:36AM -0700, Marc MERLIN wrote:
>>> On Fri, Jul 06, 2018 at 09:05:23AM -0700, Marc MERLIN wrote:
 Ok, this is where I am now:
 WARNING: debug: end of checking extent item[18457780273152 169 1]
 type: 176 offset: 2
 checking extent items [18457780273152/18457780273152]
 ERROR: errors found in extent allocation tree or chunk allocation
 checking fs roots
 ERROR: root 17592 EXTENT_DATA[25937109 4096] gap exists, expected:
 EXTENT_DATA[25937109 4033]

The expected end is not even aligned to sectorsize.

I think there is something wrong.
Dump tree on this INODE would definitely help in this case.

Marc, would you please try dump using the following command?

# btrfs ins dump-tree -t 17592  | grep -C 40 25937109

Thanks,
Qu

 ERROR: root 17592 EXTENT_DATA[25937109 8192] gap exists, expected:
 EXTENT_DATA[25937109 8129]
 ERROR: root 17592 EXTENT_DATA[25937109 20480] gap exists, expected:
 EXTENT_DATA[25937109 20417]
 ERROR: root 17592 EXTENT_DATA[25937493 4096] gap exists, expected:
 EXTENT_DATA[25937493 3349]
 ERROR: root 17592 EXTENT_DATA[25937493 8192] gap exists, expected:
 EXTENT_DATA[25937493 7445]
 ERROR: root 17592 EXTENT_DATA[25937493 12288] gap exists, expected:
 EXTENT_DATA[25937493 11541]
 ERROR: root 17592 EXTENT_DATA[25941335 4096] gap exists, expected:
 EXTENT_DATA[25941335 4091]
 ERROR: root 17592 EXTENT_DATA[25941335 8192] gap exists, expected:
 EXTENT_DATA[25941335 8187]
 ERROR: root 17592 EXTENT_DATA[25942002 4096] gap exists, expected:
 EXTENT_DATA[25942002 4093]
 ERROR: root 17592 EXTENT_DATA[25942790 4096] gap exists, expected:
 EXTENT_DATA[25942790 4094]
 ERROR: root 17592 EXTENT_DATA[25945819 4096] gap exists, expected:
 EXTENT_DATA[25945819 4093]
 ERROR: root 17592 EXTENT_DATA[26064834 4096] gap exists, expected:
 EXTENT_DATA[26064834 129]
 ERROR: root 17592 EXTENT_DATA[26064834 135168] gap exists, expected:
 EXTENT_DATA[26064834 131201]
 ERROR: root 17592 EXTENT_DATA[26064834 266240] gap exists, expected:
 EXTENT_DATA[26064834 262273]
 ERROR: root 17592 EXTENT_DATA[26064834 397312] gap exists, expected:
 EXTENT_DATA[26064834 393345]
 ERROR: root 17592 EXTENT_DATA[26064834 528384] gap exists, expected:
 EXTENT_DATA[26064834 524417]
 ERROR: root 17592 EXTENT_DATA[26064834 659456] gap exists, expected:
 EXTENT_DATA[26064834 655489]
 ERROR: root 17592 EXTENT_DATA[26064834 790528] gap exists, expected:
 EXTENT_DATA[26064834 786561]
 ERROR: root 17592 EXTENT_DATA[26064834 921600] gap exists, expected:
 EXTENT_DATA[26064834 917633]
 ERROR: root 17592 EXTENT_DATA[26064834 929792] gap exists, expected:
 EXTENT_DATA[26064834 925825]
 ERROR: root 17592 EXTENT_DATA[26064834 1224704] gap exists,
 expected: EXTENT_DATA[26064834 1220737]

 I'm not sure how long it's been stuck on that line. I'll watch it
 today.
>>>
>>> Ok, it's been stuck there for 2H.
>>
>> Well, it's now the next day and it's finished running:
>>
>> checking extent items [18457780273152/18457780273152]
>> ERROR: errors found in extent allocation tree or chunk allocation
>> checking fs roots
>> ERROR: root 17592 EXTENT_DATA[25937109 4096] gap exists, expected:
>> EXTENT_DATA[25937109 4033]
>> ERROR: root 17592 EXTENT_DATA[25937109 8192] gap exists, expected:
>> EXTENT_DATA[25937109 8129]
>> ERROR: root 17592 EXTENT_DATA[25937109 20480] gap exists, expected:
>> EXTENT_DATA[25937109 20417]
>> ERROR: root 17592 EXTENT_DATA[25937493 4096] gap exists, expected:
>> EXTENT_DATA[25937493 3349]
>> ERROR: root 17592 EXTENT_DATA[25937493 8192] gap exists, expected:
>> EXTENT_DATA[25937493 7445]
>> ERROR: root 17592 EXTENT_DATA[25937493 12288] gap exists, expected:
>> EXTENT_DATA[25937493 11541]
>> ERROR: root 17592 EXTENT_DATA[25941335 4096] gap exists, expected:
>> EXTENT_DATA[25941335 4091]
>> ERROR: root 17592 EXTENT_DATA[25941335 8192] gap exists, expected:
>> EXTENT_DATA[25941335 8187]
>> ERROR: root 17592 EXTENT_DATA[25942002 4096] gap exists, expected:
>> EXTENT_DATA[25942002 4093]
>> ERROR: root 17592 EXTENT_DATA[25942790 4096] gap exists, expected:
>> EXTENT_DATA[25942790 4094]
>> ERROR: root 17592 EXTENT_DATA[25945819 4096] gap exists, expected:
>> EXTENT_DATA[25945819 4093]
>> ERROR: root 17592 EXTENT_DATA[26064834 4096] gap exists, expected:
>> EXTENT_DATA[26064834 129]
>> ERROR: root 17592 EXTENT_DATA[26064834 135168] gap exists, expected:
>> EXTENT_DATA[26064834 131201]
>> ERROR: root 17592 EXTENT_DATA[26064834 266240] gap exists, expected:
>> EXTENT_DATA[26064834 262273]
>> ERROR: root 17592 EXTENT_DATA[26064834 397312] gap exists, expected:
>> EXTENT_DATA[26064834 393345]
>> ERROR: root 17592 EXTENT_DATA[26064834 528384] gap exists, expected:
>> 

Re: So, does btrfs check lowmem take days? weeks?

2018-07-09 Thread Su Yue

[CC to linux-btrfs]

Here is the log of wrong extent data.

On 07/08/2018 01:21 AM, Marc MERLIN wrote:

On Fri, Jul 06, 2018 at 10:56:36AM -0700, Marc MERLIN wrote:

On Fri, Jul 06, 2018 at 09:05:23AM -0700, Marc MERLIN wrote:

Ok, this is where I am now:
WARNING: debug: end of checking extent item[18457780273152 169 1] type: 176 
offset: 2
checking extent items [18457780273152/18457780273152]
ERROR: errors found in extent allocation tree or chunk allocation
checking fs roots
ERROR: root 17592 EXTENT_DATA[25937109 4096] gap exists, expected: 
EXTENT_DATA[25937109 4033]
ERROR: root 17592 EXTENT_DATA[25937109 8192] gap exists, expected: 
EXTENT_DATA[25937109 8129]
ERROR: root 17592 EXTENT_DATA[25937109 20480] gap exists, expected: 
EXTENT_DATA[25937109 20417]
ERROR: root 17592 EXTENT_DATA[25937493 4096] gap exists, expected: 
EXTENT_DATA[25937493 3349]
ERROR: root 17592 EXTENT_DATA[25937493 8192] gap exists, expected: 
EXTENT_DATA[25937493 7445]
ERROR: root 17592 EXTENT_DATA[25937493 12288] gap exists, expected: 
EXTENT_DATA[25937493 11541]
ERROR: root 17592 EXTENT_DATA[25941335 4096] gap exists, expected: 
EXTENT_DATA[25941335 4091]
ERROR: root 17592 EXTENT_DATA[25941335 8192] gap exists, expected: 
EXTENT_DATA[25941335 8187]
ERROR: root 17592 EXTENT_DATA[25942002 4096] gap exists, expected: 
EXTENT_DATA[25942002 4093]
ERROR: root 17592 EXTENT_DATA[25942790 4096] gap exists, expected: 
EXTENT_DATA[25942790 4094]
ERROR: root 17592 EXTENT_DATA[25945819 4096] gap exists, expected: 
EXTENT_DATA[25945819 4093]
ERROR: root 17592 EXTENT_DATA[26064834 4096] gap exists, expected: 
EXTENT_DATA[26064834 129]
ERROR: root 17592 EXTENT_DATA[26064834 135168] gap exists, expected: 
EXTENT_DATA[26064834 131201]
ERROR: root 17592 EXTENT_DATA[26064834 266240] gap exists, expected: 
EXTENT_DATA[26064834 262273]
ERROR: root 17592 EXTENT_DATA[26064834 397312] gap exists, expected: 
EXTENT_DATA[26064834 393345]
ERROR: root 17592 EXTENT_DATA[26064834 528384] gap exists, expected: 
EXTENT_DATA[26064834 524417]
ERROR: root 17592 EXTENT_DATA[26064834 659456] gap exists, expected: 
EXTENT_DATA[26064834 655489]
ERROR: root 17592 EXTENT_DATA[26064834 790528] gap exists, expected: 
EXTENT_DATA[26064834 786561]
ERROR: root 17592 EXTENT_DATA[26064834 921600] gap exists, expected: 
EXTENT_DATA[26064834 917633]
ERROR: root 17592 EXTENT_DATA[26064834 929792] gap exists, expected: 
EXTENT_DATA[26064834 925825]
ERROR: root 17592 EXTENT_DATA[26064834 1224704] gap exists, expected: 
EXTENT_DATA[26064834 1220737]

I'm not sure how long it's been stuck on that line. I'll watch it today.


Ok, it's been stuck there for 2H.


Well, it's now the next day and it's finished running:

checking extent items [18457780273152/18457780273152]
ERROR: errors found in extent allocation tree or chunk allocation
checking fs roots
ERROR: root 17592 EXTENT_DATA[25937109 4096] gap exists, expected: 
EXTENT_DATA[25937109 4033]
ERROR: root 17592 EXTENT_DATA[25937109 8192] gap exists, expected: 
EXTENT_DATA[25937109 8129]
ERROR: root 17592 EXTENT_DATA[25937109 20480] gap exists, expected: 
EXTENT_DATA[25937109 20417]
ERROR: root 17592 EXTENT_DATA[25937493 4096] gap exists, expected: 
EXTENT_DATA[25937493 3349]
ERROR: root 17592 EXTENT_DATA[25937493 8192] gap exists, expected: 
EXTENT_DATA[25937493 7445]
ERROR: root 17592 EXTENT_DATA[25937493 12288] gap exists, expected: 
EXTENT_DATA[25937493 11541]
ERROR: root 17592 EXTENT_DATA[25941335 4096] gap exists, expected: 
EXTENT_DATA[25941335 4091]
ERROR: root 17592 EXTENT_DATA[25941335 8192] gap exists, expected: 
EXTENT_DATA[25941335 8187]
ERROR: root 17592 EXTENT_DATA[25942002 4096] gap exists, expected: 
EXTENT_DATA[25942002 4093]
ERROR: root 17592 EXTENT_DATA[25942790 4096] gap exists, expected: 
EXTENT_DATA[25942790 4094]
ERROR: root 17592 EXTENT_DATA[25945819 4096] gap exists, expected: 
EXTENT_DATA[25945819 4093]
ERROR: root 17592 EXTENT_DATA[26064834 4096] gap exists, expected: 
EXTENT_DATA[26064834 129]
ERROR: root 17592 EXTENT_DATA[26064834 135168] gap exists, expected: 
EXTENT_DATA[26064834 131201]
ERROR: root 17592 EXTENT_DATA[26064834 266240] gap exists, expected: 
EXTENT_DATA[26064834 262273]
ERROR: root 17592 EXTENT_DATA[26064834 397312] gap exists, expected: 
EXTENT_DATA[26064834 393345]
ERROR: root 17592 EXTENT_DATA[26064834 528384] gap exists, expected: 
EXTENT_DATA[26064834 524417]
ERROR: root 17592 EXTENT_DATA[26064834 659456] gap exists, expected: 
EXTENT_DATA[26064834 655489]
ERROR: root 17592 EXTENT_DATA[26064834 790528] gap exists, expected: 
EXTENT_DATA[26064834 786561]
ERROR: root 17592 EXTENT_DATA[26064834 921600] gap exists, expected: 
EXTENT_DATA[26064834 917633]
ERROR: root 17592 EXTENT_DATA[26064834 929792] gap exists, expected: 
EXTENT_DATA[26064834 925825]
ERROR: root 17592 EXTENT_DATA[26064834 1224704] gap exists, expected: 
EXTENT_DATA[26064834 1220737]
ERROR: root 21322 EXTENT_DATA[25320803 4096] gap exists, expected: 
EXTENT_DATA[25320803 56]
ERROR: root 21322 EXTENT_DATA[25320803 

Re: Fwd: Re: So, does btrfs check lowmem take days? weeks?

2018-07-09 Thread Su Yue

Forgot to CC Marc.

On 07/10/2018 09:33 AM, Su Yue wrote:

[FWD to linux-btrfs]
Thanks to Marc's patient of running and tests btrfsck lowmem mode
in recent days.
The FS has a large extent tree but luckily few are corrupted, they are
all fixed by special version. Reloc trees were cleaned too.
So the FS can be mounted with RW.

However, the remaining errors of extent data in file trees are 
unresloved, they are all about holes.
Since I'm not familiar with kernel code, not sure how serious those 
errors are and what result could be during write/read those wrong

items.

Marc also has some questions in the part forwarded, replies are
always welcome.

Error messages are showed in the last.


 Forwarded Message 
Subject: Re: So, does btrfs check lowmem take days? weeks?
Date: Mon, 9 Jul 2018 10:48:18 -0700
From: Marc MERLIN 
To: Su Yue 
CC: quwenruo.bt...@gmx.com, Su Yue 

Success!
Well done Su, this is a huge improvement to the lowmem code. It went 
from days to less than 3 hours.


I'll paste the logs below.

Questions:
1) I assume I first need to delete a lot of snapshots. What is the limit 
in your opinion?

100? 150? other?

2) my filesystem is somewhat misbalanced. Which balance options do you 
think are safe to use?


3) Should I start a scrub now (takes about 1 day) or anything else to
check that the filesystem is hopefully not damaged anymore?

4) should btrfs check reset the corrupt counter?
bdev /dev/mapper/dshelf2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
for now, should I reset it manually?

Thanks,
Marc


gargamel:/var/local/src/btrfs-progs.sy# ./btrfsck --mode=lowmem -q 
--repair /dev/mapper/dshelf2

enabling repair mode
WARNING: low-memory mode repair support is only partial
Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
Created new chunk [18460145811456 1073741824]
Add one extent data backref [84302495744 69632]
Add one extent data backref [84302495744 69632]
Add one extent data backref [125712527360 12214272]
Add one extent data backref [125730848768 5111808]
Add one extent data backref [125730848768 5111808]
Add one extent data backref [125736914944 6037504]
Add one extent data backref [125736914944 6037504]
Add one extent data backref [129952120832 20242432]
Add one extent data backref [129952120832 20242432]
Add one extent data backref [134925357056 11829248]
Add one extent data backref [134925357056 11829248]
Add one extent data backref [147895111680 12345344]
Add one extent data backref [147895111680 12345344]
Add one extent data backref [150850146304 17522688]
Add one extent data backref [156909494272 55320576]
Add one extent data backref [156909494272 55320576]
good luck!
found 0 bytes used, no error found
total csum bytes: 0
total tree bytes: 0
total fs tree bytes: 0
total extent tree bytes: 0
btree space waste bytes: 0
file data blocks allocated: 0
  referenced 0
gargamel:/var/local/src/btrfs-progs.sy# ./btrfsck --mode=lowmem -q 
/dev/mapper/dshelf2

Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
good luck!
found 251650048 bytes used, no error found
total csum bytes: 0
total tree bytes: 0
total fs tree bytes: 0
total extent tree bytes: 0
btree space waste bytes: 0
file data blocks allocated: 0
  referenced 0
gargamel:/var/local/src/btrfs-progs.sy# ./btrfsck -c /dev/mapper/dshelf2
Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
found 0 bytes used, no error found
total csum bytes: 0
total tree bytes: 0
total fs tree bytes: 0
total extent tree bytes: 0
btree space waste bytes: 0
file data blocks allocated: 0
  referenced 0
gargamel:/var/local/src/btrfs-progs.sy# mount /dev/mapper/dshelf2 /mnt/mnt
[671283.314558] BTRFS info (device dm-2): disk space caching is enabled
[671283.334226] BTRFS info (device dm-2): has skinny extents
[671285.191740] BTRFS info (device dm-2): bdev /dev/mapper/dshelf2 errs: 
wr 0, rd 0, flush 0, corrupt 2, gen 0

[671395.371313] BTRFS info (device dm-2): enabling ssd optimizations
[671400.884013] BTRFS info (device dm-2): checking UUID tree
(hung about 2-3mn but worked eventually)

gargamel:/mnt/mnt# btrfs fi show .
Label: 'dshelf2'  uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
     Total devices 1 FS bytes used 12.59TiB
     devid    1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2

gargamel:/mnt/mnt# btrfs fi df .
Data, single: total=13.57TiB, used=12.48TiB
System, DUP: total=32.00MiB, used=1.55MiB
Metadata, DUP: total=124.50GiB, used=116.92GiB
Metadata, single: total=216.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=42.62MiB

gargamel:/mnt/mnt# btrfs subvolume list . | wc -l
270





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: Re: So, does btrfs check lowmem take days? weeks?

2018-07-09 Thread Su Yue

[FWD to linux-btrfs]
Thanks to Marc's patient of running and tests btrfsck lowmem mode
in recent days.
The FS has a large extent tree but luckily few are corrupted, they are
all fixed by special version. Reloc trees were cleaned too.
So the FS can be mounted with RW.

However, the remaining errors of extent data in file trees are 
unresloved, they are all about holes.
Since I'm not familiar with kernel code, not sure how serious those 
errors are and what result could be during write/read those wrong

items.

Marc also has some questions in the part forwarded, replies are
always welcome.

Error messages are showed in the last.


 Forwarded Message 
Subject: Re: So, does btrfs check lowmem take days? weeks?
Date: Mon, 9 Jul 2018 10:48:18 -0700
From: Marc MERLIN 
To: Su Yue 
CC: quwenruo.bt...@gmx.com, Su Yue 

Success!
Well done Su, this is a huge improvement to the lowmem code. It went 
from days to less than 3 hours.


I'll paste the logs below.

Questions:
1) I assume I first need to delete a lot of snapshots. What is the limit 
in your opinion?

100? 150? other?

2) my filesystem is somewhat misbalanced. Which balance options do you 
think are safe to use?


3) Should I start a scrub now (takes about 1 day) or anything else to
check that the filesystem is hopefully not damaged anymore?

4) should btrfs check reset the corrupt counter?
bdev /dev/mapper/dshelf2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
for now, should I reset it manually?

Thanks,
Marc


gargamel:/var/local/src/btrfs-progs.sy# ./btrfsck --mode=lowmem -q 
--repair /dev/mapper/dshelf2

enabling repair mode
WARNING: low-memory mode repair support is only partial
Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
Created new chunk [18460145811456 1073741824]
Add one extent data backref [84302495744 69632]
Add one extent data backref [84302495744 69632]
Add one extent data backref [125712527360 12214272]
Add one extent data backref [125730848768 5111808]
Add one extent data backref [125730848768 5111808]
Add one extent data backref [125736914944 6037504]
Add one extent data backref [125736914944 6037504]
Add one extent data backref [129952120832 20242432]
Add one extent data backref [129952120832 20242432]
Add one extent data backref [134925357056 11829248]
Add one extent data backref [134925357056 11829248]
Add one extent data backref [147895111680 12345344]
Add one extent data backref [147895111680 12345344]
Add one extent data backref [150850146304 17522688]
Add one extent data backref [156909494272 55320576]
Add one extent data backref [156909494272 55320576]
good luck!
found 0 bytes used, no error found
total csum bytes: 0
total tree bytes: 0
total fs tree bytes: 0
total extent tree bytes: 0
btree space waste bytes: 0
file data blocks allocated: 0
 referenced 0
gargamel:/var/local/src/btrfs-progs.sy# ./btrfsck --mode=lowmem -q 
/dev/mapper/dshelf2

Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
good luck!
found 251650048 bytes used, no error found
total csum bytes: 0
total tree bytes: 0
total fs tree bytes: 0
total extent tree bytes: 0
btree space waste bytes: 0
file data blocks allocated: 0
 referenced 0
gargamel:/var/local/src/btrfs-progs.sy# ./btrfsck -c /dev/mapper/dshelf2
Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
found 0 bytes used, no error found
total csum bytes: 0
total tree bytes: 0
total fs tree bytes: 0
total extent tree bytes: 0
btree space waste bytes: 0
file data blocks allocated: 0
 referenced 0
gargamel:/var/local/src/btrfs-progs.sy# mount /dev/mapper/dshelf2 /mnt/mnt
[671283.314558] BTRFS info (device dm-2): disk space caching is enabled
[671283.334226] BTRFS info (device dm-2): has skinny extents
[671285.191740] BTRFS info (device dm-2): bdev /dev/mapper/dshelf2 errs: 
wr 0, rd 0, flush 0, corrupt 2, gen 0

[671395.371313] BTRFS info (device dm-2): enabling ssd optimizations
[671400.884013] BTRFS info (device dm-2): checking UUID tree
(hung about 2-3mn but worked eventually)

gargamel:/mnt/mnt# btrfs fi show .
Label: 'dshelf2'  uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
Total devices 1 FS bytes used 12.59TiB
devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2

gargamel:/mnt/mnt# btrfs fi df .
Data, single: total=13.57TiB, used=12.48TiB
System, DUP: total=32.00MiB, used=1.55MiB
Metadata, DUP: total=124.50GiB, used=116.92GiB
Metadata, single: total=216.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=42.62MiB

gargamel:/mnt/mnt# btrfs subvolume list . | wc -l
270


--
"A mouse is a device used to point at the xterm you want to type in" - 
A.S.R.

Microsoft is to operating systems 
   what McDonalds is to gourmet 
cooking
Home page: http://marc.merlins.org/   | PGP 
7F55D5F27AAF9D08



!Error messages bellow:


Re: So, does btrfs check lowmem take days? weeks?

2018-07-03 Thread Su Yue




On 07/04/2018 05:40 AM, Marc MERLIN wrote:

On Tue, Jul 03, 2018 at 03:34:45PM -0600, Chris Murphy wrote:

On Tue, Jul 3, 2018 at 2:34 AM, Su Yue  wrote:


Yes, extent tree is the hardest part for lowmem mode. I'm quite
confident the tool can deal well with file trees(which records metadata
about file and directory name, relationships).
As for extent tree, I have few confidence due to its complexity.


I have to ask again if there's some metadata integrity mask opion Marc
should use to try to catch the corruption cause in the first place?

His use case really can't afford either mode of btrfs check. And also
check is only backward looking, it doesn't show what was happening at
the time. And for big file systems, check rapidly doesn't scale at all
anyway.

And now he's modifying his layout to avoid the problem from happening
again which makes it less likely to catch the cause, and get it fixed.
I think if he's willing to build a kernel with integrity checker
enabled, it should be considered but only if it's likely to reveal why
the problem is happening, even if it can't repair the problem once
it's happened. He's already in that situation so masked integrity
checking is no worse, at least it gives a chance to improve Btrfs
rather than it being a mystery how it got corrupt.


Yeah, I'm fine waiting a few more ays with this down and gather data if
that helps.
Thanks! I will write a special version which skips to check wrong extent 
items and print debug log.

And it must run faster to help us locate the stuck problem.

Su

But due to the size, a full btrfs image may be a bit larger than we
want, not counting some confidential data in some filenames.

Marc




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-03 Thread Qu Wenruo



On 2018年07月04日 06:00, Marc MERLIN wrote:
> On Tue, Jul 03, 2018 at 03:46:59PM -0600, Chris Murphy wrote:
>> On Tue, Jul 3, 2018 at 2:50 AM, Qu Wenruo  wrote:
>>>
>>>
>>> There must be something wrong, however due to the size of the fs, and
>>> the complexity of extent tree, I can't tell.
>>
>> Right, which is why I'm asking if any of the metadata integrity
>> checker mask options might reveal what's going wrong?
>>
>> I guess the big issues are:
>> a. compile kernel with CONFIG_BTRFS_FS_CHECK_INTEGRITY=y is necessary
>> b. it can come with a high resource burden depending on the mask and
>> where the log is being written (write system logs to a different file
>> system for sure)
>> c. the granularity offered in the integrity checker might not be enough.
>> d. might take a while before corruptions are injected before
>> corruption is noticed and flagged.
> 
> Back to where I'm at right now. I'm going to delete this filesystem and
> start over very soon. Tomorrow or the day after.
> I'm happy to get more data off it if someone wants it for posterity, but
> I indeed need to recover soon since being with a dead backup server is
> not a good place to be in :)

Feel free to recover asap, as the extent tree is really too large for
human to analyse manually.

Thanks,
Qu

> 
> Thanks,
> Marc
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-03 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 03:46:59PM -0600, Chris Murphy wrote:
> On Tue, Jul 3, 2018 at 2:50 AM, Qu Wenruo  wrote:
> >
> >
> > There must be something wrong, however due to the size of the fs, and
> > the complexity of extent tree, I can't tell.
> 
> Right, which is why I'm asking if any of the metadata integrity
> checker mask options might reveal what's going wrong?
> 
> I guess the big issues are:
> a. compile kernel with CONFIG_BTRFS_FS_CHECK_INTEGRITY=y is necessary
> b. it can come with a high resource burden depending on the mask and
> where the log is being written (write system logs to a different file
> system for sure)
> c. the granularity offered in the integrity checker might not be enough.
> d. might take a while before corruptions are injected before
> corruption is noticed and flagged.

Back to where I'm at right now. I'm going to delete this filesystem and
start over very soon. Tomorrow or the day after.
I'm happy to get more data off it if someone wants it for posterity, but
I indeed need to recover soon since being with a dead backup server is
not a good place to be in :)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-03 Thread Chris Murphy
On Tue, Jul 3, 2018 at 2:50 AM, Qu Wenruo  wrote:
>
>
> There must be something wrong, however due to the size of the fs, and
> the complexity of extent tree, I can't tell.

Right, which is why I'm asking if any of the metadata integrity
checker mask options might reveal what's going wrong?

I guess the big issues are:
a. compile kernel with CONFIG_BTRFS_FS_CHECK_INTEGRITY=y is necessary
b. it can come with a high resource burden depending on the mask and
where the log is being written (write system logs to a different file
system for sure)
c. the granularity offered in the integrity checker might not be enough.
d. might take a while before corruptions are injected before
corruption is noticed and flagged.

So it might be pointless, no idea.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-03 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 03:34:45PM -0600, Chris Murphy wrote:
> On Tue, Jul 3, 2018 at 2:34 AM, Su Yue  wrote:
> 
> > Yes, extent tree is the hardest part for lowmem mode. I'm quite
> > confident the tool can deal well with file trees(which records metadata
> > about file and directory name, relationships).
> > As for extent tree, I have few confidence due to its complexity.
> 
> I have to ask again if there's some metadata integrity mask opion Marc
> should use to try to catch the corruption cause in the first place?
> 
> His use case really can't afford either mode of btrfs check. And also
> check is only backward looking, it doesn't show what was happening at
> the time. And for big file systems, check rapidly doesn't scale at all
> anyway.
> 
> And now he's modifying his layout to avoid the problem from happening
> again which makes it less likely to catch the cause, and get it fixed.
> I think if he's willing to build a kernel with integrity checker
> enabled, it should be considered but only if it's likely to reveal why
> the problem is happening, even if it can't repair the problem once
> it's happened. He's already in that situation so masked integrity
> checking is no worse, at least it gives a chance to improve Btrfs
> rather than it being a mystery how it got corrupt.

Yeah, I'm fine waiting a few more ays with this down and gather data if
that helps.
But due to the size, a full btrfs image may be a bit larger than we
want, not counting some confidential data in some filenames.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-03 Thread Chris Murphy
On Tue, Jul 3, 2018 at 2:34 AM, Su Yue  wrote:

> Yes, extent tree is the hardest part for lowmem mode. I'm quite
> confident the tool can deal well with file trees(which records metadata
> about file and directory name, relationships).
> As for extent tree, I have few confidence due to its complexity.

I have to ask again if there's some metadata integrity mask opion Marc
should use to try to catch the corruption cause in the first place?

His use case really can't afford either mode of btrfs check. And also
check is only backward looking, it doesn't show what was happening at
the time. And for big file systems, check rapidly doesn't scale at all
anyway.

And now he's modifying his layout to avoid the problem from happening
again which makes it less likely to catch the cause, and get it fixed.
I think if he's willing to build a kernel with integrity checker
enabled, it should be considered but only if it's likely to reveal why
the problem is happening, even if it can't repair the problem once
it's happened. He's already in that situation so masked integrity
checking is no worse, at least it gives a chance to improve Btrfs
rather than it being a mystery how it got corrupt.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-03 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 04:50:48PM +0800, Qu Wenruo wrote:
> > It sounds like there may not be a fix to this problem with the filesystem's
> > design, outside of "do not get there, or else".
> > It would even be useful for btrfs tools to start computing heuristics and
> > output warnings like "you have more than 100 snapshots on this filesystem,
> > this is not recommended, please read http://url/;
> 
> This looks pretty doable, but maybe it's better to add some warning at
> btrfs progs (both "subvolume snapshot" and "receive").

This is what I meant to say, correct.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-03 Thread Qu Wenruo



On 2018年07月03日 12:22, Marc MERLIN wrote:
> On Mon, Jul 02, 2018 at 06:31:43PM -0600, Chris Murphy wrote:
>> So the idea behind journaled file systems is that journal replay
>> enabled mount time "repair" that's faster than an fsck. Already Btrfs
>> use cases with big, but not huge, file systems makes btrfs check a
>> problem. Either running out of memory or it takes too long. So already
>> it isn't scaling as well as ext4 or XFS in this regard.
>>
>> So what's the future hold? It seems like the goal is that the problems
>> must be avoided in the first place rather than to repair them after
>> the fact.
>>
>> Are the problem's Marc is running into understood well enough that
>> there can eventually be a fix, maybe even an on-disk format change,
>> that prevents such problems from happening in the first place?
>>
>> Or does it make sense for him to be running with btrfs debug or some
>> subset of btrfs integrity checking mask to try to catch the problems
>> in the act of them happening?
> 
> Those are all good questions.
> To be fair, I cannot claim that btrfs was at fault for whatever filesystem
> damage I ended up with. It's very possible that it happened due to a flaky
> Sata card that kicked drives off the bus when it shouldn't have.

However this still doesn't explain the problem you hit.

In theory (well, it's theory by all means), btrfs is fully atomic for
its transaction, even for its data (with csum and cow).
So even a powerloss/data corruption happens between transactions, we
should get the previous trans.

There must be something wrong, however due to the size of the fs, and
the complexity of extent tree, I can't tell.

> Sure in theory a journaling filesystem can recover from unexpected power
> loss and drives dropping off at bad times, but I'm going to guess that
> btrfs' complexity also means that it has data structures (extent tree?) that
> need to be updated completely "or else".

I'm wondering if we have some hidden bug somewhere.
For extent tree, it's metadata, and is protected by mandatory CoW, it
shouldn't be corrupted, unless we have bug in the already complex
delayed reference code, or some unexpected behavior (flush/fua failure)
due to so many layers (dmcrypt + mdraid).

Anyway, if we can't reproduce it in a controlled environment (my VM with
pretty small and plain fs), it's really hard to locate the bug.

> 
> I'm obviously ok with a filesystem check being necessary to recover in cases
> like this, afterall I still occasionally have to run e2fsck on ext4 too, but
> I'm a lot less thrilled with the btrfs situation where basically the repair
> tools can either completely crash your kernel, or take days and then either
> get stuck in an infinite loop or hit an algorithm that can't scale if you
> have too many hardlinks/snapshots.

Unfortunately, all the price is paid for the super fast snapshot creation.
The tradeoff can not be easily solved.

(Another way to implement snapshot is like LVM thin provision, each time
a snapshot is created we need to iterate all allocated blocks of the
thin LV, which can't scale very well when the fs grows, but makes its
mapping management pretty easy. But I think LVM guys have done some
trick to improve the performance)

> 
> It sounds like there may not be a fix to this problem with the filesystem's
> design, outside of "do not get there, or else".
> It would even be useful for btrfs tools to start computing heuristics and
> output warnings like "you have more than 100 snapshots on this filesystem,
> this is not recommended, please read http://url/;

This looks pretty doable, but maybe it's better to add some warning at
btrfs progs (both "subvolume snapshot" and "receive").

Thanks,
Qu

> 
> Qu, Su, does that sound both reasonable and doable?
> 
> Thanks,
> Marc
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-03 Thread Su Yue




On 07/03/2018 12:22 PM, Marc MERLIN wrote:

On Mon, Jul 02, 2018 at 06:31:43PM -0600, Chris Murphy wrote:

So the idea behind journaled file systems is that journal replay
enabled mount time "repair" that's faster than an fsck. Already Btrfs
use cases with big, but not huge, file systems makes btrfs check a
problem. Either running out of memory or it takes too long. So already
it isn't scaling as well as ext4 or XFS in this regard.

So what's the future hold? It seems like the goal is that the problems
must be avoided in the first place rather than to repair them after
the fact.

Are the problem's Marc is running into understood well enough that
there can eventually be a fix, maybe even an on-disk format change,
that prevents such problems from happening in the first place?

Or does it make sense for him to be running with btrfs debug or some
subset of btrfs integrity checking mask to try to catch the problems
in the act of them happening?


Those are all good questions.
To be fair, I cannot claim that btrfs was at fault for whatever filesystem
damage I ended up with. It's very possible that it happened due to a flaky
Sata card that kicked drives off the bus when it shouldn't have.
Sure in theory a journaling filesystem can recover from unexpected power
loss and drives dropping off at bad times, but I'm going to guess that
btrfs' complexity also means that it has data structures (extent tree?) that
need to be updated completely "or else".


Yes, extent tree is the hardest part for lowmem mode. I'm quite
confident the tool can deal well with file trees(which records metadata
about file and directory name, relationships).
As for extent tree, I have few confidence due to its complexity.


I'm obviously ok with a filesystem check being necessary to recover in cases
like this, afterall I still occasionally have to run e2fsck on ext4 too, but
I'm a lot less thrilled with the btrfs situation where basically the repair
tools can either completely crash your kernel, or take days and then either
get stuck in an infinite loop or hit an algorithm that can't scale if you
have too many hardlinks/snapshots.


It's not surprising that real world filesytems have many snapshots.
Original mode repair eats large memory space, so lowmem mode is created
to save memory but costs time. The latter is just not robust to handle
complex situations.


It sounds like there may not be a fix to this problem with the filesystem's
design, outside of "do not get there, or else".
It would even be useful for btrfs tools to start computing heuristics and
output warnings like "you have more than 100 snapshots on this filesystem,
this is not recommended, please read http://url/;

Qu, Su, does that sound both reasonable and doable?

Thanks,
Marc




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 06:31:43PM -0600, Chris Murphy wrote:
> So the idea behind journaled file systems is that journal replay
> enabled mount time "repair" that's faster than an fsck. Already Btrfs
> use cases with big, but not huge, file systems makes btrfs check a
> problem. Either running out of memory or it takes too long. So already
> it isn't scaling as well as ext4 or XFS in this regard.
> 
> So what's the future hold? It seems like the goal is that the problems
> must be avoided in the first place rather than to repair them after
> the fact.
> 
> Are the problem's Marc is running into understood well enough that
> there can eventually be a fix, maybe even an on-disk format change,
> that prevents such problems from happening in the first place?
> 
> Or does it make sense for him to be running with btrfs debug or some
> subset of btrfs integrity checking mask to try to catch the problems
> in the act of them happening?

Those are all good questions.
To be fair, I cannot claim that btrfs was at fault for whatever filesystem
damage I ended up with. It's very possible that it happened due to a flaky
Sata card that kicked drives off the bus when it shouldn't have.
Sure in theory a journaling filesystem can recover from unexpected power
loss and drives dropping off at bad times, but I'm going to guess that
btrfs' complexity also means that it has data structures (extent tree?) that
need to be updated completely "or else".

I'm obviously ok with a filesystem check being necessary to recover in cases
like this, afterall I still occasionally have to run e2fsck on ext4 too, but
I'm a lot less thrilled with the btrfs situation where basically the repair
tools can either completely crash your kernel, or take days and then either
get stuck in an infinite loop or hit an algorithm that can't scale if you
have too many hardlinks/snapshots.

It sounds like there may not be a fix to this problem with the filesystem's
design, outside of "do not get there, or else".
It would even be useful for btrfs tools to start computing heuristics and
output warnings like "you have more than 100 snapshots on this filesystem,
this is not recommended, please read http://url/;

Qu, Su, does that sound both reasonable and doable?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Chris Murphy
On Mon, Jul 2, 2018 at 8:42 AM, Qu Wenruo  wrote:
>
>
> On 2018年07月02日 22:05, Marc MERLIN wrote:
>> On Mon, Jul 02, 2018 at 02:22:20PM +0800, Su Yue wrote:
 Ok, that's 29MB, so it doesn't fit on pastebin:
 http://marc.merlins.org/tmp/dshelf2_inspect.txt

>>> Sorry Marc. After offline communication with Qu, both
>>> of us think the filesystem is hard to repair.
>>> The filesystem is too large to debug step by step.
>>> Every time check and debug spent is too expensive.
>>> And it already costs serveral days.
>>>
>>> Sadly, I am afarid that you have to recreate filesystem
>>> and reback up your data. :(
>>>
>>> Sorry again and thanks for you reports and patient.
>>
>> I appreciate your help. Honestly I only wanted to help you find why the
>> tools aren't working. Fixing filesystems by hand (and remotely via Email
>> on top of that), is way too time consuming like you said.
>>
>> Is the btrfs design flawed in a way that repair tools just cannot repair
>> on their own?
>
> For short and for your case, yes, you can consider repair tool just a
> garbage and don't use them at any production system.

So the idea behind journaled file systems is that journal replay
enabled mount time "repair" that's faster than an fsck. Already Btrfs
use cases with big, but not huge, file systems makes btrfs check a
problem. Either running out of memory or it takes too long. So already
it isn't scaling as well as ext4 or XFS in this regard.

So what's the future hold? It seems like the goal is that the problems
must be avoided in the first place rather than to repair them after
the fact.

Are the problem's Marc is running into understood well enough that
there can eventually be a fix, maybe even an on-disk format change,
that prevents such problems from happening in the first place?

Or does it make sense for him to be running with btrfs debug or some
subset of btrfs integrity checking mask to try to catch the problems
in the act of them happening?



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 10:33:09PM +0500, Roman Mamedov wrote:
> On Mon, 2 Jul 2018 08:19:03 -0700
> Marc MERLIN  wrote:
> 
> > I actually have fewer snapshots than this per filesystem, but I backup
> > more than 10 filesystems.
> > If I used as many snapshots as you recommend, that would already be 230
> > snapshots for 10 filesystems :)
> 
> (...once again me with my rsync :)
> 
> If you didn't use send/receive, you wouldn't be required to keep a separate
> snapshot trail per filesystem backed up, one trail of snapshots for the entire
> backup server would be enough. Rsync everything to subdirs within one
> subvolume, then do timed or event-based snapshots of it. You only need more
> than one trail if you want different retention policies for different datasets
> (e.g. in my case I have 91 and 31 days).

This is exactly how I used to do backups before btrfs.
I did 

cp -al backup.olddate backup.newdate
rsync -avSH src/ backup.newdate/

You don't even need snapshots or btrfs anymore.
Also, sorry to say, but I have different data retention needs for
different backups. Some need to rotate more quickly than others, but if
you're using rsync, the method I gave above works fine at any rotation
interval you need.

It is almost as efficient as btrfs on space, but as I said, the time
penalty on all those stats for many files was what killed it for me.
If I go back to rsync backups (and I'm really unlikely to), then I'd
also go back to ext4. There would be no point in dealing with the
complexity and fragility of btrfs anymore.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Roman Mamedov
On Mon, 2 Jul 2018 08:19:03 -0700
Marc MERLIN  wrote:

> I actually have fewer snapshots than this per filesystem, but I backup
> more than 10 filesystems.
> If I used as many snapshots as you recommend, that would already be 230
> snapshots for 10 filesystems :)

(...once again me with my rsync :)

If you didn't use send/receive, you wouldn't be required to keep a separate
snapshot trail per filesystem backed up, one trail of snapshots for the entire
backup server would be enough. Rsync everything to subdirs within one
subvolume, then do timed or event-based snapshots of it. You only need more
than one trail if you want different retention policies for different datasets
(e.g. in my case I have 91 and 31 days).

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Austin S. Hemmelgarn

On 2018-07-02 11:19, Marc MERLIN wrote:

Hi Qu,

thanks for the detailled and honest answer.
A few comments inline.

On Mon, Jul 02, 2018 at 10:42:40PM +0800, Qu Wenruo wrote:

For full, it depends. (but for most real world case, it's still flawed)
We have small and crafted images as test cases, which btrfs check can
repair without problem at all.
But such images are *SMALL*, and only have *ONE* type of corruption,
which can represent real world case at all.
  
right, they're just unittest images, I understand.



1) Too large fs (especially too many snapshots)
The use case (too many snapshots and shared extents, a lot of extents
get shared over 1000 times) is in fact a super large challenge for
lowmem mode check/repair.
It needs O(n^2) or even O(n^3) to check each backref, which hugely
slow the progress and make us hard to locate the real bug.
  
So, the non lowmem version would work better, but it's a problem if it

doesn't fit in RAM.
I've always considered it a grave bug that btrfs check repair can use so
much kernel memory that it will crash the entire system. This should not
be possible.
While it won't help me here, can btrfs check be improved not to suck all
the kernel memory, and ideally even allow using swap space if the RAM is
not enough?

Is btrfs check regular mode still being maintained? I think it's still
better than lowmem, correct?


2) Corruption in extent tree and our objective is to mount RW
Extent tree is almost useless if we just want to read data.
But when we do any write, we needs it and if it goes wrong even a
tiny bit, your fs could be damaged really badly.

For other corruption, like some fs tree corruption, we could do
something to discard some corrupted files, but if it's extent tree,
we either mount RO and grab anything we have, or hopes the
almost-never-working --init-extent-tree can work (that's mostly
miracle).
  
I understand that it's the weak point of btrfs, thanks for explaining.



1) Don't keep too many snapshots.
Really, this is the core.
For send/receive backup, IIRC it only needs the parent subvolume
exists, there is no need to keep the whole history of all those
snapshots.


You are correct on history. The reason I keep history is because I may
want to recover a file from last week or 2 weeks ago after I finally
notice that it's gone.
I have terabytes of space on the backup server, so it's easier to keep
history there than on the client which may not have enough space to keep
a month's worth of history.
As you know, back when we did tape backups, we also kept history of at
least several weeks (usually several months, but that's too much for
btrfs snapshots).
Bit of a case-study here, but it may be of interest.  We do something 
kind of similar where I work for our internal file servers.  We've got 
daily snapshots of the whole server kept on the server itself for 7 days 
(we usually see less than 5% of the total amount of data in changes on 
weekdays, and essentially 0 on weekends, so the snapshots rarely take up 
more than ab out 25% of the size of the live data), and then we 
additionally do daily backups which we retain for 6 months.  I've 
written up a short (albeit rather system specific script) for recovering 
old versions of a file that first scans the snapshots, and then pulls it 
out of the backups if it's not there.  I've found this works remarkably 
well for our use case (almost all the data on the file server follows a 
WORM access pattern with most of the files being between 100kB and 100MB 
in size).


We actually did try moving it all over to BTRFS for a while before we 
finally ended up with the setup we currently have, but aside from the 
whole issue with massive numbers of snapshots, we found that for us at 
least, Amanda actually outperforms BTRFS send/receive for everything 
except full backups and uses less storage space (though that last bit is 
largely because we use really aggressive compression).


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Marc MERLIN
Hi Qu,

thanks for the detailled and honest answer.
A few comments inline.

On Mon, Jul 02, 2018 at 10:42:40PM +0800, Qu Wenruo wrote:
> For full, it depends. (but for most real world case, it's still flawed)
> We have small and crafted images as test cases, which btrfs check can
> repair without problem at all.
> But such images are *SMALL*, and only have *ONE* type of corruption,
> which can represent real world case at all.
 
right, they're just unittest images, I understand.

> 1) Too large fs (especially too many snapshots)
>The use case (too many snapshots and shared extents, a lot of extents
>get shared over 1000 times) is in fact a super large challenge for
>lowmem mode check/repair.
>It needs O(n^2) or even O(n^3) to check each backref, which hugely
>slow the progress and make us hard to locate the real bug.
 
So, the non lowmem version would work better, but it's a problem if it
doesn't fit in RAM.
I've always considered it a grave bug that btrfs check repair can use so
much kernel memory that it will crash the entire system. This should not
be possible.
While it won't help me here, can btrfs check be improved not to suck all
the kernel memory, and ideally even allow using swap space if the RAM is
not enough?

Is btrfs check regular mode still being maintained? I think it's still
better than lowmem, correct?

> 2) Corruption in extent tree and our objective is to mount RW
>Extent tree is almost useless if we just want to read data.
>But when we do any write, we needs it and if it goes wrong even a
>tiny bit, your fs could be damaged really badly.
> 
>For other corruption, like some fs tree corruption, we could do
>something to discard some corrupted files, but if it's extent tree,
>we either mount RO and grab anything we have, or hopes the
>almost-never-working --init-extent-tree can work (that's mostly
>miracle).
 
I understand that it's the weak point of btrfs, thanks for explaining.

> 1) Don't keep too many snapshots.
>Really, this is the core.
>For send/receive backup, IIRC it only needs the parent subvolume
>exists, there is no need to keep the whole history of all those
>snapshots.

You are correct on history. The reason I keep history is because I may
want to recover a file from last week or 2 weeks ago after I finally
notice that it's gone. 
I have terabytes of space on the backup server, so it's easier to keep
history there than on the client which may not have enough space to keep
a month's worth of history.
As you know, back when we did tape backups, we also kept history of at
least several weeks (usually several months, but that's too much for
btrfs snapshots).

>Keep the number of snapshots to minimal does greatly improve the
>possibility (both manual patch or check repair) of a successful
>repair.
>Normally I would suggest 4 hourly snapshots, 7 daily snapshots, 12
>monthly snapshots.

I actually have fewer snapshots than this per filesystem, but I backup
more than 10 filesystems.
If I used as many snapshots as you recommend, that would already be 230
snapshots for 10 filesystems :)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Qu Wenruo



On 2018年07月02日 22:05, Marc MERLIN wrote:
> On Mon, Jul 02, 2018 at 02:22:20PM +0800, Su Yue wrote:
>>> Ok, that's 29MB, so it doesn't fit on pastebin:
>>> http://marc.merlins.org/tmp/dshelf2_inspect.txt
>>>
>> Sorry Marc. After offline communication with Qu, both
>> of us think the filesystem is hard to repair.
>> The filesystem is too large to debug step by step.
>> Every time check and debug spent is too expensive.
>> And it already costs serveral days.
>>
>> Sadly, I am afarid that you have to recreate filesystem
>> and reback up your data. :(
>>
>> Sorry again and thanks for you reports and patient.
> 
> I appreciate your help. Honestly I only wanted to help you find why the
> tools aren't working. Fixing filesystems by hand (and remotely via Email
> on top of that), is way too time consuming like you said.
> 
> Is the btrfs design flawed in a way that repair tools just cannot repair
> on their own? 

For short and for your case, yes, you can consider repair tool just a
garbage and don't use them at any production system.

For full, it depends. (but for most real world case, it's still flawed)
We have small and crafted images as test cases, which btrfs check can
repair without problem at all.
But such images are *SMALL*, and only have *ONE* type of corruption,
which can represent real world case at all.

> I understand that data can be lost, but I don't understand how the tools
> just either keep crashing for me, go in infinite loops, or otherwise
> fail to give me back a stable filesystem, even if some data is missing
> after that.

There are several reasons here that repair tool can't help much:

1) Too large fs (especially too many snapshots)
   The use case (too many snapshots and shared extents, a lot of extents
   get shared over 1000 times) is in fact a super large challenge for
   lowmem mode check/repair.
   It needs O(n^2) or even O(n^3) to check each backref, which hugely
   slow the progress and make us hard to locate the real bug.

2) Corruption in extent tree and our objective is to mount RW
   Extent tree is almost useless if we just want to read data.
   But when we do any write, we needs it and if it goes wrong even a
   tiny bit, your fs could be damaged really badly.

   For other corruption, like some fs tree corruption, we could do
   something to discard some corrupted files, but if it's extent tree,
   we either mount RO and grab anything we have, or hopes the
   almost-never-working --init-extent-tree can work (that's mostly
   miracle).

So, I feel very sorry that we can't provide enough help for your case.

But still, we hope to provide some tips on next build if you still want
to choose btrfs.

1) Don't keep too many snapshots.
   Really, this is the core.
   For send/receive backup, IIRC it only needs the parent subvolume
   exists, there is no need to keep the whole history of all those
   snapshots.
   Keep the number of snapshots to minimal does greatly improve the
   possibility (both manual patch or check repair) of a successful
   repair.
   Normally I would suggest 4 hourly snapshots, 7 daily snapshots, 12
   monthly snapshots.

2) Don't keep unrelated snapshots in one btrfs.
   I totally understand that maintain different btrfs would hugely add
   maintenance pressure, but as explains, all snapshots share one
   fragile extent tree.
   If we limit the fragile extent tree from each other fs, it's less
   possible a single extent tree corruption to take down the whole fs.

Thanks,
Qu

> 
> Thanks,
> Marc
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 02:22:20PM +0800, Su Yue wrote:
> > Ok, that's 29MB, so it doesn't fit on pastebin:
> > http://marc.merlins.org/tmp/dshelf2_inspect.txt
> > 
> Sorry Marc. After offline communication with Qu, both
> of us think the filesystem is hard to repair.
> The filesystem is too large to debug step by step.
> Every time check and debug spent is too expensive.
> And it already costs serveral days.
> 
> Sadly, I am afarid that you have to recreate filesystem
> and reback up your data. :(
> 
> Sorry again and thanks for you reports and patient.

I appreciate your help. Honestly I only wanted to help you find why the
tools aren't working. Fixing filesystems by hand (and remotely via Email
on top of that), is way too time consuming like you said.

Is the btrfs design flawed in a way that repair tools just cannot repair
on their own? 
I understand that data can be lost, but I don't understand how the tools
just either keep crashing for me, go in infinite loops, or otherwise
fail to give me back a stable filesystem, even if some data is missing
after that.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Su Yue




On 07/02/2018 11:22 AM, Marc MERLIN wrote:

On Mon, Jul 02, 2018 at 10:02:33AM +0800, Su Yue wrote:

Could you try follow dumps? They shouldn't cost much time.

#btrfs inspect dump-tree -t 21872  | grep -C 50 "374857
EXTENT_DATA "

#btrfs inspect dump-tree -t 22911  | grep -C 50 "374857
EXTENT_DATA "


Ok, that's 29MB, so it doesn't fit on pastebin:
http://marc.merlins.org/tmp/dshelf2_inspect.txt


Sorry Marc. After offline communication with Qu, both
of us think the filesystem is hard to repair.
The filesystem is too large to debug step by step.
Every time check and debug spent is too expensive.
And it already costs serveral days.

Sadly, I am afarid that you have to recreate filesystem
and reback up your data. :(

Sorry again and thanks for you reports and patient.

Su

Marc




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-01 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 10:02:33AM +0800, Su Yue wrote:
> Could you try follow dumps? They shouldn't cost much time.
> 
> #btrfs inspect dump-tree -t 21872  | grep -C 50 "374857 
> EXTENT_DATA "
> 
> #btrfs inspect dump-tree -t 22911  | grep -C 50 "374857 
> EXTENT_DATA "

Ok, that's 29MB, so it doesn't fit on pastebin:
http://marc.merlins.org/tmp/dshelf2_inspect.txt

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-01 Thread Su Yue




On 07/02/2018 07:22 AM, Marc MERLIN wrote:

On Thu, Jun 28, 2018 at 11:43:54PM -0700, Marc MERLIN wrote:

On Fri, Jun 29, 2018 at 02:32:44PM +0800, Su Yue wrote:

https://github.com/Damenly/btrfs-progs/tree/tmp1


Not sure if I undertand that you meant, here.


Sorry for my unclear words.
Simply speaking, I suggest you to stop current running check.
Then, clone above branch to compile binary then run
'btrfs check --mode=lowmem $dev'.
  
I understand, I'll build and try it.



This filesystem is trash to me and will require over a week to rebuild
manually if I can't repair it.


Understood your anxiety, a log of check without '--repair' will help
us to make clear what's wrong with your filesystem.


Ok, I'll run your new code without repair and report back. It will
likely take over a day though.


Well, it got stuck for over a day, and then I had to reboot :(

saruman:/var/local/src/btrfs-progs.sy# git remote -v
origin  https://github.com/Damenly/btrfs-progs.git (fetch)
origin  https://github.com/Damenly/btrfs-progs.git (push)
saruman:/var/local/src/btrfs-progs.sy# git branch
   master
* tmp1
saruman:/var/local/src/btrfs-progs.sy# git pull
Already up to date.
saruman:/var/local/src/btrfs-progs.sy# make
Making all in Documentation
make[1]: Nothing to be done for 'all'.

However, it still got stuck here:

Thanks, I saw. Some Clues found.

Could you try follow dumps? They shouldn't cost much time.

#btrfs inspect dump-tree -t 21872  | grep -C 50 "374857 
EXTENT_DATA "


#btrfs inspect dump-tree -t 22911  | grep -C 50 "374857 
EXTENT_DATA "


Thanks,
Su


gargamel:~# btrfs check --mode=lowmem  -p /dev/mapper/dshelf2
Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
owner: 374857, offset: 3407872) wanted: 2
have: 3
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
owner: 374857, offset: 3407872) wanted: 2
have: 4
ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
owner: 374857, offset: 114540544) wan
d: 180, have: 181
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
owner: 374857, offset: 126754816) want
: 67, have: 68
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
owner: 374857, offset: 126754816) want
: 67, have: 115
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
owner: 374857, offset: 131866624) want
: 114, have: 115
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
owner: 374857, offset: 131866624) want
: 114, have: 143
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
owner: 374857, offset: 148234240) wan
d: 301, have: 302
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
owner: 374857, offset: 148234240) wan
d: 355, have: 433
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
owner: 374857, offset: 180371456) wan
d: 160, have: 161
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
owner: 374857, offset: 180371456) wan
d: 161, have: 240
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
owner: 374857, offset: 192200704) wan
d: 169, have: 170
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
owner: 374857, offset: 192200704) wan
d: 171, have: 251
ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, 
owner: 374857, offset: 217653248) wan
d: 347, have: 348
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, 
owner: 374857, offset: 235175936) wan
d: 1, have: 1449
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
owner: 374857, offset: 235175936) wan
d: 1, have: 556

What should I try next?

Thanks,
Marc




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-01 Thread Marc MERLIN
On Thu, Jun 28, 2018 at 11:43:54PM -0700, Marc MERLIN wrote:
> On Fri, Jun 29, 2018 at 02:32:44PM +0800, Su Yue wrote:
> > > > https://github.com/Damenly/btrfs-progs/tree/tmp1
> > > 
> > > Not sure if I undertand that you meant, here.
> > > 
> > Sorry for my unclear words.
> > Simply speaking, I suggest you to stop current running check.
> > Then, clone above branch to compile binary then run
> > 'btrfs check --mode=lowmem $dev'.
>  
> I understand, I'll build and try it.
> 
> > > This filesystem is trash to me and will require over a week to rebuild
> > > manually if I can't repair it.
> > 
> > Understood your anxiety, a log of check without '--repair' will help
> > us to make clear what's wrong with your filesystem.
> 
> Ok, I'll run your new code without repair and report back. It will
> likely take over a day though.

Well, it got stuck for over a day, and then I had to reboot :(

saruman:/var/local/src/btrfs-progs.sy# git remote -v
origin  https://github.com/Damenly/btrfs-progs.git (fetch)
origin  https://github.com/Damenly/btrfs-progs.git (push)
saruman:/var/local/src/btrfs-progs.sy# git branch
  master
* tmp1
saruman:/var/local/src/btrfs-progs.sy# git pull
Already up to date.
saruman:/var/local/src/btrfs-progs.sy# make
Making all in Documentation
make[1]: Nothing to be done for 'all'.

However, it still got stuck here:
gargamel:~# btrfs check --mode=lowmem  -p /dev/mapper/dshelf2   
Checking filesystem on /dev/mapper/dshelf2  
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d  
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
owner: 374857, offset: 3407872) wanted: 2
have: 3  
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
owner: 374857, offset: 3407872) wanted: 2
have: 4  
ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
owner: 374857, offset: 114540544) wan
d: 180, have: 181  
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
owner: 374857, offset: 126754816) want
: 67, have: 68  
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
owner: 374857, offset: 126754816) want
: 67, have: 115  
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
owner: 374857, offset: 131866624) want
: 114, have: 115  
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
owner: 374857, offset: 131866624) want
: 114, have: 143  
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
owner: 374857, offset: 148234240) wan
d: 301, have: 302  
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
owner: 374857, offset: 148234240) wan
d: 355, have: 433  
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
owner: 374857, offset: 180371456) wan
d: 160, have: 161  
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
owner: 374857, offset: 180371456) wan
d: 161, have: 240  
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
owner: 374857, offset: 192200704) wan
d: 169, have: 170  
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
owner: 374857, offset: 192200704) wan
d: 171, have: 251  
ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, 
owner: 374857, offset: 217653248) wan
d: 347, have: 348  
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, 
owner: 374857, offset: 235175936) wan
d: 1, have: 1449  
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
owner: 374857, offset: 235175936) wan
d: 1, have: 556  

What should I try next?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-30 Thread Marc MERLIN
On Sat, Jun 30, 2018 at 10:49:07PM +0800, Qu Wenruo wrote:
> But the last abort looks pretty possible to be the culprit.
> 
> Would you try to dump the extent tree?
> # btrfs inspect dump-tree -t extent  | grep -A50 156909494272

Sure, there you go:

item 25 key (156909494272 EXTENT_ITEM 55320576) itemoff 14943 itemsize 
24
refs 19715 gen 31575 flags DATA
item 26 key (156909494272 EXTENT_DATA_REF 571620086735451015) itemoff 
14915 itemsize 28
extent data backref root 21641 objectid 374857 offset 235175936 
count 1452
item 27 key (156909494272 EXTENT_DATA_REF 1765833482087969671) itemoff 
14887 itemsize 28
extent data backref root 23094 objectid 374857 offset 235175936 
count 1442
item 28 key (156909494272 EXTENT_DATA_REF 1807626434455810951) itemoff 
14859 itemsize 28
extent data backref root 21503 objectid 374857 offset 235175936 
count 1454
item 29 key (156909494272 EXTENT_DATA_REF 1879818091602916231) itemoff 
14831 itemsize 28
extent data backref root 21462 objectid 374857 offset 235175936 
count 1454
item 30 key (156909494272 EXTENT_DATA_REF 3610854505775117191) itemoff 
14803 itemsize 28
extent data backref root 23134 objectid 374857 offset 235175936 
count 1442
item 31 key (156909494272 EXTENT_DATA_REF 3754675454231458695) itemoff 
14775 itemsize 28
extent data backref root 23052 objectid 374857 offset 235175936 
count 1442
item 32 key (156909494272 EXTENT_DATA_REF 5060494667839714183) itemoff 
14747 itemsize 28
extent data backref root 23174 objectid 374857 offset 235175936 
count 1440
item 33 key (156909494272 EXTENT_DATA_REF 5476627808561673095) itemoff 
14719 itemsize 28
extent data backref root 22911 objectid 374857 offset 235175936 
count 1
item 34 key (156909494272 EXTENT_DATA_REF 6378484416458011527) itemoff 
14691 itemsize 28
extent data backref root 23012 objectid 374857 offset 235175936 
count 1442
item 35 key (156909494272 EXTENT_DATA_REF 7338474132555182983) itemoff 
14663 itemsize 28
extent data backref root 21872 objectid 374857 offset 235175936 
count 1
item 36 key (156909494272 EXTENT_DATA_REF 7516565391717970823) itemoff 
14635 itemsize 28
extent data backref root 21826 objectid 374857 offset 235175936 
count 1452
item 37 key (156909494272 SHARED_DATA_REF 14871537025024) itemoff 14631 
itemsize 4
shared data backref count 10
item 38 key (156909494272 SHARED_DATA_REF 14871617568768) itemoff 14627 
itemsize 4
shared data backref count 73
item 39 key (156909494272 SHARED_DATA_REF 14871619846144) itemoff 14623 
itemsize 4
shared data backref count 59
item 40 key (156909494272 SHARED_DATA_REF 14871623270400) itemoff 14619 
itemsize 4
shared data backref count 68
item 41 key (156909494272 SHARED_DATA_REF 14871623532544) itemoff 14615 
itemsize 4
shared data backref count 70
item 42 key (156909494272 SHARED_DATA_REF 14871626383360) itemoff 14611 
itemsize 4
shared data backref count 76
item 43 key (156909494272 SHARED_DATA_REF 14871635132416) itemoff 14607 
itemsize 4
shared data backref count 60
item 44 key (156909494272 SHARED_DATA_REF 14871649533952) itemoff 14603 
itemsize 4
shared data backref count 79
item 45 key (156909494272 SHARED_DATA_REF 14871862378496) itemoff 14599 
itemsize 4
shared data backref count 70
item 46 key (156909494272 SHARED_DATA_REF 14909667098624) itemoff 14595 
itemsize 4
shared data backref count 72
item 47 key (156909494272 SHARED_DATA_REF 14909669720064) itemoff 14591 
itemsize 4
shared data backref count 58
item 48 key (156909494272 SHARED_DATA_REF 14909734567936) itemoff 14587 
itemsize 4
shared data backref count 73
item 49 key (156909494272 SHARED_DATA_REF 14909920477184) itemoff 14583 
itemsize 4
shared data backref count 79
item 50 key (156909494272 SHARED_DATA_REF 14942279335936) itemoff 14579 
itemsize 4
shared data backref count 79
item 51 key (156909494272 SHARED_DATA_REF 14942304862208) itemoff 14575 
itemsize 4
shared data backref count 72
item 52 key (156909494272 SHARED_DATA_REF 14942348378112) itemoff 14571 
itemsize 4
shared data backref count 67
item 53 key (156909494272 SHARED_DATA_REF 14942366138368) itemoff 14567 
itemsize 4
shared data backref count 51
item 54 key (156909494272 SHARED_DATA_REF 14942384799744) itemoff 14563 
itemsize 4
shared data backref count 64
item 55 key (156909494272 SHARED_DATA_REF 14978234613760) 

Re: So, does btrfs check lowmem take days? weeks?

2018-06-30 Thread Qu Wenruo



On 2018年06月30日 10:44, Marc MERLIN wrote:
> Well, there goes that. After about 18H:
> ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
> owner: 374857, offset: 235175936) wanted: 1, have: 1452 
> backref.c:466: __add_missing_keys: Assertion `ref->root_id` failed, value 0 
> btrfs(+0x3a232)[0x56091704f232] 
> btrfs(+0x3ab46)[0x56091704fb46] 
> btrfs(+0x3b9f5)[0x5609170509f5] 
> btrfs(btrfs_find_all_roots+0x9)[0x560917050a45] 
> btrfs(+0x572ff)[0x56091706c2ff] 
> btrfs(+0x60b13)[0x560917075b13] 
> btrfs(cmd_check+0x2634)[0x56091707d431] 
> btrfs(main+0x88)[0x560917027260] 
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f93aa508561] 
> btrfs(_start+0x2a)[0x560917026dfa] 
> Aborted 

I think that's the root cause.
Some invalid extent tree backref or bad tree block blow up backref code.

All previous error message may be garbage unless you're using Su's
latest branch, as lowmem mode tends to report false alerts on refrencer
count mismatch.

But the last abort looks pretty possible to be the culprit.

Would you try to dump the extent tree?
# btrfs inspect dump-tree -t extent  | grep -A50 156909494272

It should help us locate the culprit and hopefully get some chance to
fix it.

Thanks,
Qu

> 
> That's https://github.com/Damenly/btrfs-progs.git
> 
> Whoops, I didn't use the tmp1 branch, let me try again with that and
> report back, although the problem above is still going to be there since
> I think the only difference will be this, correct?
> https://github.com/Damenly/btrfs-progs/commit/b5851513a12237b3e19a3e71f3ad00b966d25b3a
> 
> Marc
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
Well, there goes that. After about 18H:
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
owner: 374857, offset: 235175936) wanted: 1, have: 1452 
backref.c:466: __add_missing_keys: Assertion `ref->root_id` failed, value 0 
btrfs(+0x3a232)[0x56091704f232] 
btrfs(+0x3ab46)[0x56091704fb46] 
btrfs(+0x3b9f5)[0x5609170509f5] 
btrfs(btrfs_find_all_roots+0x9)[0x560917050a45] 
btrfs(+0x572ff)[0x56091706c2ff] 
btrfs(+0x60b13)[0x560917075b13] 
btrfs(cmd_check+0x2634)[0x56091707d431] 
btrfs(main+0x88)[0x560917027260] 
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f93aa508561] 
btrfs(_start+0x2a)[0x560917026dfa] 
Aborted 

That's https://github.com/Damenly/btrfs-progs.git

Whoops, I didn't use the tmp1 branch, let me try again with that and
report back, although the problem above is still going to be there since
I think the only difference will be this, correct?
https://github.com/Damenly/btrfs-progs/commit/b5851513a12237b3e19a3e71f3ad00b966d25b3a

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Chris Murphy
I've got about 1/2 the snapshots and less than 1/10th the data...but
my btrfs check times are much shorter than either: 15 minutes and 65
minutes (lowmem).


[chris@f28s ~]$ sudo btrfs fi us /mnt/first
Overall:
Device size:1024.00GiB
Device allocated: 774.12GiB
Device unallocated: 249.87GiB
Device missing: 0.00B
Used: 760.48GiB
Free (estimated): 256.95GiB(min: 132.01GiB)
Data ratio:  1.00
Metadata ratio:  2.00
Global reserve: 512.00MiB(used: 0.00B)

Data,single: Size:761.00GiB, Used:753.93GiB
   /dev/mapper/first 761.00GiB

Metadata,DUP: Size:6.50GiB, Used:3.28GiB
   /dev/mapper/first  13.00GiB

System,DUP: Size:64.00MiB, Used:112.00KiB
   /dev/mapper/first 128.00MiB

Unallocated:
   /dev/mapper/first 249.87GiB


146 subvolumes
137 snapshots

total csum bytes: 790549924
total tree bytes: 3519250432
total fs tree bytes: 2546073600
total extent tree bytes: 131350528


Original mode check takes ~15 minutes
Lowmem mode takes ~65 minutes

RAM: 4G
CPU: Intel(R) Pentium(R) CPU  N3700  @ 1.60GHz



Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 12:28:31AM -0700, Marc MERLIN wrote:
> So, I rebooted, and will now run Su's btrfs check without repair and
> report back.

As expected, it will likely still take days, here's the start:

gargamel:~# btrfs check --mode=lowmem  -p /dev/mapper/dshelf2  
Checking filesystem on /dev/mapper/dshelf2 
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d 
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
owner: 374857, offset: 3407872) wanted: 2, have: 4
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
owner: 374857, offset: 3407872) wanted: 2, have: 4
ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
owner: 374857, offset: 114540544) wanted: 180, have: 240
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
owner: 374857, offset: 126754816) wanted: 67, have: 115
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
owner: 374857, offset: 126754816) wanted: 67, have: 115
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
owner: 374857, offset: 131866624) wanted: 114, have: 143
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
owner: 374857, offset: 131866624) wanted: 114, have: 143
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
owner: 374857, offset: 148234240) wanted: 301, have: 431
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
owner: 374857, offset: 148234240) wanted: 355, have: 433
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
owner: 374857, offset: 180371456) wanted: 160, have: 240
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
owner: 374857, offset: 180371456) wanted: 161, have: 240
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
owner: 374857, offset: 192200704) wanted: 169, have: 249
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
owner: 374857, offset: 192200704) wanted: 171, have: 251
ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, 
owner: 374857, offset: 217653248) wanted: 347, have: 418
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, 
owner: 374857, offset: 235175936) wanted: 1, have: 1449
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
owner: 374857, offset: 235175936) wanted: 1, have: 1452

Mmmh, these look similar (but not identical) to the last run earlier in this 
thread:
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Created new chunk [18457780224000 1073741824]
Delete backref in extent [84302495744 69632]
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Delete backref in extent [84302495744 69632]
ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
owner: 374857, offset: 114540544) wanted: 181, have: 240
Delete backref in extent [125712527360 12214272]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
owner: 374857, offset: 148234240) wanted: 302, have: 431
Delete backref in extent [129952120832 20242432]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
owner: 374857, offset: 148234240) wanted: 356, have: 433
Delete backref in extent [129952120832 20242432]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
owner: 374857, offset: 180371456) wanted: 161, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
owner: 374857, offset: 180371456) wanted: 162, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
owner: 374857, offset: 192200704) wanted: 170, have: 249
Delete backref in extent [147895111680 12345344]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
owner: 374857, offset: 192200704) wanted: 172, have: 251
Delete backref in extent 

Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Lionel Bouton
Hi,

On 29/06/2018 09:22, Marc MERLIN wrote:
> On Fri, Jun 29, 2018 at 12:09:54PM +0500, Roman Mamedov wrote:
>> On Thu, 28 Jun 2018 23:59:03 -0700
>> Marc MERLIN  wrote:
>>
>>> I don't waste a week recreating the many btrfs send/receive relationships.
>> Consider not using send/receive, and switching to regular rsync instead.
>> Send/receive is very limiting and cumbersome, including because of what you
>> described. And it doesn't gain you much over an incremental rsync. As for
> Err, sorry but I cannot agree with you here, at all :)
>
> btrfs send/receive is pretty much the only reason I use btrfs. 
> rsync takes hours on big filesystems scanning every single inode on both
> sides and then seeing what changed, and only then sends the differences
> It's super inefficient.
> btrfs send knows in seconds what needs to be sent, and works on it right
> away.

I've not yet tried send/receive but I feel the pain of rsyncing millions
of files (I had to use lsyncd to limit the problem to the time the
origin servers reboot which is a relatively rare event) so this thread
picked my attention. Looking at the whole thread I wonder if you could
get a more manageable solution by splitting the filesystem.

If instead of using a single BTRFS filesystem you used LVM volumes
(maybe with Thin provisioning and monitoring of the volume group free
space) for each of your servers to backup with one BTRFS filesystem per
volume you would have less snapshots per filesystem and isolate problems
in case of corruption. If you eventually decide to start from scratch
again this might help a lot in your case.

Lionel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Roman Mamedov
On Fri, 29 Jun 2018 00:22:10 -0700
Marc MERLIN  wrote:

> On Fri, Jun 29, 2018 at 12:09:54PM +0500, Roman Mamedov wrote:
> > On Thu, 28 Jun 2018 23:59:03 -0700
> > Marc MERLIN  wrote:
> > 
> > > I don't waste a week recreating the many btrfs send/receive relationships.
> > 
> > Consider not using send/receive, and switching to regular rsync instead.
> > Send/receive is very limiting and cumbersome, including because of what you
> > described. And it doesn't gain you much over an incremental rsync. As for
> 
> Err, sorry but I cannot agree with you here, at all :)
> 
> btrfs send/receive is pretty much the only reason I use btrfs. 
> rsync takes hours on big filesystems scanning every single inode on both
> sides and then seeing what changed, and only then sends the differences

I use it for backing up root filesystems of about 20 hosts, and for syncing
large multi-terabyte media collections -- it's fast enough in both.
Admittedly neither of those case has millions of subdirs or files where
scanning may take a long time. And in the former case it's also all from and
to SSDs. Maybe your use case is different where it doesn't work as well. But
perhaps then general day-to-day performance is not great either, so I'd suggest
looking into SSD-based LVM caching, it really works wonders with Btrfs.

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 03:20:42PM +0800, Qu Wenruo wrote:
> If certain btrfs specific operations are involved, it's definitely not OK:
> 1) Balance
> 2) Quota
> 3) Btrfs check

Ok, I understand. I'll try to balance almost never then. My problems did
indeed start because I ran balance and it got stuck 2 days with 0
progress.
That still seems like a bug though. I'm ok with slow, but stuck for 2
days with only 270 snapshots or so means there is a bug, or the
algorithm is so expensive that 270 snapshots could cause it to take days
or weeks to proceed?

> > It's a backup server, it only contains data from other machines.
> > If the filesystem cannot be recovered to a working state, I will need
> > over a week to restart the many btrfs send commands from many servers.
> > This is why anything other than --repair is useless ot me, I don't need
> > the data back, it's still on the original machines, I need the
> > filesystem to work again so that I don't waste a week recreating the
> > many btrfs send/receive relationships.
> 
> Now totally understand why you need to repair the fs.

I also understand that my use case is atypical :)
But I guess this also means that using btrfs for a lot of send/receive
on a backup server is not going to work well unfortunately :-/

Now I'm wondering if I'm the only person even doing this.

> > Does the pastebin help and is 270 snapshots ok enough?
> 
> The super dump doesn't show anything wrong.
> 
> So the problem may be in the super large extent tree.
> 
> In this case, plain check result with Su's patch would help more, other
> than the not so interesting super dump.

First I tried to mount with skip balance after the partial repair, and
it hung a long time:
[445635.716318] BTRFS info (device dm-2): disk space caching is enabled
[445635.736229] BTRFS info (device dm-2): has skinny extents
[445636.101999] BTRFS info (device dm-2): bdev /dev/mapper/dshelf2 errs: wr 0, 
rd 0, flush 0, corrupt 2, gen 0
[445825.053205] BTRFS info (device dm-2): enabling ssd optimizations
[446511.006588] BTRFS info (device dm-2): disk space caching is enabled
[446511.026737] BTRFS info (device dm-2): has skinny extents
[446511.325470] BTRFS info (device dm-2): bdev /dev/mapper/dshelf2 errs: wr 0, 
rd 0, flush 0, corrupt 2, gen 0
[446699.593501] BTRFS info (device dm-2): enabling ssd optimizations
[446964.077045] INFO: task btrfs-transacti:9211 blocked for more than 120 
seconds.
[446964.099802]   Not tainted 4.17.2-amd64-preempt-sysrq-20180818 #3
[446964.120004] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

So, I rebooted, and will now run Su's btrfs check without repair and
report back.

Thanks both for your help.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 12:09:54PM +0500, Roman Mamedov wrote:
> On Thu, 28 Jun 2018 23:59:03 -0700
> Marc MERLIN  wrote:
> 
> > I don't waste a week recreating the many btrfs send/receive relationships.
> 
> Consider not using send/receive, and switching to regular rsync instead.
> Send/receive is very limiting and cumbersome, including because of what you
> described. And it doesn't gain you much over an incremental rsync. As for

Err, sorry but I cannot agree with you here, at all :)

btrfs send/receive is pretty much the only reason I use btrfs. 
rsync takes hours on big filesystems scanning every single inode on both
sides and then seeing what changed, and only then sends the differences
It's super inefficient.
btrfs send knows in seconds what needs to be sent, and works on it right
away.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Qu Wenruo


On 2018年06月29日 14:59, Marc MERLIN wrote:
> On Fri, Jun 29, 2018 at 02:29:10PM +0800, Qu Wenruo wrote:
>>> If --repair doesn't work, check is useless to me sadly.
>>
>> Not exactly.
>> Although it's time consuming, I have manually patched several users fs,
>> which normally ends pretty well.
>  
> Ok I understand now.
> 
>>> Agreed, I doubt I have over or much over 100 snapshots though (but I
>>> can't check right now).
>>> Sadly I'm not allowed to mount even read only while check is running:
>>> gargamel:~# mount -o ro /dev/mapper/dshelf2 /mnt/mnt2
>>> mount: /dev/mapper/dshelf2 already mounted or /mnt/mnt2 busy
> 
> Ok, so I just checked now, 270 snapshots, but not because I'm crazy,
> because I use btrfs send a lot :)
> 
>> This looks like super block corruption?
>>
>> What about "btrfs inspect dump-super -fFa /dev/mapper/dshelf2"?
> 
> Sure, there you go: https://pastebin.com/uF1pHTsg
> 
>> And what about "skip_balance" mount option?
>  
> I have this in my fstab :)
> 
>> Another problem is, with so many snapshots, balance is also hugely
>> slowed, thus I'm not 100% sure if it's really a hang.
> 
> I sent another thread about this last week, balance got hung after 2
> days of doing nothing and just moving a single chunk.
> 
> Ok, I was able to remount the filesystem read only. I was wrong, I have
> 270 snapshots:
> gargamel:/mnt/mnt# btrfs subvolume list . | grep -c 'path backup/'
> 74
> gargamel:/mnt/mnt# btrfs subvolume list . | grep -c 'path backup-btrfssend/'
> 196
> 
> It's a backup server, I use btrfs send for many machines and for each btrs
> send, I keep history, maybe 10 or so backups. So it adds up in the end.
> 
> Is btrfs unable to deal with this well enough?

It depends.
For certain and rare case, if the only operations to the filesystem are
non-btrfs specific operations (POSIX file operations), then you're fine.
(Maybe you can go thousands snapshots before any obvious performance
degrade)

If certain btrfs specific operations are involved, it's definitely not OK:
1) Balance
2) Quota
3) Btrfs check

> 
>> If for that usage, btrfs-restore would fit your use case more,
>> Unfortunately it needs extra disk space and isn't good at restoring
>> subvolume/snapshots.
>> (Although it's much faster than repairing the possible corrupted extent
>> tree)
> 
> It's a backup server, it only contains data from other machines.
> If the filesystem cannot be recovered to a working state, I will need
> over a week to restart the many btrfs send commands from many servers.
> This is why anything other than --repair is useless ot me, I don't need
> the data back, it's still on the original machines, I need the
> filesystem to work again so that I don't waste a week recreating the
> many btrfs send/receive relationships.

Now totally understand why you need to repair the fs.

> 
>>> Is that possible at all?
>>
>> At least for file recovery (fs tree repair), we have such behavior.
>>
>> However, the problem you hit (and a lot of users hit) is all about
>> extent tree repair, which doesn't even goes to file recovery.
>>
>> All the hassle are in extent tree, and for extent tree, it's just good
>> or bad. Any corruption in extent tree may lead to later bugs.
>> The only way to avoid extent tree problems is to mount the fs RO.
>>
>> So, I'm afraid it is at least impossible for recent years.
> 
> Understood, thanks for answering.
> 
> Does the pastebin help and is 270 snapshots ok enough?

The super dump doesn't show anything wrong.

So the problem may be in the super large extent tree.

In this case, plain check result with Su's patch would help more, other
than the not so interesting super dump.

Thanks,
Qu

> 
> Thanks,
> Marc
> 



signature.asc
Description: OpenPGP digital signature


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Roman Mamedov
On Thu, 28 Jun 2018 23:59:03 -0700
Marc MERLIN  wrote:

> I don't waste a week recreating the many btrfs send/receive relationships.

Consider not using send/receive, and switching to regular rsync instead.
Send/receive is very limiting and cumbersome, including because of what you
described. And it doesn't gain you much over an incremental rsync. As for
snapshots on the backup server, you can either automate making one as soon as a
backup has finished, or simply make them once/twice a day, during a period
when no backups are ongoing.

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 02:29:10PM +0800, Qu Wenruo wrote:
> > If --repair doesn't work, check is useless to me sadly.
> 
> Not exactly.
> Although it's time consuming, I have manually patched several users fs,
> which normally ends pretty well.
 
Ok I understand now.

> > Agreed, I doubt I have over or much over 100 snapshots though (but I
> > can't check right now).
> > Sadly I'm not allowed to mount even read only while check is running:
> > gargamel:~# mount -o ro /dev/mapper/dshelf2 /mnt/mnt2
> > mount: /dev/mapper/dshelf2 already mounted or /mnt/mnt2 busy

Ok, so I just checked now, 270 snapshots, but not because I'm crazy,
because I use btrfs send a lot :)

> This looks like super block corruption?
> 
> What about "btrfs inspect dump-super -fFa /dev/mapper/dshelf2"?

Sure, there you go: https://pastebin.com/uF1pHTsg

> And what about "skip_balance" mount option?
 
I have this in my fstab :)

> Another problem is, with so many snapshots, balance is also hugely
> slowed, thus I'm not 100% sure if it's really a hang.

I sent another thread about this last week, balance got hung after 2
days of doing nothing and just moving a single chunk.

Ok, I was able to remount the filesystem read only. I was wrong, I have
270 snapshots:
gargamel:/mnt/mnt# btrfs subvolume list . | grep -c 'path backup/'
74
gargamel:/mnt/mnt# btrfs subvolume list . | grep -c 'path backup-btrfssend/'
196

It's a backup server, I use btrfs send for many machines and for each btrs
send, I keep history, maybe 10 or so backups. So it adds up in the end.

Is btrfs unable to deal with this well enough?

> If for that usage, btrfs-restore would fit your use case more,
> Unfortunately it needs extra disk space and isn't good at restoring
> subvolume/snapshots.
> (Although it's much faster than repairing the possible corrupted extent
> tree)

It's a backup server, it only contains data from other machines.
If the filesystem cannot be recovered to a working state, I will need
over a week to restart the many btrfs send commands from many servers.
This is why anything other than --repair is useless ot me, I don't need
the data back, it's still on the original machines, I need the
filesystem to work again so that I don't waste a week recreating the
many btrfs send/receive relationships.

> > Is that possible at all?
> 
> At least for file recovery (fs tree repair), we have such behavior.
> 
> However, the problem you hit (and a lot of users hit) is all about
> extent tree repair, which doesn't even goes to file recovery.
> 
> All the hassle are in extent tree, and for extent tree, it's just good
> or bad. Any corruption in extent tree may lead to later bugs.
> The only way to avoid extent tree problems is to mount the fs RO.
> 
> So, I'm afraid it is at least impossible for recent years.

Understood, thanks for answering.

Does the pastebin help and is 270 snapshots ok enough?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 02:32:44PM +0800, Su Yue wrote:
> > > https://github.com/Damenly/btrfs-progs/tree/tmp1
> > 
> > Not sure if I undertand that you meant, here.
> > 
> Sorry for my unclear words.
> Simply speaking, I suggest you to stop current running check.
> Then, clone above branch to compile binary then run
> 'btrfs check --mode=lowmem $dev'.
 
I understand, I'll build and try it.

> > This filesystem is trash to me and will require over a week to rebuild
> > manually if I can't repair it.
> 
> Understood your anxiety, a log of check without '--repair' will help
> us to make clear what's wrong with your filesystem.

Ok, I'll run your new code without repair and report back. It will
likely take over a day though.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Qu Wenruo


On 2018年06月29日 14:06, Marc MERLIN wrote:
> On Fri, Jun 29, 2018 at 01:48:17PM +0800, Qu Wenruo wrote:
>> Just normal btrfs check, and post the output.
>> If normal check eats up all your memory, btrfs check --mode=lowmem.
>  
> Does check without --repair eat less RAM?

Unfortunately, no.

> 
>> --repair should be considered as the last method.
> 
> If --repair doesn't work, check is useless to me sadly.

Not exactly.
Although it's time consuming, I have manually patched several users fs,
which normally ends pretty well.

If it's not a wide-spread problem but some small fatal one, it may be fixed.

> I know that for
> FS analysis and bug reporting, you want to have the FS without changing
> it to something maybe worse, but for my use, if it can't be mounted and
> can't be fixed, then it gets deleted which is even worse than check
> doing the wrong thing.
> 
>>> The last two ERROR lines took over a day to get generated, so I'm not sure 
>>> if it's still working, but just slowly.
>>
>> OK, that explains something.
>>
>> One extent is referred hundreds times, no wonder it will take a long time.
>>
>> Just one tip here, there are really too many snapshots/reflinked files.
>> It's highly recommended to keep the number of snapshots to a reasonable
>> number (lower two digits).
>> Although btrfs snapshot is super fast, it puts a lot of pressure on its
>> extent tree, so there is no free lunch here.
>  
> Agreed, I doubt I have over or much over 100 snapshots though (but I
> can't check right now).
> Sadly I'm not allowed to mount even read only while check is running:
> gargamel:~# mount -o ro /dev/mapper/dshelf2 /mnt/mnt2
> mount: /dev/mapper/dshelf2 already mounted or /mnt/mnt2 busy
> 
>>> I see. Is there any reasonably easy way to check on this running process?
>>
>> GDB attach would be good.
>> Interrupt and check the inode number if it's checking fs tree.
>> Check the extent bytenr number if it's checking extent tree.
>>
>> But considering how many snapshots there are, it's really hard to determine.
>>
>> In this case, the super large extent tree is causing a lot of problem,
>> maybe it's a good idea to allow btrfs check to skip extent tree check?
> 
> I only see --init-extent-tree in the man page, which option did you have
> in mind?

That feature is just in my mind, not even implemented yet.

> 
>>> Then again, maybe it already fixed enough that I can mount my filesystem 
>>> again.
>>
>> This needs the initial btrfs check report and the kernel messages how it
>> fails to mount.
> 
> mount command hangs, kernel does not show anything special outside of disk 
> access hanging.
> 
> Jun 23 17:23:26 gargamel kernel: [  341.802696] BTRFS warning (device dm-2): 
> 'recovery' is deprecated, use 'useback
> uproot' instead
> Jun 23 17:23:26 gargamel kernel: [  341.828743] BTRFS info (device dm-2): 
> trying to use backup root at mount time
> Jun 23 17:23:26 gargamel kernel: [  341.850180] BTRFS info (device dm-2): 
> disk space caching is enabled
> Jun 23 17:23:26 gargamel kernel: [  341.869014] BTRFS info (device dm-2): has 
> skinny extents
> Jun 23 17:23:26 gargamel kernel: [  342.206289] BTRFS info (device dm-2): 
> bdev /dev/mapper/dshelf2 errs: wr 0, rd 0 , flush 0, corrupt 2, gen 0
> Jun 23 17:26:26 gargamel kernel: [  521.571392] BTRFS info (device dm-2): 
> enabling ssd optimizations
> Jun 23 17:55:58 gargamel kernel: [ 2293.914867] perf: interrupt took too long 
> (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
> Jun 23 17:56:22 gargamel kernel: [ 2317.718406] BTRFS info (device dm-2): 
> disk space caching is enabled
> Jun 23 17:56:22 gargamel kernel: [ 2317.737277] BTRFS info (device dm-2): has 
> skinny extents
> Jun 23 17:56:22 gargamel kernel: [ 2318.069461] BTRFS info (device dm-2): 
> bdev /dev/mapper/dshelf2 errs: wr 0, rd 0 , flush 0, corrupt 2, gen 0
> Jun 23 17:59:22 gargamel kernel: [ 2498.256167] BTRFS info (device dm-2): 
> enabling ssd optimizations
> Jun 23 18:05:23 gargamel kernel: [ 2859.107057] BTRFS info (device dm-2): 
> disk space caching is enabled
> Jun 23 18:05:23 gargamel kernel: [ 2859.125883] BTRFS info (device dm-2): has 
> skinny extents
> Jun 23 18:05:24 gargamel kernel: [ 2859.448018] BTRFS info (device dm-2): 
> bdev /dev/mapper/dshelf2 errs: wr 0, rd 0 , flush 0, corrupt 2, gen 0

This looks like super block corruption?

What about "btrfs inspect dump-super -fFa /dev/mapper/dshelf2"?

And what about "skip_balance" mount option?

Another problem is, with so many snapshots, balance is also hugely
slowed, thus I'm not 100% sure if it's really a hang.

> Jun 23 18:08:23 gargamel kernel: [ 3039.023305] BTRFS info (device dm-2): 
> enabling ssd optimizations
> Jun 23 18:13:41 gargamel kernel: [ 3356.626037] perf: interrupt took too long 
> (3143 > 3133), lowering kernel.perf_event_max_sample_rate to 63500
> Jun 23 18:17:23 gargamel kernel: [ 3578.937225] Process accounting resumed
> Jun 23 18:33:47 gargamel kernel: [ 4563.356252] JFS: nTxBlock = 8192, 

Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Su Yue




On 06/29/2018 02:10 PM, Marc MERLIN wrote:

On Fri, Jun 29, 2018 at 02:02:19PM +0800, Su Yue wrote:

I have figured out the bug is lowmem check can't deal with shared tree block
in reloc tree. The fix is simple, you can try the follow repo:

https://github.com/Damenly/btrfs-progs/tree/tmp1


Not sure if I undertand that you meant, here.


Sorry for my unclear words.
Simply speaking, I suggest you to stop current running check.
Then, clone above branch to compile binary then run
'btrfs check --mode=lowmem $dev'.


Please run lowmem check "without =--repair" first to be sure whether
your filesystem is fine.
  
The filesystem is not fine, it caused btrfs balance to hang, whether

balance actually broke it further or caused the breakage, I can't say.

Then mount hangs, even with recovery, unless I use ro.

This filesystem is trash to me and will require over a week to rebuild
manually if I can't repair it.


Understood your anxiety, a log of check without '--repair' will help
us to make clear what's wrong with your filesystem.

Thanks,
Su

Running check without repair for likely several days just to know that
my filesystem is not clear (I already know this) isn't useful :)
Or am I missing something?


Though the bug and phenomenon are clear enough, before sending my patch,
I have to make a test image. I have spent a week to study btrfs balance
but it seems a liitle hard for me.


thanks for having a look, either way.

Marc




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 02:02:19PM +0800, Su Yue wrote:
> I have figured out the bug is lowmem check can't deal with shared tree block
> in reloc tree. The fix is simple, you can try the follow repo:
> 
> https://github.com/Damenly/btrfs-progs/tree/tmp1

Not sure if I undertand that you meant, here.

> Please run lowmem check "without =--repair" first to be sure whether
> your filesystem is fine.
 
The filesystem is not fine, it caused btrfs balance to hang, whether
balance actually broke it further or caused the breakage, I can't say.

Then mount hangs, even with recovery, unless I use ro.

This filesystem is trash to me and will require over a week to rebuild
manually if I can't repair it.
Running check without repair for likely several days just to know that
my filesystem is not clear (I already know this) isn't useful :)
Or am I missing something?

> Though the bug and phenomenon are clear enough, before sending my patch,
> I have to make a test image. I have spent a week to study btrfs balance
> but it seems a liitle hard for me.

thanks for having a look, either way.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 01:48:17PM +0800, Qu Wenruo wrote:
> Just normal btrfs check, and post the output.
> If normal check eats up all your memory, btrfs check --mode=lowmem.
 
Does check without --repair eat less RAM?

> --repair should be considered as the last method.

If --repair doesn't work, check is useless to me sadly. I know that for
FS analysis and bug reporting, you want to have the FS without changing
it to something maybe worse, but for my use, if it can't be mounted and
can't be fixed, then it gets deleted which is even worse than check
doing the wrong thing.

> > The last two ERROR lines took over a day to get generated, so I'm not sure 
> > if it's still working, but just slowly.
> 
> OK, that explains something.
> 
> One extent is referred hundreds times, no wonder it will take a long time.
> 
> Just one tip here, there are really too many snapshots/reflinked files.
> It's highly recommended to keep the number of snapshots to a reasonable
> number (lower two digits).
> Although btrfs snapshot is super fast, it puts a lot of pressure on its
> extent tree, so there is no free lunch here.
 
Agreed, I doubt I have over or much over 100 snapshots though (but I
can't check right now).
Sadly I'm not allowed to mount even read only while check is running:
gargamel:~# mount -o ro /dev/mapper/dshelf2 /mnt/mnt2
mount: /dev/mapper/dshelf2 already mounted or /mnt/mnt2 busy

> > I see. Is there any reasonably easy way to check on this running process?
> 
> GDB attach would be good.
> Interrupt and check the inode number if it's checking fs tree.
> Check the extent bytenr number if it's checking extent tree.
> 
> But considering how many snapshots there are, it's really hard to determine.
> 
> In this case, the super large extent tree is causing a lot of problem,
> maybe it's a good idea to allow btrfs check to skip extent tree check?

I only see --init-extent-tree in the man page, which option did you have
in mind?

> > Then again, maybe it already fixed enough that I can mount my filesystem 
> > again.
> 
> This needs the initial btrfs check report and the kernel messages how it
> fails to mount.

mount command hangs, kernel does not show anything special outside of disk 
access hanging.

Jun 23 17:23:26 gargamel kernel: [  341.802696] BTRFS warning (device dm-2): 
'recovery' is deprecated, use 'useback
uproot' instead
Jun 23 17:23:26 gargamel kernel: [  341.828743] BTRFS info (device dm-2): 
trying to use backup root at mount time
Jun 23 17:23:26 gargamel kernel: [  341.850180] BTRFS info (device dm-2): disk 
space caching is enabled
Jun 23 17:23:26 gargamel kernel: [  341.869014] BTRFS info (device dm-2): has 
skinny extents
Jun 23 17:23:26 gargamel kernel: [  342.206289] BTRFS info (device dm-2): bdev 
/dev/mapper/dshelf2 errs: wr 0, rd 0 , flush 0, corrupt 2, gen 0
Jun 23 17:26:26 gargamel kernel: [  521.571392] BTRFS info (device dm-2): 
enabling ssd optimizations
Jun 23 17:55:58 gargamel kernel: [ 2293.914867] perf: interrupt took too long 
(2507 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
Jun 23 17:56:22 gargamel kernel: [ 2317.718406] BTRFS info (device dm-2): disk 
space caching is enabled
Jun 23 17:56:22 gargamel kernel: [ 2317.737277] BTRFS info (device dm-2): has 
skinny extents
Jun 23 17:56:22 gargamel kernel: [ 2318.069461] BTRFS info (device dm-2): bdev 
/dev/mapper/dshelf2 errs: wr 0, rd 0 , flush 0, corrupt 2, gen 0
Jun 23 17:59:22 gargamel kernel: [ 2498.256167] BTRFS info (device dm-2): 
enabling ssd optimizations
Jun 23 18:05:23 gargamel kernel: [ 2859.107057] BTRFS info (device dm-2): disk 
space caching is enabled
Jun 23 18:05:23 gargamel kernel: [ 2859.125883] BTRFS info (device dm-2): has 
skinny extents
Jun 23 18:05:24 gargamel kernel: [ 2859.448018] BTRFS info (device dm-2): bdev 
/dev/mapper/dshelf2 errs: wr 0, rd 0 , flush 0, corrupt 2, gen 0
Jun 23 18:08:23 gargamel kernel: [ 3039.023305] BTRFS info (device dm-2): 
enabling ssd optimizations
Jun 23 18:13:41 gargamel kernel: [ 3356.626037] perf: interrupt took too long 
(3143 > 3133), lowering kernel.perf_event_max_sample_rate to 63500
Jun 23 18:17:23 gargamel kernel: [ 3578.937225] Process accounting resumed
Jun 23 18:33:47 gargamel kernel: [ 4563.356252] JFS: nTxBlock = 8192, nTxLock = 
65536
Jun 23 18:33:48 gargamel kernel: [ 4563.446715] ntfs: driver 2.1.32 [Flags: R/W 
MODULE].
Jun 23 18:42:20 gargamel kernel: [ 5075.995254] INFO: task sync:20253 blocked 
for more than 120 seconds.
Jun 23 18:42:20 gargamel kernel: [ 5076.015729]   Not tainted 
4.17.2-amd64-preempt-sysrq-20180817 #1
Jun 23 18:42:20 gargamel kernel: [ 5076.036141] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 23 18:42:20 gargamel kernel: [ 5076.060637] syncD0 20253  
15327 0x20020080
Jun 23 18:42:20 gargamel kernel: [ 5076.078032] Call Trace:
Jun 23 18:42:20 gargamel kernel: [ 5076.086366]  ? __schedule+0x53e/0x59b
Jun 23 18:42:20 gargamel kernel: [ 5076.098311]  schedule+0x7f/0x98

Re: So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Su Yue




On 06/29/2018 01:28 PM, Marc MERLIN wrote:

On Fri, Jun 29, 2018 at 01:07:20PM +0800, Qu Wenruo wrote:

lowmem repair seems to be going still, but it's been days and -p seems
to do absolutely nothing.


I'm a afraid you hit a bug in lowmem repair code.
By all means, --repair shouldn't really be used unless you're pretty
sure the problem is something btrfs check can handle.

That's also why --repair is still marked as dangerous.
Especially when it's combined with experimental lowmem mode.


Understood, but btrfs got corrupted (by itself or not, I don't know)
I cannot mount the filesystem read/write
I cannot btrfs check --repair it since that code will kill my machine
What do I have left?


My filesystem is "only" 10TB or so, albeit with a lot of files.


Unless you have tons of snapshots and reflinked (deduped) files, it
shouldn't take so long.


I may have a fair amount.
gargamel:~# btrfs check --mode=lowmem --repair -p /dev/mapper/dshelf2
enabling repair mode
WARNING: low-memory mode repair support is only partial
Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
Fixed 0 roots.
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Created new chunk [18457780224000 1073741824]
Delete backref in extent [84302495744 69632]
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Delete backref in extent [84302495744 69632]
ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
owner: 374857, offset: 114540544) wanted: 181, have: 240
Delete backref in extent [125712527360 12214272]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
owner: 374857, offset: 148234240) wanted: 302, have: 431
Delete backref in extent [129952120832 20242432]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
owner: 374857, offset: 148234240) wanted: 356, have: 433
Delete backref in extent [129952120832 20242432]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
owner: 374857, offset: 180371456) wanted: 161, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
owner: 374857, offset: 180371456) wanted: 162, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
owner: 374857, offset: 192200704) wanted: 170, have: 249
Delete backref in extent [147895111680 12345344]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
owner: 374857, offset: 192200704) wanted: 172, have: 251
Delete backref in extent [147895111680 12345344]
ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, 
owner: 374857, offset: 217653248) wanted: 348, have: 418
Delete backref in extent [150850146304 17522688]
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, 
owner: 374857, offset: 235175936) wanted: 555, have: 1449
Deleted root 2 item[156909494272, 178, 5476627808561673095]
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
owner: 374857, offset: 235175936) wanted: 556, have: 1452
Deleted root 2 item[156909494272, 178, 7338474132555182983]
ERROR: file extent[374857 235184128] root 21872 owner 21872 backref lost
Add one extent data backref [156909494272 55320576]
ERROR: file extent[374857 235184128] root 22911 owner 22911 backref lost
Add one extent data backref [156909494272 55320576]


My bad.
It's almost possiblelly a bug about extent of lowmem check which
was reported by Chris too.
The extent check was wrong, the the repair did wrong things.

I have figured out the bug is lowmem check can't deal with shared tree 
block in reloc tree. The fix is simple, you can try the follow repo:


https://github.com/Damenly/btrfs-progs/tree/tmp1

Please run lowmem check "without =--repair" first to be sure whether
your filesystem is fine.

Though the bug and phenomenon are clear enough, before sending my patch,
I have to make a test image. I have spent a week to study btrfs balance
but it seems a liitle 

Re: So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Qu Wenruo


On 2018年06月29日 13:28, Marc MERLIN wrote:
> On Fri, Jun 29, 2018 at 01:07:20PM +0800, Qu Wenruo wrote:
>>> lowmem repair seems to be going still, but it's been days and -p seems
>>> to do absolutely nothing.
>>
>> I'm a afraid you hit a bug in lowmem repair code.
>> By all means, --repair shouldn't really be used unless you're pretty
>> sure the problem is something btrfs check can handle.
>>
>> That's also why --repair is still marked as dangerous.
>> Especially when it's combined with experimental lowmem mode.
> 
> Understood, but btrfs got corrupted (by itself or not, I don't know)
> I cannot mount the filesystem read/write
> I cannot btrfs check --repair it since that code will kill my machine
> What do I have left?

Just normal btrfs check, and post the output.
If normal check eats up all your memory, btrfs check --mode=lowmem.

--repair should be considered as the last method.

> 
>>> My filesystem is "only" 10TB or so, albeit with a lot of files.
>>
>> Unless you have tons of snapshots and reflinked (deduped) files, it
>> shouldn't take so long.
> 
> I may have a fair amount.
> gargamel:~# btrfs check --mode=lowmem --repair -p /dev/mapper/dshelf2 
> enabling repair mode
> WARNING: low-memory mode repair support is only partial
> Checking filesystem on /dev/mapper/dshelf2
> UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
> Fixed 0 roots.
> ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
> owner: 374857, offset: 3407872) wanted: 3, have: 4
> Created new chunk [18457780224000 1073741824]
> Delete backref in extent [84302495744 69632]
> ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
> owner: 374857, offset: 3407872) wanted: 3, have: 4
> Delete backref in extent [84302495744 69632]
> ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
> owner: 374857, offset: 114540544) wanted: 181, have: 240
> Delete backref in extent [125712527360 12214272]
> ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
> owner: 374857, offset: 126754816) wanted: 68, have: 115
> Delete backref in extent [125730848768 5111808]
> ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
> owner: 374857, offset: 126754816) wanted: 68, have: 115
> Delete backref in extent [125730848768 5111808]
> ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
> owner: 374857, offset: 131866624) wanted: 115, have: 143
> Delete backref in extent [125736914944 6037504]
> ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
> owner: 374857, offset: 131866624) wanted: 115, have: 143
> Delete backref in extent [125736914944 6037504]
> ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
> owner: 374857, offset: 148234240) wanted: 302, have: 431
> Delete backref in extent [129952120832 20242432]
> ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
> owner: 374857, offset: 148234240) wanted: 356, have: 433
> Delete backref in extent [129952120832 20242432]
> ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
> owner: 374857, offset: 180371456) wanted: 161, have: 240
> Delete backref in extent [134925357056 11829248]
> ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
> owner: 374857, offset: 180371456) wanted: 162, have: 240
> Delete backref in extent [134925357056 11829248]
> ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
> owner: 374857, offset: 192200704) wanted: 170, have: 249
> Delete backref in extent [147895111680 12345344]
> ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
> owner: 374857, offset: 192200704) wanted: 172, have: 251
> Delete backref in extent [147895111680 12345344]
> ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, 
> owner: 374857, offset: 217653248) wanted: 348, have: 418
> Delete backref in extent [150850146304 17522688]
> ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, 
> owner: 374857, offset: 235175936) wanted: 555, have: 1449
> Deleted root 2 item[156909494272, 178, 5476627808561673095]
> ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
> owner: 374857, offset: 235175936) wanted: 556, have: 1452
> Deleted root 2 item[156909494272, 178, 7338474132555182983]
> ERROR: file extent[374857 235184128] root 21872 owner 21872 backref lost
> Add one extent data backref [156909494272 55320576]
> ERROR: file extent[374857 235184128] root 22911 owner 22911 backref lost
> Add one extent data backref [156909494272 55320576]
> 
> The last two ERROR lines took over a day to get generated, so I'm not sure if 
> it's still working, but just slowly.

OK, that explains something.

One extent is referred hundreds times, no wonder it will take a long time.

Just one tip here, there are really too many 

Re: So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 01:35:06PM +0800, Su Yue wrote:
> > It's hard to estimate, especially when every cross check involves a lot
> > of disk IO.
> > 
> > But at least, we could add such indicator to show we're doing something.
> > Maybe we can account all roots in root tree first, before checking a
> tree, report i/num_roots. So users can see the what is the check doing
> something meaningful or silly dead looping.

Sounds reasonable.
Do you want to submit something in git master for btrfs-progs, I pull
it, and just my btrfs check again?

In the meantime, how sane does the output I just posted, look?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Su Yue




On 06/29/2018 01:07 PM, Qu Wenruo wrote:



On 2018年06月29日 12:27, Marc MERLIN wrote:

Regular btrfs check --repair has a nice progress option. It wasn't
perfect, but it showed something.

But then it also takes all your memory quicker than the linux kernel can
defend itself and reliably completely kills my 32GB server quicker than
it can OOM anything.

lowmem repair seems to be going still, but it's been days and -p seems
to do absolutely nothing.


I'm a afraid you hit a bug in lowmem repair code.
By all means, --repair shouldn't really be used unless you're pretty
sure the problem is something btrfs check can handle.

That's also why --repair is still marked as dangerous.
Especially when it's combined with experimental lowmem mode.



My filesystem is "only" 10TB or so, albeit with a lot of files.


Unless you have tons of snapshots and reflinked (deduped) files, it
shouldn't take so long.



2 things that come to mind
1) can lowmem have some progress working so that I know if I'm looking
at days, weeks, or even months before it will be done?


It's hard to estimate, especially when every cross check involves a lot
of disk IO.

But at least, we could add such indicator to show we're doing something.
Maybe we can account all roots in root tree first, before checking a

tree, report i/num_roots. So users can see the what is the check doing
something meaningful or silly dead looping.

Thanks,
Su



2) non lowmem is more efficient obviously when it doesn't completely
crash your machine, but could lowmem be given an amount of memory to use
for caching, or maybe use some heuristics based on RAM free so that it's
not so excrutiatingly slow?


IIRC recent commit has added the ability.
a5ce5d219822 ("btrfs-progs: extent-cache: actually cache extent buffers")

That's already included in btrfs-progs v4.13.2.
So it should be a dead loop which lowmem repair code can't handle.

Thanks,
Qu



Thanks,
Marc






--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 01:07:20PM +0800, Qu Wenruo wrote:
> > lowmem repair seems to be going still, but it's been days and -p seems
> > to do absolutely nothing.
> 
> I'm a afraid you hit a bug in lowmem repair code.
> By all means, --repair shouldn't really be used unless you're pretty
> sure the problem is something btrfs check can handle.
> 
> That's also why --repair is still marked as dangerous.
> Especially when it's combined with experimental lowmem mode.

Understood, but btrfs got corrupted (by itself or not, I don't know)
I cannot mount the filesystem read/write
I cannot btrfs check --repair it since that code will kill my machine
What do I have left?

> > My filesystem is "only" 10TB or so, albeit with a lot of files.
> 
> Unless you have tons of snapshots and reflinked (deduped) files, it
> shouldn't take so long.

I may have a fair amount.
gargamel:~# btrfs check --mode=lowmem --repair -p /dev/mapper/dshelf2 
enabling repair mode
WARNING: low-memory mode repair support is only partial
Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
Fixed 0 roots.
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Created new chunk [18457780224000 1073741824]
Delete backref in extent [84302495744 69632]
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Delete backref in extent [84302495744 69632]
ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
owner: 374857, offset: 114540544) wanted: 181, have: 240
Delete backref in extent [125712527360 12214272]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
owner: 374857, offset: 148234240) wanted: 302, have: 431
Delete backref in extent [129952120832 20242432]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
owner: 374857, offset: 148234240) wanted: 356, have: 433
Delete backref in extent [129952120832 20242432]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
owner: 374857, offset: 180371456) wanted: 161, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
owner: 374857, offset: 180371456) wanted: 162, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
owner: 374857, offset: 192200704) wanted: 170, have: 249
Delete backref in extent [147895111680 12345344]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
owner: 374857, offset: 192200704) wanted: 172, have: 251
Delete backref in extent [147895111680 12345344]
ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, 
owner: 374857, offset: 217653248) wanted: 348, have: 418
Delete backref in extent [150850146304 17522688]
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, 
owner: 374857, offset: 235175936) wanted: 555, have: 1449
Deleted root 2 item[156909494272, 178, 5476627808561673095]
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
owner: 374857, offset: 235175936) wanted: 556, have: 1452
Deleted root 2 item[156909494272, 178, 7338474132555182983]
ERROR: file extent[374857 235184128] root 21872 owner 21872 backref lost
Add one extent data backref [156909494272 55320576]
ERROR: file extent[374857 235184128] root 22911 owner 22911 backref lost
Add one extent data backref [156909494272 55320576]

The last two ERROR lines took over a day to get generated, so I'm not sure if 
it's still working, but just slowly.
For what it's worth non lowmem check used to take 12 to 24H on that filesystem 
back when it still worked.

> > 2 things that come to mind
> > 1) can lowmem have some progress working so that I know if I'm looking
> > at days, weeks, or even months before it will be done?
> 
> It's hard to estimate, especially when every cross check involves a lot
> of disk IO.
> But at least, we could add such indicator to show we're doing something.

Yes, anything to show that I should still wait is still good :)

> > 2) non lowmem 

Re: So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Qu Wenruo


On 2018年06月29日 12:27, Marc MERLIN wrote:
> Regular btrfs check --repair has a nice progress option. It wasn't
> perfect, but it showed something.
> 
> But then it also takes all your memory quicker than the linux kernel can
> defend itself and reliably completely kills my 32GB server quicker than
> it can OOM anything.
> 
> lowmem repair seems to be going still, but it's been days and -p seems
> to do absolutely nothing.

I'm a afraid you hit a bug in lowmem repair code.
By all means, --repair shouldn't really be used unless you're pretty
sure the problem is something btrfs check can handle.

That's also why --repair is still marked as dangerous.
Especially when it's combined with experimental lowmem mode.

> 
> My filesystem is "only" 10TB or so, albeit with a lot of files.

Unless you have tons of snapshots and reflinked (deduped) files, it
shouldn't take so long.

> 
> 2 things that come to mind
> 1) can lowmem have some progress working so that I know if I'm looking
> at days, weeks, or even months before it will be done?

It's hard to estimate, especially when every cross check involves a lot
of disk IO.

But at least, we could add such indicator to show we're doing something.

> 
> 2) non lowmem is more efficient obviously when it doesn't completely
> crash your machine, but could lowmem be given an amount of memory to use
> for caching, or maybe use some heuristics based on RAM free so that it's
> not so excrutiatingly slow?

IIRC recent commit has added the ability.
a5ce5d219822 ("btrfs-progs: extent-cache: actually cache extent buffers")

That's already included in btrfs-progs v4.13.2.
So it should be a dead loop which lowmem repair code can't handle.

Thanks,
Qu

> 
> Thanks,
> Marc
> 



signature.asc
Description: OpenPGP digital signature


So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Marc MERLIN
Regular btrfs check --repair has a nice progress option. It wasn't
perfect, but it showed something.

But then it also takes all your memory quicker than the linux kernel can
defend itself and reliably completely kills my 32GB server quicker than
it can OOM anything.

lowmem repair seems to be going still, but it's been days and -p seems
to do absolutely nothing.

My filesystem is "only" 10TB or so, albeit with a lot of files.

2 things that come to mind
1) can lowmem have some progress working so that I know if I'm looking
at days, weeks, or even months before it will be done?

2) non lowmem is more efficient obviously when it doesn't completely
crash your machine, but could lowmem be given an amount of memory to use
for caching, or maybe use some heuristics based on RAM free so that it's
not so excrutiatingly slow?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html