How to repair errors only found with check --mode=lowmem

2017-12-16 Thread Tom Hale
The following shows that errors are found with check --mode=lowmem, but 
are not picked up without lowmem.


How would I go about fixing errors only reported by lowmem?



[manjaro manjaro]# btrfs check --mode=lowmem --progress 
/dev/mapper/vg_svelte-home

Checking filesystem on /dev/mapper/vg_svelte-home
UUID: 93722fa7-7e8f-418a-a7ca-080aca8db94b
ERROR: extent[691815358464, 11042816] referencer count mismatch (root: 
257, owner: 1869679, offset: 613974016) wanted: 1, have: 2
ERROR: extent[720156536832, 99430400] referencer count mismatch (root: 
257, owner: 758215, offset: 1610616832) wanted: 8, have: 379
ERROR: extent[720669147136, 268435456] referencer count mismatch (root: 
257, owner: 758215, offset: 4096) wanted: 86, have: 1021
ERROR: extent[720669147136, 268435456] referencer count mismatch (root: 
257, owner: 1767807, offset: 4096) wanted: 87, have: 1021
ERROR: extent[726724722688, 64069632] referencer count mismatch (root: 
257, owner: 1480823, offset: 99090432) wanted: 1, have: 5
ERROR: extent[737910194176, 134217728] referencer count mismatch (root: 
257, owner: 1480726, offset: 268435456) wanted: 1, have: 8
ERROR: extent[738077896704, 134217728] referencer count mismatch (root: 
257, owner: 1869696, offset: 402653184) wanted: 5, have: 8
ERROR: extent[744334426112, 268435456] referencer count mismatch (root: 
257, owner: 1767802, offset: 0) wanted: 111, have: 294
ERROR: extent[824948670464, 1671168] referencer count mismatch (root: 
257, owner: 2000876, offset: 247861248) wanted: 16, have: 26

ERROR: data extent[681550843904 8192] backref lost
ERROR: errors found in extent allocation tree or chunk allocation
cache and super generation don't match, space cache will be invalidated
ERROR: errors found in fs roots
found 172094545920 bytes used, error(s) found
total csum bytes: 165679768
total tree bytes: 3066789888
total fs tree bytes: 2751315968
total extent tree bytes: 112295936
btree space waste bytes: 568660274
file data blocks allocated: 8158426562560
 referenced 597269540864
[manjaro manjaro]# btrfs check --repair  --progress 
/dev/mapper/vg_svelte-home

enabling repair mode
Checking filesystem on /dev/mapper/vg_svelte-home
UUID: 93722fa7-7e8f-418a-a7ca-080aca8db94b
checking extents [.]
Fixed 0 roots.
cache and super generation don't match, space cache will be invalidated
checking fs roots [o]
checking csums
checking root refs
found 172091723777 bytes used, no error found
total csum bytes: 165679768
total tree bytes: 2003206144
total fs tree bytes: 1687732224
total extent tree bytes: 112295936
btree space waste bytes: 346177995
file data blocks allocated: 5952940838912
 referenced 474676633600
[manjaro manjaro]#



--
Tom Hale
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: check: "warning line 4144"

2017-10-25 Thread Tom Hale


On 19/10/17 15:58, Qu Wenruo wrote:
> On 2017年10月19日 16:53, Tom Hale wrote:
>> In running btrfs check, I got the following message:
>>
>> warning line 4144
>>
>> Could this be a little more descriptive?
>>
>> * Does it mean I should rebuild my FS from scratch?
>> * Is there anything I can do to remove this warning?
>>
> 
> --repair is dangerous, use it unless you're sure the problem can be
> fixed by it.
> 
> What's the output of "btrfs check" and "btrfs check --mode=lowmem" after
> doing the repair?

Plain `btrfs check` gives me:

checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
Checking filesystem on /dev/mapper/vg_svelte-home
UUID: 93722fa7-7e8f-418a-a7ca-080aca8db94b
found 192318169092 bytes used, no error found
total csum bytes: 185482112
total tree bytes: 1924104192
total fs tree bytes: 1597767680
total extent tree bytes: 102514688
btree space waste bytes: 325265468
file data blocks allocated: 6169163599872
 referenced 575636733952

Could somebody please answer my initial questions about this obscure
warning?

Thanks,

-- 
Tom Hale
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


check: "warning line 4144"

2017-10-19 Thread Tom Hale
In running btrfs check, I got the following message:

warning line 4144

Could this be a little more descriptive?

* Does it mean I should rebuild my FS from scratch?
* Is there anything I can do to remove this warning?

Complete output below:

==
$ sudo btrfs check --repair -p /dev/mapper/fix-backup
enabling repair mode
Checking filesystem on /dev/mapper/fix-backup
UUID: 0f5b7713-929d-41e7-b214-32500b5c77fc
ref mismatch on [195215851520 16384] extent item 0, found 1
Backref 195215851520 parent 1463 root 1463 not found in extent tree
backpointer mismatch on [195215851520 16384]
owner ref check failed [195215851520 16384]
repair deleting extent record: key 195215851520 169 1
adding new tree backref on start 195215851520 len 16384 parent 0 root 1463
Repaired extent references for 195215851520

Fixed 0 roots.
cache and super generation don't match, space cache will be invalidated
warning line 4144 [o]

checking csums
checking root refs
found 294913892352 bytes used, no error found
total csum bytes: 282955440
total tree bytes: 5166727168
total fs tree bytes: 4489232384
total extent tree bytes: 353075200
btree space waste bytes: 886405127
file data blocks allocated: 7930446508032
 referenced 1126675288064
==

-- 
Tom Hale
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "Corrected" errors persist after scrubbing

2017-05-16 Thread Tom Hale
nt/tmp
# mdadm --stop /dev/md127
mdadm: stopped /dev/md127
# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
# losetup -d /dev/loop[01]
#

-- 

Tom Hale
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 132972544 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 107016192 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 340852736 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 366804992 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 392626176 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 314896384 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 418050048 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 434962432 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 486998016 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 512688128 on dev /dev/md127
May 16 13:10:20 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 132972544 on dev /dev/md127
May 16 13:10:20 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 107016192 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 340852736 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 366804992 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 392626176 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 314896384 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 418050048 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 434962432 on dev /dev/md127
May 16 13:10:22 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 486998016 on dev /dev/md127
May 16 13:10:22 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 512688128 on dev /dev/md127
May 16 13:04:30 svelte kernel: BTRFS info (device md127): disk space caching is 
enabled
May 16 13:04:30 svelte kernel: BTRFS info (device md127): has skinny extents
May 16 13:07:57 svelte kernel: BTRFS warning (device md127): i/o error at 
logical 132972544 on dev /dev/md127, sector 220800, root 5, inode 261, offset 
305774592, length 4096, links 1 (path: rand)
May 16 13:07:58 svelte kernel: BTRFS error (device md127): bdev /dev/md127 
errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
May 16 13:07:58 svelte kernel: BTRFS warning (device md127): i/o error at 
logical 107016192 on dev /dev/md127, sector 170104, root 5, inode 261, offset 
278917120, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 
errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 132972544 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 107016192 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at 
logical 340852736 on dev /dev/md127, sector 313984, root 5, inode 261, offset 
196018176, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 
errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at 
logical 366804992 on dev /dev/md127, sector 364672, root 5, inode 261, offset 
221970432, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 
errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 340852736 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at 
logical 392626176 on dev /dev/md127, sector 415104, root 5, inode 261, offset 
247791616, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 
errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 366804992 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at 
logical 392626176 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at 
logical 314896384 on dev /dev/md127, sector 263288, root 5, inode 261, offset 
170061824, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (de

"Corrected" errors persist after scrubbing

2017-05-06 Thread Tom Hale
Below (and also attached because of formatting) is an example of `btrfs
scrub` incorrectly reporting that errors have been corrected.

In this example, /dev/md127 is the device created by running:
mdadm --build /dev/md0 --level=faulty --raid-devices=1 /dev/loop0

The filesystem is RAID1.

# mdadm --grow /dev/md0 --layout=rp400
layout for /dev/md0 set to 12803
# btrfs scrub start -Bd /mnt/tmp
scrub device /dev/md127 (id 1) done
scrub started at Fri May  5 19:23:54 2017 and finished after
00:00:01
total bytes scrubbed: 200.47MiB with 8 errors
error details: read=8
corrected errors: 8, uncorrectable errors: 0, unverified errors: 248
scrub device /dev/loop1 (id 2) done
scrub started at Fri May  5 19:23:54 2017 and finished after
00:00:01
total bytes scrubbed: 200.47MiB with 0 errors
WARNING: errors detected during scrubbing, corrected
# ### But the errors haven't really been corrected, they're still there:
# mdadm --grow /dev/md0 --layout=clear # Stop producing additional errors
layout for /dev/md0 set to 31
# btrfs scrub start -Bd /mnt/tmp
scrub device /dev/md127 (id 1) done
scrub started at Fri May  5 19:24:24 2017 and finished after
00:00:00
total bytes scrubbed: 200.47MiB with 8 errors
error details: read=8
corrected errors: 8, uncorrectable errors: 0, unverified errors: 248
scrub device /dev/loop1 (id 2) done
scrub started at Fri May  5 19:24:24 2017 and finished after
00:00:00
total bytes scrubbed: 200.47MiB with 0 errors
WARNING: errors detected during scrubbing, corrected
#

Since scrub is checking for read issues, I expect that it would read any
corrections before asserting that they have indeed been corrected.

I understand that HDDs have a pool of non-LBA-addressable sectors set
aside to mask bad physical sectors, but this pool size is fixed by the
manufacturer (who makes money from sales of new drives).

However, I don't believe it is sufficient to blindly trust that the
underlying  HDD still has spare reallocatable sectors or that the
hardware will always correctly write data, given the verification and
fixing intention of scrub.

At a minimum, shouldn't these 8 "corrected errors" be listed as
"uncorrectable errors" to inform the sysadmin that data integrity has
degraded (e.g. in this RAID1 example the data is no longer duplicated)?

Ideally, I would hope that the blocks with uncorrectable errors are
marked as bad and fresh blocks are used to maintain integrity.

-- 
Regards,

Tom Hale
# mdadm --grow /dev/md0 --layout=rp400
layout for /dev/md0 set to 12803
# btrfs scrub start -Bd /mnt/tmp
scrub device /dev/md127 (id 1) done
scrub started at Fri May  5 19:23:54 2017 and finished after 00:00:01
total bytes scrubbed: 200.47MiB with 8 errors
error details: read=8
corrected errors: 8, uncorrectable errors: 0, unverified errors: 248
scrub device /dev/loop1 (id 2) done
scrub started at Fri May  5 19:23:54 2017 and finished after 00:00:01
total bytes scrubbed: 200.47MiB with 0 errors
WARNING: errors detected during scrubbing, corrected
# ### But the errors haven't really been corrected, they're still there:
# mdadm --grow /dev/md0 --layout=clear # Stop producing additional errors
layout for /dev/md0 set to 31
# btrfs scrub start -Bd /mnt/tmp
scrub device /dev/md127 (id 1) done
scrub started at Fri May  5 19:24:24 2017 and finished after 00:00:00
total bytes scrubbed: 200.47MiB with 8 errors
error details: read=8
corrected errors: 8, uncorrectable errors: 0, unverified errors: 248
scrub device /dev/loop1 (id 2) done
scrub started at Fri May  5 19:24:24 2017 and finished after 00:00:00
total bytes scrubbed: 200.47MiB with 0 errors
WARNING: errors detected during scrubbing, corrected
#