Re: How to recover from btrfs scrub errors? (uncorrectable errors, checksum error at logical)

2018-10-22 Thread Qu Wenruo


On 2018/10/22 下午2:29, Otto Kekäläinen wrote:
> I never got a reply to this thread, 

I replied to you but got no rely:

https://lore.kernel.org/linux-btrfs/eba5de6f-535a-0f5d-e415-9cd622d71...@gmx.com/

And your steps are just what I suggested.

Thanks,
Qu

> but I am not replying to myself in
> case somebody has the same issue and is reading the archive:
> 
> The problem went away after:
> - deleted all snapshots as they seemed to slow down btrfs I/O so much
> that simple commands like rm and rsync were unusable
> - replaced the disk that had the corrupted file (just in case -
> smartctl did not indicate any disk failures) with btrfs replace
> - rsynced files from another location to this filesystem so that the
> corrupted files got overwritten
> 
> Now btrfs scrub does not find any corruption anymore and the
> filesystem I/O speed is usable, though still slower than what it used
> to be in the past.
> 
> ma 15. lokak. 2018 klo 10.50 Otto Kekäläinen (o...@seravo.fi) kirjoitti:
>>
>> Hello!
>>
>> I am trying to figure out how to recover from errors detected by btrfs scrub.
>>
>> Scrub status reports:
>>
>> scrub status for 4f4479d5-648a-45b9-bcbf-978c766aeb41
>> scrub started at Mon Oct 15 10:02:28 2018, running for 00:35:39
>> total bytes scrubbed: 791.15GiB with 18 errors
>> error details: csum=18
>> corrected errors: 0, uncorrectable errors: 18, unverified errors: 0
>>
>> Kernel log contains lines like
>>
>>   BTRFS warning (device dm-8): checksum error at logical 7351706472448 on dev
>>   /dev/mapper/disk6tb, sector 61412648, root 12725, inode 152358265,
>> offset 483328:
>>   path resolving failed with ret=-2
>>
>> I've tried so far:
>> - deleting the files (when path is visible)
>> - overwriting the files with new data
>> - changed disk (with btrfs replace)
>>
>> The checksum errors however persist.
>> How do I get rid of them?
>>
>>
>> The files are logs and other non-vital information. I am fine by
>> deleting the corrupted files. It is OK to recover so that I loose a
>> few gigabytes of data, but not the entire filesystem.
>>
>> Setup is a multi-disk btrfs filesystem, data single, metadata RAID-1
>> Mounted with:
>>
>> /dev/mapper/wdc3td on /data type btrfs
>> (rw,noatime,compress=lzo,space_cache,subvolid=5,subvol=/)
>>
>> I've read lots of online sources on the topic but none of these help
>> me on how to recover from the current state:
>>
>> https://btrfs.wiki.kernel.org/index.php/Btrfsck
>> http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html
>> https://wiki.archlinux.org/index.php/Identify_damaged_files#Find_damaged_files
> 
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: How to recover from btrfs scrub errors? (uncorrectable errors, checksum error at logical)

2018-10-22 Thread Otto Kekäläinen
I never got a reply to this thread, but I am not replying to myself in
case somebody has the same issue and is reading the archive:

The problem went away after:
- deleted all snapshots as they seemed to slow down btrfs I/O so much
that simple commands like rm and rsync were unusable
- replaced the disk that had the corrupted file (just in case -
smartctl did not indicate any disk failures) with btrfs replace
- rsynced files from another location to this filesystem so that the
corrupted files got overwritten

Now btrfs scrub does not find any corruption anymore and the
filesystem I/O speed is usable, though still slower than what it used
to be in the past.

ma 15. lokak. 2018 klo 10.50 Otto Kekäläinen (o...@seravo.fi) kirjoitti:
>
> Hello!
>
> I am trying to figure out how to recover from errors detected by btrfs scrub.
>
> Scrub status reports:
>
> scrub status for 4f4479d5-648a-45b9-bcbf-978c766aeb41
> scrub started at Mon Oct 15 10:02:28 2018, running for 00:35:39
> total bytes scrubbed: 791.15GiB with 18 errors
> error details: csum=18
> corrected errors: 0, uncorrectable errors: 18, unverified errors: 0
>
> Kernel log contains lines like
>
>   BTRFS warning (device dm-8): checksum error at logical 7351706472448 on dev
>   /dev/mapper/disk6tb, sector 61412648, root 12725, inode 152358265,
> offset 483328:
>   path resolving failed with ret=-2
>
> I've tried so far:
> - deleting the files (when path is visible)
> - overwriting the files with new data
> - changed disk (with btrfs replace)
>
> The checksum errors however persist.
> How do I get rid of them?
>
>
> The files are logs and other non-vital information. I am fine by
> deleting the corrupted files. It is OK to recover so that I loose a
> few gigabytes of data, but not the entire filesystem.
>
> Setup is a multi-disk btrfs filesystem, data single, metadata RAID-1
> Mounted with:
>
> /dev/mapper/wdc3td on /data type btrfs
> (rw,noatime,compress=lzo,space_cache,subvolid=5,subvol=/)
>
> I've read lots of online sources on the topic but none of these help
> me on how to recover from the current state:
>
> https://btrfs.wiki.kernel.org/index.php/Btrfsck
> http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html
> https://wiki.archlinux.org/index.php/Identify_damaged_files#Find_damaged_files



-- 
Otto Kekäläinen
CEO
Seravo
+358 44 566 2204

Follow me at @ottokekalainen


Re: How to recover from btrfs scrub errors? (uncorrectable errors, checksum error at logical)

2018-10-15 Thread Qu Wenruo


On 2018/10/15 下午3:50, Otto Kekäläinen wrote:
> Hello!
> 
> I am trying to figure out how to recover from errors detected by btrfs scrub.
> 
> Scrub status reports:
> 
> scrub status for 4f4479d5-648a-45b9-bcbf-978c766aeb41
> scrub started at Mon Oct 15 10:02:28 2018, running for 00:35:39
> total bytes scrubbed: 791.15GiB with 18 errors
> error details: csum=18
> corrected errors: 0, uncorrectable errors: 18, unverified errors: 0
> 
> Kernel log contains lines like
> 
>   BTRFS warning (device dm-8): checksum error at logical 7351706472448 on dev
>   /dev/mapper/disk6tb, sector 61412648, root 12725, inode 152358265,
> offset 483328:
>   path resolving failed with ret=-2
> 
> I've tried so far:
> - deleting the files (when path is visible)

Please ensure there are no other subvolumes/snapshots containing the
same file or reflink to it.

If path is not visible, please use the root and inode number to locate
the culprit file.
"find" command support to search using inode number.
And "btrfs subvolume list" command will show the subvolume number.

Also it's recommended to sync the fs before scrub, in case culprit inode
only get orphaned but not deleted from disk.

> - overwriting the files with new data

If you're only overwriting the culprit sector, it could get CoWed and
the original data extent is still there.

You need to ensure the old data is not referred by any other root/inode.
Please ensure there is no reflink/snapshot first.

Then delete the file or overwrite the whole culprit file.

Thanks,
Qu

> - changed disk (with btrfs replace)
> 
> The checksum errors however persist.
> How do I get rid of them?
> 
> 
> The files are logs and other non-vital information. I am fine by
> deleting the corrupted files. It is OK to recover so that I loose a
> few gigabytes of data, but not the entire filesystem.
> 
> Setup is a multi-disk btrfs filesystem, data single, metadata RAID-1
> Mounted with:
> 
> /dev/mapper/wdc3td on /data type btrfs
> (rw,noatime,compress=lzo,space_cache,subvolid=5,subvol=/)
> 
> I've read lots of online sources on the topic but none of these help
> me on how to recover from the current state:
> 
> https://btrfs.wiki.kernel.org/index.php/Btrfsck
> http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html
> https://wiki.archlinux.org/index.php/Identify_damaged_files#Find_damaged_files
> 



signature.asc
Description: OpenPGP digital signature


How to recover from btrfs scrub errors? (uncorrectable errors, checksum error at logical)

2018-10-15 Thread Otto Kekäläinen
Hello!

I am trying to figure out how to recover from errors detected by btrfs scrub.

Scrub status reports:

scrub status for 4f4479d5-648a-45b9-bcbf-978c766aeb41
scrub started at Mon Oct 15 10:02:28 2018, running for 00:35:39
total bytes scrubbed: 791.15GiB with 18 errors
error details: csum=18
corrected errors: 0, uncorrectable errors: 18, unverified errors: 0

Kernel log contains lines like

  BTRFS warning (device dm-8): checksum error at logical 7351706472448 on dev
  /dev/mapper/disk6tb, sector 61412648, root 12725, inode 152358265,
offset 483328:
  path resolving failed with ret=-2

I've tried so far:
- deleting the files (when path is visible)
- overwriting the files with new data
- changed disk (with btrfs replace)

The checksum errors however persist.
How do I get rid of them?


The files are logs and other non-vital information. I am fine by
deleting the corrupted files. It is OK to recover so that I loose a
few gigabytes of data, but not the entire filesystem.

Setup is a multi-disk btrfs filesystem, data single, metadata RAID-1
Mounted with:

/dev/mapper/wdc3td on /data type btrfs
(rw,noatime,compress=lzo,space_cache,subvolid=5,subvol=/)

I've read lots of online sources on the topic but none of these help
me on how to recover from the current state:

https://btrfs.wiki.kernel.org/index.php/Btrfsck
http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html
https://wiki.archlinux.org/index.php/Identify_damaged_files#Find_damaged_files