Re: Please help. Repair probably bitflip damage and suspected bug

2017-06-20 Thread Chris Murphy
>
[Sun Jun 18 04:02:43 2017] BTRFS critical (device sdb2): corrupt node,
bad key order: block=5123372711936, root=1, slot=82


>From the archives, most likely it's bad RAM. I see this system also
uses XFS v4 file system, if it were made as XFS v5 using metadata
csums you'd probably eventually run into a similar problem that would
be caught by metadata checksum errors. It'll fail faster with Btrfs
because it's checksumming everything.


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please help. Repair probably bitflip damage and suspected bug

2017-06-19 Thread Jesse
I just noticed a series of seemingly btrfs related call traces that
for the first time, did not lock up the system.

I have uploaded dmesg to https://paste.ee/p/An8Qy

Anyone able to help advise on these?

Thanks

Jesse


On 19 June 2017 at 17:19, Jesse  wrote:
> Further to the above message reporting problems, I have been able to
> capture a call trace under the main system rather than live media.
>
> Note this occurred in rsync from btrfs to a separate drive running xfs
> on a local filesystem (both sata drives). So I presume that btrfs is
> only reading the drive at the time of crash, unless rsync is also
> doing some sort of disc caching of the files to btrfs as it is the OS
> filesystem.
>
> The destination drive directories being copied to in this case were
> empty, so I was making a copy of the data off of the btrfs drive (due
> to the btrfs tree errors and problems reported in the post I am here
> replying to).
>
> I am suspecting that there is a direct correlation to using rsync
> while (or subsequent to) touching areas of the btrfs tree that have
> corruption which results in a complete system lockup/crash.
>
> I have also noted that when these crashes while running rsync occur,
> the prior x files (eg: 10 files) show in the rsync log as being
> synced, however, show on the destination drive with filesize of zero.
>
> The trace (/var/log/messages | grep btrfs) I have uploaded to
> https://paste.ee/p/nRcj0
>
> The important part of which is:
>
> Jun 18 23:43:24 Orion vmunix: [38084.183174] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.183195] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.183209] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.183222] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.217552] BTRFS info (device sda2):
> csum failed ino 12497 extent 1700305813504 csum 1405070872 wanted 0
> mirror 0
> Jun 18 23:43:24 Orion vmunix: [38084.217626] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.217643] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.217657] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.217669] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix:  auth_rpcgss nfs_acl nfs lockd grace
> sunrpc fscache zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE)
> spl(OE) zavl(POE) btrfs xor raid6_pq dm_mirror dm_region_hash dm_log
> hid_generic usbhid hid uas usb_storage radeon i2c_algo_bit ttm
> drm_kms_helper drm r8169 ahci mii libahci wmi
> Jun 18 23:43:24 Orion vmunix: [38084.220604] Workqueue: btrfs-endio
> btrfs_endio_helper [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.220812] RIP:
> 0010:[]  []
> __btrfs_map_block+0x32a/0x1180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.222459]  [] ?
> __btrfs_lookup_bio_sums.isra.8+0x3e0/0x540 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.222632]  []
> btrfs_map_bio+0x7d/0x2b0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.222781]  []
> btrfs_submit_compressed_read+0x484/0x4e0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.222948]  []
> btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.223198]  [] ?
> btrfs_create_repair_bio+0xf0/0x110 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.223360]  []
> bio_readpage_error+0x117/0x180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.223514]  [] ?
> clean_io_failure+0x1b0/0x1b0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.223667]  []
> end_bio_extent_readpage+0x3be/0x3f0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.223996]  []
> end_workqueue_fn+0x48/0x60 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.224145]  []
> normal_work_helper+0x82/0x210 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.224297]  []
> btrfs_endio_helper+0x12/0x20 [btrfs]
> Jun 18 23:43:24 Orion vmunix:  auth_rpcgss nfs_acl nfs lockd grace
> sunrpc fscache zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE)
> spl(OE) zavl(POE) btrfs xor raid6_pq dm_mirror dm_region_hash dm_log
> hid_generic usbhid hid uas usb_storage radeon i2c_algo_bit ttm
> drm_kms_helper drm r8169 ahci mii libahci wmi
> Jun 18 23:43:24 Orion vmunix: [38084.330053]  [] ?
> __btrfs_map_block+0x32a/0x1180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330106]  [] ?
> __btrfs_map_block+0x2cc/0x1180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330154]  [] ?
> __btrfs_lookup_bio_sums.isra.8+0x3e0/0x540 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330205]  []
> btrfs_map_bio+0x7d/0x2b0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330257]  []
> btrfs_submit_compressed_read+0x484/0x4e0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330304]  []
> 

Re: Please help. Repair probably bitflip damage and suspected bug

2017-06-19 Thread Jesse
Further to the above message reporting problems, I have been able to
capture a call trace under the main system rather than live media.

Note this occurred in rsync from btrfs to a separate drive running xfs
on a local filesystem (both sata drives). So I presume that btrfs is
only reading the drive at the time of crash, unless rsync is also
doing some sort of disc caching of the files to btrfs as it is the OS
filesystem.

The destination drive directories being copied to in this case were
empty, so I was making a copy of the data off of the btrfs drive (due
to the btrfs tree errors and problems reported in the post I am here
replying to).

I am suspecting that there is a direct correlation to using rsync
while (or subsequent to) touching areas of the btrfs tree that have
corruption which results in a complete system lockup/crash.

I have also noted that when these crashes while running rsync occur,
the prior x files (eg: 10 files) show in the rsync log as being
synced, however, show on the destination drive with filesize of zero.

The trace (/var/log/messages | grep btrfs) I have uploaded to
https://paste.ee/p/nRcj0

The important part of which is:

Jun 18 23:43:24 Orion vmunix: [38084.183174] BTRFS info (device sda2):
no csum found for inode 12497 start 0
Jun 18 23:43:24 Orion vmunix: [38084.183195] BTRFS info (device sda2):
no csum found for inode 12497 start 0
Jun 18 23:43:24 Orion vmunix: [38084.183209] BTRFS info (device sda2):
no csum found for inode 12497 start 0
Jun 18 23:43:24 Orion vmunix: [38084.183222] BTRFS info (device sda2):
no csum found for inode 12497 start 0
Jun 18 23:43:24 Orion vmunix: [38084.217552] BTRFS info (device sda2):
csum failed ino 12497 extent 1700305813504 csum 1405070872 wanted 0
mirror 0
Jun 18 23:43:24 Orion vmunix: [38084.217626] BTRFS info (device sda2):
no csum found for inode 12497 start 0
Jun 18 23:43:24 Orion vmunix: [38084.217643] BTRFS info (device sda2):
no csum found for inode 12497 start 0
Jun 18 23:43:24 Orion vmunix: [38084.217657] BTRFS info (device sda2):
no csum found for inode 12497 start 0
Jun 18 23:43:24 Orion vmunix: [38084.217669] BTRFS info (device sda2):
no csum found for inode 12497 start 0
Jun 18 23:43:24 Orion vmunix:  auth_rpcgss nfs_acl nfs lockd grace
sunrpc fscache zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE)
spl(OE) zavl(POE) btrfs xor raid6_pq dm_mirror dm_region_hash dm_log
hid_generic usbhid hid uas usb_storage radeon i2c_algo_bit ttm
drm_kms_helper drm r8169 ahci mii libahci wmi
Jun 18 23:43:24 Orion vmunix: [38084.220604] Workqueue: btrfs-endio
btrfs_endio_helper [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.220812] RIP:
0010:[]  []
__btrfs_map_block+0x32a/0x1180 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.222459]  [] ?
__btrfs_lookup_bio_sums.isra.8+0x3e0/0x540 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.222632]  []
btrfs_map_bio+0x7d/0x2b0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.222781]  []
btrfs_submit_compressed_read+0x484/0x4e0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.222948]  []
btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.223198]  [] ?
btrfs_create_repair_bio+0xf0/0x110 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.223360]  []
bio_readpage_error+0x117/0x180 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.223514]  [] ?
clean_io_failure+0x1b0/0x1b0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.223667]  []
end_bio_extent_readpage+0x3be/0x3f0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.223996]  []
end_workqueue_fn+0x48/0x60 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.224145]  []
normal_work_helper+0x82/0x210 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.224297]  []
btrfs_endio_helper+0x12/0x20 [btrfs]
Jun 18 23:43:24 Orion vmunix:  auth_rpcgss nfs_acl nfs lockd grace
sunrpc fscache zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE)
spl(OE) zavl(POE) btrfs xor raid6_pq dm_mirror dm_region_hash dm_log
hid_generic usbhid hid uas usb_storage radeon i2c_algo_bit ttm
drm_kms_helper drm r8169 ahci mii libahci wmi
Jun 18 23:43:24 Orion vmunix: [38084.330053]  [] ?
__btrfs_map_block+0x32a/0x1180 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330106]  [] ?
__btrfs_map_block+0x2cc/0x1180 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330154]  [] ?
__btrfs_lookup_bio_sums.isra.8+0x3e0/0x540 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330205]  []
btrfs_map_bio+0x7d/0x2b0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330257]  []
btrfs_submit_compressed_read+0x484/0x4e0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330304]  []
btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330361]  [] ?
btrfs_create_repair_bio+0xf0/0x110 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330412]  []
bio_readpage_error+0x117/0x180 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330462]  [] ?
clean_io_failure+0x1b0/0x1b0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330513]  []
end_bio_extent_readpage+0x3be/0x3f0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330568]  []
end_workqueue_fn+0x48/0x60 [btrfs]
Jun