Re: Disk failed while doing scrub

2015-07-14 Thread Duncan
Dāvis Mosāns posted on Tue, 14 Jul 2015 04:54:27 +0300 as excerpted:

 2015-07-13 11:12 GMT+03:00 Duncan 1i5t5.dun...@cox.net:
 You say five disk, but nowhere in your post do you mention what raid
 mode you were using, neither do you post btrfs filesystem show and
 btrfs filesystem df, as suggested on the wiki and which list that
 information.
 
 Sorry, I forgot. I'm running Arch Linux 4.0.7, with btrfs-progs v4.1
 Using RAID1 for metadata and single for data, with features
 big_metadata, extended_iref, mixed_backref, no_holes, skinny_metadata
 and mounted with noatime,compress=zlib,space_cache,autodefrag

Thanks.  FWIW, pretty similar here, but running gentoo, now with btrfs-
progs v4.1.1 and the mainline 4.2-rc1+ kernel.

BTW, note that space_cache has been the default for quite some time, 
now.  I've never actually manually mounted with space_cache on any of my 
filesystems over several years, now, yet they all report it when I check 
/proc/mounts, etc.  So if you're adding that manually, you can kill that 
option and save the commandline/fstab space. =:^)

 Label: 'Data'  uuid: 1ec5b839-acc6-4f70-be9d-6f9e6118c71c
Total devices 5 FS bytes used 7.16TiB
devid1 size 2.73TiB used 2.35TiB path /dev/sdc
devid2 size 1.82TiB used 1.44TiB path /dev/sdd
devid3 size 1.82TiB used 1.44TiB path /dev/sde
devid4 size 1.82TiB used 1.44TiB path /dev/sdg
devid5 size 931.51GiB used 539.01GiB path /dev/sdh
 
 Data, single: total=7.15TiB, used=7.15TiB
 System, RAID1: total=8.00MiB, used=784.00KiB
 System, single: total=4.00MiB, used=0.00B
 Metadata, RAID1: total=16.00GiB, used=14.37GiB
 Metadata, single: total=8.00MiB, used=0.00B
 GlobalReserve, single: total=512.00MiB, used=0.00B

And note that you can easily and quickly remove those empty single-mode 
system and metadata chunks, which are an artifact of the way mkfs.btrfs 
works, using balance filters.

btrfs balance start -mprofile=single

... should do it.  They're actually working on mkfs.btrfs patches to fix 
it not to do that, right now.  There's active patch and testing threads 
discussing it.  Hopefully for btrfs-progs v4.2.  (4.1.1 has the patches 
for single-device and prep work for multi-device, according to the 
changelog.)

 Because filesystem still mounts, I assume I should do btrfs device
 delete /dev/sdd /mntpoint and then restore damaged files from backup.

 You can try a replace, but with a failing drive still connected, people
 report mixed results.  It's likely to fail as it can't read certain
 blocks to transfer them to the new device.
 
 As I understand, device delete will copy data from that disk and
 distribute across rest of disks, while btrfs replace will copy to new
 disk which must be atleast size of disk I'm replacing.

Sorry.  You wrote delete, I read replace.  How'd I do that? =:^(

You are absolutely correct.  Delete would be better here.

I guess I had just been reading a thread discussing the problems I 
mentioned with replace, and saw what I expected to see, not what you 
actually wrote.

 There's no such partial-file with null-fill tools shipped just yet.

 From journal I have only 14 files mentioned where errors occurred. Now
 13 files from them don't throw any errors and their SHA's match to my
 backups so they're fine.

Good.  I was going on the assumption that the questionable device was in 
much worse shape than that.

 And actually btrfs does allow to copy/read that one damaged file, only I
 get I/O error when trying to read data from those broken sectors

Good, and good to know.  Thanks. =:^)

 best and correct way to recover a file is using ddrescue

I was just going to mention ddrescue. =:^)

 $ du -m /tmp/damaged_file 6251/tmp/damaged_file
 
 so basically only like 8K bytes are unrecoverable from this file.
 Probably there could be created some tool which could get even more data
 knowing about btrfs.
 
 There /is/, however, a command that can be used to either regenerate or
 zero-out the checksum tree.  See btrfs check --init-csum-tree.

 Seems, you can't specify a path/file for it and it's quite destructive
 action if you want to get data only about some one specific file.

Yes.  It's whole-filesystem-all-or-nothing, unfortunately. =:^(

 I did scrub second time and this time there aren't that many
 uncorrectable errors and also there's no csum_errors so --init-csum-tree
 is useless here I think.

Agreed.

 Most likely previously scrub got that many errors because it still
 continued for a bit even if disk didn't respond.

Yes.

 scrub status [...]
read_errors: 2
csum_errors: 0
verify_errors: 0
no_csum: 89600
csum_discards: 656214
super_errors: 0
malloc_errors: 0
uncorrectable_errors: 2
unverified_errors: 0
corrected_errors: 0
last_physical: 2590041112576

OK, that matches up with 8 KiB bad, since blocks are 4 KiB and there's 
two uncorrectable errors.  With the scrub 

Re: Disk failed while doing scrub

2015-07-13 Thread Dāvis Mosāns
2015-07-13 11:12 GMT+03:00 Duncan 1i5t5.dun...@cox.net:
 You say five disk, but nowhere in your post do you mention what raid mode
 you were using, neither do you post btrfs filesystem show and btrfs
 filesystem df, as suggested on the wiki and which list that information.

Sorry, I forgot. I'm running Arch Linux 4.0.7, with btrfs-progs v4.1
Using RAID1 for metadata and single for data, with features
big_metadata, extended_iref, mixed_backref, no_holes, skinny_metadata
and mounted with noatime,compress=zlib,space_cache,autodefrag

Label: 'Data'  uuid: 1ec5b839-acc6-4f70-be9d-6f9e6118c71c
   Total devices 5 FS bytes used 7.16TiB
   devid1 size 2.73TiB used 2.35TiB path /dev/sdc
   devid2 size 1.82TiB used 1.44TiB path /dev/sdd
   devid3 size 1.82TiB used 1.44TiB path /dev/sde
   devid4 size 1.82TiB used 1.44TiB path /dev/sdg
   devid5 size 931.51GiB used 539.01GiB path /dev/sdh

Data, single: total=7.15TiB, used=7.15TiB
System, RAID1: total=8.00MiB, used=784.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=16.00GiB, used=14.37GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B


 Because filesystem still mounts, I assume I should do btrfs device
 delete /dev/sdd /mntpoint and then restore damaged files from backup.

 You can try a replace, but with a failing drive still connected, people
 report mixed results.  It's likely to fail as it can't read certain
 blocks to transfer them to the new device.

As I understand, device delete will copy data from that disk and
distribute across rest of disks,
while btrfs replace will copy to new disk which must be atleast size
of disk I'm replacing.
Assuming other existing disks are good, if so, why replace would be
preferable over delete?
because delete could fail, but replace not?


 There's no such partial-file with null-fill tools shipped just yet.
 Those files normally simply trigger errors trying to read them, because
 btrfs won't let you at them if the checksum doesn't verify.

From journal I have only 14 files mentioned where errors occurred. Now
13 files from
them don't throw any errors and their SHA's match to my backups so they're fine.
And actually btrfs does allow to copy/read that one damaged file, only
I get I/O error
when trying to read data from those broken sectors

kernel: drivers/scsi/mvsas/mv_sas.c 1863:Release slot [0] tag[0], task
[88011c8c9900]:
kernel: drivers/scsi/mvsas/mv_94xx.c 625:command active 0001,  slot [0].
kernel: sas: sas_ata_task_done: SAS error 8a
kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1
kernel: sas: ata9: end_device-7:2: cmd error handler
kernel: sas: ata7: end_device-7:0: dev error handler
kernel: sas: ata14: end_device-7:7: dev error handler
kernel: ata9.00: exception Emask 0x0 SAct 0x4000 SErr 0x0 action 0x0
kernel: ata9.00: failed command: READ FPDMA QUEUED
kernel: ata9.00: cmd 60/00:00:00:33:a1/0f:00:ab:00:00/40 tag 14 ncq 1966080 in
 res 41/40:00:48:40:a1/00:0f:ab:00:00/00 Emask
0x409 (media error) F
kernel: ata9.00: status: { DRDY ERR }
kernel: ata9.00: error: { UNC }
kernel: ata9.00: configured for UDMA/133
kernel: sd 7:0:2:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00
driverbyte=0x08
kernel: sd 7:0:2:0: [sdd] tag#0 Sense Key : 0x3 [current] [descriptor]
kernel: sd 7:0:2:0: [sdd] tag#0 ASC=0x11 ASCQ=0x4
kernel: sd 7:0:2:0: [sdd] tag#0 CDB: opcode=0x28 28 00 ab a1 33 00 00 0f 00 00
kernel: blk_update_request: I/O error, dev sdd, sector 2879471688
kernel: ata9: EH complete
kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1


but all other sectors can be copied fine

$ du -m ./damaged_file
6250 ./damaged_file

$ cp ./damaged_file /tmp/
cp: error reading ‘damaged_file’: Input/output error

$ du -m /tmp/damaged_file
4335/tmp/damaged_file

cp copies first file part correctly, and I verified that both
start of file (first 4336M) and end of file (last 1890M) SHA's match backup

$ head -c 4336M ./damaged_file | sha256sum
e81b20bfa7358c9f5a0ed165bffe43185abc59e35246e52a7be1d43e6b7e040d  -
$ head -c 4337M ./damaged_file | sha256sum
head: error reading ‘./damaged_file’: Input/output error

$ tail -c 1890M ./damaged_file | sha256sum
941568f4b614077858cb8c8dd262bb431bf4c45eca936af728ecffc95619cb60  -
$ tail -c 1891M ./damaged_file  | sha256sum
tail: error reading ‘./damaged_file’: Input/output error

with dd can also copy almost all file, only using noerror option it
excludes those regions
from target file rather than filling with nulls so this isn't good for recovery

$ dd conv=noerror if=damaged_file of=/tmp/damaged_file
dd: error reading ‘damaged_file’: Input/output error
8880328+0 records in
8880328+0 records out
4546727936 bytes (4,5 GB) copied, 69,7282 s, 65,2 MB/s
dd: error reading ‘damaged_file’: Input/output error
8930824+0 records in
8930824+0 records out
4572581888 bytes (4,6 GB) copied, 113,648 s, 40,2 MB/s
12801720+0 records in

Disk failed while doing scrub

2015-07-13 Thread Dāvis Mosāns
Hello,

Short version: while doing scrub on 5 disk btrfs filesystem, /dev/sdd
failed and also had some error on other disk (/dev/sdh)

Because filesystem still mounts, I assume I should do btrfs device
delete /dev/sdd /mntpoint and then restore damaged files from backup.
Are all affected files listed in journal? there's messages about x
callbacks suppressed so I'm not sure and if there aren't how to get
full list of damaged files?
Also I wonder if there are any tools to recover partial file fragments
and reconstruct file? (where missing fragments filled with nulls)
I assume that there's no point in running btrfs check
--check-data-csum because scrub already does check that?

from journal:

kernel: drivers/scsi/mvsas/mv_sas.c 1863:Release slot [1] tag[1], task
[88007efb8800]:
kernel: drivers/scsi/mvsas/mv_94xx.c 625:command active 0002,  slot [1].
kernel: sas: sas_ata_task_done: SAS error 8a
kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1
kernel: sas: ata9: end_device-7:2: cmd error handler
kernel: sas: ata7: end_device-7:0: dev error handler
kernel: sas: ata14: end_device-7:7: dev error handler
kernel: ata9.00: exception Emask 0x0 SAct 0x800 SErr 0x0 action 0x0
kernel: ata9.00: failed command: READ FPDMA QUEUED
kernel: ata9.00: cmd 60/00:00:00:3d:a1/04:00:ab:00:00/40 tag 11 ncq 524288 in
res
41/40:00:48:40:a1/00:04:ab:00:00/00 Emask 0x409 (media error) F
kernel: ata9.00: status: { DRDY ERR }
kernel: ata9.00: error: { UNC }
kernel: ata9.00: configured for UDMA/133
kernel: sd 7:0:2:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00
driverbyte=0x08
kernel: sd 7:0:2:0: [sdd] tag#0 Sense Key : 0x3 [current] [descriptor]
kernel: sd 7:0:2:0: [sdd] tag#0 ASC=0x11 ASCQ=0x4
kernel: sd 7:0:2:0: [sdd] tag#0 CDB: opcode=0x28 28 00 ab a1 3d 00 00 04 00 00
kernel: blk_update_request: I/O error, dev sdd, sector 2879471688
kernel: ata9: EH complete
kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
kernel: drivers/scsi/mvsas/mv_sas.c 1863:Release slot [1] tag[1], task
[88007efb9a00]:
kernel: drivers/scsi/mvsas/mv_94xx.c 625:command active 0003,  slot [1].
kernel: sas: sas_ata_task_done: SAS error 8a
kernel: sas: Enter sas_scsi_recover_host busy: 2 failed: 2
kernel: sas: trying to find task 0x8801e0cadb00
kernel: sas: sas_scsi_find_task: aborting task 0x8801e0cadb00
kernel: sas: sas_scsi_find_task: task 0x8801e0cadb00 is aborted
kernel: sas: sas_eh_handle_sas_errors: task 0x8801e0cadb00 is aborted
kernel: sas: ata9: end_device-7:2: cmd error handler
kernel: sas: ata8: end_device-7:1: cmd error handler
kernel: sas: ata7: end_device-7:0: dev error handler
kernel: sas: ata8: end_device-7:1: dev error handler
kernel: ata8.00: exception Emask 0x0 SAct 0x4 SErr 0x0 action 0x6 frozen
kernel: ata8.00: failed command: READ FPDMA QUEUED
kernel: ata8.00: cmd 60/00:00:00:1b:36/04:00:bf:00:00/40 tag 18 ncq 524288 in
res
40/00:08:00:58:11/00:00:a6:00:00/40 Emask 0x4 (timeout)
kernel: ata8.00: status: { DRDY }
kernel: ata8: hard resetting link
kernel: sas: ata9: end_device-7:2: dev error handler
kernel: sas: ata14: end_device-7:7: dev error handler
kernel: ata9: log page 10h reported inactive tag 26
kernel: ata9.00: exception Emask 0x1 SAct 0x40 SErr 0x0 action 0x6
kernel: ata9.00: failed command: READ FPDMA QUEUED
kernel: ata9.00: cmd 60/08:00:48:40:a1/00:00:ab:00:00/40 tag 22 ncq 4096 in
res
01/04:a8:40:40:a1/00:00:ab:00:00/40 Emask 0x3 (HSM violation)
kernel: ata9.00: status: { ERR }
kernel: ata9.00: error: { ABRT }
kernel: ata9: hard resetting link
kernel: sas: sas_form_port: phy1 belongs to port1 already(1)!
kernel: ata9.00: both IDENTIFYs aborted, assuming NODEV
kernel: ata9.00: revalidation failed (errno=-2)
kernel: drivers/scsi/mvsas/mv_sas.c 1428:mvs_I_T_nexus_reset for device[1]:rc= 0
kernel: ata8.00: configured for UDMA/133
kernel: ata8.00: device reported invalid CHS sector 0
kernel: ata8: EH complete
kernel: ata9: hard resetting link
kernel: ata9.00: both IDENTIFYs aborted, assuming NODEV
kernel: ata9.00: revalidation failed (errno=-2)
kernel: ata9: hard resetting link
kernel: ata9.00: both IDENTIFYs aborted, assuming NODEV
kernel: ata9.00: revalidation failed (errno=-2)
kernel: ata9.00: disabled
kernel: ata9: EH complete
kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
kernel: sd 7:0:2:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04
driverbyte=0x00
kernel: sd 7:0:2:0: [sdd] tag#0 CDB: opcode=0x28 28 00 ab a1 40 48 00 00 08 00
kernel: blk_update_request: I/O error, dev sdd, sector 2879471688
kernel: sd 7:0:2:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04
driverbyte=0x00
kernel: sd 7:0:2:0: [sdd] tag#0 CDB: opcode=0x28 28 00 ab a1 45 00 00 06 00 00
kernel: BTRFS: unable to fixup (regular) error at logical
7390602616832 on dev /dev/sdd
kernel: BTRFS: unable to fixup (regular) error at 

Re: Disk failed while doing scrub

2015-07-13 Thread Duncan
Dāvis Mosāns posted on Mon, 13 Jul 2015 09:26:05 +0300 as excerpted:

 Short version: while doing scrub on 5 disk btrfs filesystem, /dev/sdd
 failed and also had some error on other disk (/dev/sdh)

You say five disk, but nowhere in your post do you mention what raid mode 
you were using, neither do you post btrfs filesystem show and btrfs 
filesystem df, as suggested on the wiki and which list that information.

FWIW, btrfs defaults for a multi-device filesystem are raid1 metadata, 
raid0 data.  If you didn't specify raid level at mkfs time, it's very 
likely that's what you're using.  The scrub results seem to support this 
as if the data had been raid1 or raid10, nearly all the errors should 
have been correctable by pulling from the second copy.  And raid5/6 
should have been able to recover from parity, tho this mode is new enough 
it's still not recommended as the chances of bugs and thus failure to 
work properly are much higher.

So you really should have been using raid1/10 if you wanted device 
failure tolerance, but you didn't say, and if you're using defaults as 
seems reasonably likely, your data was raid0, and thus it's likely many/
most files are either gone or damaged beyond repair.

(As it happens I have a number of btrfs raid1 data/metadata on a pair of 
partitioned ssds, with each btrfs on a corresponding partition on both of 
them, with one of the ssds developing bad sectors and basically slowly 
failing.  But the other member of the raid1 pair is solid and I have 
backups, as well as a spare I can replace the failing one with when I 
decide it's time, so I've been letting the bad one stick around due as 
much as anything to morbid curiosity, watching it slowly fail. So I know 
exactly how scrub on btrfs raid1 behaves in a bad-sector case, pulling 
the copy from the good device to overwrite the bad copy with, triggering 
the device's sector remapping in the process.  Despite all the read 
errors, they've all been correctable, because I'm using raid1 for both 
data and metadata.)

 Because filesystem still mounts, I assume I should do btrfs device
 delete /dev/sdd /mntpoint and then restore damaged files from backup.

You can try a replace, but with a failing drive still connected, people 
report mixed results.  It's likely to fail as it can't read certain 
blocks to transfer them to the new device.

With raid1 or better, physically disconnecting the failing device, and 
doing a device delete missing (or replace missing, but AFAIK this doesn't 
work with released versions and I'm not sure if it's even in integration 
yet, but there are patches on-list that should make it work) can work, 
but with raid0/single, you can mount with a missing device if you use 
degraded,ro, but obviously that'll only let you try to copy files off, 
and you'll likely not have a lot of luck with raid0, with files missing 
but a bit more luck with single.

In the likely raid0/single case, you're best bet is probably to try 
copying off what you can, and/or restoring from backups.  See the 
discussion below.

 Are all affected files listed in journal? there's messages about x
 callbacks suppressed so I'm not sure and if there aren't how to get
 full list of damaged files?

 Also I wonder if there are any tools to recover partial file fragments
 and reconstruct file? (where missing fragments filled with nulls)
 I assume that there's no point in running btrfs check
 --check-data-csum because scrub already does check that?

There's no such partial-file with null-fill tools shipped just yet.  
Those files normally simply trigger errors trying to read them, because 
btrfs won't let you at them if the checksum doesn't verify.

There /is/, however, a command that can be used to either regenerate or 
zero-out the checksum tree.  See btrfs check --init-csum-tree.  Current 
versions recalculate the csums, older versions (btrfsck as that was 
before btrfs check) simply zeroed it out.  Then you can read the file 
despite bad checksums, tho you'll still get errors if the block 
physically cannot be read.

There's also btrfs restore, which works on the unmounted filesystem 
without actually writing to it, copying the files it can read to a new 
location, which of course has to be a filesystem with enough room to 
restore the files to, altho it's possible to tell restore to do only 
specific subdirs, for instance.

What I'd recommend depends on how complete and how recent your backup 
is.  If it's complete and recent enough, probably the easiest thing is to 
simply blow away the bad filesystem and start over, recovering from the 
backup to a new filesystem.

If there's files you'd like to get back that weren't backed up or where 
the backup is old, since the filesystem is mountable, I'd probably copy 
everything off it I could.  Then, I'd try restore, letting it restore to 
the same location I had copied to, but NOT using the --overwrite option, 
so it only wrote any files it could restore that the copy wasn't able to 
get