Re: Removing bad hdd from btrfs volume

2015-08-07 Thread Patrik Lundquist
On 7 August 2015 at 00:17, Peter Foley pefol...@pefoley.com wrote:
 Hi,

 I have an btrfs volume that spans multiple disks (no raid, just
 single), and earlier this morning I hit some hardware problems with
 one of the disks.
 I tried btrfs dev del /dev/sda1 /, but btrfs was unable to migrate the
 1gb that appears to be causing the read errors.
 See http://sprunge.us/aeZC

You might want to try to save as much as possible from the failing
disk with the help of GNU ddrescue. Either by copying sda to a
replacement disk or by copying sda1 to a file for loopback mounting.

Unmount filesystem before copying and remove sda before you mount with the copy.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Removing bad hdd from btrfs volume

2015-08-07 Thread Peter Foley
On Fri, Aug 7, 2015 at 2:30 AM, Patrik Lundquist
patrik.lundqu...@gmail.com wrote:
 On 7 August 2015 at 00:17, Peter Foley pefol...@pefoley.com wrote:
 Hi,

 I have an btrfs volume that spans multiple disks (no raid, just
 single), and earlier this morning I hit some hardware problems with
 one of the disks.
 I tried btrfs dev del /dev/sda1 /, but btrfs was unable to migrate the
 1gb that appears to be causing the read errors.
 See http://sprunge.us/aeZC

 You might want to try to save as much as possible from the failing
 disk with the help of GNU ddrescue. Either by copying sda to a
 replacement disk or by copying sda1 to a file for loopback mounting.

 Unmount filesystem before copying and remove sda before you mount with the 
 copy.

So, it turns out that the only corrupted file was some random header.
I was able to rm the file, and then the btrfs dev del succeeded.

Thanks,

Peter
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Removing bad hdd from btrfs volume

2015-08-06 Thread Peter Foley
Hi,

I have an btrfs volume that spans multiple disks (no raid, just
single), and earlier this morning I hit some hardware problems with
one of the disks.
I tried btrfs dev del /dev/sda1 /, but btrfs was unable to migrate the
1gb that appears to be causing the read errors.
See http://sprunge.us/aeZC
Is there some way to figure out which file(s) are affected, and if
they are stuff I don't care about, is there some way to force btrfs to
lose the 1gb it can't copy off of the failing hdd?

Thanks,

Peter Foley
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Removing bad hdd from btrfs volume

2015-08-06 Thread Duncan
Peter Foley posted on Thu, 06 Aug 2015 15:17:04 -0700 as excerpted:

 I have an btrfs volume that spans multiple disks (no raid, just single),
 and earlier this morning I hit some hardware problems with one of the
 disks.
 I tried btrfs dev del /dev/sda1 /, but btrfs was unable to migrate the
 1gb that appears to be causing the read errors.
 See http://sprunge.us/aeZC Is there some way to figure out which file(s)
 are affected, and if they are stuff I don't care about, is there some
 way to force btrfs to lose the 1gb it can't copy off of the failing
 hdd?

Of course that's the classic raid0 trap (with btrfs multi-device single 
being effectively a raid0 with really big stripes).  Raid0 is (ideally) 
never supposed to be used for data that isn't throw-away, either because 
it's literally no-care data, or because there's backups kept 
appropriately updated, as it's generally considered as good as dead the 
moment one device fails or even really starts to go bad.

So ideally, with one device starting to go bad, you scrap the entire 
filesystem, remove the bad device (or trigger sector remap and reuse, but 
that's dangerous as once sectors start to go, generally the badness 
spreads so the entire device can't be considered trustworthy again), and 
mkfs a new filesystem on the remaining devices, with a replacement device 
thrown in as well if desired.

But sometimes the world isn't ideal; on the arguably more practical 
side... Most of my btrfs are raid1, both data/metadata, with the 
remainder being mixed-bg dup, so I've never tried this on single, 
personally, but...

First, you didn't mention versions so be sure you're current, btrfs-progs 
v4.1.2 is current on the user side, kernel 4.1.x (which you appear to 
have, based on the dmesg, BTW, gentoo here too =:^), or 4.2-rc5+ since 
4.2 is close to release now, is current on the kernel side.

Try btrfs scrub.  Assuming a current btrfs-progs, that should correct 
errors in the metadata, which should be raid1 and thus have a second 
hopefully valid copy to read from.  It should detect but not be able to 
correct errors in the single mode data, but should tell you what files 
the errors are in (I believe very old btrfs-progs scrub did not).

Armed with a list of the files with errors, you should be able to delete 
them.  Once all such files are deleted, the 1 GiB chunk that they were in 
should be empty, and a btrfs balance -dusage=0 should eliminate it.

At that point a btrfs dev del should work.

That's the theory, anyway.  As I said, I've not tried it myself.  But 
it's what I'd try if I did have single-mode data on anything and found 
myself in that situation.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html