Re: Rebuild after disk fail

2020-01-19 Thread Craig Sanders via luv-main
On Sun, Jan 19, 2020 at 05:38:23PM +1100, russ...@coker.com.au wrote:
> Generally I recommend using BTRFS for workstations and servers that have 2
> disks.  Use ZFS for big storage.

Unless you need to make regular backups from workstations or small servers to
a "big storage" ZFS backup server. In that case, use zfs so you can use 'zfs
send'.  Backups will be completed in a very small fraction of the time they'd
take with rsyncthe time difference is huge - minutes vs hours.  That's
fast enough to do them hourly or more frequently if needed, instead of daily.

craig

--
craig sanders 
___
luv-main mailing list
luv-main@luv.asn.au
https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main


Re: Rebuild after disk fail

2020-01-19 Thread Craig Sanders via luv-main
On Sun, Jan 19, 2020 at 05:34:46PM +1100, russ...@coker.com.au wrote:
> I generally agree that RAID-1 is the way to go.  But if you can't do that
> then BTRFS "dup" and ZFS "copies=2" are good options, especially with SSD.

I don't see how that's the case, how it can help much (if at all). Making a
second copy of the data on the same drive that's failing doesn't add much
redundancy, but does add significantly to the drive's workload (increasing the
risk of failure).

It might be ok on a drive with only a few bad sectors or in conjunction with
some kind of RAID, but it's not a substitute for RAID.


> So far I have not seen a SSD entirely die, the worst I've seen is a SSD stop

I haven't either, but I've heard & read of it.  Andrew's rootfs SSD seems to
have died (or possibly just corrupted so badly it can't be mounted. i'm not
sure)

I've seen LOTS of HDDs die.  Even at home I've had dozens die on me over the
years - I've got multiple stacks of dead drives of various ages and sizes
cluttering up shelves (mostly waiting for me to need another fridge magnet or
shiny coffee-cup coaster :)

> I've also seen SSDs return corrupt data while claiming it to be good, but
> not in huge quantities.

That's one of the things that btrfs and zfs can detect...and correct if
there's any redundancy in the storage.

> For hard drives also I haven't seen a total failure (like stiction) for many
> years.  The worst hard drive problem I've seen was about 12,000 read errors,
> that sounds like a lot but is a very small portion of a 3TB disk and "dup"
> or "copies=2" should get most of your data back in that situation.

If a drive is failing, all the read or write re-tries kill performance on a
zpool, and that drive will eventually be evicted from the pool. Lose enough
drives, and your pool goes from "DEGRADED" to "FAILED", and your data goes
with it.

craig

--
craig sanders 
___
luv-main mailing list
luv-main@luv.asn.au
https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main


Re: Rebuild after disk fail

2020-01-19 Thread Craig Sanders via luv-main
On Sun, Jan 19, 2020 at 04:48:30PM +1100, Andrew Greig wrote:
> here is the output of blkid
>
> /dev/sdb1: LABEL="Data" UUID="73f55e83-2038-4a0d-9c05-8f7e2e741517" 
> UUID_SUB="77fdea4e-3157-45af-bba4-7db8eb04ff08" TYPE="btrfs" 
> PARTUUID="d5d96658-01"
> /dev/sdc1: LABEL="Data" UUID="73f55e83-2038-4a0d-9c05-8f7e2e741517" 
> UUID_SUB="8ad739f7-675e-4aeb-ab27-299b34f6ace5" TYPE="btrfs" 
> PARTUUID="a1948e65-01"
>
> I tried the first UUID for sdc1 and the machine hung but gave me an
> opportunity to edit the fstab and reboot.

That should work. Are you sure you typed or copy-pasted the UUID correctly?
The fstab entry should look something like this:

UUID="73f55e83-2038-4a0d-9c05-8f7e2e741517" /data   btrfs   defaults
0   0

edit /etc/fstab so that it looks like that and then (as root) run "mount
/data".  If that works manually on the command line, it will work when the
machine reboots.

> When checking the UUID I discovered that the first entry for both drives
> were identical.

yes, that's normal. they're both members of the same btrfs array.

> Should I be using the SUB UUID for sdc1 for the entry in fstab?

No, you should use the UUID.



Alternatively, you could use ONE of the PARTUUID values. e.g. one of:

PARTUUID="d5d96658-01"  /data   btrfs   defaults0   0
PARTUUID="a1948e65-01"  /data   btrfs   defaults0   0

craig

PS: I just tested several variations on this on my btrfs testing VM.  UUID
works.  PARTUUID works. /etc/fstab does not support UUID_SUB (and it isn't
mentioned in `man fstab`).

--
craig sanders 
___
luv-main mailing list
luv-main@luv.asn.au
https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main