Re: Rebuild after disk fail
On Saturday, 18 January 2020 6:44:51 PM AEDT Craig Sanders via luv-main wrote: > I personally would never use anything less than RAID-1 (or equivalent, such > as a mirrored pair on zfs) for any storage. Which means, of course, that I'm > used to paying double for my storage capacity - i can't just buy one, I > have to buy a pair. Not as a substitute for regular backups, but for > convenience when only one drive of a pair has died. > > Drives die, and the time & inconvenience of dealing with that (and the lost > data) cost far more than the price of a second drive for raid-1/mirror. I generally agree that RAID-1 is the way to go. But if you can't do that then BTRFS "dup" and ZFS "copies=2" are good options, especially with SSD. So far I have not seen a SSD entirely die, the worst I've seen is a SSD stop accepting writes (which causes an immediate kernel panic with a filesystem like BTRFS). I've also seen SSDs return corrupt data while claiming it to be good, but not in huge quantities. For hard drives also I haven't seen a total failure (like stiction) for many years. The worst hard drive problem I've seen was about 12,000 read errors, that sounds like a lot but is a very small portion of a 3TB disk and "dup" or "copies=2" should get most of your data back in that situation. -- My Main Blog http://etbe.coker.com.au/ My Documents Bloghttp://doc.coker.com.au/ ___ luv-main mailing list luv-main@luv.asn.au https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main
Re: Rebuild after disk fail
On Saturday, 18 January 2020 2:34:52 PM AEDT Andrew McGlashan via luv-main wrote: > Hi, > > On 18/1/20 2:14 pm, Andrew McGlashan via luv-main wrote: > > btrfs -- I never, ever considered that to be real production ready > > and I believe that even dead hat has moved away from it somewhat > > (not sure to what extent). > > Some links, none of which are new as this occurred some time ago now. > > https://news.ycombinator.com/item?id=14907771 I think this link is the most useful. BTRFS has worked quite solidly for me for years. The main deficiency of BTRFS is that RAID-5 and RAID-6 are not usable as of the last reports I read. For a home server RAID-1 is all you need (2 or 3 largish SATA disks in a RAID-1 gives plenty of storage). The way BTRFS allows you to extend a RAID-1 filesystem by adding a new disk of any size and rebalancing is really handy for home use. The ZFS limit of having all disks be the same size and upgraded in lock step is no problem for corporate use. Generally I recommend using BTRFS for workstations and servers that have 2 disks. Use ZFS for big storage. -- My Main Blog http://etbe.coker.com.au/ My Documents Bloghttp://doc.coker.com.au/ ___ luv-main mailing list luv-main@luv.asn.au https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main
Re: Rebuild after disk fail
On Sunday, 19 January 2020 3:47:00 PM AEDT Craig Sanders via luv-main wrote: > NVME SSDs are **much** faster then SATA SSDs. SATA 3 is 6 Gbps (600 MBps), > so taking protocol overhead into account SATA drives max out at around 550 > MBps. > > NVME drives run at **up to** PCI-e bus speeds - with 4 lanes, that's a > little under 40 Gbps for PCIe v3 (approx 4000 MBps minus protocol > overhead), double that for PCIe v4. That's the theoretical maximum speed, > anyway. In practice, most NVME SSDs run quite a bit slower than that, about > 2 GBps - that's still almost 4 times as fast as a SATA SSD. > > Some brands and models (e.g. those from samsung and crucial) run at around > 3200 to 3500 MBps, but they cost more (e.g. a 1TB Samsung 970 EVO PLUS > (MZ-V7S1T0BW) costs around $300, while the 1TB Kingston A2000 > (SA2000M8/1000G) costs around $160 but is only around 1800 MBps). Until recently I had a work Thinkpad with NVMe. That could sustain almost 5GB/s until the CPU overheated and throttled it (there was an ACPI bug that caused it to falsely regard 60C as a thermal throttle point instead of 80C). But when it came to random writes the speed was much lower, particularly with sustained writes. Things like upgrading a Linux distribution in a VM image causes sustained write rates to go well below 1GB/s. The NVMe interface is good, but having a CPU and storage that can sustain it is another issue. -- My Main Blog http://etbe.coker.com.au/ My Documents Bloghttp://doc.coker.com.au/ ___ luv-main mailing list luv-main@luv.asn.au https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main
Re: Rebuild after disk fail
Hi Craig On 19/1/20 3:47 pm, Craig Sanders via luv-main wrote That would be a very good idea. Most modern motherboards will have more than enough NVME and SATA slots for that (e.g. most Ryzen x570 motherboards have 2 or 3 NVME slots for extremely fast SSDs, plus 6 or 8 SATA ports for SATA HDDs and SSDs. They also have enough RAM slots for 64GB DDR-4 RAM, and have at least 2 or 3 PCI-e v4 slots - you'll use one for your graphics card). 2 SSDs for the rootfs including your home dir, and 2 HDDs for your /data bulk storage filesystem. And more than enough drive ports for future expansion if you ever need it. --- some info on nvme vs sata: NVME SSDs are **much** faster then SATA SSDs. SATA 3 is 6 Gbps (600 MBps), so taking protocol overhead into account SATA drives max out at around 550 MBps. NVME drives run at **up to** PCI-e bus speeds - with 4 lanes, that's a little under 40 Gbps for PCIe v3 (approx 4000 MBps minus protocol overhead), double that for PCIe v4. That's the theoretical maximum speed, anyway. In practice, most NVME SSDs run quite a bit slower than that, about 2 GBps - that's still almost 4 times as fast as a SATA SSD. Some brands and models (e.g. those from samsung and crucial) run at around 3200 to 3500 MBps, but they cost more (e.g. a 1TB Samsung 970 EVO PLUS (MZ-V7S1T0BW) costs around $300, while the 1TB Kingston A2000 (SA2000M8/1000G) costs around $160 but is only around 1800 MBps). AFAIK there are no NVME drives that run at full PCI-e v4 speed (~8 GBps with 4 lanes) yet, it's still too new. That's not a problem, PCI-e is designed to be backwards-compatible with earlier versions, so any current NVME drive will work in pcie v4 slots. NVME SSDs cost about the same as SATA SSDs of the same capacity so there's no reason not to get them if your motherboard has NVME slots (which are pretty much standard these days). BTW, the socket that NVME drives plug into is called "M.2". M.2 supports both SATA & NVME protocols. SATA M.2 runs at 6 Gbps. NVME runs at PCI-e bus speed. So you have to be careful when you buy to make sure you get an NVME M.2 drive and not a SATA drive in M.2 form-factor...some retailers will try to exploit the confusion over this. craig -- Hi Craig here is the output of blkid /dev/sdb1: LABEL="Data" UUID="73f55e83-2038-4a0d-9c05-8f7e2e741517" UUID_SUB="77fdea4e-3157-45af-bba4-7db8eb04ff08" TYPE="btrfs" PARTUUID="d5d96658-01" /dev/sdc1: LABEL="Data" UUID="73f55e83-2038-4a0d-9c05-8f7e2e741517" UUID_SUB="8ad739f7-675e-4aeb-ab27-299b34f6ace5" TYPE="btrfs" PARTUUID="a1948e65-01" I tried the first UUID for sdc1 and the machine hung but gave me an opportunity to edit the fstab and reboot. When checking the UUID I discovered that the first entry for both drives were identical. Should I be using the SUB UUID for sdc1 for the entry in fstab? Kind regards Andrew ___ luv-main mailing list luv-main@luv.asn.au https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main
Re: Rebuild after disk fail
Thanks Craig, As they say in the Medibank commercial "I feel better now!" Andrew On 19/1/20 3:47 pm, Craig Sanders via luv-main wrote: On Sat, Jan 18, 2020 at 11:06:50PM +1100, Andrew Greig wrote: Yes, the problem was my Motherboard would not handle enough disks, and we did Format sdc with btrfs and left the sdb alone so that btrfs could arrange things between them. I was hoping to get an understanding of how the RAID drives remembered the "Balance" command when the the whole of the root filesystem was replaced on a new SSD. Your rootfs and your /data filesystem(*) are entirely separate. Don't confuse them. The /data filesystem needed to be re-balanced when you added the second drive (making it into a raid-1 array). 'btrfs balance' reads and rewrites all the existing data on a btrfs filesystem so that it is distributed equally over all drives in the array. For RAID-1, that means mirroring all the data on the first drive onto the second, so that there's a redundant copy of everything. Your rootfs is only a single partition, it doesn't have a raid-1 mirror, so re-balancing isn't necessary (and would do nothing). BTW, there's nothing being "remembered". 'btrfs balance' just re-balances the existing data over all drives in the array. It's a once-off operation that runs to completion and then exits. All **NEW** data will be automatically distributed across the array. If you ever add another drive to the array, or convert it to raid-0 (definitely NOT recommended), you'll need to re-balance it again. until and unless that happens you don't need to even think about re-balancing, it's no longer relevant. (*) I think you had your btrfs raid array mounted at /data, but I may be mis-remembering that. To the best of my knowledge, you have two entirely separate btrfs filesystems - one is the root filesystem, mounted as / (it also has /home on it, which IIRC you have made a separate btrfs sub-volume for). Anyway, it's a single-partition btrfs fs with no raid. The other is a 2 drive btrfs fs using raid-1, which I think is mounted as /data. I thought that control would have rested with /etc/fstab. How do the drives know to balance themselves, is there a command resident in sdc1? /etc/fstab tells the system which filesystems to mount. It gets read at boot time by the system start up scripts. My plan is to have auto backups, and given that my activity has seen an SSD go down in 12 months, maybe at 10 months I should build a new box, something which will handle 64Gb RAM and have a decent Open Source Graphics driver. And put the / on a pair of 1Tb SSDs. That would be a very good idea. Most modern motherboards will have more than enough NVME and SATA slots for that (e.g. most Ryzen x570 motherboards have 2 or 3 NVME slots for extremely fast SSDs, plus 6 or 8 SATA ports for SATA HDDs and SSDs. They also have enough RAM slots for 64GB DDR-4 RAM, and have at least 2 or 3 PCI-e v4 slots - you'll use one for your graphics card). 2 SSDs for the rootfs including your home dir, and 2 HDDs for your /data bulk storage filesystem. And more than enough drive ports for future expansion if you ever need it. --- some info on nvme vs sata: NVME SSDs are **much** faster then SATA SSDs. SATA 3 is 6 Gbps (600 MBps), so taking protocol overhead into account SATA drives max out at around 550 MBps. NVME drives run at **up to** PCI-e bus speeds - with 4 lanes, that's a little under 40 Gbps for PCIe v3 (approx 4000 MBps minus protocol overhead), double that for PCIe v4. That's the theoretical maximum speed, anyway. In practice, most NVME SSDs run quite a bit slower than that, about 2 GBps - that's still almost 4 times as fast as a SATA SSD. Some brands and models (e.g. those from samsung and crucial) run at around 3200 to 3500 MBps, but they cost more (e.g. a 1TB Samsung 970 EVO PLUS (MZ-V7S1T0BW) costs around $300, while the 1TB Kingston A2000 (SA2000M8/1000G) costs around $160 but is only around 1800 MBps). AFAIK there are no NVME drives that run at full PCI-e v4 speed (~8 GBps with 4 lanes) yet, it's still too new. That's not a problem, PCI-e is designed to be backwards-compatible with earlier versions, so any current NVME drive will work in pcie v4 slots. NVME SSDs cost about the same as SATA SSDs of the same capacity so there's no reason not to get them if your motherboard has NVME slots (which are pretty much standard these days). BTW, the socket that NVME drives plug into is called "M.2". M.2 supports both SATA & NVME protocols. SATA M.2 runs at 6 Gbps. NVME runs at PCI-e bus speed. So you have to be careful when you buy to make sure you get an NVME M.2 drive and not a SATA drive in M.2 form-factor...some retailers will try to exploit the confusion over this. craig -- craig sanders ___ luv-main mailing list luv-main@luv.asn.au https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main
Re: Rebuild after disk fail
On Sat, Jan 18, 2020 at 11:06:50PM +1100, Andrew Greig wrote: > Yes, the problem was my Motherboard would not handle enough disks, and we > did Format sdc with btrfs and left the sdb alone so that btrfs could arrange > things between them. > > I was hoping to get an understanding of how the RAID drives remembered the > "Balance" command when the the whole of the root filesystem was replaced on > a new SSD. Your rootfs and your /data filesystem(*) are entirely separate. Don't confuse them. The /data filesystem needed to be re-balanced when you added the second drive (making it into a raid-1 array). 'btrfs balance' reads and rewrites all the existing data on a btrfs filesystem so that it is distributed equally over all drives in the array. For RAID-1, that means mirroring all the data on the first drive onto the second, so that there's a redundant copy of everything. Your rootfs is only a single partition, it doesn't have a raid-1 mirror, so re-balancing isn't necessary (and would do nothing). BTW, there's nothing being "remembered". 'btrfs balance' just re-balances the existing data over all drives in the array. It's a once-off operation that runs to completion and then exits. All **NEW** data will be automatically distributed across the array. If you ever add another drive to the array, or convert it to raid-0 (definitely NOT recommended), you'll need to re-balance it again. until and unless that happens you don't need to even think about re-balancing, it's no longer relevant. (*) I think you had your btrfs raid array mounted at /data, but I may be mis-remembering that. To the best of my knowledge, you have two entirely separate btrfs filesystems - one is the root filesystem, mounted as / (it also has /home on it, which IIRC you have made a separate btrfs sub-volume for). Anyway, it's a single-partition btrfs fs with no raid. The other is a 2 drive btrfs fs using raid-1, which I think is mounted as /data. > I thought that control would have rested with /etc/fstab. How do the > drives know to balance themselves, is there a command resident in sdc1? /etc/fstab tells the system which filesystems to mount. It gets read at boot time by the system start up scripts. > My plan is to have auto backups, and given that my activity has seen an SSD > go down in 12 months, maybe at 10 months I should build a new box, something > which will handle 64Gb RAM and have a decent Open Source Graphics driver. > And put the / on a pair of 1Tb SSDs. That would be a very good idea. Most modern motherboards will have more than enough NVME and SATA slots for that (e.g. most Ryzen x570 motherboards have 2 or 3 NVME slots for extremely fast SSDs, plus 6 or 8 SATA ports for SATA HDDs and SSDs. They also have enough RAM slots for 64GB DDR-4 RAM, and have at least 2 or 3 PCI-e v4 slots - you'll use one for your graphics card). 2 SSDs for the rootfs including your home dir, and 2 HDDs for your /data bulk storage filesystem. And more than enough drive ports for future expansion if you ever need it. --- some info on nvme vs sata: NVME SSDs are **much** faster then SATA SSDs. SATA 3 is 6 Gbps (600 MBps), so taking protocol overhead into account SATA drives max out at around 550 MBps. NVME drives run at **up to** PCI-e bus speeds - with 4 lanes, that's a little under 40 Gbps for PCIe v3 (approx 4000 MBps minus protocol overhead), double that for PCIe v4. That's the theoretical maximum speed, anyway. In practice, most NVME SSDs run quite a bit slower than that, about 2 GBps - that's still almost 4 times as fast as a SATA SSD. Some brands and models (e.g. those from samsung and crucial) run at around 3200 to 3500 MBps, but they cost more (e.g. a 1TB Samsung 970 EVO PLUS (MZ-V7S1T0BW) costs around $300, while the 1TB Kingston A2000 (SA2000M8/1000G) costs around $160 but is only around 1800 MBps). AFAIK there are no NVME drives that run at full PCI-e v4 speed (~8 GBps with 4 lanes) yet, it's still too new. That's not a problem, PCI-e is designed to be backwards-compatible with earlier versions, so any current NVME drive will work in pcie v4 slots. NVME SSDs cost about the same as SATA SSDs of the same capacity so there's no reason not to get them if your motherboard has NVME slots (which are pretty much standard these days). BTW, the socket that NVME drives plug into is called "M.2". M.2 supports both SATA & NVME protocols. SATA M.2 runs at 6 Gbps. NVME runs at PCI-e bus speed. So you have to be careful when you buy to make sure you get an NVME M.2 drive and not a SATA drive in M.2 form-factor...some retailers will try to exploit the confusion over this. craig -- craig sanders ___ luv-main mailing list luv-main@luv.asn.au https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main
Re: Rebuild after disk fail
Hi Craig, Yes, the problem was my Motherboard would not handle enough disks, and we did Format sdc with btrfs and left the sdb alone so that btrfs could arrange things between them. I was hoping to get an understanding of how the RAID drives remembered the "Balance" command when the the whole of the root filesystem was replaced on a new SSD. I thought that control would have rested with /etc/fstab. How do the drives know to balance themselves, is there a command resident in sdc1? My plan is to have auto backups, and given that my activity has seen an SSD go down in 12 months, maybe at 10 months I should build a new box, something which will handle 64Gb RAM and have a decent Open Source Graphics driver. And put the / on a pair of 1Tb SSDs. Many thanks Andrew On 18/1/20 6:44 pm, Craig Sanders via luv-main wrote: On Sat, Jan 18, 2020 at 02:14:46PM +1100, Andrew McGlashan wrote: Just some thoughts Way back, SSDs were expensive and less reliable than today. Given the cost of SSDs today, I would consider even RAIDING the SSDs. If it's physically possible to install a second SSD of the same storage capacity or larger then he absolutely should do so. I vaguely recall suggesting he should get a second SSD for the rootfs ages ago, but my understanding / assumption was that there was only physical space and connectors for one SSD in the machine. The 'btrfs snapshot' + 'btrfs send' suggestion was just a way of regularly backing up a single-drive btrfs filesystem onto his raid-1 btrfs array so that little or nothing was lost in case of another drive failure. It's less than ideal, but a LOT better than nothing. I personally would never use anything less than RAID-1 (or equivalent, such as a mirrored pair on zfs) for any storage. Which means, of course, that I'm used to paying double for my storage capacity - i can't just buy one, I have to buy a pair. Not as a substitute for regular backups, but for convenience when only one drive of a pair has died. Drives die, and the time & inconvenience of dealing with that (and the lost data) cost far more than the price of a second drive for raid-1/mirror. craig -- craig sanders ___ luv-main mailing list luv-main@luv.asn.au https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main ___ luv-main mailing list luv-main@luv.asn.au https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main