Re: Data and hardware protection measures
Michael Kjörling composed on 2024-01-28 19:23 (UTC): > On 28 Jan 2024 19:19 +0100, from h...@adminart.net (hw): >> On Fri, 2024-01-26 at 15:56 +, Michael Kjörling wrote: >>> It's also worth talking to your local electrician about installing an >>> incoming-mains overvoltage protection for lightning protection. >> Hm I thought it's expensive. > So did I until I actually asked someone who could give me a quote for > actually installing it. Old construction used meter "boxes" that don't support accessories like those. My utility won't touch mine unless I pay an electrician for an expensive full service upgrade. -- Evolution as taught in public schools is, like religion, based on faith, not based on science. Team OS/2 ** Reg. Linux User #211409 ** a11y rocks! Felix Miata
Re: Data and hardware protection measures; was: rsync --delete vs rsync --delete-after
On Sun, 28 Jan 2024 19:19:55 +0100 hw wrote: Hello hw, >How do you know in advance when the battery will have failed? Even my very basic UPS (APC Backup 1400) has a light on the front labelled "Replace Battery". That, combined with a very annoying high pitch scream, are pretty good motivators to do the job. I know the Backup 1400 was mentioned in this thread as "probably avoid" (or something similar), but it's served me well thus far. Had to replace the battery pack only once. That was after ten years, not the three to five that people have been talking about. APC no longer sell that model, but battery packs are still available. Just as an FYI, the battery packs are sealed Lead-Acid. Where I live (UK), it's possible to sell lead-acid batteries to scrap merchants. Amount paid is variable and subject to massive market forces that are best described as 'volatile'. Like others have mentioned with some of the more basic APC devices, this particular model isn't designed with user replaceable batteries in mind, but it's not an overly difficult task. It can't easily (if at all) be done leaving connected devices powered up, though. -- Regards _ "Valid sig separator is {dash}{dash}{space}" / ) "The blindingly obvious is never immediately apparent" / _)rad "Is it only me that has a working delete key?" They take away our freedom in the name of liberty Suspect Device - Stiff Little Fingers pgpqAXSSxvoLF.pgp Description: OpenPGP digital signature
Re: Data and hardware protection measures
On 28 Jan 2024 19:19 +0100, from h...@adminart.net (hw): > On Fri, 2024-01-26 at 15:56 +, Michael Kjörling wrote: >> On 26 Jan 2024 16:11 +0100, from h...@adminart.net (hw): >>> I rather spend the money on new batteries (EUR 40 last time after 5 >>> years) every couple years [...] > > To comment myself, I think was 3 years, not 5, sorry. > >>> The hardware is usually extremely difficult --- and may be impossible >>> --- to replace. >> >> And let's not forget that you can _plan_ to perform the battery >> replacement for whenever that is convenient. > > How do you know in advance when the battery will have failed? You replace the battery before it fails completely. Most batteries don't go from perfectly fine to completely dead within one charge cycle. If the battery drains completely during a power outage before the UPS has a chance to respond to the battery's loss of capacity, that becomes a (hopefully clean) power cut, which _still_ is _a lot_ better than equipment which isn't designed to deal with a significant overvoltage condition taking the brunt of a lightning strike. I'm assuming, of course, that you replace the battery with one of the same chemistry. The UPS will probably assume some discharge characteristic depending on what battery type the OEM uses (lead acid, NiCd, NiMH, LiIon, ...); of course if you give the UPS a battery using some other chemistry, that'll immediately wreak havoc with lots of things. >> Which is quite the contrast to a lightning strike blowing out even >> _just_ the PSU and it needing replacement before you can even use >> the computer again (and you _hope_ that nothing more took a hit, >> which it probably did even if the computer _seems_ to be working >> fine). > > It would also hit the display(s), the switches and through that > everything that's connected to the network, the server(s) ... That > adds up to a lot of money. Which is why I said "even _just_ the PSU", emphasis original. >> It's also worth talking to your local electrician about installing an >> incoming-mains overvoltage protection for lightning protection. > > Hm I thought it's expensive. So did I until I actually asked someone who could give me a quote for actually installing it. > That doesn't exactly help when the failed disk has disappeared > altogether, as if it had been removed ;) If that happens, I'd get output along the lines of: # zpool status pool: tank state: DEGRADED scan: scrub repaired B in with errors on config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2-0ONLINE 0 0 0 wwn-0x0001-crypt ONLINE 0 0 0 8446744073709551616 UNAVAIL 0 0 0 was /dev/mapper/wwn-0x1113-crypt wwn-0x2225-crypt ONLINE 0 0 0 wwn-0x3337-crypt ONLINE 0 0 0 wwn-0x4449-crypt ONLINE 0 0 0 wwn-0x555b-crypt ONLINE 0 0 0 clearly identifying the problem. And also most likely a lot of event notifications telling me that wwn-0x1113-crypt is having issues within the "tank" pool, plus any applicable kernel logs for the device disconnection and perhaps lower-level I/O errors. Similarly, if a storage device suddenly starts returning garbage, that will show up likely as CKSUM errors and the device will eventually get kicked out of the pool, showing as state FAILED with large error counter values. (zpool status would also provide some more explanatory details, in the example above including that "applications are unaffected" because sufficient redundancy would still exist; but I'm eliding those here because I don't have them handy and don't feel like creating such a situation just to get example output. The important part is that the disk that dropped off the bus will show as likely UNAVAIL with its internal identifier and a reference to its WWN because of my naming scheme, instead of as completely missing. Solution is to get a replacement disk, plug it in, execute "sudo zpool replace tank $numeric_id $new_device_path", and wait a while, all the while I can still use the system normally.) No matter what kind of storage solution you're using - hardware RAID, software RAID, no redundancy, whichever - or how you're doing backups (assuming that you are, for some value of "you"), you can't just ignore issues with it. That way lies data loss. -- Michael Kjörling https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?”
Re: Data and hardware protection measures; was: rsync --delete vs rsync --delete-after
On Fri, 2024-01-26 at 15:56 +, Michael Kjörling wrote: > On 26 Jan 2024 16:11 +0100, from h...@adminart.net (hw): > > I rather spend the money on new batteries (EUR 40 last time after 5 > > years) every couple years [...] To comment myself, I think was 3 years, not 5, sorry. > > The hardware is usually extremely difficult --- and may be impossible > > --- to replace. > > And let's not forget that you can _plan_ to perform the battery > replacement for whenever that is convenient. How do you know in advance when the battery will have failed? > Which is quite the contrast to a lightning strike blowing out even > _just_ the PSU and it needing replacement before you can even use > the computer again (and you _hope_ that nothing more took a hit, > which it probably did even if the computer _seems_ to be working > fine). It would also hit the display(s), the switches and through that everything that's connected to the network, the server(s) ... That adds up to a lot of money. > [...] > It's also worth talking to your local electrician about installing an > incoming-mains overvoltage protection for lightning protection. I > won't quote prices because I had mine installed a good while ago and > also did it together with some other electrical work, but I was > surprised at how low the cost for that was, and I _know_ that it has > saved me on at least one occasion. Hm I thought it's expensive. I'll ask when I get a chance. > [...] > > You can always tell with a good hardware RAID because it > > will indicate on the trays which disk has failed and the controller > > tells you. > > Or you can label the physical disks. Whenever I replace a disk, I > print a label with the WWN of the new disk and place it so that it is > readable without removing any disks or cabling; That doesn't exactly help when the failed disk has disappeared altogether, as if it had been removed ;) But then, you can go by the numbers of the disks you can still see. And beware of SSDs; when they fail, they're usually entirely inaccessible whereas you may be still able to resuce (some) data from a spinning disk after it failed. It's probably really bad with mainbaords that use M2 storage since apparently, they seem to support only one (of the some type at least) rather than two. So you can't use those at all. What's the point of that? ZFS cache maybe?
Re: Data and hardware protection measures; was: rsync --delete vs rsync --delete-after
On 26 Jan 2024 16:11 +0100, from h...@adminart.net (hw): > I rather spend the money on new batteries (EUR 40 last time after 5 > years) every couple years [...] > > The hardware is usually extremely difficult --- and may be impossible > --- to replace. And let's not forget that you can _plan_ to perform the battery replacement for whenever that is convenient. Which is quite the contrast to a lightning strike blowing out even _just_ the PSU and it needing replacement before you can even use the computer again (and you _hope_ that nothing more took a hit, which it probably did even if the computer _seems_ to be working fine). >> I've had no external power outage in the last 5 or 10 years, but a UPS >> often needs at least one battery replacement during that time. > > Outages are (still) rare here, but it suffices to trigger a fuse or > the main switch when some device shorts out, or someone working on the > solar power systems some of the neighbours have, causing crazy voltage > fluctuations, or a lightning strike somewhere in the vinicity or > whatever reason for an UPS to be required. It's also worth talking to your local electrician about installing an incoming-mains overvoltage protection for lightning protection. I won't quote prices because I had mine installed a good while ago and also did it together with some other electrical work, but I was surprised at how low the cost for that was, and I _know_ that it has saved me on at least one occasion. It won't do power conditioning or power loss protection of course, but it _does_ greatly increase the odds that your home wiring survives a lightning-related voltage surge. (Nothing will realistically protect you against a _direct_ lightning strike; in that case the very best you can hope for is damage containment.) > More importantly, the hassle involved in trying to recover from a > failed disk is ridiculously enormous without RAID and can get > expensive when hours of work were lost. With RAID, you don't even > notice unless you keep an eye on it, and when a disk has failed, you > simply order a replacement and plug it in. Indeed; the point of RAID is uptime. > You can always tell with a good hardware RAID because it > will indicate on the trays which disk has failed and the controller > tells you. Or you can label the physical disks. Whenever I replace a disk, I print a label with the WWN of the new disk and place it so that it is readable without removing any disks or cabling; then I use the WWN to identify the disk in software; in both cases because the WWN is a stable identifier that I can fully expect will never change throughout the disk's lifetime. So when the system tells me that wwn-0x123456789abcdef0 is having issues, I can quickly and accurately identify the exact physical device that needs replacement once I have a replacement on hand. And if the kernel logs are telling me that, say, sdg is having issues, I can map that back to whatever WWN happens to map to that identifier at that particular time. (In practice, I'm more likely to get useful error details through ZFS status monitoring tools, where I already use the WWN, so I likely won't need to go that somewhat circuitous route.) > Yes, my setup is far from ideal when it comes to backups in that I > should make backups more frequently. That doesn't mean I shouldn't > have good backups and that UPSs and RAID were not required. Or that, again, they solve different problems. -- Michael Kjörling https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?”