Re: Data and hardware protection measures; was: rsync --delete vs rsync --delete-after

2024-01-28 Thread Brad Rogers
On Sun, 28 Jan 2024 19:19:55 +0100
hw  wrote:

Hello hw,

>How do you know in advance when the battery will have failed?

Even my very basic UPS (APC Backup 1400) has a light on the front
labelled "Replace Battery".  That, combined with a very annoying high
pitch scream, are pretty good motivators to do the job.

I know the Backup 1400 was mentioned in this thread as "probably avoid"
(or something similar), but it's served me well thus far.  Had to replace
the battery pack only once.  That was after ten years, not the three to
five that people have been talking about.  APC no longer sell that
model, but battery packs are still available.  Just as an FYI, the
battery packs are sealed Lead-Acid.

Where I live (UK), it's possible to sell lead-acid batteries to scrap
merchants.  Amount paid is variable and subject to massive market forces
that are best described as 'volatile'.

Like others have mentioned with some of the more basic APC devices, this
particular model isn't designed with user replaceable batteries in mind,
but it's not an overly difficult task.  It can't easily (if at all) be
done leaving connected devices powered up, though.

-- 
 Regards  _   "Valid sig separator is {dash}{dash}{space}"
 / )  "The blindingly obvious is never immediately apparent"
/ _)rad   "Is it only me that has a working delete key?"
They take away our freedom in the name of liberty
Suspect Device - Stiff Little Fingers


pgpqAXSSxvoLF.pgp
Description: OpenPGP digital signature


Re: Data and hardware protection measures; was: rsync --delete vs rsync --delete-after

2024-01-28 Thread hw
On Fri, 2024-01-26 at 15:56 +, Michael Kjörling wrote:
> On 26 Jan 2024 16:11 +0100, from h...@adminart.net (hw):
> > I rather spend the money on new batteries (EUR 40 last time after 5
> > years) every couple years [...]

To comment myself, I think was 3 years, not 5, sorry.

> > The hardware is usually extremely difficult --- and may be impossible
> > --- to replace.
> 
> And let's not forget that you can _plan_ to perform the battery
> replacement for whenever that is convenient.

How do you know in advance when the battery will have failed?

> Which is quite the contrast to a lightning strike blowing out even
> _just_ the PSU and it needing replacement before you can even use
> the computer again (and you _hope_ that nothing more took a hit,
> which it probably did even if the computer _seems_ to be working
> fine).

It would also hit the display(s), the switches and through that
everything that's connected to the network, the server(s) ...  That
adds up to a lot of money.

> [...]
> It's also worth talking to your local electrician about installing an
> incoming-mains overvoltage protection for lightning protection. I
> won't quote prices because I had mine installed a good while ago and
> also did it together with some other electrical work, but I was
> surprised at how low the cost for that was, and I _know_ that it has
> saved me on at least one occasion.

Hm I thought it's expensive.  I'll ask when I get a chance.

> [...]
> > You can always tell with a good hardware RAID because it
> > will indicate on the trays which disk has failed and the controller
> > tells you.
> 
> Or you can label the physical disks. Whenever I replace a disk, I
> print a label with the WWN of the new disk and place it so that it is
> readable without removing any disks or cabling;

That doesn't exactly help when the failed disk has disappeared
altogether, as if it had been removed ;)

But then, you can go by the numbers of the disks you can still see.

And beware of SSDs; when they fail, they're usually entirely
inaccessible whereas you may be still able to resuce (some) data from
a spinning disk after it failed.

It's probably really bad with mainbaords that use M2 storage since
apparently, they seem to support only one (of the some type at least)
rather than two.  So you can't use those at all.  What's the point of
that?  ZFS cache maybe?



Re: Data and hardware protection measures; was: rsync --delete vs rsync --delete-after

2024-01-26 Thread Michael Kjörling
On 26 Jan 2024 16:11 +0100, from h...@adminart.net (hw):
> I rather spend the money on new batteries (EUR 40 last time after 5
> years) every couple years [...]
> 
> The hardware is usually extremely difficult --- and may be impossible
> --- to replace.

And let's not forget that you can _plan_ to perform the battery
replacement for whenever that is convenient. Which is quite the
contrast to a lightning strike blowing out even _just_ the PSU and it
needing replacement before you can even use the computer again (and
you _hope_ that nothing more took a hit, which it probably did even if
the computer _seems_ to be working fine).


>> I've had no external power outage in the last 5 or 10 years, but a UPS
>> often needs at least one battery replacement during that time.
> 
> Outages are (still) rare here, but it suffices to trigger a fuse or
> the main switch when some device shorts out, or someone working on the
> solar power systems some of the neighbours have, causing crazy voltage
> fluctuations, or a lightning strike somewhere in the vinicity or
> whatever reason for an UPS to be required.

It's also worth talking to your local electrician about installing an
incoming-mains overvoltage protection for lightning protection. I
won't quote prices because I had mine installed a good while ago and
also did it together with some other electrical work, but I was
surprised at how low the cost for that was, and I _know_ that it has
saved me on at least one occasion. It won't do power conditioning or
power loss protection of course, but it _does_ greatly increase the
odds that your home wiring survives a lightning-related voltage surge.
(Nothing will realistically protect you against a _direct_ lightning
strike; in that case the very best you can hope for is damage
containment.)


> More importantly, the hassle involved in trying to recover from a
> failed disk is ridiculously enormous without RAID and can get
> expensive when hours of work were lost.  With RAID, you don't even
> notice unless you keep an eye on it, and when a disk has failed, you
> simply order a replacement and plug it in.

Indeed; the point of RAID is uptime.


> You can always tell with a good hardware RAID because it
> will indicate on the trays which disk has failed and the controller
> tells you.

Or you can label the physical disks. Whenever I replace a disk, I
print a label with the WWN of the new disk and place it so that it is
readable without removing any disks or cabling; then I use the WWN to
identify the disk in software; in both cases because the WWN is a
stable identifier that I can fully expect will never change throughout
the disk's lifetime. So when the system tells me that
wwn-0x123456789abcdef0 is having issues, I can quickly and accurately
identify the exact physical device that needs replacement once I have
a replacement on hand. And if the kernel logs are telling me that,
say, sdg is having issues, I can map that back to whatever WWN happens
to map to that identifier at that particular time. (In practice, I'm
more likely to get useful error details through ZFS status monitoring
tools, where I already use the WWN, so I likely won't need to go that
somewhat circuitous route.)


> Yes, my setup is far from ideal when it comes to backups in that I
> should make backups more frequently.  That doesn't mean I shouldn't
> have good backups and that UPSs and RAID were not required.

Or that, again, they solve different problems.

-- 
Michael Kjörling  https://michael.kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”