Re: Build servers offline due to failed SSD

2021-03-15 Thread Ryan Schmidt



On Mar 11, 2021, at 20:15, Bjarne D Mathiesen wrote:

> Have you looked at something like this for fast storage/cache :
> 
> https://eshop.macsales.com/item/OWC/SSDACL4M20GB/
> https://www.amazon.de/ASRock-Ultra-Quad-Controller-Karte-PCI-Express/dp/B079TQ9C6Q/ref=sr_1_6
> 
> Setting these up in RAID-0 (with proper backup) ought to be the fastest
> storage solution

I did find and look into some of those quad-m.2 PCIe adapters and agree that 
storage access would probably be quicker but did not buy them due to their 
cost. The OWC model you linked to is listed at $349 without any SSDs, and then 
I would have needed to buy four SSDs to fully make use of it. The single-m.2 
PCIe adapters I'm using cost $20.


> With OpenCore it might also be possible to get to 384 MB RAM

Assuming that OpenCore is a replacement for the firmware and assuming that you 
meant 384 GB, then I have not looked into that because three of my servers only 
have 1/3 of their RAM slots populated as is, so there's still plenty of room to 
grow. I have also heard that replacing the Xserve firmware with the MacPro5,1 
firmware could enable the use of 192 GB RAM, among other benefits, so if I'm 
going to replace the firmware that's the route I plan to investigate first.



Re: Using RAM instead of disk for build servers (was: Re: Build servers offline due to failed SSD)

2021-03-15 Thread Ryan Schmidt
On Mar 15, 2021, at 09:50, Steven Smith wrote:

> SSD RAID offers speed and fault tolerance.

Sure. That either wasn't available or was not within what I was willing to 
spend in 2016 when I set up this system.


> Simple options that are tolerant to a single disk failure are:
> 
>   • Free/one extra SSD: Use macOS Disk Utility to RAID 1 together two 
> smaller, inexpensive SSD drives for 100% redundancy.
>   • OWC ThunderBay 4 Mini, $279: Use macOS Disk Utility to RAID 1 
> together four smaller, inexpensive SSD drives for 100% redundancy and larger 
> capacity.
>   • OWC ThunderBay 4 Mini with SoftRAID, $379: Use SoftRAID to RAID 4 
> together four smaller, inexpensive SSD drives for 100% redundancy and even 
> larger capacity. (Caveats: no encryption, no boot volumes.)

Software RAID is not possible with VMware ESXi, which is what we are using and 
which is where the storage needs to be addressable from*.

*with the exception of the hard disk RAID that holds the master copy of the 
rsync server, including packages and distfiles, which is the built-in hardware 
RAID of one of the Xserves which VMware ESXi itself cannot use but which is 
mapped directly into the single VM that needs to use it.



Re: Using RAM instead of disk for build servers (was: Re: Build servers offline due to failed SSD)

2021-03-15 Thread Dave Horsfall

On Mon, 15 Mar 2021, Daniel J. Luke wrote:

Thanks for including this information - it's similar to experience I've 
had with SSDs for $work. I'd be really surprised if we care about builds 
on the xserves in 8-10 years (given our previous experience with the ppc 
to x86 transition).


Somewhat related, I record TV shows to USB sticks (mini-SSDs) and I find 
that they tend to fail after a number of cycles (well, they are cheap).


I can usually recover by a complete reformat, but of course that burns up 
more spare blocks; time to break out the spinning rust that I used to 
use...


-- Dave


Re: Using RAM instead of disk for build servers (was: Re: Build servers offline due to failed SSD)

2021-03-15 Thread Daniel J. Luke
On Mar 14, 2021, at 6:38 PM, Ryan Schmidt  wrote:
> As far as longevity, the previous set of 3 500 GB SSDs I bought for these 
> servers in 2016 lasted 4-5 years. They were rated for 150 TBW (terabytes 
> written) and actually endured around 450 TBW by the time they failed, or 3 
> times as long as they were expected to last. The new SSDs are rated for 300 
> TBW, and if they also last 3 times longer than that, then they might last 
> 8-10 years, by which time we might have completely abandoned Intel-based Macs 
> and be totally switched over to Apple Silicon hardware and will have no use 
> for the Xserves anymore.

Thanks for including this information - it's similar to experience I've had 
with SSDs for $work. I'd be really surprised if we care about builds on the 
xserves in 8-10 years (given our previous experience with the ppc to x86 
transition).

I haven't looked recently, but I recall xserves being somewhat picky about 
their internal drives - have you found that specific SSDs work well (vs others 
that don't)? I'm assuming you've installed them on the internal trays - but 
maybe that's a bad assumption.

-- 
Daniel J. Luke



Re: Using RAM instead of disk for build servers (was: Re: Build servers offline due to failed SSD)

2021-03-15 Thread Daniel J. Luke
On Mar 14, 2021, at 6:38 PM, Ryan Schmidt  wrote:
> As far as longevity, the previous set of 3 500 GB SSDs I bought for these 
> servers in 2016 lasted 4-5 years. They were rated for 150 TBW (terabytes 
> written) and actually endured around 450 TBW by the time they failed, or 3 
> times as long as they were expected to last. The new SSDs are rated for 300 
> TBW, and if they also last 3 times longer than that, then they might last 
> 8-10 years, by which time we might have completely abandoned Intel-based Macs 
> and be totally switched over to Apple Silicon hardware and will have no use 
> for the Xserves anymore.

Thanks for including this information - it's similar to experience I've had 
with SSDs for $work. I'd be really surprised if we care about builds on the 
xserves in 8-10 years (given our previous experience with the ppc to x86 
transition).

I haven't looked recently, but I recall xserves being somewhat picky about 
their internal drives - have you found that specific SSDs work well (vs others 
that don't)? I'm assuming you've installed them on the internal trays - but 
maybe that's a bad assumption.

-- 
Daniel J. Luke



Re: Using RAM instead of disk for build servers (was: Re: Build servers offline due to failed SSD)

2021-03-15 Thread Steven Smith
SSD RAID offers speed and fault tolerance.

Simple options that are tolerant to a single disk failure are:

Free/one extra SSD: Use macOS Disk Utility to RAID 1 together two smaller, 
inexpensive SSD drives for 100% redundancy.
OWC ThunderBay 4 Mini, $279: Use macOS Disk Utility to RAID 1 together four 
smaller, inexpensive SSD drives for 100% redundancy and larger capacity.
OWC ThunderBay 4 Mini with SoftRAID, $379: Use SoftRAID to RAID 4 together four 
smaller, inexpensive SSD drives for 100% redundancy and even larger capacity. 
(Caveats: no encryption, no boot volumes.)


> On Mar 14, 2021, at 6:38 PM, Ryan Schmidt  wrote:
> 
> On Mar 14, 2021, at 06:11, Balthasar Indermuehle wrote:
> 
>> I used to run mac servers in what now can only be described as the days of 
>> yore... when a 32GB RAM bank cost a lot more than a (spinning) disk - and 
>> those were expensive then too. SSDs were not here yet. I haven't checked 
>> pricing lately, but I'd think you could put 256GB of RAM into a server for 
>> probably about the same as a 1TB SSD, and that would offer plenty of build 
>> space when used as a RAM drive. And that space will not degrade over time 
>> (unlike the SSD). In terms of longevity, for a machine with such a 
>> singularly targeted use case, I'd seriously consider taking the expense now, 
>> and have a server that lives for another decade.
> 
> Some pricing details:
> 
> OWC sells 96 GB Xserve RAM for $270 and 48 GB for $160. 96 + 96 + 48 would be 
> 240 GB for $700.
> 
> Meanwhile, the 500 GB SSDs I've been putting in cost about $65 each. I've 
> already put in three and still need one more to get rid of the hard drives in 
> the fourth server, though I may get a 1 TB SSD for $120 to have some extra 
> room.
> 
> Note that the way our build system (using our mpbb build script) works is 
> that all (or most) ports that exist are kept installed but deactivated. When 
> a build request comes in, we first activate all of that port's dependencies, 
> then we build and install and activate that port, then we deactivate all 
> ports. "Activate" means extract the tbz2 file to disk and note it in the 
> registry. "Deactivate" means delete the extracted files from disk and note it 
> in the registry. So even if we move all port building to a RAM disk, which 
> will undeniably be faster and reduce wear on the disk, it will not eliminate 
> it completely, not by a long shot. Some ports have hundreds of dependencies. 
> Activating and deactivating ports is a considerable portion of what our build 
> machines spend their day doing.
> 
> If we wanted to move that from SSD to RAM disk as well, that would mean 
> putting MacPorts itself onto the RAM disk. We wouldn't have room on the RAM 
> disk to keep all ports installed, so it would mean not keeping any ports 
> installed, and instead installing them on demand and then uninstalling them 
> (and maybe we would need to budget even more RAM for the RAM disk to 
> accommodate both space needed for MacPorts and for the dependencies and for 
> the build). That means downloading and verifying each port's tbz2 file from 
> the packages web server for every port build. Even though we do have a local 
> packages server, so that traffic would not have to go over the Internet, the 
> server uses a hard disk based RAID which is not the fastest in the world, so 
> this would cause additional delays, not to mention additional wear and tear 
> on the RAID's disks.
> 
> As far as longevity, the previous set of 3 500 GB SSDs I bought for these 
> servers in 2016 lasted 4-5 years. They were rated for 150 TBW (terabytes 
> written) and actually endured around 450 TBW by the time they failed, or 3 
> times as long as they were expected to last. The new SSDs are rated for 300 
> TBW, and if they also last 3 times longer than that, then they might last 
> 8-10 years, by which time we might have completely abandoned Intel-based Macs 
> and be totally switched over to Apple Silicon hardware and will have no use 
> for the Xserves anymore.
> 
> 



smime.p7s
Description: S/MIME cryptographic signature


Re: Using RAM instead of disk for build servers (was: Re: Build servers offline due to failed SSD)

2021-03-14 Thread Balthasar Indermuehle
Hi Ryan,

thanks for your detailed response. I hadn't thought of some of the build
intricacies you mention. Let alone the upcoming silicon change and phasing
out of x86. Sounds like your approach is a good balance for longevity,
performance, and cost.

Cheers

Balthasar

On Mon, 15 Mar 2021 at 09:38, Ryan Schmidt  wrote:

> On Mar 14, 2021, at 06:11, Balthasar Indermuehle wrote:
>
> > I used to run mac servers in what now can only be described as the days
> of yore... when a 32GB RAM bank cost a lot more than a (spinning) disk -
> and those were expensive then too. SSDs were not here yet. I haven't
> checked pricing lately, but I'd think you could put 256GB of RAM into a
> server for probably about the same as a 1TB SSD, and that would offer
> plenty of build space when used as a RAM drive. And that space will not
> degrade over time (unlike the SSD). In terms of longevity, for a machine
> with such a singularly targeted use case, I'd seriously consider taking the
> expense now, and have a server that lives for another decade.
>
> Some pricing details:
>
> OWC sells 96 GB Xserve RAM for $270 and 48 GB for $160. 96 + 96 + 48 would
> be 240 GB for $700.
>
> Meanwhile, the 500 GB SSDs I've been putting in cost about $65 each. I've
> already put in three and still need one more to get rid of the hard drives
> in the fourth server, though I may get a 1 TB SSD for $120 to have some
> extra room.
>
> Note that the way our build system (using our mpbb build script) works is
> that all (or most) ports that exist are kept installed but deactivated.
> When a build request comes in, we first activate all of that port's
> dependencies, then we build and install and activate that port, then we
> deactivate all ports. "Activate" means extract the tbz2 file to disk and
> note it in the registry. "Deactivate" means delete the extracted files from
> disk and note it in the registry. So even if we move all port building to a
> RAM disk, which will undeniably be faster and reduce wear on the disk, it
> will not eliminate it completely, not by a long shot. Some ports have
> hundreds of dependencies. Activating and deactivating ports is a
> considerable portion of what our build machines spend their day doing.
>
> If we wanted to move that from SSD to RAM disk as well, that would mean
> putting MacPorts itself onto the RAM disk. We wouldn't have room on the RAM
> disk to keep all ports installed, so it would mean not keeping any ports
> installed, and instead installing them on demand and then uninstalling them
> (and maybe we would need to budget even more RAM for the RAM disk to
> accommodate both space needed for MacPorts and for the dependencies and for
> the build). That means downloading and verifying each port's tbz2 file from
> the packages web server for every port build. Even though we do have a
> local packages server, so that traffic would not have to go over the
> Internet, the server uses a hard disk based RAID which is not the fastest
> in the world, so this would cause additional delays, not to mention
> additional wear and tear on the RAID's disks.
>
> As far as longevity, the previous set of 3 500 GB SSDs I bought for these
> servers in 2016 lasted 4-5 years. They were rated for 150 TBW (terabytes
> written) and actually endured around 450 TBW by the time they failed, or 3
> times as long as they were expected to last. The new SSDs are rated for 300
> TBW, and if they also last 3 times longer than that, then they might last
> 8-10 years, by which time we might have completely abandoned Intel-based
> Macs and be totally switched over to Apple Silicon hardware and will have
> no use for the Xserves anymore.
>
>
>


Re: Using RAM instead of disk for build servers (was: Re: Build servers offline due to failed SSD)

2021-03-14 Thread Ryan Schmidt
On Mar 14, 2021, at 06:11, Balthasar Indermuehle wrote:

> I used to run mac servers in what now can only be described as the days of 
> yore... when a 32GB RAM bank cost a lot more than a (spinning) disk - and 
> those were expensive then too. SSDs were not here yet. I haven't checked 
> pricing lately, but I'd think you could put 256GB of RAM into a server for 
> probably about the same as a 1TB SSD, and that would offer plenty of build 
> space when used as a RAM drive. And that space will not degrade over time 
> (unlike the SSD). In terms of longevity, for a machine with such a singularly 
> targeted use case, I'd seriously consider taking the expense now, and have a 
> server that lives for another decade.

Some pricing details:

OWC sells 96 GB Xserve RAM for $270 and 48 GB for $160. 96 + 96 + 48 would be 
240 GB for $700.

Meanwhile, the 500 GB SSDs I've been putting in cost about $65 each. I've 
already put in three and still need one more to get rid of the hard drives in 
the fourth server, though I may get a 1 TB SSD for $120 to have some extra room.

Note that the way our build system (using our mpbb build script) works is that 
all (or most) ports that exist are kept installed but deactivated. When a build 
request comes in, we first activate all of that port's dependencies, then we 
build and install and activate that port, then we deactivate all ports. 
"Activate" means extract the tbz2 file to disk and note it in the registry. 
"Deactivate" means delete the extracted files from disk and note it in the 
registry. So even if we move all port building to a RAM disk, which will 
undeniably be faster and reduce wear on the disk, it will not eliminate it 
completely, not by a long shot. Some ports have hundreds of dependencies. 
Activating and deactivating ports is a considerable portion of what our build 
machines spend their day doing.

If we wanted to move that from SSD to RAM disk as well, that would mean putting 
MacPorts itself onto the RAM disk. We wouldn't have room on the RAM disk to 
keep all ports installed, so it would mean not keeping any ports installed, and 
instead installing them on demand and then uninstalling them (and maybe we 
would need to budget even more RAM for the RAM disk to accommodate both space 
needed for MacPorts and for the dependencies and for the build). That means 
downloading and verifying each port's tbz2 file from the packages web server 
for every port build. Even though we do have a local packages server, so that 
traffic would not have to go over the Internet, the server uses a hard disk 
based RAID which is not the fastest in the world, so this would cause 
additional delays, not to mention additional wear and tear on the RAID's disks.

As far as longevity, the previous set of 3 500 GB SSDs I bought for these 
servers in 2016 lasted 4-5 years. They were rated for 150 TBW (terabytes 
written) and actually endured around 450 TBW by the time they failed, or 3 
times as long as they were expected to last. The new SSDs are rated for 300 
TBW, and if they also last 3 times longer than that, then they might last 8-10 
years, by which time we might have completely abandoned Intel-based Macs and be 
totally switched over to Apple Silicon hardware and will have no use for the 
Xserves anymore.




Re: Build servers offline due to failed SSD

2021-03-14 Thread Balthasar Indermuehle
I used to run mac servers in what now can only be described as the days of
yore... when a 32GB RAM bank cost a lot more than a (spinning) disk - and
those were expensive then too. SSDs were not here yet. I haven't checked
pricing lately, but I'd think you could put 256GB of RAM into a server for
probably about the same as a 1TB SSD, and that would offer plenty of build
space when used as a RAM drive. And that space will not degrade over time
(unlike the SSD). In terms of longevity, for a machine with such a
singularly targeted use case, I'd seriously consider taking the expense
now, and have a server that lives for another decade.


Dr Balthasar Indermühle
Inside Systems Pty Ltd
5007/101 Bathurst Street
Sydney NSW 2000, Australia
t: +61 (0)405 988 500



On Sun, 14 Mar 2021 at 21:04, Ryan Schmidt  wrote:

> There was some additional downtime in the last few days but the
> buildmaster now has a permanent home on a new SSD and is faster than ever.
> Builds that could not be scheduled during recent downtime have been
> rescheduled and are in progress.
>
>
> On Mar 14, 2021, at 04:02, Vincent Habchi wrote:
>
> > Wouldn’t it make sense to use some sort of RAM caching to speed up
> builds instead of SSD? What’s the point of using a permanent storage device
> for something that is bound to be erased in a very short time?
>
> RAM would be faster than SSD but also a lot more expensive. Certainly I
> know or can figure out how to create a RAM disk, and certainly we could
> tell MacPorts to store the build directory there. But if we ran out of
> space on the RAM disk during a build, the build would fail. Some builds
> need a lot of disk space -- I've seen ports that use 20GB of disk space to
> build. Instead of buying 20GB or more of additional RAM per VM, I've chosen
> to buy 90GB of SSD per VM.
>
> If you're suggesting that we should just set aside 1-2GB of RAM for build
> files and use the SSD if we need more space than that, then I don't know
> how to set that up.
>
> Note that macOS already caches disk files in RAM if there is any free RAM.
>
>


Re: Build servers offline due to failed SSD

2021-03-14 Thread Ryan Schmidt
There was some additional downtime in the last few days but the buildmaster now 
has a permanent home on a new SSD and is faster than ever. Builds that could 
not be scheduled during recent downtime have been rescheduled and are in 
progress.


On Mar 14, 2021, at 04:02, Vincent Habchi wrote:

> Wouldn’t it make sense to use some sort of RAM caching to speed up builds 
> instead of SSD? What’s the point of using a permanent storage device for 
> something that is bound to be erased in a very short time?

RAM would be faster than SSD but also a lot more expensive. Certainly I know or 
can figure out how to create a RAM disk, and certainly we could tell MacPorts 
to store the build directory there. But if we ran out of space on the RAM disk 
during a build, the build would fail. Some builds need a lot of disk space -- 
I've seen ports that use 20GB of disk space to build. Instead of buying 20GB or 
more of additional RAM per VM, I've chosen to buy 90GB of SSD per VM.

If you're suggesting that we should just set aside 1-2GB of RAM for build files 
and use the SSD if we need more space than that, then I don't know how to set 
that up.

Note that macOS already caches disk files in RAM if there is any free RAM.



Re: Build servers offline due to failed SSD

2021-03-14 Thread Vincent Habchi
Hi,

Wouldn’t it make sense to use some sort of RAM caching to speed up builds 
instead of SSD? What’s the point of using a permanent storage device for 
something that is bound to be erased in a very short time?

Maybe I’m way off base, though.

V.



Re: Build servers offline due to failed SSD

2021-03-11 Thread Bjarne D Mathiesen
Have you looked at something like this for fast storage/cache :

https://eshop.macsales.com/item/OWC/SSDACL4M20GB/
https://www.amazon.de/ASRock-Ultra-Quad-Controller-Karte-PCI-Express/dp/B079TQ9C6Q/ref=sr_1_6

Setting these up in RAID-0 (with proper backup) ought to be the fastest
storage solution

With OpenCore it might also be possible to get to 384 MB RAM

-- 
Bjarne D Mathiesen
Korsør ; Danmark ; Europa
--
denne besked er skrevet i et totalt M$-frit miljø
OpenCore + macOS 10.15.7 Cataina
MacPro 2010 ; 2 x 3,46 GHz 6-Core Intel Xeon ; 128 GB 1333 MHz DDR3 ECC
ATI Radeon RX 590 8 GB


Re: Build servers offline due to failed SSD

2021-03-11 Thread Bjarne D Mathiesen
Further discussions :

https://arstechnica.com/information-technology/2012/06/inside-the-ssd-revolution-how-solid-state-disks-really-work/

https://arstechnica.com/gadgets/2015/03/consumer-ssds-benchmarked-to-death-and-last-far-longer-than-rated/

https://arstechnica.com/gadgets/2020/05/zfs-versus-raid-eight-ironwolf-disks-two-filesystems-one-winner/

Lothar Haeger wrote:
> Here‘s an in depth discussion on SSD reliability, a little more detailed
> than „(not) recommended“ from someone with a lot of first hand
> experience, it seems: https://www.backblaze.com/blog/how-reliable-are-ssds/
> 
> 

-- 
Bjarne D Mathiesen
Korsør ; Danmark ; Europa
--
denne besked er skrevet i et totalt M$-frit miljø
macOS 10.15.7 Cataina
2 x 3,46 GHz 6-Core Intel Xeon ; 128 GB 1333 MHz DDR3 ECC
ATI Radeon RX 590 8 GB


Re: Build servers offline due to failed SSD

2021-03-09 Thread Dave C via macports-users
I’m curious. I know the ToH, but “tail”?

Dave

- - - 

> ... but I’m going to reconfigure it to get the longer backup “tail” provided 
> by the ToH approach.
> 
> Jim



Re: Build servers offline due to failed SSD

2021-03-09 Thread James Secan
James,

Thanks for the Tower of Hanoi reminder.  I used that many (many) years ago with 
9" tapes on a Big Iron machine but had forgotten the technique.  I’ve been 
using a FIFO seven-day rotation backup of my main user directory (using CCC), 
but I’m going to reconfigure it to get the longer backup “tail” provided by the 
ToH approach.

Jim
3222 NE 89th St
Seattle, WA 98115
(206) 430-0109

> On Mar 8, 2021, at 6:55 PM, James Linder  wrote:
> 
> All considered I’d take SSD for work disks and HD for long term backup
> Heck in my day (ouch) we’d teach 'tower of hanoi' backup stratedgy using 
> tape. why not do likewise with HDs. Timemachine certainly make that easy.
> 
> James



Re: Build servers offline due to failed SSD

2021-03-08 Thread James Linder



> On 9 Mar 2021, at 5:53 am, Dave C via macports-users 
>  wrote:
> 
> Old technology drives use magnetism to hold bits. This works for decades, or 
> so I’ve read. Usually the motor or bearings die before the magnetic medium 
> fails.
> 
> Solid State Drives use memory chips to hold bits. These “bit holders” can 
> wear out after a few trillion transitions (changing from 1 to 0 and 0 to 1). 
> I’d you’re using it in your laptop or PC, you’ll likely have no problems for 
> many years. In an internet-connected server, you may exceed those maximum 
> write cycles sooner rather than later.
> 

Dave I was just reading up, interesting info …

SSDs work (as do eproms) by having an isolated ‘chamber’. You get electrons in 
or out of the chamber using quantum tunneling [it disapears here and teleports 
there] based on probability, higher with an electric field

Repeated used breaks down the insulation of the isolated ‘chamber’

SLC (the lowest capacity and most expensive) store 0 or 1 (volts or whatever 
unit) and are most tolerant of damage
0 is 0, 1, 2, 3 units, 1 = 6, 7 8 units with 4 more likely a 0 and 5 more 
likely a 1 (say)

DLC store 0, 1 2 ,3 units and are less tolerant of an extra, or a fewer u
TCL
DLC store 0, 1, 2, 3 … 13, 14, 15 units. Are the cheapest but most fragile ie 
13 can leak away to 12, or gain from 13 to 14

But the link earlier is discuusion showed drives rated at 300 TBW going well 
past that to 1 or 2 PBW (Peta is 1000 times Tera)

In use a HD (specially with lots of them) is more likely to fail than SSD. 
Seagates old paper ‘ATA more than an interface’ says this:
Drive 1 seeks and the bump knocks others off track. They seek back knocking 
others off track. Process continues until a disk fails.
Their 10 year old assesment of reasons is even more relevant on todays drives.

SSDs generally give more warning that ’the end is nigh’

All considered I’d take SSD for work disks and HD for long term backup
Heck in my day (ouch) we’d teach 'tower of hanoi' backup stratedgy using tape. 
why not do likewise with HDs. Timemachine certainly make that easy.

James

> Dave
> 
> - - - 
> 
>>> On Sun, 7 Mar 2021, Michael A. Leonetti via macports-users wrote:
>>> 
>>> I’d really love to know more about what you’re saying here. Up until I just 
>>> read what you wrote, I thought SSDs were the savior of HDDs.
>> 
>> Real disk drives [tm] have their N/S magnetic poles lined up pretty much 
>> forever; SSDs rely upon capacitors storing their charge forever (hah!).
>> 
>> You need to have an electronics background to understand...
>> 
>> -- Dave (VK2KFU)
> 



Re: Build servers offline due to failed SSD

2021-03-08 Thread Dave C via macports-users
I think most people who talk about servers and HDs/SSDs are referring to 
commercial internet-connected servers.

Yes, a private server will likely see a lesser degree of service/use and 
storage drives can be uprated (the opposite of derated) for greater lifetime.

Dave

> I’ve been looking at VPS providers, and most of them offer SSD-based VPSs, 
> so they seem to be increasingly popular. I suspect that most VPSs do not get 
> consistently hammered, though.
> 
> Peter



Re: Build servers offline due to failed SSD

2021-03-08 Thread Dave C via macports-users
Old technology drives use magnetism to hold bits. This works for decades, or 
so I’ve read. Usually the motor or bearings die before the magnetic medium 
fails.

Solid State Drives use memory chips to hold bits. These “bit holders” can wear 
out after a few trillion transitions (changing from 1 to 0 and 0 to 1). I’d 
you’re using it in your laptop or PC, you’ll likely have no problems for many 
years. In an internet-connected server, you may exceed those maximum write 
cycles sooner rather than later.

Dave

- - - 

>> On Sun, 7 Mar 2021, Michael A. Leonetti via macports-users wrote:
>> 
>> I’d really love to know more about what you’re saying here. Up until I just 
>> read what you wrote, I thought SSDs were the savior of HDDs.
> 
> Real disk drives [tm] have their N/S magnetic poles lined up pretty much 
> forever; SSDs rely upon capacitors storing their charge forever (hah!).
> 
> You need to have an electronics background to understand...
> 
> -- Dave (VK2KFU)



Re: Build servers offline due to failed SSD

2021-03-08 Thread Dave Horsfall

On Sun, 7 Mar 2021, Todd Doucet wrote:


HDs fail also, obviously, but tend not to be so predictable about it. 


That of course depends upon the HD and the OS; my (FreeBSD) server's drive 
is around 20 years old, and is still going strong.


There's also software that monitors the health of the disk.

-- Dave

Re: Build servers offline due to failed SSD

2021-03-08 Thread Dave Horsfall

On Sun, 7 Mar 2021, Michael A. Leonetti via macports-users wrote:

I’d really love to know more about what you’re saying here. Up until I 
just read what you wrote, I thought SSDs were the savior of HDDs.


Real disk drives [tm] have their N/S magnetic poles lined up pretty much 
forever; SSDs rely upon capacitors storing their charge forever (hah!).


You need to have an electronics background to understand...

-- Dave (VK2KFU)

Re: Build servers offline due to failed SSD

2021-03-08 Thread James Linder



> On 7 Mar 2021, at 3:26 pm, Dave C via macports-users 
>  wrote:
> 
> This applies to affordable SSDs. As you say, the ones that are on par (re. 
> reliability) with HDDs are $pendy.
> 
> It’s something to do with an SSD’s limited number of write cycles, if I 
> remember...
> 
> Dave
> 
> - - - 
> 
>> Isn’t SSD a bad choice for server duty? No server farms use them, apparently 
>> due to short lifespan.

The reality needs to be carefully weighed up

SSDs are rated in TBW. That is Terrabytes Written
The Cheaper SSDs may be 300 or 600 TBW the more expensive may be 1200 TBW or 
even 2500 TBW.

The TBW rating depends on size,

I’ve put a 2T SSD (600 TBW) in my iMac and after a year i see life expected of 
65 years. So no SSD for a build farm is not a bad idea. The performance 
benefits far outweigh the 50+ year hastle of replacing.

The MTBF of spinning rust is 10 odd years, ssd is many times that. But 
remembering my uni stats the chance of a light globe, with a life of 1000 hours 
failing, when you have a few dozen bulbs (in my test question) was 20 min !!!

Enterprize Disks have a longer life, but as I said it is complicated. 

James

Re: Build servers offline due to failed SSD

2021-03-08 Thread Lothar Haeger
Here‘s an in depth discussion on SSD reliability, a little more detailed than 
„(not) recommended“ from someone with a lot of first hand experience, it seems: 
https://www.backblaze.com/blog/how-reliable-are-ssds/




Re: Build servers offline due to failed SSD

2021-03-07 Thread Daniel J. Luke
On Mar 7, 2021, at 8:30 PM, Todd Doucet  wrote:
> I think one can only get so far with purely qualitative analysis of the 
> characteristics of SSDs and HDs and then the end of that analysis will be 
> one-size-fits all advice, for example "recommended" or "not recommended" for 
> servers.

this +1000

> Surely the answer might vary depending on the particular server usage 
> pattern, the need for performance, the cost of routine maintenance (swapping 
> out aging drives or SSDs), the cost of the devices themselves, etc.

exactly

There's a reason you don't really see 15k enterprise drives anymore.

> It seems to me that a given server operator can tell how long a particular 
> SSD is likely to last.  They do not fail randomly, at least not very much.  
> The fail when they are "used up" and you can figure out well in advance, 
> usually, when you will need to swap the old ones out of service.

Back in 2015 - there's this article 
https://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead/
 where someone actually bothered to test and report some results.

> HDs fail also, obviously, but tend not to be so predictable about it.  
> Whether it makes sense for a given server to use an SSD really does depend on 
> the numbers.  All drives will fail.  All drives will need to be rotated out 
> of service.  It is a matter of cost, convenience, and performance.
> 
> The only caveat I can think of is that there might be an issue of malicious 
> use--a server with SSDs might be vulnerable to a wear attack, depending on 
> the server services offered, I suppose.

I'm sure there are worst-case scenarios for spinning disks that (in theory) 
could be exploited to wear their mechanisms out as well.

I've personally used both enterprise and consumer SSDs in high-write 
environments where the cost of replacing the SSDs was worthwhile for the 
performance benefits (or otherwise didn't change the overall cost of the 
solution) - and I've been pleasantly surprised with how much more use I've 
gotten from them than I originally calculated (based on the drive specs + the 
planed utilization + over provisioning). 

YMMV of course - but the blanket "you shouldn't use SSDs for servers" or "no 
one uses SSDs for servers" is wrong. For those who are interested in more 
details, there are a bunch of good USENIX and ACM papers where people have 
actually gone and collected data on real-world failure rates.

-- 
Daniel J. Luke



Re: Build servers offline due to failed SSD

2021-03-07 Thread Peter West
I’ve been looking at VPS providers, and most of them offer SSD-based VPSs, so 
they seem to be increasingly popular. I suspect that most VPSs do not get 
consistently hammered, though.

Peter
—
p...@ehealth.id.au
“Destroy this temple, and in three days I will raise it up.”

> On 8 Mar 2021, at 11:30 am, Todd Doucet  wrote:
> 
> I think one can only get so far with purely qualitative analysis of the 
> characteristics of SSDs and HDs and then the end of that analysis will be 
> one-size-fits all advice, for example "recommended" or "not recommended" for 
> servers.
> 
> Surely the answer might vary depending on the particular server usage 
> pattern, the need for performance, the cost of routine maintenance (swapping 
> out aging drives or SSDs), the cost of the devices themselves, etc.
> 
> It seems to me that a given server operator can tell how long a particular 
> SSD is likely to last.  They do not fail randomly, at least not very much.  
> The fail when they are "used up" and you can figure out well in advance, 
> usually, when you will need to swap the old ones out of service.
> 
> HDs fail also, obviously, but tend not to be so predictable about it.  
> Whether it makes sense for a given server to use an SSD really does depend on 
> the numbers.  All drives will fail.  All drives will need to be rotated out 
> of service.  It is a matter of cost, convenience, and performance.
> 
> The only caveat I can think of is that there might be an issue of malicious 
> use--a server with SSDs might be vulnerable to a wear attack, depending on 
> the server services offered, I suppose.
> 
> 
> 
> 
>> To emphasize again, the reason SSDs aren’t recommended for servers is 
>> because servers—by definition—see much heavier service, and these read/write 
>> cycles are used up more quickly.
>> 
>> For personal use in a PC, or such, SSDs are proving to be the dream they 
>> were promised to be.
>> 
>> As mentioned, given time, the technology will overcome this limitation for 
>> use in servers and these comments will be just so much past history.
>> 
>> Dave C.
>> 
>> - - - 
>> 
>> > The “on/off” switches in SSD’s are fragile and essentially break after 
>> > too many read/write cycles.  As pointed out, it’s a get what you pay for 
>> > world and cheap SSD’s are just that… cheap.   The expensive ones are more 
>> > reliable because they actually make available only a portion of their 
>> > total capacity, reserving the rest as replacements for such failures.  
>> > Intelligent software within the firmware manages this so that the end user 
>> > experiences a much longer device lifespan.
>> > 
>> > There’s lots of technical documentation for such.  Google knows.
>> > 
>> > Regards,
>> > 
>> > 
>> >>> On Mar 7, 2021, at 18:15, Michael A. Leonetti via macports-users 
>> >>> > >>> > wrote:
>> >> I’d really love to know more about what you’re saying here. Up until I 
>> >> just read what you wrote, I thought SSDs were the savior of HDDs.
>> >> Michael A. Leonetti
>> >> As warm as green tea
>> >>> 3/7/21 午後5:26、Dave Horsfall > >>> >のメール:
>> >>> On Sat, 6 Mar 2021, Dave C via macports-users wrote:
>>  Isn’t SSD a bad choice for server duty? No server farms use them, 
>>  apparently due to short lifespan.
>> >>> If you knew how SSDs worked then you wouldn't use them at all without 
>> >>> many backups.  Give me spinning rust any day...
>> >>> -- Dave



Re: Build servers offline due to failed SSD

2021-03-07 Thread Todd Doucet
I think one can only get so far with purely qualitative analysis of the 
characteristics of SSDs and HDs and then the end of that analysis will be 
one-size-fits all advice, for example "recommended" or "not recommended" for 
servers.

Surely the answer might vary depending on the particular server usage pattern, 
the need for performance, the cost of routine maintenance (swapping out aging 
drives or SSDs), the cost of the devices themselves, etc.

It seems to me that a given server operator can tell how long a particular SSD 
is likely to last.  They do not fail randomly, at least not very much.  The 
fail when they are "used up" and you can figure out well in advance, usually, 
when you will need to swap the old ones out of service.

HDs fail also, obviously, but tend not to be so predictable about it.  Whether 
it makes sense for a given server to use an SSD really does depend on the 
numbers.  All drives will fail.  All drives will need to be rotated out of 
service.  It is a matter of cost, convenience, and performance.

The only caveat I can think of is that there might be an issue of malicious 
use--a server with SSDs might be vulnerable to a wear attack, depending on the 
server services offered, I suppose.




> To emphasize again, the reason SSDs aren’t recommended for servers is because 
> servers—by definition—see much heavier service, and these read/write cycles 
> are used up more quickly.
> 
> For personal use in a PC, or such, SSDs are proving to be the dream they were 
> promised to be.
> 
> As mentioned, given time, the technology will overcome this limitation for 
> use in servers and these comments will be just so much past history.
> 
> Dave C.
> 
> - - - 
> 
> > The “on/off” switches in SSD’s are fragile and essentially break after too 
> > many read/write cycles.  As pointed out, it’s a get what you pay for world 
> > and cheap SSD’s are just that… cheap.   The expensive ones are more 
> > reliable because they actually make available only a portion of their total 
> > capacity, reserving the rest as replacements for such failures.  
> > Intelligent software within the firmware manages this so that the end user 
> > experiences a much longer device lifespan.
> > 
> > There’s lots of technical documentation for such.  Google knows.
> > 
> > Regards,
> > 
> > 
> >>> On Mar 7, 2021, at 18:15, Michael A. Leonetti via macports-users 
> >>>  wrote:
> >> I’d really love to know more about what you’re saying here. Up until I 
> >> just read what you wrote, I thought SSDs were the savior of HDDs.
> >> Michael A. Leonetti
> >> As warm as green tea
> >>> 3/7/21 午後5:26、Dave Horsfall のメール:
> >>> On Sat, 6 Mar 2021, Dave C via macports-users wrote:
>  Isn’t SSD a bad choice for server duty? No server farms use them, 
>  apparently due to short lifespan.
> >>> If you knew how SSDs worked then you wouldn't use them at all without 
> >>> many backups.  Give me spinning rust any day...
> >>> -- Dave
> 
> 


Re: Build servers offline due to failed SSD

2021-03-07 Thread Dave C via macports-users
To emphasize again, the reason SSDs aren’t recommended for servers is because 
servers—by definition—see much heavier service, and these read/write cycles are 
used up more quickly.

For personal use in a PC, or such, SSDs are proving to be the dream they were 
promised to be.

As mentioned, given time, the technology will overcome this limitation for use 
in servers and these comments will be just so much past history.

Dave C.

- - - 

> The “on/off” switches in SSD’s are fragile and essentially break after too 
> many read/write cycles.  As pointed out, it’s a get what you pay for world 
> and cheap SSD’s are just that… cheap.   The expensive ones are more reliable 
> because they actually make available only a portion of their total capacity, 
> reserving the rest as replacements for such failures.  Intelligent software 
> within the firmware manages this so that the end user experiences a much 
> longer device lifespan.
> 
> There’s lots of technical documentation for such.  Google knows.
> 
> Regards,
> 
> 
>>> On Mar 7, 2021, at 18:15, Michael A. Leonetti via macports-users 
>>>  wrote:
>> I’d really love to know more about what you’re saying here. Up until I just 
>> read what you wrote, I thought SSDs were the savior of HDDs.
>> Michael A. Leonetti
>> As warm as green tea
>>> 3/7/21 午後5:26、Dave Horsfall のメール:
>>> On Sat, 6 Mar 2021, Dave C via macports-users wrote:
 Isn’t SSD a bad choice for server duty? No server farms use them, 
 apparently due to short lifespan.
>>> If you knew how SSDs worked then you wouldn't use them at all without many 
>>> backups.  Give me spinning rust any day...
>>> -- Dave



Re: Build servers offline due to failed SSD

2021-03-07 Thread John Chivian
The “on/off” switches in SSD’s are fragile and essentially break after too many 
read/write cycles.  As pointed out, it’s a get what you pay for world and cheap 
SSD’s are just that… cheap.   The expensive ones are more reliable because they 
actually make available only a portion of their total capacity, reserving the 
rest as replacements for such failures.  Intelligent software within the 
firmware manages this so that the end user experiences a much longer device 
lifespan.

There’s lots of technical documentation for such.  Google knows.

Regards,


> On Mar 7, 2021, at 18:15, Michael A. Leonetti via macports-users 
>  wrote:
> 
> I’d really love to know more about what you’re saying here. Up until I just 
> read what you wrote, I thought SSDs were the savior of HDDs.
> 
> Michael A. Leonetti
> As warm as green tea
> 
>> 3/7/21 午後5:26、Dave Horsfall のメール:
>> 
>> On Sat, 6 Mar 2021, Dave C via macports-users wrote:
>> 
>>> Isn’t SSD a bad choice for server duty? No server farms use them, 
>>> apparently due to short lifespan.
>> 
>> If you knew how SSDs worked then you wouldn't use them at all without many 
>> backups.  Give me spinning rust any day...
>> 
>> -- Dave
> 



Re: Build servers offline due to failed SSD

2021-03-07 Thread Michael A. Leonetti via macports-users
I’d really love to know more about what you’re saying here. Up until I just 
read what you wrote, I thought SSDs were the savior of HDDs.

Michael A. Leonetti
As warm as green tea

> 3/7/21 午後5:26、Dave Horsfall のメール:
> 
> On Sat, 6 Mar 2021, Dave C via macports-users wrote:
> 
>> Isn’t SSD a bad choice for server duty? No server farms use them, apparently 
>> due to short lifespan.
> 
> If you knew how SSDs worked then you wouldn't use them at all without many 
> backups.  Give me spinning rust any day...
> 
> -- Dave



Re: Build servers offline due to failed SSD

2021-03-07 Thread Dave Horsfall

On Sat, 6 Mar 2021, Dave C via macports-users wrote:

Isn’t SSD a bad choice for server duty? No server farms use them, 
apparently due to short lifespan.


If you knew how SSDs worked then you wouldn't use them at all without many 
backups.  Give me spinning rust any day...


-- Dave

Re: Build servers offline due to failed SSD

2021-03-07 Thread Ryan Schmidt



On Mar 7, 2021, at 00:20, Dave C wrote:

> Isn’t SSD a bad choice for server duty?

My opinion is that it is a good choice in terms of performance. When I first up 
this incarnation of our buildbot system in 2016 I had the workers running on 
SSDs so that builds would be fast (our previous buildbot setup at Apple's macOS 
forge used a very expensive very-many-hard-disk RAID; we were in no position to 
purchase any equivalent type of hardware once we left macOS forge) and I had 
the master and distfiles/packages storage on a hard disk RAID for reliability. 
The specific RAID that I have turned out to be too slow. Web requests could 
take many seconds to respond. GitHub Web Hooks being delivered to the server 
could be marked as failed because GitHub didn't always wait long enough for our 
server to respond. This was unsatisfying so I moved the buildmaster to an old 
SSD while keeping the large files on the RAID. This was much faster, though not 
as fast as if I had used a new SSD, which is what I will ultimately be using. 
For now, the buildmaster is temporarily running off a USB hard drive and is 
slow as molasses. This is a terrible choice but all drive bays are already 
occupied by the RAID.

All of the SSDs we used for the workers have failed as well, 2 last year and 
the last one last month. In response to these failures, someone else also 
suggested that we should not use SSDs. I've run one of the workers off of three 
independent hard disks for the past year, and my opinion is that the 
performance and power consumption of SSDs is much better and I will switch the 
hard disk-based worker back to an SSD in the future. You can read this 
discussion here:

https://trac.macports.org/ticket/60178



Re: Build servers offline due to failed SSD

2021-03-06 Thread Dave C via macports-users
This applies to affordable SSDs. As you say, the ones that are on par (re. 
reliability) with HDDs are $pendy.

It’s something to do with an SSD’s limited number of write cycles, if I 
remember...

Dave

- - - 

> Isn’t SSD a bad choice for server duty? No server farms use them, apparently 
> due to short lifespan.
> 
> Dave



Re: Build servers offline due to failed SSD

2021-03-06 Thread Andrew Udvare


> On 2021-03-07, at 01:20, Dave C via macports-users 
>  wrote:
> 
> Isn’t SSD a bad choice for server duty? No server farms use them, apparently 
> due to short lifespan.
> 
> Dave

Plenty of servers use SSDs now, usually with HDDs to lower cost. The default 
option on AWS EC2 is to use an SSD.

There are enterprise grade SSDs that basically have the same characteristics as 
enterprise HDDs. Usually they are not as cheap as HDDs but sometimes are due to 
the underlying technology. Spinning discs will remain useful for long term 
storage that can tolerate large delays, but I don't see them being used for 
much else soon.

If SSDs become as cheap as HDDs with the same expected enterprise-level 
tolerances, there will be no reason to keep HDDs. That would mean you get the 
same benefits as an HDD, but with huge performance increases, and huge 
decreases in power usage.

Andrew

Re: Build servers offline due to failed SSD

2021-03-06 Thread Dave C via macports-users
Isn’t SSD a bad choice for server duty? No server farms use them, apparently 
due to short lifespan.

Dave


Re: Build servers offline due to failed SSD

2021-03-06 Thread Ryan Schmidt



On Mar 2, 2021, at 09:03, Ryan Schmidt wrote:

> On Feb 21, 2021, at 10:08, Ryan Schmidt wrote:
> 
>> We got through the winter storms but now there's a new problem. The SSD that 
>> the buildmaster VM is stored on and that boots up VMware ESXi is failing. 
>> I'm currently setting up a new ESXi startup disk and trying to find a 
>> temporary disk I can move that VM to to get us back up and running.
> 
> Builds are resuming though the buildmaster web interface is not yet available.

The buildbot web interface has been available read-only for a few days and is 
now back to normal functionality. The backlog of builds has been mostly 
completed, except on 10.15. The buildmaster is still running on a slow 
temporary disk so it's probably best not to access it unless you have to.

Re: Build servers offline due to failed SSD

2021-03-02 Thread Ryan Schmidt



On Feb 21, 2021, at 10:08, Ryan Schmidt wrote:

> We got through the winter storms but now there's a new problem. The SSD that 
> the buildmaster VM is stored on and that boots up VMware ESXi is failing. I'm 
> currently setting up a new ESXi startup disk and trying to find a temporary 
> disk I can move that VM to to get us back up and running.

Builds are resuming though the buildmaster web interface is not yet available.



Build servers offline due to failed SSD

2021-02-21 Thread Ryan Schmidt
We got through the winter storms but now there's a new problem. The SSD that 
the buildmaster VM is stored on and that boots up VMware ESXi is failing. I'm 
currently setting up a new ESXi startup disk and trying to find a temporary 
disk I can move that VM to to get us back up and running.

The long-term plan is to rewrite the build system to run under buildbot version 
2. I already created a buildbot 2 VM on a new SSD last year but I haven't 
finished rewriting the configuration so we can't just switch to that yet.