This sounds reasonable. I approve of this plan.

Sent from my iPad

> On Jul 13, 2019, at 05:56, A. Wilcox <awil...@adelielinux.org> wrote:
> 
> == Table of Contents ==
> 
> * Executive Summary
> 
> * How Did We Get Here?
> 
> * Reliability Is Not Availability
> 
> * Enter Scaleway
> 
> * Hard Numbers
> 
> * Conclusion
> 
> * References
> 
> 
> 
> == Executive Summary ==
> 
> This is a formal proposal to retire the dedicated server we have with
> Integricloud and replace it with a set of virtual servers from Scaleway.
> 
> We originally chose Integricloud's dedicated server offering primarily
> for reliability and security.  While it has proven secure, and the
> hardware itself is reliable, its availability leaves something to be
> desired.
> 
> Scaleway offers a similar level of reliability, and has a higher level
> of availability based on our current account with them.  They
> additionally offer servers that are not based on the x86 architecture,
> so we are still protected from the numerous issues that plague x86.
> 
> This will also reduce our hosting costs by almost 90%, and should reduce
> downtime by nearly 100%.
> 
> 
> == How Did We Get Here? ==
> 
> In early January 2019, we were notified that both of our dedicated
> servers at Rack911 were being retired, with very little notice.  For
> some additional information, reference adelie-devel@ post with message
> ID <ba35ebd3-54b4-f18f-b65f-d327e9d0a...@adelielinux.org>
> (archived at [1]).
> 
> After our sponsorship was pulled in October 2018, we had done a bit of
> investigation into replacement hosting providers in the event that this
> would happen.  Our requirements at the time were:
> 
> * non-x86 based (due to the plethora of x86 bugs being discovered)
> 
> * at least 8 GB RAM minimum
> 
> * dedicated hardware preferred
> 
> * at least 3 IPv4 addresses
> 
> We evaluated Packet.net for ARM64 based systems[2] and Integricloud for
> PPC64 based systems[3].  We found Integricloud to be approximately 60%
> of the cost of Packet.net[4].  Additionally, we had a professional
> working relationship with their parent company, Raptor Engineering, who
> make the Talos and Blackbird family of computers.  In fact, the
> Integricloud system we were offered was to be a rack-mounted Talos II.
> Since we already had a Talos II in use as a build server, we felt this
> would be close to ideal, as any hardware oddities have already been
> worked out.
> 
> We chose their 4-core (16-thread) PowerPC system with 8 GB RAM and 2 x 1
> TB NVMe disk storage.  One 1 TB NVMe disk is dedicated to
> mirrormaster.adelielinux.org.  The other 1 TB NVMe disk is an LVM group,
> shared between the various KVM-based virtual servers run on it.
> 
> 
> == Reliability Is Not Availability ==
> 
> The Integricloud dedicated server, chloe.adelielinux.org, has has no
> hardware issues in over eight months of service.  The hardware itself
> has been fast, stable, and very reliable.  However, there have been
> multiple issues regarding availability.
> 
> Integricloud has a single homed fibre infrastructure; per a public
> looking glass, it is run via Mediacom[5].  This has caused an unforeseen
> and consistent issue regarding availability.
> 
> 2019-04-16 13:17  down
> 2019-04-16 22:24  9 hours, 7 minutes
> 
> 2019-04-17 00:10  down
> 2019-04-17 12:29  12 hours, 19 minutes
> 
> 2019-07-09 06:25  down
> 2019-07-09 20:01  13 hours, 37 minutes
> 
> 2019-07-10 15:14  down
> 2019-07-10 15:39  25 minutes
> 
> 2019-07-12 16:35  down
> 2019-07-12 16:43  8 minutes
> 
> This has resulted in a 97% uptime for April, and a 98% uptime for July -
> and we are only 13 days into July, so this number could go down further.
> 
> Additionally, many ISPs are not accepting Mediacom's IPv6 route
> announcements.  This has caused mirrormaster to be inaccessible to many
> of our users, and even one of the members of our own Infra Team[6].
> 
> Finally, while yours truly was trying to show an Adélie Web page to
> someone while on public Wi-Fi at a well-known place in Broken Arrow, OK,
> I was greeted with an error page[7]:
> 
> 
> Sonicwall Network Security Appliance
> 
> This site has been blocked by the network administrator.
> 
> Block reason: Gateway GEO-IP Filter Alert
> 
> IP address: 23.155.224.64
> 
> Connection initiated towards country: Unknown
> 
> 
> If a car dealership's firewall is blocking us, who knows what other
> firewalls are blocking us.  How many people are unable discover us, and
> how many corporate sponsors are we missing out on, because they can't
> even connect to our Web site?  And why can they not connect to our Web
> site?  It could be the IPv6 peering issue, or a firewall blocking our
> IPv4 space, or because Mediacom has suffered another "fibre cut".
> 
> 
> == Enter Scaleway ==
> 
> We have had a working relationship with Scaleway for almost a year and a
> half.  We launched our 32-bit ARM builder on the Scaleway ARM cloud in
> March 2018, and have had no downtime in that time:
> 
> awilcox on erin [pts/0 Sat 13 9:33] ~: uptime
> 09:33:02 up 489 days,  5:59,  load average: 0.00, 0.00, 0.00
> 
> The network has never suffered any outages, either.  Since the Scaleway
> cloud features ARM servers, we would additionally still be able to avoid
> the x86 architecture and all of its failings.
> 
> We have continually been limited by our lack of IPv4 space at
> Integricloud.  Currently, we "proxy" every server via athdheise, a
> virtual server on our Integricloud dedicated system that has both an
> IPv4 and IPv6 address.  All of our main systems are IPv6-only (wiki,
> bts, next, etc), and when an IPv4 system attempts to connect to any of
> these services, they have to be proxied via athdheise.
> 
> If we use Scaleway virtual servers, every system gets its own dedicated
> IPv4 address, which drastically simplifies our administration.
> 
> Additionally, we would receive a lot more RAM per virtual server.
> Currently, athdheise - the aforementioned Web server and proxy - has 256
> MB RAM.  It has 34 MB of available RAM.  When documentation changes are
> made and the Git hook runs to cause athdheise to rebuild the
> documentation site (at help.adelielinux.org), sometimes the process runs
> out of memory.  This means one of us has to log in, stop the web server,
> run the make process, and then restart the web server.  The minimum RAM
> at Scaleway is 2 GB per virtual server.  This is an extreme amount of
> overhead, and would even allow us to play with memcached (or other
> caching solutions) to reduce latency across our infrastructure.
> 
> Finally, we would save a dramatic amount of money.  We currently pay
> 225$/mo pre-tax for Integricloud.
> 
> 
> == Hard Numbers ==
> 
> The current systems we run on Integricloud are:
> 
> enfys (postgresql)             768 MB RAM    30 GB disk
> 
> rarity (these mailing lists)  1536 MB RAM    30 GB disk
> 
> mirrormaster                   256 MB RAM     1 TB disk
> 
> bts (Bugzilla issue tracking)  512 MB RAM     8 GB disk
> 
> athdheise (Web server/proxy)   256 MB RAM     4 GB disk
> 
> wiki                           512 MB RAM     8 GB disk
> 
> annwyn (Nextcloud)             512 MB RAM   100 GB disk
> 
> chatterbox (Quassel IRC)       512 MB RAM    40 GB disk
> 
> 
> 
> Since Scaleway tops out at 500 GB disk, we will need to consider
> alternate hosting for mirrormaster.  I believe we can run this on the
> Hetzner dedicated server that is being sponsored by Alyx at Leuhta Labs.
> 
> 
> 
> And this is what we could pay per virtual system on Scaleway:
> 
> 4 ARM CPUs, 2 GB RAM, 50 GB disk - 2.99€/mo
> 
> 6 ARM CPUs, 4 GB RAM, 100 GB disk - 5.99€/mo
> 
> 8 ARM CPUs, 8 GB RAM, 200 GB disk - 11.99€/mo
> 
> 
> 
> By my approximation, we would be able to put every single system except
> annwyn on the smallest server, and annwyn on the second-smallest.
> 
> 6× 2.99€ = 17.94€ per month
> 
> 1× 4.99€ + 17.94€ = 22.93€ per month total cost, or approximately
> 25.81$.  This is a savings of nearly 90% after tax.
> 
> 
> == Conclusion ==
> 
> I believe that retiring our Integricloud dedicated server and replacing
> it with Scaleway virtual ARM servers makes business sense.  It will
> allow us to spend less time down, dramatically improve the architecture
> of our infrastructure, and reach more people.  This will allow us to
> have an even greater reach, and allow us to grow into a larger, more
> healthy Linux distribution that can genuinely improve the world.
> 
> 
> I do not want to leave this proposal without a separate smaller proposal
> for how this could be effected easily.  I believe that we can simply
> start by migrating the wiki server, since it is the least used service.
> We can feel out Scaleway's ARM offering for a while, and make sure that
> it will genuinely work for our needs.  After we are satisfied, we can
> change the DNS for the wiki and begin work on another server.  Assuming
> all goes well, we will eventually be able to quietly power off the
> Integricloud dedicated system with zero further downtime.
> 
> 
> Thank you so much for reading this proposal.  I welcome any comments or
> questions you may have.  You may respond here or poke me on IRC.  I'll
> post a summary email in response with any important notes from IRC.
> 
> Best,
> --arw
> 
> 
> == References ==
> 
> [1]:
> https://lists.adelielinux.org/hyperkitty/list/adelie-de...@lists.adelielinux.org/thread/5QZCLXCVL7H2DOCDUOURWRVTZ52CMRPS/
> 
> [2]: https://www.packet.com/cloud/servers/c1-large-arm/
> 
> [3]: https://www.integricloud.com/
> 
> [4]: The Packet.net ARM box runs at 360$/mo.  Integricloud is 220$/mo.
> 
> [5]: https://bgp.he.net/AS46246
> 
> [6]:
> 
> <aranea> awilfox: Looks like my routing issues are Mediacom's (that's
> Raptor's only upstream) fault. I doubt I'll have any success contacting
> them; this needs to come from a customer. I'll try contacting tpearson
> again with more details; if he doesn't respond, I may have to ask you to
> file an outage report or sth.
> <aranea> Short version: Mediacom doesn't follow some standard industry
> practices, and thus many of their peers aren't accepting the routes they
> announce on behalf of their customers (and guess what, Raport is their
> only IPv6 customer.)
> 
> [7]: https://i.imgur.com/khmebJ5.png
> 
> -- 
> A. Wilcox (awilfox)
> Project Lead, Adélie Linux
> https://www.adelielinux.org
> 
> _______________________________________________
> Ad?lie Open Governance mailing list -- adelie-project@lists.adelielinux.org
> To unsubscribe send an email to adelie-project-le...@lists.adelielinux.org
_______________________________________________
Adélie Open Governance mailing list -- adelie-project@lists.adelielinux.org
To unsubscribe send an email to adelie-project-le...@lists.adelielinux.org

Reply via email to