On 29 December 2015 at 00:34, a b <[email protected]> wrote:
> What if the hypervisor mirrored each memory write to one or more
> nodes through a kernel driver, something akin, but not quite like
> the Solaris remote shared memory feature?
>
> The memory mirroring technique is the one employed by VMware ESX,
> I think VMware calls it "HA", or some such.
That sounds extremely complicated, and I would imagine impossible to
do in a way that performs adequately without some kind of specialised
hardware platform.
>> It is cleaner, simpler, and more robust to do HA (whatever that means
>> for you) in your application.
> This will forever be a point of vehement and visceral disagree-
> ment: apart from a select few people on this mailing list, most
> people in the industry are decidedly *NOT* capable of developing
> applications which are high availability capable. I have been
> professionally working in the information technology industry for
> over 20 years now, and most people do not know the difference
> between high availability and failover, let alone being capable
> of designing an application for that. I have yet to meet someone
> face-to-face who apart from myself has actually done so.
I think this is a pretty defeatist attitude, honestly. It also
assumes that we're trying to build a platform on which every legacy
application that has ever existed can be run in some fault tolerant
configuration without any work.
We're not! We are, instead, trying to build a platform that is
tailor-made for hosting OS-virtualised containers (aka zones) as close
to the metal as possible, on relatively isolated commodity systems
with local storage. The goal is to build systems that scale
horizontally, where the application (in the broadest possible sense)
understands that it will be deployed across multiple hosts and can be
configured accordingly.
> Be that as it might, for the sake of illustrating the point, I
> will assume that your premise is correct: if we take your philo-
> sophy, anybody doing this would have to re-invent and re-
> implement high availability for *every* application in their
> software stack; and while this might be acceptable for a single
> web application in a single company, think of all the infrastruc-
> ture that people normally need to run, just in order to be able
> to run the aforementioned application. In those terms, are you
> still convinced custom high availability for each and every
> application in the infrastructure is a feasible approach?
Yes, I do, because people aren't building every new application from
whole cloth!
Databases like PostgreSQL support streaming replication from one host
to another in a synchronous mode -- failover can be automated with a
system like Joyent Manatee. There are also many other database
systems of different classes with different styles of replicated
cluster mode; these generally function without any shared storage or
OS-level HA; e.g., RethinkDB, MongoDB, etc.
Storing files on an NFS share (or other shared filesystem) is not a
great way to build scale-out file storage in 2015. Instead, people
are starting to using object storage systems that are themselves
highly available and durable in the face of component failure.
Consider Joyent Manta, which is itself build on top of SDC. The
application can PUT and GET from one URL without needing to be
directly responsible for protecting the data. If you _truly_ need NFS
compatibility, there is a rudimentary caching NFS proxy that can sit
in front of Manta. There are other systems in this space as well.
If you need leader election and/or service discovery, there are many
software systems that exist already and provide these sorts of
facilities. We use Zookeeper within Manta, but it's not my favourite
piece of software in the world. I hear positive things about Consul
and etcd from various sources, and they're almost certainly worth a
look. Your web application servers, of which you will presumably have
several spread across multiple containers on multiple hosts, can then
be found by your front end load balancers. Unhealthy, or manually
disabled, instances can be removed from the active rotation.
Casey and Tim (at Joyent) have also been working on tools like
Containerbuddy[1]. This tool is a sort of wrapper around an existing
application to make it easier to configure in a containerised,
multi-host world. You don't have to rewrite this tool for each new
application, just _configure_ it and include it in your container
images along with the application itself. Alex Wilson has spent
considerable effort recently working on Triton CNS[2], an upcoming
service for exposing live container topology into DNS.
If you're looking for a system that promises to be a magically fault
tolerant computer, that system is almost certainly not SmartOS. We're
not trying to provide all of the features of VMware VMotion, or HP
NonStop, or Veritas Clustering. In fact, we generally reject the
immense complexity of these proprietary hardware and software systems.
> SmartOS has a much bigger fish to fry: I constantly argue we
> should at least do a proof of concept of SmartOS at ${JOB}. Then
> comes a colleague and asks me a simple question: "we are not
> capable of getting even the infrastructure right without throwing
> hardware and hundreds of millions on it, how would we do cluster-
> ing with SmartOS?
If you're looking for a larger scale multi-host deployment of SmartOS,
I would seriously consider evaluating SmartDataCenter (SDC) instead.
SDC is aimed at operators of big and small cloud deployments, and
provides things like an operator web portal and a set of remote
provisioning APIs for users with access control, etc. It also
automates common operator tasks like installing new compute servers
and updating the operating system image.
We're investigating new ways to provide a better experience to users
seeking to deploy multi-container applications, but a lot of this
tooling is (or will be) built into the orchestration/service stack of
SDC, rather than bare SmartOS. But we're not about to fold shared
storage or live migration into the SmartOS design centre.
Cheers.
[1]: http://containerbuddy.io/
[2]: https://github.com/joyent/rfd/blob/master/rfd/0001/README.md
--
Joshua M. Clulow
UNIX Admin/Developer
http://blog.sysmgr.org
-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription:
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com