Not replying to anyone in particular, but just jotting down some notes on
what HA gives and doesn't, and why it's worthwhile to explore "Application
- level HA". Perhaps some can use this for internal discussions when asked
"Why use SmartOS when it doesn't give VM/Zone HA?" to explore what exactly
is it that Hypervisor level HA gives.

While commercial Hypervisors advertize High Availability, what they are
really advertizing is observing if a Hypervisor is still up, and if not,
then starting those VMs on another host. I've been in situations where the
guest OS' kernel is halted (e.g. due to auditd buffers becoming full), or a
kernel panic but VMware doesn't launch the VM on another host. If the VM is
detected as not being up any more, or of the Hypervisor itself is not up,
then the clustering solution boots that VM on a different node in the
cluster. From an Infrastructure team's point of view, the VM was highly
available. They will also be able to quote some SLAs of detecting and
booting up the VM.

However, if the application or its client doesn't have transactionality
built it, or some form of Load Balancing, then transactions would be lost.

It is therefore better to build in transactionality at the application
level, build in some stateless-ness into the application, and distribute
the Zones/VMs on different hosts. That way, whether the Hypervisor level HA
works or not, at least at the application level things still continue. On
my current project, we've got a mix of Oracle 11G R2 on Solaris 11.2, and
RHEL VMs on VMWare ESXi. Always, when the guest OS hangs, VMWare doesn't
detect the issue. But other apps and load balancers that connect to the VM
detect that it is not available, and re-send the same requests to other VMs.

Sure, not everyone can re-write all apps. But it is useful to check just
what Hypervisor HA gives and whether the expectations are actually being
met.

SmartOS has some nice goals, and may not be for those who absolutely need
Hypervisor level HA (e..g. when a host dies at 3 am, and other hosts boot
up those VMs).

-- Ram

On Tue, Dec 29, 2015 at 4:34 PM, a b <[email protected]> wrote:

> > Even though the zone (instead of the process) could potentially make
> > for a crisper boundary along which to detach a set of processes (and
> > associated resources) and move them to another machine, it is almost
> > certainly not worth the trouble.  Such an architectural shift would
> > forever complicate the implementation of every other operating system
> > feature implemented afterwards.  Each new feature would need to be
> > built with a view to being paused, serialised, deserialised and
> > resumed; there is little practical difference between this kind of
> > migration and a checkpoint/restart style facility.
>
> What if the hypervisor mirrored each memory write to one or  more
> nodes through a kernel driver, something akin, but not quite like
> the Solaris remote shared memory feature?
>
>
> http://docs.oracle.com/cd/E19120-01/open.solaris/817-4415/rsmapi-1/index.html
>
> The memory mirroring technique is the one employed by VMware ESX,
> I think VMware calls it "HA", or some such.
>
> > It is cleaner, simpler, and more robust to do HA (whatever that means
> > for you) in your application.
>
> This will forever be a point of vehement and  visceral  disagree-
> ment:  apart  from a select few people on this mailing list, most
> people in the industry are decidedly *NOT* capable of  developing
> applications  which  are  high  availability capable. I have been
> professionally working in the information technology industry for
> over  20  years  now,  and most people do not know the difference
> between high availability and failover, let alone  being  capable
> of  designing an application for that. I have yet to meet someone
> face-to-face who apart from myself has actually done so.
>
> Be that as it might, for the sake of illustrating  the  point,  I
> will  assume that your premise is correct: if we take your philo-
> sophy, anybody  doing  this  would  have  to  re-invent  and  re-
> implement  high  availability  for  *every*  application in their
> software stack; and while this might be acceptable for  a  single
> web application in a single company, think of all the infrastruc-
> ture that people normally need to run, just in order to  be  able
> to  run  the  aforementioned application. In those terms, are you
> still convinced custom  high  availability  for  each  and  every
> application in the infrastructure is a feasible approach?
>
> > There, you can make different and
> > nuanced decisions about consistency and availability for individual
> > abstract pieces of your application, even down to the level of
> > particular tables in a database, rather than trying to create a UNIX
> > host that magically spans the data centre.
> 
> SmartOS has a much bigger fish to  fry:  I  constantly  argue  we
> should  at least do a proof of concept of SmartOS at ${JOB}. Then
> comes a colleague and asks me a  simple  question:  "we  are  not
> capable of getting even the infrastructure right without throwing
> hardware and hundreds of millions on it, how would we do cluster-
> ing with SmartOS?
> 



-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to