Not replying to anyone in particular, but just jotting down some notes on what HA gives and doesn't, and why it's worthwhile to explore "Application - level HA". Perhaps some can use this for internal discussions when asked "Why use SmartOS when it doesn't give VM/Zone HA?" to explore what exactly is it that Hypervisor level HA gives.
While commercial Hypervisors advertize High Availability, what they are really advertizing is observing if a Hypervisor is still up, and if not, then starting those VMs on another host. I've been in situations where the guest OS' kernel is halted (e.g. due to auditd buffers becoming full), or a kernel panic but VMware doesn't launch the VM on another host. If the VM is detected as not being up any more, or of the Hypervisor itself is not up, then the clustering solution boots that VM on a different node in the cluster. From an Infrastructure team's point of view, the VM was highly available. They will also be able to quote some SLAs of detecting and booting up the VM. However, if the application or its client doesn't have transactionality built it, or some form of Load Balancing, then transactions would be lost. It is therefore better to build in transactionality at the application level, build in some stateless-ness into the application, and distribute the Zones/VMs on different hosts. That way, whether the Hypervisor level HA works or not, at least at the application level things still continue. On my current project, we've got a mix of Oracle 11G R2 on Solaris 11.2, and RHEL VMs on VMWare ESXi. Always, when the guest OS hangs, VMWare doesn't detect the issue. But other apps and load balancers that connect to the VM detect that it is not available, and re-send the same requests to other VMs. Sure, not everyone can re-write all apps. But it is useful to check just what Hypervisor HA gives and whether the expectations are actually being met. SmartOS has some nice goals, and may not be for those who absolutely need Hypervisor level HA (e..g. when a host dies at 3 am, and other hosts boot up those VMs). -- Ram On Tue, Dec 29, 2015 at 4:34 PM, a b <[email protected]> wrote: > > Even though the zone (instead of the process) could potentially make > > for a crisper boundary along which to detach a set of processes (and > > associated resources) and move them to another machine, it is almost > > certainly not worth the trouble. Such an architectural shift would > > forever complicate the implementation of every other operating system > > feature implemented afterwards. Each new feature would need to be > > built with a view to being paused, serialised, deserialised and > > resumed; there is little practical difference between this kind of > > migration and a checkpoint/restart style facility. > > What if the hypervisor mirrored each memory write to one or more > nodes through a kernel driver, something akin, but not quite like > the Solaris remote shared memory feature? > > > http://docs.oracle.com/cd/E19120-01/open.solaris/817-4415/rsmapi-1/index.html > > The memory mirroring technique is the one employed by VMware ESX, > I think VMware calls it "HA", or some such. > > > It is cleaner, simpler, and more robust to do HA (whatever that means > > for you) in your application. > > This will forever be a point of vehement and visceral disagree- > ment: apart from a select few people on this mailing list, most > people in the industry are decidedly *NOT* capable of developing > applications which are high availability capable. I have been > professionally working in the information technology industry for > over 20 years now, and most people do not know the difference > between high availability and failover, let alone being capable > of designing an application for that. I have yet to meet someone > face-to-face who apart from myself has actually done so. > > Be that as it might, for the sake of illustrating the point, I > will assume that your premise is correct: if we take your philo- > sophy, anybody doing this would have to re-invent and re- > implement high availability for *every* application in their > software stack; and while this might be acceptable for a single > web application in a single company, think of all the infrastruc- > ture that people normally need to run, just in order to be able > to run the aforementioned application. In those terms, are you > still convinced custom high availability for each and every > application in the infrastructure is a feasible approach? > > > There, you can make different and > > nuanced decisions about consistency and availability for individual > > abstract pieces of your application, even down to the level of > > particular tables in a database, rather than trying to create a UNIX > > host that magically spans the data centre. > > SmartOS has a much bigger fish to fry: I constantly argue we > should at least do a proof of concept of SmartOS at ${JOB}. Then > comes a colleague and asks me a simple question: "we are not > capable of getting even the infrastructure right without throwing > hardware and hundreds of millions on it, how would we do cluster- > ing with SmartOS? > ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00 Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb Powered by Listbox: http://www.listbox.com
