Re: [URGENT] KVM HA - (FW: cs 4.1 host disconnected status)

2013-07-15 Thread Marcus Sorensen
I don't know much about HA in regards to management server/agent connectivity, but it seems to me like this is perilous ground. If a host loses connection with the management server, it seems to me that the management server doesn't have the resources to determine whether it should start

Re: [URGENT] KVM HA - (FW: cs 4.1 host disconnected status)

2013-07-15 Thread Marcus Sorensen
By the way, I'm aware that KVM has a heartbeat function in the agent, but that only works for NFS primary storage. Maybe the secondary storage could have a similar function that keeps track of running guests per host... Would still rely on the agent to not have died if the host is still up,

Re: [URGENT] KVM HA - (FW: cs 4.1 host disconnected status)

2013-07-15 Thread Chiradeep Vittal
Indeed HA is very tricky as you note. In the generic case where the MS cannot communicate with the agent, nothing can be concluded and the MS does nothing. I dug this up and posted it to the wiki https://cwiki.apache.org/confluence/x/dwn8AQ On 7/15/13 1:20 PM, Marcus Sorensen shadow...@gmail.com

RE: [URGENT] KVM HA - (FW: cs 4.1 host disconnected status)

2013-07-15 Thread Paul Angus
: Chiradeep Vittal [mailto:chiradeep.vit...@citrix.com] Sent: 15 July 2013 11:21 To: dev@cloudstack.apache.org Subject: Re: [URGENT] KVM HA - (FW: cs 4.1 host disconnected status) Indeed HA is very tricky as you note. In the generic case where the MS cannot communicate with the agent, nothing can

Re: [URGENT] KVM HA - (FW: cs 4.1 host disconnected status)

2013-07-15 Thread Chiradeep Vittal
A robust solution would probably involve Apache Zookeeper (using Curator perhaps) to perform robust distributed locking and/or leader election. On 7/15/13 3:51 PM, Chiradeep Vittal chiradeep.vit...@citrix.com wrote: Indeed HA is very tricky as you note. In the generic case where the MS cannot

Re: [URGENT] KVM HA - (FW: cs 4.1 host disconnected status)

2013-07-15 Thread Shanker Balan
On 15-Jul-2013, at 12:03 PM, Chiradeep Vittal chiradeep.vit...@citrix.commailto:chiradeep.vit...@citrix.com wrote: A robust solution would probably involve Apache Zookeeper (using Curator perhaps) to perform robust distributed locking and/or leader election. Just curious - Any idea as to how

Re: [URGENT] KVM HA - (FW: cs 4.1 host disconnected status)

2013-07-15 Thread Joe Brockmeier
Hi Paul, What's the bug ID for this so we can track it properly? Thanks! Joe On Mon, Jul 15, 2013, at 02:31 AM, Paul Angus wrote: I bumped this from the user list as we've just come across the same issue. CloudStack does not react or even change host status when contact is lost with a

Re: [URGENT] KVM HA - (FW: cs 4.1 host disconnected status)

2013-07-15 Thread Marcus Sorensen
For open stack, look to the current state of evacuate. http://www.mirantis.com/blog/cloud-prizefight-vmware-vs-openstack/ there is no official support for VM-level HA in OpenStack—it was initially planned for the Folsom release but was later dropped/postponed. There is currently an incubation

Re: [URGENT] KVM HA - (FW: cs 4.1 host disconnected status)

2013-07-15 Thread Marcus Sorensen
My strong preference would be to avoid any cluster locking libraries or similar on the agent side, if possible. I've just seen too many clustering products that are brittle and easily deadlock-able, where you end up having to reboot *everything* if something goes wrong on one host. It should be

RE: [URGENT] KVM HA - (FW: cs 4.1 host disconnected status)

2013-07-15 Thread Paul Angus
@cloudstack.apache.org Subject: Re: [URGENT] KVM HA - (FW: cs 4.1 host disconnected status) Hi Paul, What's the bug ID for this so we can track it properly? Thanks! Joe On Mon, Jul 15, 2013, at 02:31 AM, Paul Angus wrote: I bumped this from the user list as we've just come across the same issue