KVM HA fails continuously | Dell R440
Hi, I am facing the same issue mentioned here. Re: KVM Host HA and power lost to host. <http://mail-archives.apache.org/mod_mbox/cloudstack-users/201903.mbox/%3ccwlp123mb2497e9a14efb930067be397df8...@cwlp123mb2497.gbrp123.prod.outlook.com%3E> I have configured everything for HA to work and using KVM as hypervisor. We are using Dell PowerEdge R440 servers. Server has redundant PSU. The issue mentioned below is only with Dell R440 Servers. On other server models (PowerEdge R730 & R230) with the same configuration KVM HA works fine. *Testing Done:* 1. When host is powered off (server that is hosting the vm) 2. Host state stays in *disconnected **state* and VM never migrates. ( Ideally it should go into maintenance mode and VMs should migrate to available host as per ACS) 3. When I checked management logs I got this error message .* i.e.- "OOBM is not configured or enabled for host"* 4. When I clicked on Power ON menu in the idrac dashboard*,* the host goes into maintenance mode and VM Fencing takes place, but I see that still the server is not ping-able and the power state in the idrac dashboard is showing as *OFF*. Then I re-clicked on the power ON menu, and found that the power state shows *ON.* 5.We have tested the server with one PSU at a time with different power source and still we see the above issue. 6. Dell team has isolated the issue to ACS being, they were able to execute the IPMI commands via CLI mode. On other server models (PowerEdge R730 & R230) with the same configuration KVM HA works fine. The issue mentioned above is only with Dell R440 Servers. *Management server logs:* 2020-02-25 20:25:11,173 DEBUG [o.a.c.o.OutOfBandManagementServiceImpl] (pool-4-thread-6:ctx-c9e50e03) (logid:5b57b3b3) Out-of-band Management action (RESET) on host (a295403a-d87e-4294-8c8a-f9085f36248b) *fail*ed with *error*: Using best available cipher suite 3 *Invalid* completion code received: *Invalid* command Set Chassis Power Control to reset *fail*ed: Command not supported in present state 2020-02-25 20:25:11,178 *WARN* [o.a.c.k.h.KVMHAProvider] (pool-4-thread-6:ctx-c9e50e03) (logid:5b57b3b3) OOBM service is not configured or enabled for this host host3.hyclon3.com *error* is Out-of-band Management action (RESET) on host (a295403a-d87e-4294-8c8a-f9085f36248b) *fail*ed with *error*: Using best available cipher suite 3 *Invalid* completion code received: *Invalid* command Set Chassis Power Control to reset *fail*ed: Command not supported in present state 2020-02-25 20:25:11,178 *WARN* [o.a.c.h.t.BaseHATask] (pool-4-thread-5:null) (logid:5b57b3b3) *Exception* occurred while running RecoveryTask on a resource: org.apache.cloudstack.ha.provider.HARecovery *Exception*: OOBM service is not configured or enabled for this host host3.hyclon3.com org.apache.cloudstack.ha.provider.HARecovery*Exception*: OOBM service is not configured or enabled for this host host3.hyclon3.com Caused by: com.cloud.utils.*exception*.CloudRuntime*Exception*: Out-of-band Management action (RESET) on host (a295403a-d87e-4294-8c8a-f9085f36248b) *fail*ed with *error*: Using best available cipher suite 3 *Invalid* completion code received: *Invalid* command Set Chassis Power Control to reset *fail*ed: Command not supported in present state 2020-02-25 20:25:11,182 *WARN* [c.c.a.AlertManagerImpl] (pool-4-thread-5:null) (logid:5b57b3b3) AlertType:: 30 | dataCenterId:: 1 | podId:: 1 | clusterId:: null | message:: HA Recovery of host id=13, in dc id=1 performed 2020-02-25 20:25:15,056 DEBUG [o.a.c.o.OutOfBandManagementServiceImpl] (pool-5-thread-24:ctx-08e29baf) (logid:8e05151d) Out-of-band Management action (OFF) on host (a295403a-d87e-4294-8c8a-f9085f36248b) *fail*ed with *error*: Using best available cipher suite 3 *Invalid* completion code received: *Invalid* command Set Chassis Power Control to down/off *fail*ed: Command not supported in present state It will be of great help if anyone can help us to resolve this issue. Regards, Mark
kvm ha test in suspect degraded loop
Hi, We are trying to test the host HA, we have correctly configured OOBM and KVMHAProvider (together with the NFS pool). Unfortunately, during testing we cannot leave the suspect-degraded state loop. All values in the global settings are default. My question is how are you organized HA for KVM? How should HA options be set? The test we did was very simple, we disconnected the network (except ipmi) and then the host was additionally turned off. Regards, Piotr
kvm ha test in suspect degraded loop
Hi, We are trying to test the host HA, we have correctly configured OOBM and KVMHAProvider (together with the NFS pool). Unfortunately, during testing we cannot leave the suspect-degraded state loop. All values in the global settings are default. My question is how are you organized HA for KVM? How should HA options be set? The test we did was very simple, we disconnected the network (except ipmi) and then the host was additionally turned off. Regards, Piotr
Re: KVM HA fails under multiple management services
Li, please test with Indirect.agent.lb.check.interval=60 or similar, not 0 (zero), since that means it won't reconnect - this should solve your concern. As for the what is in what rack, it is your responsibility to disperse infrastructure components appropriately, i.e. across racks and such. We can't handle every case in that regards, hope you understand Andrija On Mon, 24 Jun 2019 at 02:24, li jerry wrote: > Thank you Nicolas and Andrija. > > Even if indirect.agent.lb.algorithm is configured as roundrobin, the > probability of failure can only be reduced. But it does not solve 100% of > the failure of KVM HA; > > Because in extreme cases, the management server and the kvm host may fail > at the same time (for example, the management server and the KVM HOST are > placed in the same rack, and the RACK will fail at the same time after the > power failure) > > > E.g; > > H1 is assigned and connected to M2 > H2 is assigned and connected to M3 > H3 is assigned and connected to M1 > > When H1 and M2 fail simultaneously, HOST HA of H1 will be invalid; > > Should we have other protection mechanisms to avoid this? > > 发件人: Nicolas Vazquez<mailto:nicolas.vazq...@shapeblue.com> > 发送时间: 2019年6月23日 23:31 > 收件人: d...@cloudstack.apache.org<mailto:d...@cloudstack.apache.org>; > users<mailto:users@cloudstack.apache.org> > 抄送: d...@cloudstack.apache.org<mailto:d...@cloudstack.apache.org> > 主题: Re: KVM HA fails under multiple management services > > As Andrija mentioned that is expected behavior as the global setting is > 'static'. It is also expected that your agents connect to the next > management server on the 'host' list once the management server they are > connected to is down. > You can find more information of this feature on this link: > https://www.shapeblue.com/software-based-agent-lb-for-cloudstack/ > > Please note this is a different feature than host HA, in which CloudStack > will try to recover hosts which are off via ipmi > > Obtener Outlook para Android<https://aka.ms/ghei36> > > > > De: Andrija Panic > Enviado: domingo, 23 de junio 11:03 > Asunto: Re: KVM HA fails under multiple management services > Para: users > Cc: d...@cloudstack.apache.org > > > Li, > > based on the Global Setting description for those 2, I would say that is > the expected behaviour. > i.e. change Indirect.agent.lb.check.interval to some other value, since 0 > means "don't check, don't reconnect" per what I read. > > Also, you might want to change from Indirect.agent.lb.algorithm=static to > some other value, since static means all your KVM agents will always > connect to that one mgmt host that is the first one in the in the "host" > list. > > Regards, > Andrija > > > nicolas.vazq...@shapeblue.com > www.shapeblue.com<http://www.shapeblue.com> > Amadeus House, Floral Street, London WC2E 9DPUK > @shapeblue > > > > On Sat, 22 Jun 2019 at 06:19, li jerry wrote: > > > > > Hello everyone > > I recently tested the multiple management services, based on agent lb > HOST > > HA (KVM). It was found that in extreme cases, HA would fail; the details > > are as follows: > > > > > > Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an > > external database cluster > > Three KVM nodes, H1, H2, H3 > > An external NFS primary storage > > > > > > CLOUDSTACK parameter configuration > > Indirect.agent.lb.algorithm=static > > Indirect.agent.lb.check.interval=0 > > host=172.17.1.141,172.17.1.142 > > > > > > Through the agent.log analysis, all kvm agents are connected to the first > > selection management node M1 (172.17.1.141): > > > > INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4) > > Processed new management server list: 172.17.1.141,172.17.1.142@static > > > > > > > > In extreme cases: > > KVM HOST and the preferred management server fail at the same time, KVM > > HOST will not trigger HA detection > > > > E.g: > > > > M1+H1, power off at the same time; the state of H1 remains Disconnected, > > and all VMs on H1 will not restart on other KVM nodes; > > M1+H2, power off at the same time; the state of H1 remains Disconnected, > > and all VMs on H2 will not restart on other KVM nodes; > > M1+H3, power off at the same time; the state of H1 remains Disconnected, > > and all VMs on H3 will not restart on other KVM nodes; > > > > > -- > > Andrija Panić > > > -- Andrija Panić
KVM HA fails under multiple management services
Thank you Nicolas and Andrija. Even if indirect.agent.lb.algorithm is configured as roundrobin, the probability of failure can only be reduced. But it does not solve 100% of the failure of KVM HA; Because in extreme cases, the management server and the kvm host may fail at the same time (for example, the management server and the KVM HOST are placed in the same rack, and the RACK will fail at the same time after the power failure) E.g; H1 is assigned and connected to M2 H2 is assigned and connected to M3 H3 is assigned and connected to M1 When H1 and M2 fail simultaneously, HOST HA of H1 will be invalid; Should we have other protection mechanisms to avoid this? 发件人: Nicolas Vazquez<mailto:nicolas.vazq...@shapeblue.com> 发送时间: 2019年6月23日 23:31 收件人: d...@cloudstack.apache.org<mailto:d...@cloudstack.apache.org>; users<mailto:users@cloudstack.apache.org> 抄送: d...@cloudstack.apache.org<mailto:d...@cloudstack.apache.org> 主题: Re: KVM HA fails under multiple management services As Andrija mentioned that is expected behavior as the global setting is 'static'. It is also expected that your agents connect to the next management server on the 'host' list once the management server they are connected to is down. You can find more information of this feature on this link: https://www.shapeblue.com/software-based-agent-lb-for-cloudstack/ Please note this is a different feature than host HA, in which CloudStack will try to recover hosts which are off via ipmi Obtener Outlook para Android<https://aka.ms/ghei36> De: Andrija Panic Enviado: domingo, 23 de junio 11:03 Asunto: Re: KVM HA fails under multiple management services Para: users Cc: d...@cloudstack.apache.org Li, based on the Global Setting description for those 2, I would say that is the expected behaviour. i.e. change Indirect.agent.lb.check.interval to some other value, since 0 means "don't check, don't reconnect" per what I read. Also, you might want to change from Indirect.agent.lb.algorithm=static to some other value, since static means all your KVM agents will always connect to that one mgmt host that is the first one in the in the "host" list. Regards, Andrija nicolas.vazq...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> Amadeus House, Floral Street, London WC2E 9DPUK @shapeblue On Sat, 22 Jun 2019 at 06:19, li jerry wrote: > > Hello everyone > I recently tested the multiple management services, based on agent lb HOST > HA (KVM). It was found that in extreme cases, HA would fail; the details > are as follows: > > > Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an > external database cluster > Three KVM nodes, H1, H2, H3 > An external NFS primary storage > > > CLOUDSTACK parameter configuration > Indirect.agent.lb.algorithm=static > Indirect.agent.lb.check.interval=0 > host=172.17.1.141,172.17.1.142 > > > Through the agent.log analysis, all kvm agents are connected to the first > selection management node M1 (172.17.1.141): > > INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4) > Processed new management server list: 172.17.1.141,172.17.1.142@static > > > > In extreme cases: > KVM HOST and the preferred management server fail at the same time, KVM > HOST will not trigger HA detection > > E.g: > > M1+H1, power off at the same time; the state of H1 remains Disconnected, > and all VMs on H1 will not restart on other KVM nodes; > M1+H2, power off at the same time; the state of H1 remains Disconnected, > and all VMs on H2 will not restart on other KVM nodes; > M1+H3, power off at the same time; the state of H1 remains Disconnected, > and all VMs on H3 will not restart on other KVM nodes; > -- Andrija Panić
Re: KVM HA fails under multiple management services
As Andrija mentioned that is expected behavior as the global setting is 'static'. It is also expected that your agents connect to the next management server on the 'host' list once the management server they are connected to is down. You can find more information of this feature on this link: https://www.shapeblue.com/software-based-agent-lb-for-cloudstack/ Please note this is a different feature than host HA, in which CloudStack will try to recover hosts which are off via ipmi Obtener Outlook para Android<https://aka.ms/ghei36> De: Andrija Panic Enviado: domingo, 23 de junio 11:03 Asunto: Re: KVM HA fails under multiple management services Para: users Cc: d...@cloudstack.apache.org Li, based on the Global Setting description for those 2, I would say that is the expected behaviour. i.e. change Indirect.agent.lb.check.interval to some other value, since 0 means "don't check, don't reconnect" per what I read. Also, you might want to change from Indirect.agent.lb.algorithm=static to some other value, since static means all your KVM agents will always connect to that one mgmt host that is the first one in the in the "host" list. Regards, Andrija nicolas.vazq...@shapeblue.com www.shapeblue.com Amadeus House, Floral Street, London WC2E 9DPUK @shapeblue On Sat, 22 Jun 2019 at 06:19, li jerry wrote: > > Hello everyone > I recently tested the multiple management services, based on agent lb HOST > HA (KVM). It was found that in extreme cases, HA would fail; the details > are as follows: > > > Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an > external database cluster > Three KVM nodes, H1, H2, H3 > An external NFS primary storage > > > CLOUDSTACK parameter configuration > Indirect.agent.lb.algorithm=static > Indirect.agent.lb.check.interval=0 > host=172.17.1.141,172.17.1.142 > > > Through the agent.log analysis, all kvm agents are connected to the first > selection management node M1 (172.17.1.141): > > INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4) > Processed new management server list: 172.17.1.141,172.17.1.142@static > > > > In extreme cases: > KVM HOST and the preferred management server fail at the same time, KVM > HOST will not trigger HA detection > > E.g: > > M1+H1, power off at the same time; the state of H1 remains Disconnected, > and all VMs on H1 will not restart on other KVM nodes; > M1+H2, power off at the same time; the state of H1 remains Disconnected, > and all VMs on H2 will not restart on other KVM nodes; > M1+H3, power off at the same time; the state of H1 remains Disconnected, > and all VMs on H3 will not restart on other KVM nodes; > -- Andrija Panić
Re: KVM HA fails under multiple management services
Li, based on the Global Setting description for those 2, I would say that is the expected behaviour. i.e. change Indirect.agent.lb.check.interval to some other value, since 0 means "don't check, don't reconnect" per what I read. Also, you might want to change from Indirect.agent.lb.algorithm=static to some other value, since static means all your KVM agents will always connect to that one mgmt host that is the first one in the in the "host" list. Regards, Andrija On Sat, 22 Jun 2019 at 06:19, li jerry wrote: > > Hello everyone > I recently tested the multiple management services, based on agent lb HOST > HA (KVM). It was found that in extreme cases, HA would fail; the details > are as follows: > > > Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an > external database cluster > Three KVM nodes, H1, H2, H3 > An external NFS primary storage > > > CLOUDSTACK parameter configuration > Indirect.agent.lb.algorithm=static > Indirect.agent.lb.check.interval=0 > host=172.17.1.141,172.17.1.142 > > > Through the agent.log analysis, all kvm agents are connected to the first > selection management node M1 (172.17.1.141): > > INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4) > Processed new management server list: 172.17.1.141,172.17.1.142@static > > > > In extreme cases: > KVM HOST and the preferred management server fail at the same time, KVM > HOST will not trigger HA detection > > E.g: > > M1+H1, power off at the same time; the state of H1 remains Disconnected, > and all VMs on H1 will not restart on other KVM nodes; > M1+H2, power off at the same time; the state of H1 remains Disconnected, > and all VMs on H2 will not restart on other KVM nodes; > M1+H3, power off at the same time; the state of H1 remains Disconnected, > and all VMs on H3 will not restart on other KVM nodes; > -- Andrija Panić
KVM HA fails under multiple management services
Hello everyone I recently tested the multiple management services, based on agent lb HOST HA (KVM). It was found that in extreme cases, HA would fail; the details are as follows: Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an external database cluster Three KVM nodes, H1, H2, H3 An external NFS primary storage CLOUDSTACK parameter configuration Indirect.agent.lb.algorithm=static Indirect.agent.lb.check.interval=0 host=172.17.1.141,172.17.1.142 Through the agent.log analysis, all kvm agents are connected to the first selection management node M1 (172.17.1.141): INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4) Processed new management server list: 172.17.1.141,172.17.1.142@static In extreme cases: KVM HOST and the preferred management server fail at the same time, KVM HOST will not trigger HA detection E.g: M1+H1, power off at the same time; the state of H1 remains Disconnected, and all VMs on H1 will not restart on other KVM nodes; M1+H2, power off at the same time; the state of H1 remains Disconnected, and all VMs on H2 will not restart on other KVM nodes; M1+H3, power off at the same time; the state of H1 remains Disconnected, and all VMs on H3 will not restart on other KVM nodes;
Re: Problems with KVM HA & STONITH
Hi Victor, If I may interject, I read your email and understand you're running KVM with Ceph storage. As I far I know, ACS only supports HA on NFS or iSCSI primary storage. http://docs.cloudstack.apache.org/projects/cloudstack-administration/en/4.11/reliability.html However, if you wanted to use Ceph, you could create an RBD block device and export it over NFS. Here is an article I referenced in the past: https://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/ You could then add that NFS storage into ACS and utilize HA. I hope I'm understanding you correctly. Best Regards, James On Thu, Apr 5, 2018 at 12:53 PM, victor <vic...@ihnetworks.com> wrote: > Hello Boris, > > I am able to create VM with nfs+Ha and nfs without HA. The issue is with > creating VM with Ceph storage. > > Regards > Victor > > > > On 04/05/2018 01:18 PM, Boris Stoyanov wrote: > >> Hi Victor, >> Host HA is working only with KVM + NFS. Ceph is not supported at this >> stage. Obviously RAW volumes are not supported on your pool, but I’m not >> sure if that’s because of Ceph or HA in general. Are you able to deploy a >> non-ha VM? >> >> Boris Stoyanov >> >> >> boris.stoya...@shapeblue.com >> www.shapeblue.com >> 53 Chandos Place, Covent Garden, London WC2N 4HSUK >> @shapeblue >> >> >>> On 5 Apr 2018, at 4:19, victor <vic...@ihnetworks.com> wrote: >>> >>> Hello Rohit, >>> >>> Is the Host HA provider start working with Ceph. The reason I am asking >>> is because, I am not able to create a VM with Ceph storage in a kvm host >>> with HA enabled and I am getting the following error while creating VM. >>> >>> >>> .cloud.exception.StorageUnavailableException: Resource [StoragePool:2] >>> is unreachable: Unable to create Vol[9|vm=6|DATADISK]:com.cloud >>> .utils.exception.CloudRuntimeException: org.libvirt.LibvirtException: >>> unsupported configuration: only RAW volumes are supported by this storage >>> pool >>> >>> >>> Regards >>> Victor >>> >>> On 11/04/2017 09:53 PM, Rohit Yadav wrote: >>> >>>> Hi James, (/cc Simon and others), >>>> >>>> >>>> A new feature exists in upcoming ACS 4.11, Host HA: >>>> >>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA >>>> >>>> You can read more about it here as well: http://www.shapeblue.com/host- >>>> ha-for-kvm-hosts-in-cloudstack/ >>>> >>>> This feature can use a custom HA provider, with default HA provider >>>> implemented for KVM and NFS, and uses ipmi based fencing (STONITH) of the >>>> host. The current HA mechanism provides no such method of fencing (powering >>>> off) a host and it depends under what circumstances the VM HA is failing >>>> (environment issues, ACS version etc). >>>> >>>> As Simon mentioned, we have a (host) HA provider that works with Ceph >>>> in near future. >>>> >>>> Regards. >>>> >>>> >>>> From: Simon Weller <swel...@ena.com.INVALID> >>>> Sent: Thursday, November 2, 2017 7:27:22 PM >>>> To: users@cloudstack.apache.org >>>> Subject: Re: Problems with KVM HA & STONITH >>>> >>>> James, >>>> >>>> >>>> Ceph is a great solution and we run all of our ACS storage on Ceph. >>>> Note that it adds another layer of complexity to your installation, so >>>> you're going need to develop some expertise with that platform to get >>>> comfortable with how it works. Typically you don't want to mix Ceph with >>>> your ACS hosts. We in fact deploy 3 separate Ceph Monitors, and then scale >>>> OSDs as required on a per cluster basis in order to add additional >>>> resiliency (So every KVM ACS cluster has it's own Ceph "POD"). We also use >>>> Ceph for S3 storage (on completely separate Ceph clusters) for some other >>>> services. >>>> >>>> >>>> NFS is much simpler to maintain for smaller installations in my >>>> opinion. If the IO load you're looking at isn't going to be insanely high, >>>> you could look at building a 2 node NFS cluster using pacemaker and DRDB >>>> for data replication between nodes. That would reduce your storage >>>> requirement to 2 fairly low power servers (NFS is not very cpu intensive). >>>>
Re: Problems with KVM HA & STONITH
Hello Boris, I am able to create VM with nfs+Ha and nfs without HA. The issue is with creating VM with Ceph storage. Regards Victor On 04/05/2018 01:18 PM, Boris Stoyanov wrote: Hi Victor, Host HA is working only with KVM + NFS. Ceph is not supported at this stage. Obviously RAW volumes are not supported on your pool, but I’m not sure if that’s because of Ceph or HA in general. Are you able to deploy a non-ha VM? Boris Stoyanov boris.stoya...@shapeblue.com www.shapeblue.com 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue On 5 Apr 2018, at 4:19, victor <vic...@ihnetworks.com> wrote: Hello Rohit, Is the Host HA provider start working with Ceph. The reason I am asking is because, I am not able to create a VM with Ceph storage in a kvm host with HA enabled and I am getting the following error while creating VM. .cloud.exception.StorageUnavailableException: Resource [StoragePool:2] is unreachable: Unable to create Vol[9|vm=6|DATADISK]:com.cloud.utils.exception.CloudRuntimeException: org.libvirt.LibvirtException: unsupported configuration: only RAW volumes are supported by this storage pool Regards Victor On 11/04/2017 09:53 PM, Rohit Yadav wrote: Hi James, (/cc Simon and others), A new feature exists in upcoming ACS 4.11, Host HA: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA You can read more about it here as well: http://www.shapeblue.com/host-ha-for-kvm-hosts-in-cloudstack/ This feature can use a custom HA provider, with default HA provider implemented for KVM and NFS, and uses ipmi based fencing (STONITH) of the host. The current HA mechanism provides no such method of fencing (powering off) a host and it depends under what circumstances the VM HA is failing (environment issues, ACS version etc). As Simon mentioned, we have a (host) HA provider that works with Ceph in near future. Regards. From: Simon Weller <swel...@ena.com.INVALID> Sent: Thursday, November 2, 2017 7:27:22 PM To: users@cloudstack.apache.org Subject: Re: Problems with KVM HA & STONITH James, Ceph is a great solution and we run all of our ACS storage on Ceph. Note that it adds another layer of complexity to your installation, so you're going need to develop some expertise with that platform to get comfortable with how it works. Typically you don't want to mix Ceph with your ACS hosts. We in fact deploy 3 separate Ceph Monitors, and then scale OSDs as required on a per cluster basis in order to add additional resiliency (So every KVM ACS cluster has it's own Ceph "POD"). We also use Ceph for S3 storage (on completely separate Ceph clusters) for some other services. NFS is much simpler to maintain for smaller installations in my opinion. If the IO load you're looking at isn't going to be insanely high, you could look at building a 2 node NFS cluster using pacemaker and DRDB for data replication between nodes. That would reduce your storage requirement to 2 fairly low power servers (NFS is not very cpu intensive). Currently on a host failure when using a storage other than NFS on KVM, you will not see HA occur until you take the failed host out of the ACS cluster. This is a historical limitation because ACS could not confirm the host had been fenced correctly, so to avoid potential data corruption (due to 2 hosts mounting the same storage), it doesn't do anything until the operator intervenes. As of ACS 4.10, IPMI based fencing is now supported on NFS and we're planning on developing similar support for Ceph. Since you're an school district, I'm more than happy to jump on the phone with you to talk you through these options if you'd like. - Si From: McClune, James <mcclu...@norwalktruckers.net> Sent: Thursday, November 2, 2017 8:28 AM To: users@cloudstack.apache.org Subject: Re: Problems with KVM HA & STONITH Hi Simon, Thanks for getting back to me. I created one single NFS share and added it as primary storage. I think I better understand how the storage works, with ACS. I was able to get HA working with one NFS storage, which is good. However, is there a way to incorporate multiple NFS storage pools and still have the HA functionality? I think something like GlusterFS or Ceph (like Ivan and Dag described) will work better. Thank you Simon, Ivan, and Dag for your assistance! James On Wed, Nov 1, 2017 at 10:10 AM, Simon Weller <swel...@ena.com.invalid> wrote: James, Try just configuring a single NFS server and see if your setup works. If you have 3 NFS shares, across all 3 hosts, i'm wondering whether ACS is picking the one you rebooted as the storage for your VMs and when that storage goes away (when you bounce the host), all storage for your VMs vanishes and ACS tries to reboot your other hosts. Normally in a simple ACS setup, you would have a separate storage server that can serve up N
Re: Problems with KVM HA & STONITH
Hi Victor, Host HA is working only with KVM + NFS. Ceph is not supported at this stage. Obviously RAW volumes are not supported on your pool, but I’m not sure if that’s because of Ceph or HA in general. Are you able to deploy a non-ha VM? Boris Stoyanov boris.stoya...@shapeblue.com www.shapeblue.com 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue > On 5 Apr 2018, at 4:19, victor <vic...@ihnetworks.com> wrote: > > Hello Rohit, > > Is the Host HA provider start working with Ceph. The reason I am asking is > because, I am not able to create a VM with Ceph storage in a kvm host with HA > enabled and I am getting the following error while creating VM. > > > .cloud.exception.StorageUnavailableException: Resource [StoragePool:2] is > unreachable: Unable to create > Vol[9|vm=6|DATADISK]:com.cloud.utils.exception.CloudRuntimeException: > org.libvirt.LibvirtException: unsupported configuration: only RAW volumes are > supported by this storage pool > > > Regards > Victor > > On 11/04/2017 09:53 PM, Rohit Yadav wrote: >> Hi James, (/cc Simon and others), >> >> >> A new feature exists in upcoming ACS 4.11, Host HA: >> >> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA >> >> You can read more about it here as well: >> http://www.shapeblue.com/host-ha-for-kvm-hosts-in-cloudstack/ >> >> This feature can use a custom HA provider, with default HA provider >> implemented for KVM and NFS, and uses ipmi based fencing (STONITH) of the >> host. The current HA mechanism provides no such method of fencing (powering >> off) a host and it depends under what circumstances the VM HA is failing >> (environment issues, ACS version etc). >> >> As Simon mentioned, we have a (host) HA provider that works with Ceph in >> near future. >> >> Regards. >> >> ____ >> From: Simon Weller <swel...@ena.com.INVALID> >> Sent: Thursday, November 2, 2017 7:27:22 PM >> To: users@cloudstack.apache.org >> Subject: Re: Problems with KVM HA & STONITH >> >> James, >> >> >> Ceph is a great solution and we run all of our ACS storage on Ceph. Note >> that it adds another layer of complexity to your installation, so you're >> going need to develop some expertise with that platform to get comfortable >> with how it works. Typically you don't want to mix Ceph with your ACS hosts. >> We in fact deploy 3 separate Ceph Monitors, and then scale OSDs as required >> on a per cluster basis in order to add additional resiliency (So every KVM >> ACS cluster has it's own Ceph "POD"). We also use Ceph for S3 storage (on >> completely separate Ceph clusters) for some other services. >> >> >> NFS is much simpler to maintain for smaller installations in my opinion. If >> the IO load you're looking at isn't going to be insanely high, you could >> look at building a 2 node NFS cluster using pacemaker and DRDB for data >> replication between nodes. That would reduce your storage requirement to 2 >> fairly low power servers (NFS is not very cpu intensive). Currently on a >> host failure when using a storage other than NFS on KVM, you will not see HA >> occur until you take the failed host out of the ACS cluster. This is a >> historical limitation because ACS could not confirm the host had been fenced >> correctly, so to avoid potential data corruption (due to 2 hosts mounting >> the same storage), it doesn't do anything until the operator intervenes. As >> of ACS 4.10, IPMI based fencing is now supported on NFS and we're planning >> on developing similar support for Ceph. >> >> >> Since you're an school district, I'm more than happy to jump on the phone >> with you to talk you through these options if you'd like. >> >> >> - Si >> >> >> >> From: McClune, James <mcclu...@norwalktruckers.net> >> Sent: Thursday, November 2, 2017 8:28 AM >> To: users@cloudstack.apache.org >> Subject: Re: Problems with KVM HA & STONITH >> >> Hi Simon, >> >> Thanks for getting back to me. I created one single NFS share and added it >> as primary storage. I think I better understand how the storage works, with >> ACS. >> >> I was able to get HA working with one NFS storage, which is good. However, >> is there a way to incorporate multiple NFS storage pools and still have the >> HA functionality? I think something like GlusterFS or Ceph (like Ivan and >> Dag described
Re: Problems with KVM HA & STONITH
Hello Rohit, Is the Host HA provider start working with Ceph. The reason I am asking is because, I am not able to create a VM with Ceph storage in a kvm host with HA enabled and I am getting the following error while creating VM. .cloud.exception.StorageUnavailableException: Resource [StoragePool:2] is unreachable: Unable to create Vol[9|vm=6|DATADISK]:com.cloud.utils.exception.CloudRuntimeException: org.libvirt.LibvirtException: unsupported configuration: only RAW volumes are supported by this storage pool Regards Victor On 11/04/2017 09:53 PM, Rohit Yadav wrote: Hi James, (/cc Simon and others), A new feature exists in upcoming ACS 4.11, Host HA: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA You can read more about it here as well: http://www.shapeblue.com/host-ha-for-kvm-hosts-in-cloudstack/ This feature can use a custom HA provider, with default HA provider implemented for KVM and NFS, and uses ipmi based fencing (STONITH) of the host. The current HA mechanism provides no such method of fencing (powering off) a host and it depends under what circumstances the VM HA is failing (environment issues, ACS version etc). As Simon mentioned, we have a (host) HA provider that works with Ceph in near future. Regards. From: Simon Weller <swel...@ena.com.INVALID> Sent: Thursday, November 2, 2017 7:27:22 PM To: users@cloudstack.apache.org Subject: Re: Problems with KVM HA & STONITH James, Ceph is a great solution and we run all of our ACS storage on Ceph. Note that it adds another layer of complexity to your installation, so you're going need to develop some expertise with that platform to get comfortable with how it works. Typically you don't want to mix Ceph with your ACS hosts. We in fact deploy 3 separate Ceph Monitors, and then scale OSDs as required on a per cluster basis in order to add additional resiliency (So every KVM ACS cluster has it's own Ceph "POD"). We also use Ceph for S3 storage (on completely separate Ceph clusters) for some other services. NFS is much simpler to maintain for smaller installations in my opinion. If the IO load you're looking at isn't going to be insanely high, you could look at building a 2 node NFS cluster using pacemaker and DRDB for data replication between nodes. That would reduce your storage requirement to 2 fairly low power servers (NFS is not very cpu intensive). Currently on a host failure when using a storage other than NFS on KVM, you will not see HA occur until you take the failed host out of the ACS cluster. This is a historical limitation because ACS could not confirm the host had been fenced correctly, so to avoid potential data corruption (due to 2 hosts mounting the same storage), it doesn't do anything until the operator intervenes. As of ACS 4.10, IPMI based fencing is now supported on NFS and we're planning on developing similar support for Ceph. Since you're an school district, I'm more than happy to jump on the phone with you to talk you through these options if you'd like. - Si From: McClune, James <mcclu...@norwalktruckers.net> Sent: Thursday, November 2, 2017 8:28 AM To: users@cloudstack.apache.org Subject: Re: Problems with KVM HA & STONITH Hi Simon, Thanks for getting back to me. I created one single NFS share and added it as primary storage. I think I better understand how the storage works, with ACS. I was able to get HA working with one NFS storage, which is good. However, is there a way to incorporate multiple NFS storage pools and still have the HA functionality? I think something like GlusterFS or Ceph (like Ivan and Dag described) will work better. Thank you Simon, Ivan, and Dag for your assistance! James On Wed, Nov 1, 2017 at 10:10 AM, Simon Weller <swel...@ena.com.invalid> wrote: James, Try just configuring a single NFS server and see if your setup works. If you have 3 NFS shares, across all 3 hosts, i'm wondering whether ACS is picking the one you rebooted as the storage for your VMs and when that storage goes away (when you bounce the host), all storage for your VMs vanishes and ACS tries to reboot your other hosts. Normally in a simple ACS setup, you would have a separate storage server that can serve up NFS to all hosts. If a host dies, then a VM would be brought up on a spare hosts since all hosts have access to the same storage. Your other option is to use local storage, but that won't provide HA. - Si From: McClune, James <mcclu...@norwalktruckers.net> Sent: Monday, October 30, 2017 2:26 PM To: users@cloudstack.apache.org Subject: Re: Problems with KVM HA & STONITH Hi Dag, Thank you for responding back. I am currently running ACS 4.9 on an Ubuntu 14.04 VM. I have the three nodes, each having about 1TB of primary storage (NFS) and 1TB of secondary storage (NFS). I added each
Re: KVM HA configuration
Hi, Thanks for the pointer. I managed to create instance after lower the speed offer. And after that, I tried to shutdown one host. It takes a while, but the HA do works. -- Regards, William On 20-Nov-17 11:00:46, Ivan Kudryavtsev <kudryavtsev...@bw-sw.com> wrote: Hi. Host: 4 doesn't have cpu capability (cpu:12, speed:1895) to support requested CPU: 2 and requested speed: 2000 It seems you have to decrease cpu speed constraint for the offering. 20 нояб. 2017 г. 10:50 ДП пользователь "William Alianto" написал: > Hi Ivan, > > Thanks for the pointer. I can now see the HA option on the offering. I > created a new offer with HA option, and try to create a new instance using > the offer. Unfortunately the action failed. Here is the error log from the > action : > > https://pastebin.com/9DKXHdW3 > > Seems like the ACS cannot find suitable host for the HA, although I have > setup hypervisor hosts in the infrastructure. Did I do any wrong step in > the configuration? > > -- > Regards, > > William > > > On 17-Nov-17 22:40:43, Ivan Kudryavtsev wrote: > Hi, when you create a service offering (not an instance) it allows > specifying HA or not. > > 17 нояб. 2017 г. 10:36 ПП пользователь "William Alianto" > написал: > > > Hi Dag, > > > > I can’t find any option for HA when I try to create new instances. How do > > I know if the HA option available or not? > > > > -- > > Regards, > > > > William > > > > > On 17 Nov 2017, at 17.03, Dag Sonstebo > > wrote: > > > > > > Hi William, > > > > > > HA follows the compute offering of your VMs, it’s not attached to the > > host as such. So if the VM uses a HA offering then ACS will monitor it > and > > bring it back online if offline. > > > > > > Regards, > > > Dag Sonstebo > > > Cloud Architect > > > ShapeBlue > > > > > > On 17/11/2017, 10:00, "William Alianto" wrote: > > > > > > Hi, > > > > > > I'm still learning more about ACS and I would like to know if there > > is any configuration needed to have KVM HA enabled on ACS 4.9. I've been > > searching for documentation but still haven't find any clear picture on > how > > to do that. Can anyone please give me some guide how to enable it? I > > already have 2 KVM hosts added in the cluster. > > > > > > -- > > > Regards, > > > > > > William > > > > > > > > > > > > > > > > > > dag.sonst...@shapeblue.com > > > www.shapeblue.com > > > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > > > @shapeblue > > > > > > > > > > > >
Re: KVM HA configuration
Hi. Host: 4 doesn't have cpu capability (cpu:12, speed:1895) to support requested CPU: 2 and requested speed: 2000 It seems you have to decrease cpu speed constraint for the offering. 20 нояб. 2017 г. 10:50 ДП пользователь "William Alianto" <will...@xofap.com> написал: > Hi Ivan, > > Thanks for the pointer. I can now see the HA option on the offering. I > created a new offer with HA option, and try to create a new instance using > the offer. Unfortunately the action failed. Here is the error log from the > action : > > https://pastebin.com/9DKXHdW3 > > Seems like the ACS cannot find suitable host for the HA, although I have > setup hypervisor hosts in the infrastructure. Did I do any wrong step in > the configuration? > > -- > Regards, > > William > > > On 17-Nov-17 22:40:43, Ivan Kudryavtsev <kudryavtsev...@bw-sw.com> wrote: > Hi, when you create a service offering (not an instance) it allows > specifying HA or not. > > 17 нояб. 2017 г. 10:36 ПП пользователь "William Alianto" > написал: > > > Hi Dag, > > > > I can’t find any option for HA when I try to create new instances. How do > > I know if the HA option available or not? > > > > -- > > Regards, > > > > William > > > > > On 17 Nov 2017, at 17.03, Dag Sonstebo > > wrote: > > > > > > Hi William, > > > > > > HA follows the compute offering of your VMs, it’s not attached to the > > host as such. So if the VM uses a HA offering then ACS will monitor it > and > > bring it back online if offline. > > > > > > Regards, > > > Dag Sonstebo > > > Cloud Architect > > > ShapeBlue > > > > > > On 17/11/2017, 10:00, "William Alianto" wrote: > > > > > > Hi, > > > > > > I'm still learning more about ACS and I would like to know if there > > is any configuration needed to have KVM HA enabled on ACS 4.9. I've been > > searching for documentation but still haven't find any clear picture on > how > > to do that. Can anyone please give me some guide how to enable it? I > > already have 2 KVM hosts added in the cluster. > > > > > > -- > > > Regards, > > > > > > William > > > > > > > > > > > > > > > > > > dag.sonst...@shapeblue.com > > > www.shapeblue.com > > > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > > > @shapeblue > > > > > > > > > > > >
Re: KVM HA configuration
Hi Ivan, Thanks for the pointer. I can now see the HA option on the offering. I created a new offer with HA option, and try to create a new instance using the offer. Unfortunately the action failed. Here is the error log from the action : https://pastebin.com/9DKXHdW3 Seems like the ACS cannot find suitable host for the HA, although I have setup hypervisor hosts in the infrastructure. Did I do any wrong step in the configuration? -- Regards, William On 17-Nov-17 22:40:43, Ivan Kudryavtsev <kudryavtsev...@bw-sw.com> wrote: Hi, when you create a service offering (not an instance) it allows specifying HA or not. 17 нояб. 2017 г. 10:36 ПП пользователь "William Alianto" написал: > Hi Dag, > > I can’t find any option for HA when I try to create new instances. How do > I know if the HA option available or not? > > -- > Regards, > > William > > > On 17 Nov 2017, at 17.03, Dag Sonstebo > wrote: > > > > Hi William, > > > > HA follows the compute offering of your VMs, it’s not attached to the > host as such. So if the VM uses a HA offering then ACS will monitor it and > bring it back online if offline. > > > > Regards, > > Dag Sonstebo > > Cloud Architect > > ShapeBlue > > > > On 17/11/2017, 10:00, "William Alianto" wrote: > > > > Hi, > > > > I'm still learning more about ACS and I would like to know if there > is any configuration needed to have KVM HA enabled on ACS 4.9. I've been > searching for documentation but still haven't find any clear picture on how > to do that. Can anyone please give me some guide how to enable it? I > already have 2 KVM hosts added in the cluster. > > > > -- > > Regards, > > > > William > > > > > > > > > > > > dag.sonst...@shapeblue.com > > www.shapeblue.com > > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > > @shapeblue > > > > > > >
Re: KVM HA configuration
Hi, when you create a service offering (not an instance) it allows specifying HA or not. 17 нояб. 2017 г. 10:36 ПП пользователь "William Alianto" <will...@xofap.com> написал: > Hi Dag, > > I can’t find any option for HA when I try to create new instances. How do > I know if the HA option available or not? > > -- > Regards, > > William > > > On 17 Nov 2017, at 17.03, Dag Sonstebo <dag.sonst...@shapeblue.com> > wrote: > > > > Hi William, > > > > HA follows the compute offering of your VMs, it’s not attached to the > host as such. So if the VM uses a HA offering then ACS will monitor it and > bring it back online if offline. > > > > Regards, > > Dag Sonstebo > > Cloud Architect > > ShapeBlue > > > > On 17/11/2017, 10:00, "William Alianto" <will...@xofap.com> wrote: > > > >Hi, > > > >I'm still learning more about ACS and I would like to know if there > is any configuration needed to have KVM HA enabled on ACS 4.9. I've been > searching for documentation but still haven't find any clear picture on how > to do that. Can anyone please give me some guide how to enable it? I > already have 2 KVM hosts added in the cluster. > > > >-- > >Regards, > > > >William > > > > > > > > > > > > dag.sonst...@shapeblue.com > > www.shapeblue.com > > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > > @shapeblue > > > > > > >
Re: KVM HA configuration
Hi Dag, I can’t find any option for HA when I try to create new instances. How do I know if the HA option available or not? -- Regards, William > On 17 Nov 2017, at 17.03, Dag Sonstebo <dag.sonst...@shapeblue.com> wrote: > > Hi William, > > HA follows the compute offering of your VMs, it’s not attached to the host as > such. So if the VM uses a HA offering then ACS will monitor it and bring it > back online if offline. > > Regards, > Dag Sonstebo > Cloud Architect > ShapeBlue > > On 17/11/2017, 10:00, "William Alianto" <will...@xofap.com> wrote: > >Hi, > >I'm still learning more about ACS and I would like to know if there is any > configuration needed to have KVM HA enabled on ACS 4.9. I've been searching > for documentation but still haven't find any clear picture on how to do that. > Can anyone please give me some guide how to enable it? I already have 2 KVM > hosts added in the cluster. > >-- >Regards, > >William > > > > > > dag.sonst...@shapeblue.com > www.shapeblue.com > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > @shapeblue > > >
Re: KVM HA configuration
Hi William, HA follows the compute offering of your VMs, it’s not attached to the host as such. So if the VM uses a HA offering then ACS will monitor it and bring it back online if offline. Regards, Dag Sonstebo Cloud Architect ShapeBlue On 17/11/2017, 10:00, "William Alianto" <will...@xofap.com> wrote: Hi, I'm still learning more about ACS and I would like to know if there is any configuration needed to have KVM HA enabled on ACS 4.9. I've been searching for documentation but still haven't find any clear picture on how to do that. Can anyone please give me some guide how to enable it? I already have 2 KVM hosts added in the cluster. -- Regards, William dag.sonst...@shapeblue.com www.shapeblue.com 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue
KVM HA configuration
Hi, I'm still learning more about ACS and I would like to know if there is any configuration needed to have KVM HA enabled on ACS 4.9. I've been searching for documentation but still haven't find any clear picture on how to do that. Can anyone please give me some guide how to enable it? I already have 2 KVM hosts added in the cluster. -- Regards, William
RE: Problems with KVM HA & STONITH
Yep, very exciting! Simon Weller/615-312-6068 -Original Message- From: Rohit Yadav [rohit.ya...@shapeblue.com] Received: Saturday, 04 Nov 2017, 11:23AM To: users@cloudstack.apache.org [users@cloudstack.apache.org] Subject: Re: Problems with KVM HA & STONITH Hi James, (/cc Simon and others), A new feature exists in upcoming ACS 4.11, Host HA: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA You can read more about it here as well: http://www.shapeblue.com/host-ha-for-kvm-hosts-in-cloudstack/ This feature can use a custom HA provider, with default HA provider implemented for KVM and NFS, and uses ipmi based fencing (STONITH) of the host. The current HA mechanism provides no such method of fencing (powering off) a host and it depends under what circumstances the VM HA is failing (environment issues, ACS version etc). As Simon mentioned, we have a (host) HA provider that works with Ceph in near future. Regards. From: Simon Weller <swel...@ena.com.INVALID> Sent: Thursday, November 2, 2017 7:27:22 PM To: users@cloudstack.apache.org Subject: Re: Problems with KVM HA & STONITH James, Ceph is a great solution and we run all of our ACS storage on Ceph. Note that it adds another layer of complexity to your installation, so you're going need to develop some expertise with that platform to get comfortable with how it works. Typically you don't want to mix Ceph with your ACS hosts. We in fact deploy 3 separate Ceph Monitors, and then scale OSDs as required on a per cluster basis in order to add additional resiliency (So every KVM ACS cluster has it's own Ceph "POD"). We also use Ceph for S3 storage (on completely separate Ceph clusters) for some other services. NFS is much simpler to maintain for smaller installations in my opinion. If the IO load you're looking at isn't going to be insanely high, you could look at building a 2 node NFS cluster using pacemaker and DRDB for data replication between nodes. That would reduce your storage requirement to 2 fairly low power servers (NFS is not very cpu intensive). Currently on a host failure when using a storage other than NFS on KVM, you will not see HA occur until you take the failed host out of the ACS cluster. This is a historical limitation because ACS could not confirm the host had been fenced correctly, so to avoid potential data corruption (due to 2 hosts mounting the same storage), it doesn't do anything until the operator intervenes. As of ACS 4.10, IPMI based fencing is now supported on NFS and we're planning on developing similar support for Ceph. Since you're an school district, I'm more than happy to jump on the phone with you to talk you through these options if you'd like. - Si From: McClune, James <mcclu...@norwalktruckers.net> Sent: Thursday, November 2, 2017 8:28 AM To: users@cloudstack.apache.org Subject: Re: Problems with KVM HA & STONITH Hi Simon, Thanks for getting back to me. I created one single NFS share and added it as primary storage. I think I better understand how the storage works, with ACS. I was able to get HA working with one NFS storage, which is good. However, is there a way to incorporate multiple NFS storage pools and still have the HA functionality? I think something like GlusterFS or Ceph (like Ivan and Dag described) will work better. Thank you Simon, Ivan, and Dag for your assistance! James On Wed, Nov 1, 2017 at 10:10 AM, Simon Weller <swel...@ena.com.invalid> wrote: > James, > > > Try just configuring a single NFS server and see if your setup works. If > you have 3 NFS shares, across all 3 hosts, i'm wondering whether ACS is > picking the one you rebooted as the storage for your VMs and when that > storage goes away (when you bounce the host), all storage for your VMs > vanishes and ACS tries to reboot your other hosts. > > > Normally in a simple ACS setup, you would have a separate storage server > that can serve up NFS to all hosts. If a host dies, then a VM would be > brought up on a spare hosts since all hosts have access to the same storage. > > Your other option is to use local storage, but that won't provide HA. > > > - Si > > > > From: McClune, James <mcclu...@norwalktruckers.net> > Sent: Monday, October 30, 2017 2:26 PM > To: users@cloudstack.apache.org > Subject: Re: Problems with KVM HA & STONITH > > Hi Dag, > > Thank you for responding back. I am currently running ACS 4.9 on an Ubuntu > 14.04 VM. I have the three nodes, each having about 1TB of primary storage > (NFS) and 1TB of secondary storage (NFS). I added each NFS share into ACS. > All nodes are in a cluster. > > Maybe I'm not understanding the setup or misconfigured something. I'm > trying to setup an HA environment where if one node go
Re: Problems with KVM HA & STONITH
Hi James, (/cc Simon and others), A new feature exists in upcoming ACS 4.11, Host HA: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA You can read more about it here as well: http://www.shapeblue.com/host-ha-for-kvm-hosts-in-cloudstack/ This feature can use a custom HA provider, with default HA provider implemented for KVM and NFS, and uses ipmi based fencing (STONITH) of the host. The current HA mechanism provides no such method of fencing (powering off) a host and it depends under what circumstances the VM HA is failing (environment issues, ACS version etc). As Simon mentioned, we have a (host) HA provider that works with Ceph in near future. Regards. From: Simon Weller <swel...@ena.com.INVALID> Sent: Thursday, November 2, 2017 7:27:22 PM To: users@cloudstack.apache.org Subject: Re: Problems with KVM HA & STONITH James, Ceph is a great solution and we run all of our ACS storage on Ceph. Note that it adds another layer of complexity to your installation, so you're going need to develop some expertise with that platform to get comfortable with how it works. Typically you don't want to mix Ceph with your ACS hosts. We in fact deploy 3 separate Ceph Monitors, and then scale OSDs as required on a per cluster basis in order to add additional resiliency (So every KVM ACS cluster has it's own Ceph "POD"). We also use Ceph for S3 storage (on completely separate Ceph clusters) for some other services. NFS is much simpler to maintain for smaller installations in my opinion. If the IO load you're looking at isn't going to be insanely high, you could look at building a 2 node NFS cluster using pacemaker and DRDB for data replication between nodes. That would reduce your storage requirement to 2 fairly low power servers (NFS is not very cpu intensive). Currently on a host failure when using a storage other than NFS on KVM, you will not see HA occur until you take the failed host out of the ACS cluster. This is a historical limitation because ACS could not confirm the host had been fenced correctly, so to avoid potential data corruption (due to 2 hosts mounting the same storage), it doesn't do anything until the operator intervenes. As of ACS 4.10, IPMI based fencing is now supported on NFS and we're planning on developing similar support for Ceph. Since you're an school district, I'm more than happy to jump on the phone with you to talk you through these options if you'd like. - Si From: McClune, James <mcclu...@norwalktruckers.net> Sent: Thursday, November 2, 2017 8:28 AM To: users@cloudstack.apache.org Subject: Re: Problems with KVM HA & STONITH Hi Simon, Thanks for getting back to me. I created one single NFS share and added it as primary storage. I think I better understand how the storage works, with ACS. I was able to get HA working with one NFS storage, which is good. However, is there a way to incorporate multiple NFS storage pools and still have the HA functionality? I think something like GlusterFS or Ceph (like Ivan and Dag described) will work better. Thank you Simon, Ivan, and Dag for your assistance! James On Wed, Nov 1, 2017 at 10:10 AM, Simon Weller <swel...@ena.com.invalid> wrote: > James, > > > Try just configuring a single NFS server and see if your setup works. If > you have 3 NFS shares, across all 3 hosts, i'm wondering whether ACS is > picking the one you rebooted as the storage for your VMs and when that > storage goes away (when you bounce the host), all storage for your VMs > vanishes and ACS tries to reboot your other hosts. > > > Normally in a simple ACS setup, you would have a separate storage server > that can serve up NFS to all hosts. If a host dies, then a VM would be > brought up on a spare hosts since all hosts have access to the same storage. > > Your other option is to use local storage, but that won't provide HA. > > > - Si > > > > From: McClune, James <mcclu...@norwalktruckers.net> > Sent: Monday, October 30, 2017 2:26 PM > To: users@cloudstack.apache.org > Subject: Re: Problems with KVM HA & STONITH > > Hi Dag, > > Thank you for responding back. I am currently running ACS 4.9 on an Ubuntu > 14.04 VM. I have the three nodes, each having about 1TB of primary storage > (NFS) and 1TB of secondary storage (NFS). I added each NFS share into ACS. > All nodes are in a cluster. > > Maybe I'm not understanding the setup or misconfigured something. I'm > trying to setup an HA environment where if one node goes down, running an > HA marked VM, the VM will start on another host. When I simulate a network > disconnect or reboot of a host, all of the nodes go down (STONITH?). > > I am unsure on how to setup an HA environment, if all the nodes in the > cluster go dow
Re: Problems with KVM HA & STONITH
James, Ceph is a great solution and we run all of our ACS storage on Ceph. Note that it adds another layer of complexity to your installation, so you're going need to develop some expertise with that platform to get comfortable with how it works. Typically you don't want to mix Ceph with your ACS hosts. We in fact deploy 3 separate Ceph Monitors, and then scale OSDs as required on a per cluster basis in order to add additional resiliency (So every KVM ACS cluster has it's own Ceph "POD"). We also use Ceph for S3 storage (on completely separate Ceph clusters) for some other services. NFS is much simpler to maintain for smaller installations in my opinion. If the IO load you're looking at isn't going to be insanely high, you could look at building a 2 node NFS cluster using pacemaker and DRDB for data replication between nodes. That would reduce your storage requirement to 2 fairly low power servers (NFS is not very cpu intensive). Currently on a host failure when using a storage other than NFS on KVM, you will not see HA occur until you take the failed host out of the ACS cluster. This is a historical limitation because ACS could not confirm the host had been fenced correctly, so to avoid potential data corruption (due to 2 hosts mounting the same storage), it doesn't do anything until the operator intervenes. As of ACS 4.10, IPMI based fencing is now supported on NFS and we're planning on developing similar support for Ceph. Since you're an school district, I'm more than happy to jump on the phone with you to talk you through these options if you'd like. - Si From: McClune, James <mcclu...@norwalktruckers.net> Sent: Thursday, November 2, 2017 8:28 AM To: users@cloudstack.apache.org Subject: Re: Problems with KVM HA & STONITH Hi Simon, Thanks for getting back to me. I created one single NFS share and added it as primary storage. I think I better understand how the storage works, with ACS. I was able to get HA working with one NFS storage, which is good. However, is there a way to incorporate multiple NFS storage pools and still have the HA functionality? I think something like GlusterFS or Ceph (like Ivan and Dag described) will work better. Thank you Simon, Ivan, and Dag for your assistance! James On Wed, Nov 1, 2017 at 10:10 AM, Simon Weller <swel...@ena.com.invalid> wrote: > James, > > > Try just configuring a single NFS server and see if your setup works. If > you have 3 NFS shares, across all 3 hosts, i'm wondering whether ACS is > picking the one you rebooted as the storage for your VMs and when that > storage goes away (when you bounce the host), all storage for your VMs > vanishes and ACS tries to reboot your other hosts. > > > Normally in a simple ACS setup, you would have a separate storage server > that can serve up NFS to all hosts. If a host dies, then a VM would be > brought up on a spare hosts since all hosts have access to the same storage. > > Your other option is to use local storage, but that won't provide HA. > > > - Si > > > > From: McClune, James <mcclu...@norwalktruckers.net> > Sent: Monday, October 30, 2017 2:26 PM > To: users@cloudstack.apache.org > Subject: Re: Problems with KVM HA & STONITH > > Hi Dag, > > Thank you for responding back. I am currently running ACS 4.9 on an Ubuntu > 14.04 VM. I have the three nodes, each having about 1TB of primary storage > (NFS) and 1TB of secondary storage (NFS). I added each NFS share into ACS. > All nodes are in a cluster. > > Maybe I'm not understanding the setup or misconfigured something. I'm > trying to setup an HA environment where if one node goes down, running an > HA marked VM, the VM will start on another host. When I simulate a network > disconnect or reboot of a host, all of the nodes go down (STONITH?). > > I am unsure on how to setup an HA environment, if all the nodes in the > cluster go down. Any help is much appreciated! > > Thanks, > James > > On Mon, Oct 30, 2017 at 3:49 AM, Dag Sonstebo <dag.sonst...@shapeblue.com> > wrote: > > > Hi James, > > > > I think you possibly have over-configured your KVM hosts. If you use NFS > > (and no clustered file system like CLVM) then there should be no need to > > configure STONITH. CloudStack takes care of your HA, so this is not > > something you offload to the KVM host. > > > > (As mentioned the only time I have played with STONITH and CloudStack was > > for CLVM – and I eventually found it not fit for purpose, too unstable > and > > causing too many issues like you describe. Note this was for block > storage > > though – not NFS). > > > > Regards, > > Dag Sonstebo > > Cloud Architect > > ShapeBlue > &
Re: Problems with KVM HA & STONITH
Hi Simon, Thanks for getting back to me. I created one single NFS share and added it as primary storage. I think I better understand how the storage works, with ACS. I was able to get HA working with one NFS storage, which is good. However, is there a way to incorporate multiple NFS storage pools and still have the HA functionality? I think something like GlusterFS or Ceph (like Ivan and Dag described) will work better. Thank you Simon, Ivan, and Dag for your assistance! James On Wed, Nov 1, 2017 at 10:10 AM, Simon Weller <swel...@ena.com.invalid> wrote: > James, > > > Try just configuring a single NFS server and see if your setup works. If > you have 3 NFS shares, across all 3 hosts, i'm wondering whether ACS is > picking the one you rebooted as the storage for your VMs and when that > storage goes away (when you bounce the host), all storage for your VMs > vanishes and ACS tries to reboot your other hosts. > > > Normally in a simple ACS setup, you would have a separate storage server > that can serve up NFS to all hosts. If a host dies, then a VM would be > brought up on a spare hosts since all hosts have access to the same storage. > > Your other option is to use local storage, but that won't provide HA. > > > - Si > > > > From: McClune, James <mcclu...@norwalktruckers.net> > Sent: Monday, October 30, 2017 2:26 PM > To: users@cloudstack.apache.org > Subject: Re: Problems with KVM HA & STONITH > > Hi Dag, > > Thank you for responding back. I am currently running ACS 4.9 on an Ubuntu > 14.04 VM. I have the three nodes, each having about 1TB of primary storage > (NFS) and 1TB of secondary storage (NFS). I added each NFS share into ACS. > All nodes are in a cluster. > > Maybe I'm not understanding the setup or misconfigured something. I'm > trying to setup an HA environment where if one node goes down, running an > HA marked VM, the VM will start on another host. When I simulate a network > disconnect or reboot of a host, all of the nodes go down (STONITH?). > > I am unsure on how to setup an HA environment, if all the nodes in the > cluster go down. Any help is much appreciated! > > Thanks, > James > > On Mon, Oct 30, 2017 at 3:49 AM, Dag Sonstebo <dag.sonst...@shapeblue.com> > wrote: > > > Hi James, > > > > I think you possibly have over-configured your KVM hosts. If you use NFS > > (and no clustered file system like CLVM) then there should be no need to > > configure STONITH. CloudStack takes care of your HA, so this is not > > something you offload to the KVM host. > > > > (As mentioned the only time I have played with STONITH and CloudStack was > > for CLVM – and I eventually found it not fit for purpose, too unstable > and > > causing too many issues like you describe. Note this was for block > storage > > though – not NFS). > > > > Regards, > > Dag Sonstebo > > Cloud Architect > > ShapeBlue > > > > On 28/10/2017, 03:40, "Ivan Kudryavtsev" <kudryavtsev...@bw-sw.com> > wrote: > > > > Hi. If the node losts nfs host it reboots (acs agent behaviour). If > you > > really have 3 storages, you'll go clusterwide reboot everytime your > > host is > > down. > > > > 28 окт. 2017 г. 3:02 пользователь "Simon Weller" > > <swel...@ena.com.invalid> > > написал: > > > > > Hi James, > > > > > > > > > Can you elaborate a bit further on the storage? You say you're > > running NFS > > > on all 3 nodes, can you explain how it is setup? > > > > > > Also, what version of ACS are you running? > > > > > > > > > - Si > > > > > > > > > > > > > > > > > > From: McClune, James <mcclu...@norwalktruckers.net> > > > Sent: Friday, October 27, 2017 2:21 PM > > > To: users@cloudstack.apache.org > > > Subject: Problems with KVM HA & STONITH > > > > > > Hello Apache CloudStack Community, > > > > > > My setup consists of the following: > > > > > > - Three nodes (NODE1, NODE2, and NODE3) > > > NODE1 is running Ubuntu 16.04.3, NODE2 is running Ubuntu 16.04.3, > > and NODE3 > > > is running Ubuntu 14.04.5. > > > - Management Server (running on separate VM, not in cluster) > > > > > > The three nodes use KVM as the hypervisor. I also configured > primary > > and > >
Re: Problems with KVM HA & STONITH
Also you can run ceph if you need HA. I met setup description which uses compute nodes for ceph cluster nodes simultaneously. 1 нояб. 2017 г. 21:11 пользователь "Simon Weller" <swel...@ena.com.invalid> написал: > James, > > > Try just configuring a single NFS server and see if your setup works. If > you have 3 NFS shares, across all 3 hosts, i'm wondering whether ACS is > picking the one you rebooted as the storage for your VMs and when that > storage goes away (when you bounce the host), all storage for your VMs > vanishes and ACS tries to reboot your other hosts. > > > Normally in a simple ACS setup, you would have a separate storage server > that can serve up NFS to all hosts. If a host dies, then a VM would be > brought up on a spare hosts since all hosts have access to the same storage. > > Your other option is to use local storage, but that won't provide HA. > > > - Si > > > > From: McClune, James <mcclu...@norwalktruckers.net> > Sent: Monday, October 30, 2017 2:26 PM > To: users@cloudstack.apache.org > Subject: Re: Problems with KVM HA & STONITH > > Hi Dag, > > Thank you for responding back. I am currently running ACS 4.9 on an Ubuntu > 14.04 VM. I have the three nodes, each having about 1TB of primary storage > (NFS) and 1TB of secondary storage (NFS). I added each NFS share into ACS. > All nodes are in a cluster. > > Maybe I'm not understanding the setup or misconfigured something. I'm > trying to setup an HA environment where if one node goes down, running an > HA marked VM, the VM will start on another host. When I simulate a network > disconnect or reboot of a host, all of the nodes go down (STONITH?). > > I am unsure on how to setup an HA environment, if all the nodes in the > cluster go down. Any help is much appreciated! > > Thanks, > James > > On Mon, Oct 30, 2017 at 3:49 AM, Dag Sonstebo <dag.sonst...@shapeblue.com> > wrote: > > > Hi James, > > > > I think you possibly have over-configured your KVM hosts. If you use NFS > > (and no clustered file system like CLVM) then there should be no need to > > configure STONITH. CloudStack takes care of your HA, so this is not > > something you offload to the KVM host. > > > > (As mentioned the only time I have played with STONITH and CloudStack was > > for CLVM – and I eventually found it not fit for purpose, too unstable > and > > causing too many issues like you describe. Note this was for block > storage > > though – not NFS). > > > > Regards, > > Dag Sonstebo > > Cloud Architect > > ShapeBlue > > > > On 28/10/2017, 03:40, "Ivan Kudryavtsev" <kudryavtsev...@bw-sw.com> > wrote: > > > > Hi. If the node losts nfs host it reboots (acs agent behaviour). If > you > > really have 3 storages, you'll go clusterwide reboot everytime your > > host is > > down. > > > > 28 окт. 2017 г. 3:02 пользователь "Simon Weller" > > <swel...@ena.com.invalid> > > написал: > > > > > Hi James, > > > > > > > > > Can you elaborate a bit further on the storage? You say you're > > running NFS > > > on all 3 nodes, can you explain how it is setup? > > > > > > Also, what version of ACS are you running? > > > > > > > > > - Si > > > > > > > > > > > > > > > > > > From: McClune, James <mcclu...@norwalktruckers.net> > > > Sent: Friday, October 27, 2017 2:21 PM > > > To: users@cloudstack.apache.org > > > Subject: Problems with KVM HA & STONITH > > > > > > Hello Apache CloudStack Community, > > > > > > My setup consists of the following: > > > > > > - Three nodes (NODE1, NODE2, and NODE3) > > > NODE1 is running Ubuntu 16.04.3, NODE2 is running Ubuntu 16.04.3, > > and NODE3 > > > is running Ubuntu 14.04.5. > > > - Management Server (running on separate VM, not in cluster) > > > > > > The three nodes use KVM as the hypervisor. I also configured > primary > > and > > > secondary storage on all three of the nodes. I'm using NFS for the > > primary > > > & secondary storage. VM operations work great. Live migration works > > great. > > > > > > However, when a host goes down, the HA functionality does not work > > at all. > > >
Re: Problems with KVM HA & STONITH
James, Try just configuring a single NFS server and see if your setup works. If you have 3 NFS shares, across all 3 hosts, i'm wondering whether ACS is picking the one you rebooted as the storage for your VMs and when that storage goes away (when you bounce the host), all storage for your VMs vanishes and ACS tries to reboot your other hosts. Normally in a simple ACS setup, you would have a separate storage server that can serve up NFS to all hosts. If a host dies, then a VM would be brought up on a spare hosts since all hosts have access to the same storage. Your other option is to use local storage, but that won't provide HA. - Si From: McClune, James <mcclu...@norwalktruckers.net> Sent: Monday, October 30, 2017 2:26 PM To: users@cloudstack.apache.org Subject: Re: Problems with KVM HA & STONITH Hi Dag, Thank you for responding back. I am currently running ACS 4.9 on an Ubuntu 14.04 VM. I have the three nodes, each having about 1TB of primary storage (NFS) and 1TB of secondary storage (NFS). I added each NFS share into ACS. All nodes are in a cluster. Maybe I'm not understanding the setup or misconfigured something. I'm trying to setup an HA environment where if one node goes down, running an HA marked VM, the VM will start on another host. When I simulate a network disconnect or reboot of a host, all of the nodes go down (STONITH?). I am unsure on how to setup an HA environment, if all the nodes in the cluster go down. Any help is much appreciated! Thanks, James On Mon, Oct 30, 2017 at 3:49 AM, Dag Sonstebo <dag.sonst...@shapeblue.com> wrote: > Hi James, > > I think you possibly have over-configured your KVM hosts. If you use NFS > (and no clustered file system like CLVM) then there should be no need to > configure STONITH. CloudStack takes care of your HA, so this is not > something you offload to the KVM host. > > (As mentioned the only time I have played with STONITH and CloudStack was > for CLVM – and I eventually found it not fit for purpose, too unstable and > causing too many issues like you describe. Note this was for block storage > though – not NFS). > > Regards, > Dag Sonstebo > Cloud Architect > ShapeBlue > > On 28/10/2017, 03:40, "Ivan Kudryavtsev" <kudryavtsev...@bw-sw.com> wrote: > > Hi. If the node losts nfs host it reboots (acs agent behaviour). If you > really have 3 storages, you'll go clusterwide reboot everytime your > host is > down. > > 28 окт. 2017 г. 3:02 пользователь "Simon Weller" > <swel...@ena.com.invalid> > написал: > > > Hi James, > > > > > > Can you elaborate a bit further on the storage? You say you're > running NFS > > on all 3 nodes, can you explain how it is setup? > > > > Also, what version of ACS are you running? > > > > > > - Si > > > > > > > > > > > > From: McClune, James <mcclu...@norwalktruckers.net> > > Sent: Friday, October 27, 2017 2:21 PM > > To: users@cloudstack.apache.org > > Subject: Problems with KVM HA & STONITH > > > > Hello Apache CloudStack Community, > > > > My setup consists of the following: > > > > - Three nodes (NODE1, NODE2, and NODE3) > > NODE1 is running Ubuntu 16.04.3, NODE2 is running Ubuntu 16.04.3, > and NODE3 > > is running Ubuntu 14.04.5. > > - Management Server (running on separate VM, not in cluster) > > > > The three nodes use KVM as the hypervisor. I also configured primary > and > > secondary storage on all three of the nodes. I'm using NFS for the > primary > > & secondary storage. VM operations work great. Live migration works > great. > > > > However, when a host goes down, the HA functionality does not work > at all. > > Instead of spinning up the VM on another available host, the down > host > > seems to trigger STONITH. When STONITH happens, all hosts in the > cluster go > > down. This not only causes no HA, but also downs perfectly good > VM's. I > > have read countless articles and documentation related to this > issue. I > > still cannot find a viable solution for this issue. I really want to > use > > Apache CloudStack, but cannot implement this in production when > STONITH > > happens. > > > > I think I have something misconfigured. I thought I would reach out > to the > > CloudStack community and ask for some friendly assistance. > > > > If there is anything (sy
Re: Problems with KVM HA & STONITH
Hi Dag, Thank you for responding back. I am currently running ACS 4.9 on an Ubuntu 14.04 VM. I have the three nodes, each having about 1TB of primary storage (NFS) and 1TB of secondary storage (NFS). I added each NFS share into ACS. All nodes are in a cluster. Maybe I'm not understanding the setup or misconfigured something. I'm trying to setup an HA environment where if one node goes down, running an HA marked VM, the VM will start on another host. When I simulate a network disconnect or reboot of a host, all of the nodes go down (STONITH?). I am unsure on how to setup an HA environment, if all the nodes in the cluster go down. Any help is much appreciated! Thanks, James On Mon, Oct 30, 2017 at 3:49 AM, Dag Sonstebo <dag.sonst...@shapeblue.com> wrote: > Hi James, > > I think you possibly have over-configured your KVM hosts. If you use NFS > (and no clustered file system like CLVM) then there should be no need to > configure STONITH. CloudStack takes care of your HA, so this is not > something you offload to the KVM host. > > (As mentioned the only time I have played with STONITH and CloudStack was > for CLVM – and I eventually found it not fit for purpose, too unstable and > causing too many issues like you describe. Note this was for block storage > though – not NFS). > > Regards, > Dag Sonstebo > Cloud Architect > ShapeBlue > > On 28/10/2017, 03:40, "Ivan Kudryavtsev" <kudryavtsev...@bw-sw.com> wrote: > > Hi. If the node losts nfs host it reboots (acs agent behaviour). If you > really have 3 storages, you'll go clusterwide reboot everytime your > host is > down. > > 28 окт. 2017 г. 3:02 пользователь "Simon Weller" > <swel...@ena.com.invalid> > написал: > > > Hi James, > > > > > > Can you elaborate a bit further on the storage? You say you're > running NFS > > on all 3 nodes, can you explain how it is setup? > > > > Also, what version of ACS are you running? > > > > > > - Si > > > > > > > > > > ____ > > From: McClune, James <mcclu...@norwalktruckers.net> > > Sent: Friday, October 27, 2017 2:21 PM > > To: users@cloudstack.apache.org > > Subject: Problems with KVM HA & STONITH > > > > Hello Apache CloudStack Community, > > > > My setup consists of the following: > > > > - Three nodes (NODE1, NODE2, and NODE3) > > NODE1 is running Ubuntu 16.04.3, NODE2 is running Ubuntu 16.04.3, > and NODE3 > > is running Ubuntu 14.04.5. > > - Management Server (running on separate VM, not in cluster) > > > > The three nodes use KVM as the hypervisor. I also configured primary > and > > secondary storage on all three of the nodes. I'm using NFS for the > primary > > & secondary storage. VM operations work great. Live migration works > great. > > > > However, when a host goes down, the HA functionality does not work > at all. > > Instead of spinning up the VM on another available host, the down > host > > seems to trigger STONITH. When STONITH happens, all hosts in the > cluster go > > down. This not only causes no HA, but also downs perfectly good > VM's. I > > have read countless articles and documentation related to this > issue. I > > still cannot find a viable solution for this issue. I really want to > use > > Apache CloudStack, but cannot implement this in production when > STONITH > > happens. > > > > I think I have something misconfigured. I thought I would reach out > to the > > CloudStack community and ask for some friendly assistance. > > > > If there is anything (system-wise) you request in order to further > > troubleshoot this issue, please let me know and I'll send. I > appreciate any > > help in this issue! > > > > -- > > > > Thanks, > > > > James > > > > > > dag.sonst...@shapeblue.com > www.shapeblue.com > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > @shapeblue > > > > -- James McClune Technical Support Specialist Norwalk City Schools Phone: 419-660-6590 mcclu...@norwalktruckers.net
Re: Problems with KVM HA & STONITH
Hi Simon, Thank you for responding back. I am currently running ACS 4.9 on an Ubuntu 14.04 VM. I have the three nodes, each having about 1TB of primary storage (NFS) and 1TB of secondary storage (NFS). I added each NFS share into ACS. All nodes are in a cluster. Maybe I'm not understanding the setup or misconfigured something. I'm trying to setup an HA environment where if one node goes down, running an HA marked VM, the VM will start on another host. When I simulate a network disconnect or reboot of a host, all of the nodes go down. If you request more information, please let me know. Again, any help is greatly appreciated! Thanks, James On Fri, Oct 27, 2017 at 4:02 PM, Simon Weller <swel...@ena.com.invalid> wrote: > Hi James, > > > Can you elaborate a bit further on the storage? You say you're running NFS > on all 3 nodes, can you explain how it is setup? > > Also, what version of ACS are you running? > > > - Si > > > > > > From: McClune, James <mcclu...@norwalktruckers.net> > Sent: Friday, October 27, 2017 2:21 PM > To: users@cloudstack.apache.org > Subject: Problems with KVM HA & STONITH > > Hello Apache CloudStack Community, > > My setup consists of the following: > > - Three nodes (NODE1, NODE2, and NODE3) > NODE1 is running Ubuntu 16.04.3, NODE2 is running Ubuntu 16.04.3, and NODE3 > is running Ubuntu 14.04.5. > - Management Server (running on separate VM, not in cluster) > > The three nodes use KVM as the hypervisor. I also configured primary and > secondary storage on all three of the nodes. I'm using NFS for the primary > & secondary storage. VM operations work great. Live migration works great. > > However, when a host goes down, the HA functionality does not work at all. > Instead of spinning up the VM on another available host, the down host > seems to trigger STONITH. When STONITH happens, all hosts in the cluster go > down. This not only causes no HA, but also downs perfectly good VM's. I > have read countless articles and documentation related to this issue. I > still cannot find a viable solution for this issue. I really want to use > Apache CloudStack, but cannot implement this in production when STONITH > happens. > > I think I have something misconfigured. I thought I would reach out to the > CloudStack community and ask for some friendly assistance. > > If there is anything (system-wise) you request in order to further > troubleshoot this issue, please let me know and I'll send. I appreciate any > help in this issue! > > -- > > Thanks, > > James > -- James McClune Technical Support Specialist Norwalk City Schools Phone: 419-660-6590 mcclu...@norwalktruckers.net
Re: Problems with KVM HA & STONITH
Hi James, I think you possibly have over-configured your KVM hosts. If you use NFS (and no clustered file system like CLVM) then there should be no need to configure STONITH. CloudStack takes care of your HA, so this is not something you offload to the KVM host. (As mentioned the only time I have played with STONITH and CloudStack was for CLVM – and I eventually found it not fit for purpose, too unstable and causing too many issues like you describe. Note this was for block storage though – not NFS). Regards, Dag Sonstebo Cloud Architect ShapeBlue On 28/10/2017, 03:40, "Ivan Kudryavtsev" <kudryavtsev...@bw-sw.com> wrote: Hi. If the node losts nfs host it reboots (acs agent behaviour). If you really have 3 storages, you'll go clusterwide reboot everytime your host is down. 28 окт. 2017 г. 3:02 пользователь "Simon Weller" <swel...@ena.com.invalid> написал: > Hi James, > > > Can you elaborate a bit further on the storage? You say you're running NFS > on all 3 nodes, can you explain how it is setup? > > Also, what version of ACS are you running? > > > - Si > > > > > > From: McClune, James <mcclu...@norwalktruckers.net> > Sent: Friday, October 27, 2017 2:21 PM > To: users@cloudstack.apache.org > Subject: Problems with KVM HA & STONITH > > Hello Apache CloudStack Community, > > My setup consists of the following: > > - Three nodes (NODE1, NODE2, and NODE3) > NODE1 is running Ubuntu 16.04.3, NODE2 is running Ubuntu 16.04.3, and NODE3 > is running Ubuntu 14.04.5. > - Management Server (running on separate VM, not in cluster) > > The three nodes use KVM as the hypervisor. I also configured primary and > secondary storage on all three of the nodes. I'm using NFS for the primary > & secondary storage. VM operations work great. Live migration works great. > > However, when a host goes down, the HA functionality does not work at all. > Instead of spinning up the VM on another available host, the down host > seems to trigger STONITH. When STONITH happens, all hosts in the cluster go > down. This not only causes no HA, but also downs perfectly good VM's. I > have read countless articles and documentation related to this issue. I > still cannot find a viable solution for this issue. I really want to use > Apache CloudStack, but cannot implement this in production when STONITH > happens. > > I think I have something misconfigured. I thought I would reach out to the > CloudStack community and ask for some friendly assistance. > > If there is anything (system-wise) you request in order to further > troubleshoot this issue, please let me know and I'll send. I appreciate any > help in this issue! > > -- > > Thanks, > > James > dag.sonst...@shapeblue.com www.shapeblue.com 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue
Re: Problems with KVM HA & STONITH
Hi. If the node losts nfs host it reboots (acs agent behaviour). If you really have 3 storages, you'll go clusterwide reboot everytime your host is down. 28 окт. 2017 г. 3:02 пользователь "Simon Weller" <swel...@ena.com.invalid> написал: > Hi James, > > > Can you elaborate a bit further on the storage? You say you're running NFS > on all 3 nodes, can you explain how it is setup? > > Also, what version of ACS are you running? > > > - Si > > > > > > From: McClune, James <mcclu...@norwalktruckers.net> > Sent: Friday, October 27, 2017 2:21 PM > To: users@cloudstack.apache.org > Subject: Problems with KVM HA & STONITH > > Hello Apache CloudStack Community, > > My setup consists of the following: > > - Three nodes (NODE1, NODE2, and NODE3) > NODE1 is running Ubuntu 16.04.3, NODE2 is running Ubuntu 16.04.3, and NODE3 > is running Ubuntu 14.04.5. > - Management Server (running on separate VM, not in cluster) > > The three nodes use KVM as the hypervisor. I also configured primary and > secondary storage on all three of the nodes. I'm using NFS for the primary > & secondary storage. VM operations work great. Live migration works great. > > However, when a host goes down, the HA functionality does not work at all. > Instead of spinning up the VM on another available host, the down host > seems to trigger STONITH. When STONITH happens, all hosts in the cluster go > down. This not only causes no HA, but also downs perfectly good VM's. I > have read countless articles and documentation related to this issue. I > still cannot find a viable solution for this issue. I really want to use > Apache CloudStack, but cannot implement this in production when STONITH > happens. > > I think I have something misconfigured. I thought I would reach out to the > CloudStack community and ask for some friendly assistance. > > If there is anything (system-wise) you request in order to further > troubleshoot this issue, please let me know and I'll send. I appreciate any > help in this issue! > > -- > > Thanks, > > James >
Re: Problems with KVM HA & STONITH
Hi James, Can you elaborate a bit further on the storage? You say you're running NFS on all 3 nodes, can you explain how it is setup? Also, what version of ACS are you running? - Si From: McClune, James <mcclu...@norwalktruckers.net> Sent: Friday, October 27, 2017 2:21 PM To: users@cloudstack.apache.org Subject: Problems with KVM HA & STONITH Hello Apache CloudStack Community, My setup consists of the following: - Three nodes (NODE1, NODE2, and NODE3) NODE1 is running Ubuntu 16.04.3, NODE2 is running Ubuntu 16.04.3, and NODE3 is running Ubuntu 14.04.5. - Management Server (running on separate VM, not in cluster) The three nodes use KVM as the hypervisor. I also configured primary and secondary storage on all three of the nodes. I'm using NFS for the primary & secondary storage. VM operations work great. Live migration works great. However, when a host goes down, the HA functionality does not work at all. Instead of spinning up the VM on another available host, the down host seems to trigger STONITH. When STONITH happens, all hosts in the cluster go down. This not only causes no HA, but also downs perfectly good VM's. I have read countless articles and documentation related to this issue. I still cannot find a viable solution for this issue. I really want to use Apache CloudStack, but cannot implement this in production when STONITH happens. I think I have something misconfigured. I thought I would reach out to the CloudStack community and ask for some friendly assistance. If there is anything (system-wise) you request in order to further troubleshoot this issue, please let me know and I'll send. I appreciate any help in this issue! -- Thanks, James
Problems with KVM HA & STONITH
Hello Apache CloudStack Community, My setup consists of the following: - Three nodes (NODE1, NODE2, and NODE3) NODE1 is running Ubuntu 16.04.3, NODE2 is running Ubuntu 16.04.3, and NODE3 is running Ubuntu 14.04.5. - Management Server (running on separate VM, not in cluster) The three nodes use KVM as the hypervisor. I also configured primary and secondary storage on all three of the nodes. I'm using NFS for the primary & secondary storage. VM operations work great. Live migration works great. However, when a host goes down, the HA functionality does not work at all. Instead of spinning up the VM on another available host, the down host seems to trigger STONITH. When STONITH happens, all hosts in the cluster go down. This not only causes no HA, but also downs perfectly good VM's. I have read countless articles and documentation related to this issue. I still cannot find a viable solution for this issue. I really want to use Apache CloudStack, but cannot implement this in production when STONITH happens. I think I have something misconfigured. I thought I would reach out to the CloudStack community and ask for some friendly assistance. If there is anything (system-wise) you request in order to further troubleshoot this issue, please let me know and I'll send. I appreciate any help in this issue! -- Thanks, James
Re: KVM+HA
Apology for fragmented messages, in existing framework cloudstack does not know for certain if your VMs are dead, or KVM hypervisor crashed, or its just a network blip, or perhaps you stopped kvm agent (or agent died). It takes a conservative approach and does not re-start the VMs on other hypervisors to avoid split brain scenario. The only time it will restart KVM hypervisor and move VMs over - is when you loose a primary storage access to one of the hypervisors in the cluster - using NFS heartbeat method i mentioned earlier. New framework addresses the limitations above by 1) checking if there is any disk activity on VMs that are in uncertain state - if no activity for ALL VMs for "x" number of seconds 2) cloudstack will issue IPMI fence command to power down/reboot a host (via ILO or DRAC or something else similar) 3) the VMs will be restarted elsewhere Regards ilya On Tue, Jul 18, 2017 at 6:10 AM, ilya musayev <ilya.mailing.li...@gmail.com> wrote: > What share primary storage backend do you have for your VMs? > > If it is NFS - cloudstack agent writes heartbeat. When issue occurs - the > neighbor hosts will check if the hypervisor that failed - still writes to > heartbeat file. There are bunch of corner case where cloudstack HA does not > kick in - due to uncertainty. > > The new framework should address those uncertainties. > > KVM HA with IPMI Fencing - Apache Cloudstack - Apache Software ... > <https://www.google.com/url?sa=t=j==s=web=1=rja=8=0ahUKEwi59uv58pLVAhXHslQKHSU_B5YQFgg2MAA=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FCLOUDSTACK%2FKVM%2BHA%2Bwith%2BIPMI%2BFencing=AFQjCNG_-JHCYhcZm0lM9M4gKM4vKQ3hew> > [CLOUDSTACK-8943] KVM HA is broken, let's fix it - ASF JIRA > <https://www.google.com/url?sa=t=j==s=web=2=rja=8=0ahUKEwi59uv58pLVAhXHslQKHSU_B5YQFgg9MAE=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCLOUDSTACK-8943=AFQjCNGkOyC0hR4otCJ1LZF4j-2HSayMyQ> > > Regards > ilya > > On Tue, Jul 18, 2017 at 6:06 AM, ilya musayev < > ilya.mailing.li...@gmail.com> wrote: > >> Hi Victor >> >> We recently rewrote KVM HA framework. Its being merged into latest build. >> >> >> On Tue, Jul 18, 2017 at 5:39 AM, victor <vic...@ihnetworks.com> wrote: >> >>> Hello Guys, >>> >>> I am facing the same issue that mentioned in the following url . >>> >>> - >>> >>> https://issues.apache.org/jira/browse/CLOUDSTACK-3535 >>> >>> - >>> >>> When the host is put in maintenance mode , then ha enabled VM's are >>> automatically migrated to available host. But when the kvm host is down, no >>> HA is done. The vm's are still down until I put the host node back up. >>> >>> >>> I have tried everything like the following. >>> >>> = >>> >>> 1, system VM's and client vm's are created in shared storage >>> >>> 3, Added ha.tag host tags >>> >>> 2, Created host by adding ha tag >>> >>> 3, Created VE's in Ha enabled host with ha enabled service offering >>> >>> >>> >>> Do you guys have successfully tested Ha. I am really stuck at this part. >>> >>> Regards >>> >>> >>> >>> >> >
Re: KVM+HA
What share primary storage backend do you have for your VMs? If it is NFS - cloudstack agent writes heartbeat. When issue occurs - the neighbor hosts will check if the hypervisor that failed - still writes to heartbeat file. There are bunch of corner case where cloudstack HA does not kick in - due to uncertainty. The new framework should address those uncertainties. KVM HA with IPMI Fencing - Apache Cloudstack - Apache Software ... <https://www.google.com/url?sa=t=j==s=web=1=rja=8=0ahUKEwi59uv58pLVAhXHslQKHSU_B5YQFgg2MAA=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FCLOUDSTACK%2FKVM%2BHA%2Bwith%2BIPMI%2BFencing=AFQjCNG_-JHCYhcZm0lM9M4gKM4vKQ3hew> [CLOUDSTACK-8943] KVM HA is broken, let's fix it - ASF JIRA <https://www.google.com/url?sa=t=j==s=web=2=rja=8=0ahUKEwi59uv58pLVAhXHslQKHSU_B5YQFgg9MAE=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCLOUDSTACK-8943=AFQjCNGkOyC0hR4otCJ1LZF4j-2HSayMyQ> Regards ilya On Tue, Jul 18, 2017 at 6:06 AM, ilya musayev <ilya.mailing.li...@gmail.com> wrote: > Hi Victor > > We recently rewrote KVM HA framework. Its being merged into latest build. > > > On Tue, Jul 18, 2017 at 5:39 AM, victor <vic...@ihnetworks.com> wrote: > >> Hello Guys, >> >> I am facing the same issue that mentioned in the following url . >> >> - >> >> https://issues.apache.org/jira/browse/CLOUDSTACK-3535 >> >> - >> >> When the host is put in maintenance mode , then ha enabled VM's are >> automatically migrated to available host. But when the kvm host is down, no >> HA is done. The vm's are still down until I put the host node back up. >> >> >> I have tried everything like the following. >> >> = >> >> 1, system VM's and client vm's are created in shared storage >> >> 3, Added ha.tag host tags >> >> 2, Created host by adding ha tag >> >> 3, Created VE's in Ha enabled host with ha enabled service offering >> >> >> >> Do you guys have successfully tested Ha. I am really stuck at this part. >> >> Regards >> >> >> >> >
Re: KVM+HA
Hi Victor We recently rewrote KVM HA framework. Its being merged into latest build. On Tue, Jul 18, 2017 at 5:39 AM, victor <vic...@ihnetworks.com> wrote: > Hello Guys, > > I am facing the same issue that mentioned in the following url . > > - > > https://issues.apache.org/jira/browse/CLOUDSTACK-3535 > > - > > When the host is put in maintenance mode , then ha enabled VM's are > automatically migrated to available host. But when the kvm host is down, no > HA is done. The vm's are still down until I put the host node back up. > > > I have tried everything like the following. > > = > > 1, system VM's and client vm's are created in shared storage > > 3, Added ha.tag host tags > > 2, Created host by adding ha tag > > 3, Created VE's in Ha enabled host with ha enabled service offering > > > > Do you guys have successfully tested Ha. I am really stuck at this part. > > Regards > > > >
KVM+HA
Hello Guys, I am facing the same issue that mentioned in the following url . - https://issues.apache.org/jira/browse/CLOUDSTACK-3535 - When the host is put in maintenance mode , then ha enabled VM's are automatically migrated to available host. But when the kvm host is down, no HA is done. The vm's are still down until I put the host node back up. I have tried everything like the following. = 1, system VM's and client vm's are created in shared storage 3, Added ha.tag host tags 2, Created host by adding ha tag 3, Created VE's in Ha enabled host with ha enabled service offering Do you guys have successfully tested Ha. I am really stuck at this part. Regards
Re: KVM HA is broken, let's fix it
Hi; This IPMI fencing is the technology where most of cloud providers like OVirt use, so its good. How could we test this IPMI Fencing feature, where could i find its scripts and its usage/test documents? I have some test hardwares and i really like to try it. Thanks Özhan On Sat, Oct 17, 2015 at 2:44 AM, ilya <ilya.mailing.li...@gmail.com> wrote: > Please see another thread on DEV that proposes the fix for KVM HA -> > [DISCUSS] KVM HA with IPMI Fencing > > > > > We propose the following solution that in our understanding should cover > all use cases and provide a fencing mechanism. > > NOTE: Proposed IPMI fencing, is just a script. If you are using HP > hardware with ILO, it could be an ILO executable with specific > parameters. In theory - this can be *any* script not just IPMI. > > Please take few minutes to read this through, to avoid duplicate efforts... > > > Proposed FS below: > ---- > > > https://cwiki.apache.org/confluence/display/CLOUDSTACK/KVM+HA+with+IPMI+Fencing > > > On 10/12/15 12:54 AM, Frank Louwers wrote: > > > >> On 10 Oct 2015, at 12:35, Remi Bergsma <rberg...@schubergphilis.com> > wrote: > >> > >> Can you please explain what the issue is with KVM HA? In my tests, HA > starts all VMs just fine without the hypervisor coming back. At least that > is on current 4.6. Assuming a cluster of multiple nodes of course. It will > then do a neighbor check from another host in the same cluster. > >> > >> Also, malfunctioning NFS leads to corruption and therefore we fence a > box when the shared storage is unreliable. Combining primary and secondary > NFS is not a good idea for production in my opinion. > > > > Well, it depends how you look at it, and what your situation is. > > > > If you use 1 NFS export als primary storage (and only NFS), then yes, > the system works as one would expect, and doesn’t need to be fixed. > > > > However, HA is “not functioning” in any of these scenario’s: > > > > - you don’t use NFS as your only primary storage > > - you use more than one NFS primary storage > > > > Even worse: imagine you only use local storage as primary storage, but > have 1 NFS configured (as the UI “wizard” forces you to configure one). You > don’t have any active VM configured on the primary storage. You then > perform maintenance on the NFS storage, and take it offline… > > > > All your hosts will then reboot, resulting in major downtime, that’s > completely unnecessary. There’s not even an option to disable this at this > point… We’ve removed the reboot instructions from the HA script on all our > instances… > > > > Regards, > > > > Frank > > >
Re: KVM HA is broken, let's fix it
Please see another thread on DEV that proposes the fix for KVM HA -> [DISCUSS] KVM HA with IPMI Fencing We propose the following solution that in our understanding should cover all use cases and provide a fencing mechanism. NOTE: Proposed IPMI fencing, is just a script. If you are using HP hardware with ILO, it could be an ILO executable with specific parameters. In theory - this can be *any* script not just IPMI. Please take few minutes to read this through, to avoid duplicate efforts... Proposed FS below: https://cwiki.apache.org/confluence/display/CLOUDSTACK/KVM+HA+with+IPMI+Fencing On 10/12/15 12:54 AM, Frank Louwers wrote: > >> On 10 Oct 2015, at 12:35, Remi Bergsma <rberg...@schubergphilis.com> wrote: >> >> Can you please explain what the issue is with KVM HA? In my tests, HA starts >> all VMs just fine without the hypervisor coming back. At least that is on >> current 4.6. Assuming a cluster of multiple nodes of course. It will then do >> a neighbor check from another host in the same cluster. >> >> Also, malfunctioning NFS leads to corruption and therefore we fence a box >> when the shared storage is unreliable. Combining primary and secondary NFS >> is not a good idea for production in my opinion. > > Well, it depends how you look at it, and what your situation is. > > If you use 1 NFS export als primary storage (and only NFS), then yes, the > system works as one would expect, and doesn’t need to be fixed. > > However, HA is “not functioning” in any of these scenario’s: > > - you don’t use NFS as your only primary storage > - you use more than one NFS primary storage > > Even worse: imagine you only use local storage as primary storage, but have 1 > NFS configured (as the UI “wizard” forces you to configure one). You don’t > have any active VM configured on the primary storage. You then perform > maintenance on the NFS storage, and take it offline… > > All your hosts will then reboot, resulting in major downtime, that’s > completely unnecessary. There’s not even an option to disable this at this > point… We’ve removed the reboot instructions from the HA script on all our > instances… > > Regards, > > Frank >
Re: KVM HA is broken, let's fix it
> On 10 Oct 2015, at 12:35, Remi Bergsma <rberg...@schubergphilis.com> wrote: > > Can you please explain what the issue is with KVM HA? In my tests, HA starts > all VMs just fine without the hypervisor coming back. At least that is on > current 4.6. Assuming a cluster of multiple nodes of course. It will then do > a neighbor check from another host in the same cluster. > > Also, malfunctioning NFS leads to corruption and therefore we fence a box > when the shared storage is unreliable. Combining primary and secondary NFS is > not a good idea for production in my opinion. Well, it depends how you look at it, and what your situation is. If you use 1 NFS export als primary storage (and only NFS), then yes, the system works as one would expect, and doesn’t need to be fixed. However, HA is “not functioning” in any of these scenario’s: - you don’t use NFS as your only primary storage - you use more than one NFS primary storage Even worse: imagine you only use local storage as primary storage, but have 1 NFS configured (as the UI “wizard” forces you to configure one). You don’t have any active VM configured on the primary storage. You then perform maintenance on the NFS storage, and take it offline… All your hosts will then reboot, resulting in major downtime, that’s completely unnecessary. There’s not even an option to disable this at this point… We’ve removed the reboot instructions from the HA script on all our instances… Regards, Frank
Re: KVM HA is broken, let's fix it
Hi Remi, So we started here with Andrei (v4.5) complaining a slow NFS causes a mass reboot: http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201510.mbox/%3C18886119.904.1444382474932.JavaMail.andrei%40tuchka%3E My claim that the VM is not started until the HV is back is not based on personal testing alas, but on Marcus' statement below as well as Simon Weller's reply in the "slow nfs = reboot all hosts" thread above: http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201508.mbox/%3CCALFpzo5CotX0Qz%2Bd_OXEZJGYTau%2BfA%2Bmzxg_yQEUzswi_9gz5w%40mail.gmail.com%3E If what you say is true about the HV not having to come back then this is great; we need to double check this is actually the case. We could then try to tweak the settings in the heartbeat script to be more forgiving re timeouts and/or to add additional logic such as checking if other nodes or the mgmt server is online (therefore the HV has network) before rebooting. Any further thoughts are welcome. I'll try to setup HA on my test deployment and check. Lucian -- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro - Original Message - > From: "Remi Bergsma" <rberg...@schubergphilis.com> > To: d...@cloudstack.apache.org > Cc: "Cloudstack Users List" <users@cloudstack.apache.org> > Sent: Saturday, 10 October, 2015 11:35:36 > Subject: Re: KVM HA is broken, let's fix it > Hi Lucian, > > Can you please explain what the issue is with KVM HA? In my tests, HA starts > all > VMs just fine without the hypervisor coming back. At least that is on current > 4.6. Assuming a cluster of multiple nodes of course. It will then do a > neighbor > check from another host in the same cluster. > > Also, malfunctioning NFS leads to corruption and therefore we fence a box when > the shared storage is unreliable. Combining primary and secondary NFS is not a > good idea for production in my opinion. > > I'm happy to help and if you have a scenario I can replay I will try that in > my > lab. > > Regards, Remi > > Sent from my iPhone > >> On 10 Oct 2015, at 00:19, Nux! <n...@li.nux.ro> wrote: >> >> Hello, >> >> Following a recent thread on the users ml where slow NFS caused a mass >> reboot, I >> have opened the following issue about improving HA on KVM. >> https://issues.apache.org/jira/browse/CLOUDSTACK-8943 >> >> I know there are many people around here who use KVM and are interested in a >> more robust way of doing HA. >> >> Please share your ideas, comments, suggestions, let's see what we can come up >> with to make this better. >> >> Regards, >> Lucian >> >> -- >> Sent from the Delta quadrant using Borg technology! >> >> Nux! > > www.nux.ro
Re: KVM HA is broken, let's fix it
Hi Lucian, Can you please explain what the issue is with KVM HA? In my tests, HA starts all VMs just fine without the hypervisor coming back. At least that is on current 4.6. Assuming a cluster of multiple nodes of course. It will then do a neighbor check from another host in the same cluster. Also, malfunctioning NFS leads to corruption and therefore we fence a box when the shared storage is unreliable. Combining primary and secondary NFS is not a good idea for production in my opinion. I'm happy to help and if you have a scenario I can replay I will try that in my lab. Regards, Remi Sent from my iPhone > On 10 Oct 2015, at 00:19, Nux! <n...@li.nux.ro> wrote: > > Hello, > > Following a recent thread on the users ml where slow NFS caused a mass > reboot, I have opened the following issue about improving HA on KVM. > https://issues.apache.org/jira/browse/CLOUDSTACK-8943 > > I know there are many people around here who use KVM and are interested in a > more robust way of doing HA. > > Please share your ideas, comments, suggestions, let's see what we can come up > with to make this better. > > Regards, > Lucian > > -- > Sent from the Delta quadrant using Borg technology! > > Nux! > www.nux.ro
KVM HA is broken, let's fix it
Hello, Following a recent thread on the users ml where slow NFS caused a mass reboot, I have opened the following issue about improving HA on KVM. https://issues.apache.org/jira/browse/CLOUDSTACK-8943 I know there are many people around here who use KVM and are interested in a more robust way of doing HA. Please share your ideas, comments, suggestions, let's see what we can come up with to make this better. Regards, Lucian -- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro
CS 4.2 kvm ha issues - NullPointerException
Hi all, I try to test HA on CS 4.2/KVM and I have java.lang.NullPointerException. ENV used: CS 4.2(rev 2963) management on centos 6.4 CS 4.2(rev 2963) agent on centos 6.4 (node1,2) separate nfs server as primary/secondary storage zone type: KVM scenario: I create several VMs with HA-enabled offering. Then I switch power on node2 via IPMI. expected behaviour: ha-enabled vms from node2 should start on node1 real behaviour: vms remain in stopped state, unassigned to any host, NullPointerException in mgmt log: 2013-09-24 11:21:25,500 ERROR [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-4) Terminating HAWork[4-HA-6-Running-Scheduled] java.lang.NullPointerException at com.cloud.storage.VolumeManagerImpl.canVmRestartOnAnotherServer(VolumeManagerImpl.java:2641) at com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:516) at com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:831) Full log is here: http://pastebin.com/upnEA601 Any thoughts ? -- Regards, Valery http://protocol.by/slayer
CS 4.1 + KVM + HA
您好, 请问基于kvm的HA好使吗? 我按照官方文档把HA功能设置好了, VM和另一台host(一台跑VM, 另一台做HA)都设置了HA. 测试1: 把VM关闭, CS成功自动在HA host上把VM重新启动了 测试2: 当我把跑VM的host直接关机时, CS不能在HA host上把MV重新启动, 甚至都不能成功判断host和VM的状态应该为disconnected. 请问是BUG吗? 致敬, 张