Re: CloudStack HA
Hi Martins, yes this is a typical example of host self fencing, where a host will reboot when losing connectivity to a primary storage pool. As you have already found this is controlled by the xenheartbeat.sh script. Dag Sonstebo Cloud Architect ShapeBlue On 30/03/2016, 10:04, "Mārtiņš Jakubovičs"wrote: >Looks like I found issue, it is /opt/cloud/bin/xenheartbeat.sh script >which is running in all hosts. > >On 2016.03.30. 11:14, Mārtiņš Jakubovičs wrote: >> Hello, >> >> This morning I faced unexpected problem, one of XenServer hosts >> rebooted. I checked logs and it looks like due network issue, but >> question is why host rebooted it self? CloudStack's XS Pool is not HA >> enabled. And as I know, in ACS 4.3.2 CloudStack did not manage Host's >> HA or am I wrong? >> >> Mar 30 07:00:33 cloudstack-1 heartbeat: Potential problem with >> /var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: >> >> not reachable since 65 seconds >> Mar 30 07:00:33 cloudstack-1 heartbeat: Problem with >> /var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: >> >> not reachable for 65 seconds, rebooting system! >> >> [root@cloudstack-1 ~]# xe pool-list params=all | grep ha- >> ha-enabled ( RO): false >> ha-configuration ( RO): >>ha-statefiles ( RO): >> ha-host-failures-to-tolerate ( RW): 0 >> ha-plan-exists-for ( RO): 0 >> ha-allow-overcommit ( RW): false >> ha-overcommitted ( RO): false >> >> So did ACS manage some kind of host's HA? >> >> XenServer 6.2 >> ACS 4.3.2 >> >> Best regards, >> Martins >> > Regards, Dag Sonstebo dag.sonst...@shapeblue.com www.shapeblue.com 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue
Re: CloudStack HA
Looks like I found issue, it is /opt/cloud/bin/xenheartbeat.sh script which is running in all hosts. On 2016.03.30. 11:14, Mārtiņš Jakubovičs wrote: Hello, This morning I faced unexpected problem, one of XenServer hosts rebooted. I checked logs and it looks like due network issue, but question is why host rebooted it self? CloudStack's XS Pool is not HA enabled. And as I know, in ACS 4.3.2 CloudStack did not manage Host's HA or am I wrong? Mar 30 07:00:33 cloudstack-1 heartbeat: Potential problem with /var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: not reachable since 65 seconds Mar 30 07:00:33 cloudstack-1 heartbeat: Problem with /var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: not reachable for 65 seconds, rebooting system! [root@cloudstack-1 ~]# xe pool-list params=all | grep ha- ha-enabled ( RO): false ha-configuration ( RO): ha-statefiles ( RO): ha-host-failures-to-tolerate ( RW): 0 ha-plan-exists-for ( RO): 0 ha-allow-overcommit ( RW): false ha-overcommitted ( RO): false So did ACS manage some kind of host's HA? XenServer 6.2 ACS 4.3.2 Best regards, Martins
AW: CloudStack HA
Hi Martins, you need to check XenServer logs. CS will not reboot any hypervisor. XenServer will also reboot in some situations where Dom0 has no resources (CPU, RAM) left. Which version of XS are you using? Mit freundlichen Grüßen / With kind regards, Swen -Ursprüngliche Nachricht- Von: Mārtiņš Jakubovičs [mailto:martins-li...@hostnet.lv] Gesendet: Mittwoch, 30. März 2016 10:14 An: users@cloudstack.apache.org Betreff: CloudStack HA Hello, This morning I faced unexpected problem, one of XenServer hosts rebooted. I checked logs and it looks like due network issue, but question is why host rebooted it self? CloudStack's XS Pool is not HA enabled. And as I know, in ACS 4.3.2 CloudStack did not manage Host's HA or am I wrong? Mar 30 07:00:33 cloudstack-1 heartbeat: Potential problem with /var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: not reachable since 65 seconds Mar 30 07:00:33 cloudstack-1 heartbeat: Problem with /var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: not reachable for 65 seconds, rebooting system! [root@cloudstack-1 ~]# xe pool-list params=all | grep ha- ha-enabled ( RO): false ha-configuration ( RO): ha-statefiles ( RO): ha-host-failures-to-tolerate ( RW): 0 ha-plan-exists-for ( RO): 0 ha-allow-overcommit ( RW): false ha-overcommitted ( RO): false So did ACS manage some kind of host's HA? XenServer 6.2 ACS 4.3.2 Best regards, Martins - proIO GmbH - Geschäftsführer: Swen Brüseke Sitz der Gesellschaft: Frankfurt am Main USt-IdNr. DE 267 075 918 Registergericht: Frankfurt am Main - HRB 86239 Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail sind nicht gestattet. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
CloudStack HA
Hello, This morning I faced unexpected problem, one of XenServer hosts rebooted. I checked logs and it looks like due network issue, but question is why host rebooted it self? CloudStack's XS Pool is not HA enabled. And as I know, in ACS 4.3.2 CloudStack did not manage Host's HA or am I wrong? Mar 30 07:00:33 cloudstack-1 heartbeat: Potential problem with /var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: not reachable since 65 seconds Mar 30 07:00:33 cloudstack-1 heartbeat: Problem with /var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: not reachable for 65 seconds, rebooting system! [root@cloudstack-1 ~]# xe pool-list params=all | grep ha- ha-enabled ( RO): false ha-configuration ( RO): ha-statefiles ( RO): ha-host-failures-to-tolerate ( RW): 0 ha-plan-exists-for ( RO): 0 ha-allow-overcommit ( RW): false ha-overcommitted ( RO): false So did ACS manage some kind of host's HA? XenServer 6.2 ACS 4.3.2 Best regards, Martins
CloudStack HA & One VM startup failure
Hi all. My environment: CloudStack 4.3.0 (CentOS6) XenServer 6.2 SP1 1. One host failure. 2. VM is started on another host by CloudStack HA. (VM001,VM002,VM004,VM005) 3. VM003 only startup failure. I want to know the cause of VM003 did not start by CloudStack HA. Resources of the host is enough. I could start VM003 by manual operation after several hours. Although I think the following log is concerned, this means what? 2015-10-11 20:39:45,960 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (HostReservationReleaseChecker:ctx-a058591d) Cannot release reservation, Found VM: VM[User|VM003] Stopped but reserved on host 11 Why ? Best regards. log (excerpt): 2015-10-11 20:28:55,770 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-11:ctx-bc69642d) Notifying HA Mgr of to restart vm 451-VM003 2015-10-11 20:28:55,776 INFO [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-11:ctx-bc69642d) Schedule vm for HA: VM[User|VM003] 2015-10-11 20:28:55,789 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-0:ctx-71125e79 work-139) HA on VM[User|VM003] 2015-10-11 20:28:55,806 DEBUG [c.c.h.CheckOnAgentInvestigator] (HA-Worker-0:ctx-71125e79 work-139) Unable to reach the agent for VM[User|VM003]: Resource [Host:11] is unreachable: Host 11: Host with specified id is not in the right state: Down 2015-10-11 20:28:55,806 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-0:ctx-71125e79 work-139) SimpleInvestigator found VM[User|VM003]to be alive? null 2015-10-11 20:28:58,135 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-0:ctx-71125e79 work-139) XenServerInvestigator found VM[User|VM003]to be alive? null 2015-10-11 20:28:58,135 DEBUG [c.c.h.UserVmDomRInvestigator] (HA-Worker-0:ctx-71125e79 work-139) testing if VM[User|VM003] is alive 2015-10-11 20:29:10,585 DEBUG [c.c.h.UserVmDomRInvestigator] (HA-Worker-0:ctx-71125e79 work-139) VM[User|VM003] could not be pinged, returning that it is unknown 2015-10-11 20:29:15,040 DEBUG [c.c.h.UserVmDomRInvestigator] (HA-Worker-0:ctx-71125e79 work-139) VM[User|VM003] could not be pinged, returning that it is unknown 2015-10-11 20:29:15,040 DEBUG [c.c.h.UserVmDomRInvestigator] (HA-Worker-0:ctx-71125e79 work-139) Returning null since we're unable to determine state of VM[User|VM003] 2015-10-11 20:29:15,040 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-0:ctx-71125e79 work-139) PingInvestigator found VM[User|VM003]to be alive? null 2015-10-11 20:29:15,040 DEBUG [c.c.h.ManagementIPSystemVMInvestigator] (HA-Worker-0:ctx-71125e79 work-139) Not a System Vm, unable to determine state of VM[User|VM003] returning null 2015-10-11 20:29:15,040 DEBUG [c.c.h.ManagementIPSystemVMInvestigator] (HA-Worker-0:ctx-71125e79 work-139) Testing if VM[User|VM003] is alive 2015-10-11 20:29:15,041 DEBUG [c.c.h.ManagementIPSystemVMInvestigator] (HA-Worker-0:ctx-71125e79 work-139) Unable to find a management nic, cannot ping this system VM, unable to determine state of VM[User|VM003] returning null 2015-10-11 20:29:15,041 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-0:ctx-71125e79 work-139) ManagementIPSysVMInvestigator found VM[User|VM003]to be alive? null 2015-10-11 20:29:15,041 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-0:ctx-71125e79 work-139) KVMInvestigator found VM[User|VM003]to be alive? null 2015-10-11 20:29:15,041 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-0:ctx-71125e79 work-139) HypervInvestigator found VM[User|VM003]to be alive? null 2015-10-11 20:29:15,041 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-0:ctx-71125e79 work-139) VMwareInvestigator found VM[User|VM003]to be alive? null 2015-10-11 20:29:17,190 WARN [c.c.v.VirtualMachineManagerImpl] (HA-Worker-0:ctx-71125e79 work-139) Unable to actually stop VM[User|VM003] but continue with release because it's a force stop 2015-10-11 20:29:17,194 DEBUG [c.c.v.VirtualMachineManagerImpl] (HA-Worker-0:ctx-71125e79 work-139) VM[User|VM003] is stopped on the host. Proceeding to release resource held. 2015-10-11 20:29:17,205 DEBUG [c.c.v.VirtualMachineManagerImpl] (HA-Worker-0:ctx-71125e79 work-139) Successfully released network resources for the vm VM[User|VM003] 2015-10-11 20:29:17,205 DEBUG [c.c.v.VirtualMachineManagerImpl] (HA-Worker-0:ctx-71125e79 work-139) Successfully released storage resources for the vm VM[User|VM003] 2015-10-11 20:29:17,276 DEBUG [c.c.v.VirtualMachineManagerImpl] (HA-Worker-0:ctx-71125e79 work-139) Successfully transitioned to start state for VM[User|VM003] reservation id = 74b7e3a7-80b4-4153-a6c8-993fd67c23b4 2015-10-11 20:29:17,280 DEBUG [c.c.v.VirtualMachineManagerImpl] (HA-Worker-0:ctx-71125e79 work-139) Trying to deploy VM, vm has dcId: 1 and podId: 1 2015-10-11 20:29:17,280 DEBUG [c.c.v.VirtualMachineManagerImpl] (HA-Worker-0:ctx-71125e79 work-139) Deploy avoids pods: null, clusters: null, hosts: null 2015-10-11 20:29:17,290 DEBUG [c.c.c.CapacityManagerImpl] (HA-Worker-0:ctx-71125e79 work-139) VM state transitted from :Starting to Stopped with event