Re: CloudStack HA

2016-03-30 Thread Dag Sonstebo
Hi Martins,

yes this is a typical example of host self fencing, where a host will reboot 
when losing connectivity to a primary storage pool. As you have already found 
this is controlled by the xenheartbeat.sh script.

Dag Sonstebo
Cloud Architect
ShapeBlue






On 30/03/2016, 10:04, "Mārtiņš Jakubovičs"  wrote:

>Looks like I found issue, it is /opt/cloud/bin/xenheartbeat.sh script 
>which is running in all hosts.
>
>On 2016.03.30. 11:14, Mārtiņš Jakubovičs wrote:
>> Hello,
>>
>> This morning I faced unexpected problem, one of XenServer hosts 
>> rebooted. I checked logs and it looks like due network issue, but 
>> question is why host rebooted it self? CloudStack's XS Pool is not HA 
>> enabled. And as I know, in ACS 4.3.2 CloudStack did not manage Host's 
>> HA or am I wrong?
>>
>> Mar 30 07:00:33 cloudstack-1 heartbeat: Potential problem with 
>> /var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68:
>>  
>> not reachable since 65 seconds
>> Mar 30 07:00:33 cloudstack-1 heartbeat: Problem with 
>> /var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68:
>>  
>> not reachable for 65 seconds, rebooting system!
>>
>> [root@cloudstack-1 ~]# xe pool-list params=all | grep ha-
>>   ha-enabled ( RO): false
>> ha-configuration ( RO):
>>ha-statefiles ( RO):
>> ha-host-failures-to-tolerate ( RW): 0
>>   ha-plan-exists-for ( RO): 0
>>  ha-allow-overcommit ( RW): false
>> ha-overcommitted ( RO): false
>>
>> So did ACS manage some kind of host's HA?
>>
>> XenServer 6.2
>> ACS 4.3.2
>>
>> Best regards,
>> Martins
>>
>

Regards,

Dag Sonstebo

dag.sonst...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue


Re: CloudStack HA

2016-03-30 Thread Mārtiņš Jakubovičs
Looks like I found issue, it is /opt/cloud/bin/xenheartbeat.sh script 
which is running in all hosts.


On 2016.03.30. 11:14, Mārtiņš Jakubovičs wrote:

Hello,

This morning I faced unexpected problem, one of XenServer hosts 
rebooted. I checked logs and it looks like due network issue, but 
question is why host rebooted it self? CloudStack's XS Pool is not HA 
enabled. And as I know, in ACS 4.3.2 CloudStack did not manage Host's 
HA or am I wrong?


Mar 30 07:00:33 cloudstack-1 heartbeat: Potential problem with 
/var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: 
not reachable since 65 seconds
Mar 30 07:00:33 cloudstack-1 heartbeat: Problem with 
/var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: 
not reachable for 65 seconds, rebooting system!


[root@cloudstack-1 ~]# xe pool-list params=all | grep ha-
  ha-enabled ( RO): false
ha-configuration ( RO):
   ha-statefiles ( RO):
ha-host-failures-to-tolerate ( RW): 0
  ha-plan-exists-for ( RO): 0
 ha-allow-overcommit ( RW): false
ha-overcommitted ( RO): false

So did ACS manage some kind of host's HA?

XenServer 6.2
ACS 4.3.2

Best regards,
Martins





AW: CloudStack HA

2016-03-30 Thread S . Brüseke - proIO GmbH
Hi Martins,

you need to check XenServer logs. CS will not reboot any hypervisor.
XenServer will also reboot in some situations where Dom0 has no resources (CPU, 
RAM) left. Which version of XS are you using?

Mit freundlichen Grüßen / With kind regards,

Swen


-Ursprüngliche Nachricht-
Von: Mārtiņš Jakubovičs [mailto:martins-li...@hostnet.lv] 
Gesendet: Mittwoch, 30. März 2016 10:14
An: users@cloudstack.apache.org
Betreff: CloudStack HA

Hello,

This morning I faced unexpected problem, one of XenServer hosts rebooted. I 
checked logs and it looks like due network issue, but question is why host 
rebooted it self? CloudStack's XS Pool is not HA enabled. And as I know, in ACS 
4.3.2 CloudStack did not manage Host's HA or am I wrong?

Mar 30 07:00:33 cloudstack-1 heartbeat: Potential problem with
/var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68:
 
not reachable since 65 seconds
Mar 30 07:00:33 cloudstack-1 heartbeat: Problem with
/var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68:
 
not reachable for 65 seconds, rebooting system!

[root@cloudstack-1 ~]# xe pool-list params=all | grep ha-
   ha-enabled ( RO): false
 ha-configuration ( RO):
ha-statefiles ( RO):
 ha-host-failures-to-tolerate ( RW): 0
   ha-plan-exists-for ( RO): 0
  ha-allow-overcommit ( RW): false
 ha-overcommitted ( RO): false

So did ACS manage some kind of host's HA?

XenServer 6.2
ACS 4.3.2

Best regards,
Martins



- proIO GmbH -
Geschäftsführer: Swen Brüseke
Sitz der Gesellschaft: Frankfurt am Main

USt-IdNr. DE 267 075 918
Registergericht: Frankfurt am Main - HRB 86239

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. 
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten 
haben, 
informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. 
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail sind nicht 
gestattet. 

This e-mail may contain confidential and/or privileged information. 
If you are not the intended recipient (or have received this e-mail in error) 
please notify 
the sender immediately and destroy this e-mail.  
Any unauthorized copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden. 




CloudStack HA

2016-03-30 Thread Mārtiņš Jakubovičs

Hello,

This morning I faced unexpected problem, one of XenServer hosts 
rebooted. I checked logs and it looks like due network issue, but 
question is why host rebooted it self? CloudStack's XS Pool is not HA 
enabled. And as I know, in ACS 4.3.2 CloudStack did not manage Host's HA 
or am I wrong?


Mar 30 07:00:33 cloudstack-1 heartbeat: Potential problem with 
/var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: 
not reachable since 65 seconds
Mar 30 07:00:33 cloudstack-1 heartbeat: Problem with 
/var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: 
not reachable for 65 seconds, rebooting system!


[root@cloudstack-1 ~]# xe pool-list params=all | grep ha-
  ha-enabled ( RO): false
ha-configuration ( RO):
   ha-statefiles ( RO):
ha-host-failures-to-tolerate ( RW): 0
  ha-plan-exists-for ( RO): 0
 ha-allow-overcommit ( RW): false
ha-overcommitted ( RO): false

So did ACS manage some kind of host's HA?

XenServer 6.2
ACS 4.3.2

Best regards,
Martins



CloudStack HA & One VM startup failure

2015-10-20 Thread giraffeg forestg
Hi all.

My environment:
 CloudStack 4.3.0 (CentOS6)
 XenServer 6.2 SP1


1. One host failure.
2. VM is started on another host by CloudStack HA. (VM001,VM002,VM004,VM005)
3. VM003 only startup failure.

I want to know the cause of VM003 did not start by CloudStack HA.

Resources of the host is enough.
I could start VM003 by manual operation after several hours.

Although I think the following log is concerned, this means what?

2015-10-11 20:39:45,960 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
(HostReservationReleaseChecker:ctx-a058591d) Cannot release reservation,
Found VM: VM[User|VM003] Stopped but reserved on host 11

Why ?

Best regards.


log (excerpt):

2015-10-11 20:28:55,770 DEBUG [c.c.h.HighAvailabilityManagerImpl]
(AgentTaskPool-11:ctx-bc69642d) Notifying HA Mgr of to restart vm 451-VM003
2015-10-11 20:28:55,776 INFO  [c.c.h.HighAvailabilityManagerImpl]
(AgentTaskPool-11:ctx-bc69642d) Schedule vm for HA:  VM[User|VM003]
2015-10-11 20:28:55,789 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-0:ctx-71125e79 work-139) HA on VM[User|VM003]
2015-10-11 20:28:55,806 DEBUG [c.c.h.CheckOnAgentInvestigator]
(HA-Worker-0:ctx-71125e79 work-139) Unable to reach the agent for
VM[User|VM003]: Resource [Host:11] is unreachable: Host 11: Host with
specified id is not in the right state: Down
2015-10-11 20:28:55,806 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-0:ctx-71125e79 work-139) SimpleInvestigator found
VM[User|VM003]to be alive? null
2015-10-11 20:28:58,135 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-0:ctx-71125e79 work-139) XenServerInvestigator found
VM[User|VM003]to be alive? null
2015-10-11 20:28:58,135 DEBUG [c.c.h.UserVmDomRInvestigator]
(HA-Worker-0:ctx-71125e79 work-139) testing if VM[User|VM003] is alive
2015-10-11 20:29:10,585 DEBUG [c.c.h.UserVmDomRInvestigator]
(HA-Worker-0:ctx-71125e79 work-139) VM[User|VM003] could not be pinged,
returning that it is unknown
2015-10-11 20:29:15,040 DEBUG [c.c.h.UserVmDomRInvestigator]
(HA-Worker-0:ctx-71125e79 work-139) VM[User|VM003] could not be pinged,
returning that it is unknown
2015-10-11 20:29:15,040 DEBUG [c.c.h.UserVmDomRInvestigator]
(HA-Worker-0:ctx-71125e79 work-139) Returning null since we're unable to
determine state of VM[User|VM003]
2015-10-11 20:29:15,040 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-0:ctx-71125e79 work-139) PingInvestigator found VM[User|VM003]to
be alive? null
2015-10-11 20:29:15,040 DEBUG [c.c.h.ManagementIPSystemVMInvestigator]
(HA-Worker-0:ctx-71125e79 work-139) Not a System Vm, unable to determine
state of VM[User|VM003] returning null
2015-10-11 20:29:15,040 DEBUG [c.c.h.ManagementIPSystemVMInvestigator]
(HA-Worker-0:ctx-71125e79 work-139) Testing if VM[User|VM003] is alive
2015-10-11 20:29:15,041 DEBUG [c.c.h.ManagementIPSystemVMInvestigator]
(HA-Worker-0:ctx-71125e79 work-139) Unable to find a management nic, cannot
ping this system VM, unable to determine state of VM[User|VM003] returning
null
2015-10-11 20:29:15,041 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-0:ctx-71125e79 work-139) ManagementIPSysVMInvestigator found
VM[User|VM003]to be alive? null
2015-10-11 20:29:15,041 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-0:ctx-71125e79 work-139) KVMInvestigator found VM[User|VM003]to
be alive? null
2015-10-11 20:29:15,041 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-0:ctx-71125e79 work-139) HypervInvestigator found
VM[User|VM003]to be alive? null
2015-10-11 20:29:15,041 INFO  [c.c.h.HighAvailabilityManagerImpl]
(HA-Worker-0:ctx-71125e79 work-139) VMwareInvestigator found
VM[User|VM003]to be alive? null
2015-10-11 20:29:17,190 WARN  [c.c.v.VirtualMachineManagerImpl]
(HA-Worker-0:ctx-71125e79 work-139) Unable to actually stop VM[User|VM003]
but continue with release because it's a force stop
2015-10-11 20:29:17,194 DEBUG [c.c.v.VirtualMachineManagerImpl]
(HA-Worker-0:ctx-71125e79 work-139) VM[User|VM003] is stopped on the host.
Proceeding to release resource held.
2015-10-11 20:29:17,205 DEBUG [c.c.v.VirtualMachineManagerImpl]
(HA-Worker-0:ctx-71125e79 work-139) Successfully released network resources
for the vm VM[User|VM003]
2015-10-11 20:29:17,205 DEBUG [c.c.v.VirtualMachineManagerImpl]
(HA-Worker-0:ctx-71125e79 work-139) Successfully released storage resources
for the vm VM[User|VM003]
2015-10-11 20:29:17,276 DEBUG [c.c.v.VirtualMachineManagerImpl]
(HA-Worker-0:ctx-71125e79 work-139) Successfully transitioned to start
state for VM[User|VM003] reservation id =
74b7e3a7-80b4-4153-a6c8-993fd67c23b4
2015-10-11 20:29:17,280 DEBUG [c.c.v.VirtualMachineManagerImpl]
(HA-Worker-0:ctx-71125e79 work-139) Trying to deploy VM, vm has dcId: 1 and
podId: 1
2015-10-11 20:29:17,280 DEBUG [c.c.v.VirtualMachineManagerImpl]
(HA-Worker-0:ctx-71125e79 work-139) Deploy avoids pods: null, clusters:
null, hosts: null
2015-10-11 20:29:17,290 DEBUG [c.c.c.CapacityManagerImpl]
(HA-Worker-0:ctx-71125e79 work-139) VM state transitted from :Starting to
Stopped with event