KVM HA fails continuously | Dell R440

2020-02-25 Thread Cloud Udupi
Hi,

I am facing the same issue mentioned here.
Re: KVM Host HA and power lost to host.
<http://mail-archives.apache.org/mod_mbox/cloudstack-users/201903.mbox/%3ccwlp123mb2497e9a14efb930067be397df8...@cwlp123mb2497.gbrp123.prod.outlook.com%3E>

I have configured everything for HA to work and using KVM as hypervisor. We
are using Dell PowerEdge R440 servers. Server has redundant PSU.
The issue mentioned below is only with Dell R440 Servers. On other server
models (PowerEdge R730 & R230) with the same configuration KVM HA works
fine.

*Testing Done:*
1. When host is powered off (server that is hosting the vm)
2. Host state stays in *disconnected **state* and VM never migrates. (
Ideally it should go into maintenance mode and VMs should migrate to
available host as per ACS)
3. When I checked management logs I got this error message .* i.e.- "OOBM
is not configured or enabled for host"*
4. When I clicked on Power ON menu in the idrac dashboard*,* the host goes
into maintenance mode and VM Fencing takes place, but I see that still the
server is not ping-able and the power state in the idrac dashboard is
showing as *OFF*. Then I re-clicked on the power ON menu, and found that
the power state shows *ON.*
5.We have tested the server with one PSU at a time with different power
source and still we see the above issue.
6. Dell team has isolated the issue to ACS being, they were able to execute
the IPMI commands via CLI mode.

On other server models (PowerEdge R730 & R230) with the same configuration
KVM HA works fine.
The issue mentioned above is only with Dell R440 Servers.

*Management server logs:*

2020-02-25 20:25:11,173 DEBUG [o.a.c.o.OutOfBandManagementServiceImpl]
(pool-4-thread-6:ctx-c9e50e03) (logid:5b57b3b3) Out-of-band Management
action (RESET) on host (a295403a-d87e-4294-8c8a-f9085f36248b) *fail*ed with
*error*: Using best available cipher suite 3

*Invalid* completion code received: *Invalid* command

Set Chassis Power Control to reset *fail*ed: Command not supported in
present state

2020-02-25 20:25:11,178 *WARN*  [o.a.c.k.h.KVMHAProvider]
(pool-4-thread-6:ctx-c9e50e03) (logid:5b57b3b3) OOBM service is not
configured or enabled for this host host3.hyclon3.com *error* is
Out-of-band Management action (RESET) on host
(a295403a-d87e-4294-8c8a-f9085f36248b) *fail*ed with *error*: Using best
available cipher suite 3

*Invalid* completion code received: *Invalid* command

Set Chassis Power Control to reset *fail*ed: Command not supported in
present state

2020-02-25 20:25:11,178 *WARN*  [o.a.c.h.t.BaseHATask]
(pool-4-thread-5:null) (logid:5b57b3b3) *Exception* occurred while running
RecoveryTask on a resource: org.apache.cloudstack.ha.provider.HARecovery
*Exception*:  OOBM service is not configured or enabled for this host
host3.hyclon3.com

org.apache.cloudstack.ha.provider.HARecovery*Exception*:  OOBM service is
not configured or enabled for this host host3.hyclon3.com

Caused by: com.cloud.utils.*exception*.CloudRuntime*Exception*: Out-of-band
Management action (RESET) on host (a295403a-d87e-4294-8c8a-f9085f36248b)
*fail*ed with *error*: Using best available cipher suite 3

*Invalid* completion code received: *Invalid* command

Set Chassis Power Control to reset *fail*ed: Command not supported in
present state

2020-02-25 20:25:11,182 *WARN*  [c.c.a.AlertManagerImpl]
(pool-4-thread-5:null) (logid:5b57b3b3) AlertType:: 30 | dataCenterId:: 1 |
podId:: 1 | clusterId:: null | message:: HA Recovery of host id=13, in dc
id=1 performed

2020-02-25 20:25:15,056 DEBUG [o.a.c.o.OutOfBandManagementServiceImpl]
(pool-5-thread-24:ctx-08e29baf) (logid:8e05151d) Out-of-band Management
action (OFF) on host (a295403a-d87e-4294-8c8a-f9085f36248b) *fail*ed with
*error*: Using best available cipher suite 3

*Invalid* completion code received: *Invalid* command

Set Chassis Power Control to down/off *fail*ed: Command not supported in
present state


It will be of great help if anyone can help us to resolve this issue.


Regards,

Mark


kvm ha test in suspect degraded loop

2020-02-22 Thread Piotr Pisz
 

Hi,

We are trying to test the host HA, we have correctly configured OOBM and
KVMHAProvider (together with the NFS pool). Unfortunately, during testing we
cannot leave the suspect-degraded state loop. All values in the global
settings are default. My question is how are you organized HA for KVM? How
should HA options be set?

The test we did was very simple, we disconnected the network (except ipmi)
and then the host was additionally turned off.

 

Regards,

Piotr



kvm ha test in suspect degraded loop

2020-02-21 Thread Piotr Pisz
Hi,

We are trying to test the host HA, we have correctly configured OOBM and
KVMHAProvider (together with the NFS pool). Unfortunately, during testing we
cannot leave the suspect-degraded state loop. All values in the global
settings are default. My question is how are you organized HA for KVM? How
should HA options be set?

The test we did was very simple, we disconnected the network (except ipmi)
and then the host was additionally turned off.

 

Regards,

Piotr



Re: KVM HA fails under multiple management services

2019-06-24 Thread Andrija Panic
Li,

please test with  Indirect.agent.lb.check.interval=60 or similar, not 0
(zero), since that means it won't reconnect - this should solve your
concern.

As for the what is in what rack, it is your responsibility to disperse
infrastructure components appropriately, i.e. across racks and such.
We can't handle every case in that regards, hope you understand

Andrija

On Mon, 24 Jun 2019 at 02:24, li jerry  wrote:

> Thank you Nicolas and Andrija.
>
> Even if indirect.agent.lb.algorithm is configured as roundrobin, the
> probability of failure can only be reduced. But it does not solve 100% of
> the failure of KVM HA;
>
> Because in extreme cases, the management server and the kvm host may fail
> at the same time (for example, the management server and the KVM HOST are
> placed in the same rack, and the RACK will fail at the same time after the
> power failure)
>
>
> E.g;
>
> H1 is assigned and connected to M2
> H2 is assigned and connected to M3
> H3 is assigned and connected to M1
>
> When H1 and M2 fail simultaneously, HOST HA of H1 will be invalid;
>
> Should we have other protection mechanisms to avoid this?
>
> 发件人: Nicolas Vazquez<mailto:nicolas.vazq...@shapeblue.com>
> 发送时间: 2019年6月23日 23:31
> 收件人: d...@cloudstack.apache.org<mailto:d...@cloudstack.apache.org>;
> users<mailto:users@cloudstack.apache.org>
> 抄送: d...@cloudstack.apache.org<mailto:d...@cloudstack.apache.org>
> 主题: Re: KVM HA fails under multiple management services
>
> As Andrija mentioned that is expected behavior as the global setting is
> 'static'. It is also expected that your agents connect to the next
> management server on the 'host' list once the management server they are
> connected to is down.
> You can find more information of this feature on this link:
> https://www.shapeblue.com/software-based-agent-lb-for-cloudstack/
>
> Please note this is a different feature than host HA, in which CloudStack
> will try to recover hosts which are off via ipmi
>
> Obtener Outlook para Android<https://aka.ms/ghei36>
>
>
>
> De: Andrija Panic
> Enviado: domingo, 23 de junio 11:03
> Asunto: Re: KVM HA fails under multiple management services
> Para: users
> Cc: d...@cloudstack.apache.org
>
>
> Li,
>
> based on the Global Setting description for those 2, I would say that is
> the expected behaviour.
> i.e. change Indirect.agent.lb.check.interval to some other value, since 0
> means "don't check, don't reconnect" per what I read.
>
> Also, you might want to change from  Indirect.agent.lb.algorithm=static to
> some other value, since static means all your KVM agents will always
> connect to that one mgmt host that is the first one in the in the "host"
> list.
>
> Regards,
> Andrija
>
>
> nicolas.vazq...@shapeblue.com
> www.shapeblue.com<http://www.shapeblue.com>
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>
>
>
> On Sat, 22 Jun 2019 at 06:19, li jerry  wrote:
>
> >
> > Hello everyone
> > I recently tested the multiple management services, based on agent lb
> HOST
> > HA (KVM). It was found that in extreme cases, HA would fail; the details
> > are as follows:
> >
> >
> > Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an
> > external database cluster
> > Three KVM nodes, H1, H2, H3
> > An external NFS primary storage
> >
> >
> > CLOUDSTACK parameter configuration
> > Indirect.agent.lb.algorithm=static
> > Indirect.agent.lb.check.interval=0
> > host=172.17.1.141,172.17.1.142
> >
> >
> > Through the agent.log analysis, all kvm agents are connected to the first
> > selection management node M1 (172.17.1.141):
> >
> > INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4)
> > Processed new management server list: 172.17.1.141,172.17.1.142@static
> >
> >
> >
> > In extreme cases:
> > KVM HOST and the preferred management server fail at the same time, KVM
> > HOST will not trigger HA detection
> >
> > E.g:
> >
> > M1+H1, power off at the same time; the state of H1 remains Disconnected,
> > and all VMs on H1 will not restart on other KVM nodes;
> > M1+H2, power off at the same time; the state of H1 remains Disconnected,
> > and all VMs on H2 will not restart on other KVM nodes;
> > M1+H3, power off at the same time; the state of H1 remains Disconnected,
> > and all VMs on H3 will not restart on other KVM nodes;
> >
>
>
> --
>
> Andrija Panić
>
>
>

-- 

Andrija Panić


KVM HA fails under multiple management services

2019-06-23 Thread li jerry
Thank you Nicolas and Andrija.

Even if indirect.agent.lb.algorithm is configured as roundrobin, the 
probability of failure can only be reduced. But it does not solve 100% of the 
failure of KVM HA;

Because in extreme cases, the management server and the kvm host may fail at 
the same time (for example, the management server and the KVM HOST are placed 
in the same rack, and the RACK will fail at the same time after the power 
failure)


E.g;

H1 is assigned and connected to M2
H2 is assigned and connected to M3
H3 is assigned and connected to M1

When H1 and M2 fail simultaneously, HOST HA of H1 will be invalid;

Should we have other protection mechanisms to avoid this?

发件人: Nicolas Vazquez<mailto:nicolas.vazq...@shapeblue.com>
发送时间: 2019年6月23日 23:31
收件人: d...@cloudstack.apache.org<mailto:d...@cloudstack.apache.org>; 
users<mailto:users@cloudstack.apache.org>
抄送: d...@cloudstack.apache.org<mailto:d...@cloudstack.apache.org>
主题: Re: KVM HA fails under multiple management services

As Andrija mentioned that is expected behavior as the global setting is 
'static'. It is also expected that your agents connect to the next management 
server on the 'host' list once the management server they are connected to is 
down.
You can find more information of this feature on this link: 
https://www.shapeblue.com/software-based-agent-lb-for-cloudstack/

Please note this is a different feature than host HA, in which CloudStack will 
try to recover hosts which are off via ipmi

Obtener Outlook para Android<https://aka.ms/ghei36>



De: Andrija Panic
Enviado: domingo, 23 de junio 11:03
Asunto: Re: KVM HA fails under multiple management services
Para: users
Cc: d...@cloudstack.apache.org


Li,

based on the Global Setting description for those 2, I would say that is
the expected behaviour.
i.e. change Indirect.agent.lb.check.interval to some other value, since 0
means "don't check, don't reconnect" per what I read.

Also, you might want to change from  Indirect.agent.lb.algorithm=static to
some other value, since static means all your KVM agents will always
connect to that one mgmt host that is the first one in the in the "host"
list.

Regards,
Andrija


nicolas.vazq...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue



On Sat, 22 Jun 2019 at 06:19, li jerry  wrote:

>
> Hello everyone
> I recently tested the multiple management services, based on agent lb HOST
> HA (KVM). It was found that in extreme cases, HA would fail; the details
> are as follows:
>
>
> Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an
> external database cluster
> Three KVM nodes, H1, H2, H3
> An external NFS primary storage
>
>
> CLOUDSTACK parameter configuration
> Indirect.agent.lb.algorithm=static
> Indirect.agent.lb.check.interval=0
> host=172.17.1.141,172.17.1.142
>
>
> Through the agent.log analysis, all kvm agents are connected to the first
> selection management node M1 (172.17.1.141):
>
> INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4)
> Processed new management server list: 172.17.1.141,172.17.1.142@static
>
>
>
> In extreme cases:
> KVM HOST and the preferred management server fail at the same time, KVM
> HOST will not trigger HA detection
>
> E.g:
>
> M1+H1, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H1 will not restart on other KVM nodes;
> M1+H2, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H2 will not restart on other KVM nodes;
> M1+H3, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H3 will not restart on other KVM nodes;
>


--

Andrija Panić




Re: KVM HA fails under multiple management services

2019-06-23 Thread Nicolas Vazquez
As Andrija mentioned that is expected behavior as the global setting is 
'static'. It is also expected that your agents connect to the next management 
server on the 'host' list once the management server they are connected to is 
down.
You can find more information of this feature on this link: 
https://www.shapeblue.com/software-based-agent-lb-for-cloudstack/

Please note this is a different feature than host HA, in which CloudStack will 
try to recover hosts which are off via ipmi

Obtener Outlook para Android<https://aka.ms/ghei36>



De: Andrija Panic
Enviado: domingo, 23 de junio 11:03
Asunto: Re: KVM HA fails under multiple management services
Para: users
Cc: d...@cloudstack.apache.org


Li,

based on the Global Setting description for those 2, I would say that is
the expected behaviour.
i.e. change Indirect.agent.lb.check.interval to some other value, since 0
means "don't check, don't reconnect" per what I read.

Also, you might want to change from  Indirect.agent.lb.algorithm=static to
some other value, since static means all your KVM agents will always
connect to that one mgmt host that is the first one in the in the "host"
list.

Regards,
Andrija


nicolas.vazq...@shapeblue.com 
www.shapeblue.com
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue
  
 

On Sat, 22 Jun 2019 at 06:19, li jerry  wrote:

>
> Hello everyone
> I recently tested the multiple management services, based on agent lb HOST
> HA (KVM). It was found that in extreme cases, HA would fail; the details
> are as follows:
>
>
> Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an
> external database cluster
> Three KVM nodes, H1, H2, H3
> An external NFS primary storage
>
>
> CLOUDSTACK parameter configuration
> Indirect.agent.lb.algorithm=static
> Indirect.agent.lb.check.interval=0
> host=172.17.1.141,172.17.1.142
>
>
> Through the agent.log analysis, all kvm agents are connected to the first
> selection management node M1 (172.17.1.141):
>
> INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4)
> Processed new management server list: 172.17.1.141,172.17.1.142@static
>
>
>
> In extreme cases:
> KVM HOST and the preferred management server fail at the same time, KVM
> HOST will not trigger HA detection
>
> E.g:
>
> M1+H1, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H1 will not restart on other KVM nodes;
> M1+H2, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H2 will not restart on other KVM nodes;
> M1+H3, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H3 will not restart on other KVM nodes;
>


--

Andrija Panić




Re: KVM HA fails under multiple management services

2019-06-23 Thread Andrija Panic
Li,

based on the Global Setting description for those 2, I would say that is
the expected behaviour.
i.e. change Indirect.agent.lb.check.interval to some other value, since 0
means "don't check, don't reconnect" per what I read.

Also, you might want to change from  Indirect.agent.lb.algorithm=static to
some other value, since static means all your KVM agents will always
connect to that one mgmt host that is the first one in the in the "host"
list.

Regards,
Andrija

On Sat, 22 Jun 2019 at 06:19, li jerry  wrote:

>
> Hello everyone
> I recently tested the multiple management services, based on agent lb HOST
> HA (KVM). It was found that in extreme cases, HA would fail; the details
> are as follows:
>
>
> Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an
> external database cluster
> Three KVM nodes, H1, H2, H3
> An external NFS primary storage
>
>
> CLOUDSTACK parameter configuration
> Indirect.agent.lb.algorithm=static
> Indirect.agent.lb.check.interval=0
> host=172.17.1.141,172.17.1.142
>
>
> Through the agent.log analysis, all kvm agents are connected to the first
> selection management node M1 (172.17.1.141):
>
> INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4)
> Processed new management server list: 172.17.1.141,172.17.1.142@static
>
>
>
> In extreme cases:
> KVM HOST and the preferred management server fail at the same time, KVM
> HOST will not trigger HA detection
>
> E.g:
>
> M1+H1, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H1 will not restart on other KVM nodes;
> M1+H2, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H2 will not restart on other KVM nodes;
> M1+H3, power off at the same time; the state of H1 remains Disconnected,
> and all VMs on H3 will not restart on other KVM nodes;
>


-- 

Andrija Panić


KVM HA fails under multiple management services

2019-06-21 Thread li jerry

Hello everyone
I recently tested the multiple management services, based on agent lb HOST HA 
(KVM). It was found that in extreme cases, HA would fail; the details are as 
follows:


Two management nodes, M1 (172.17.1.141) and M2 (172.17.1.142), share an 
external database cluster
Three KVM nodes, H1, H2, H3
An external NFS primary storage


CLOUDSTACK parameter configuration
Indirect.agent.lb.algorithm=static
Indirect.agent.lb.check.interval=0
host=172.17.1.141,172.17.1.142


Through the agent.log analysis, all kvm agents are connected to the first 
selection management node M1 (172.17.1.141):

INFO [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:b30323e4) 
Processed new management server list: 172.17.1.141,172.17.1.142@static



In extreme cases:
KVM HOST and the preferred management server fail at the same time, KVM HOST 
will not trigger HA detection

E.g:

M1+H1, power off at the same time; the state of H1 remains Disconnected, and 
all VMs on H1 will not restart on other KVM nodes;
M1+H2, power off at the same time; the state of H1 remains Disconnected, and 
all VMs on H2 will not restart on other KVM nodes;
M1+H3, power off at the same time; the state of H1 remains Disconnected, and 
all VMs on H3 will not restart on other KVM nodes;


Re: Problems with KVM HA & STONITH

2018-04-05 Thread McClune, James
Hi Victor,

If I may interject, I read your email and understand you're running KVM
with Ceph storage. As I far I know, ACS only supports HA on NFS or iSCSI
primary storage.

http://docs.cloudstack.apache.org/projects/cloudstack-administration/en/4.11/reliability.html

However, if you wanted to use Ceph, you could create an RBD block device
and export it over NFS. Here is an article I referenced in the past:

https://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/

You could then add that NFS storage into ACS and utilize HA. I hope I'm
understanding you correctly.

Best Regards,
James

On Thu, Apr 5, 2018 at 12:53 PM, victor <vic...@ihnetworks.com> wrote:

> Hello Boris,
>
> I am able to create VM with nfs+Ha and nfs without HA. The issue is with
> creating VM with Ceph  storage.
>
> Regards
> Victor
>
>
>
> On 04/05/2018 01:18 PM, Boris Stoyanov wrote:
>
>> Hi Victor,
>> Host HA is working only with KVM + NFS. Ceph is not supported at this
>> stage. Obviously RAW volumes are not supported on your pool, but I’m not
>> sure if that’s because of Ceph or HA in general. Are you able to deploy a
>> non-ha VM?
>>
>> Boris Stoyanov
>>
>>
>> boris.stoya...@shapeblue.com
>> www.shapeblue.com
>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
>> @shapeblue
>>
>>
>>> On 5 Apr 2018, at 4:19, victor <vic...@ihnetworks.com> wrote:
>>>
>>> Hello Rohit,
>>>
>>> Is the Host HA provider start working with Ceph. The reason I am asking
>>> is because, I am not able to create a VM with Ceph storage in a kvm host
>>> with HA enabled and I am getting the following error while creating VM.
>>>
>>> 
>>> .cloud.exception.StorageUnavailableException: Resource [StoragePool:2]
>>> is unreachable: Unable to create Vol[9|vm=6|DATADISK]:com.cloud
>>> .utils.exception.CloudRuntimeException: org.libvirt.LibvirtException:
>>> unsupported configuration: only RAW volumes are supported by this storage
>>> pool
>>> 
>>>
>>> Regards
>>> Victor
>>>
>>> On 11/04/2017 09:53 PM, Rohit Yadav wrote:
>>>
>>>> Hi James, (/cc Simon and others),
>>>>
>>>>
>>>> A new feature exists in upcoming ACS 4.11, Host HA:
>>>>
>>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA
>>>>
>>>> You can read more about it here as well: http://www.shapeblue.com/host-
>>>> ha-for-kvm-hosts-in-cloudstack/
>>>>
>>>> This feature can use a custom HA provider, with default HA provider
>>>> implemented for KVM and NFS, and uses ipmi based fencing (STONITH) of the
>>>> host. The current HA mechanism provides no such method of fencing (powering
>>>> off) a host and it depends under what circumstances the VM HA is failing
>>>> (environment issues, ACS version etc).
>>>>
>>>> As Simon mentioned, we have a (host) HA provider that works with Ceph
>>>> in near future.
>>>>
>>>> Regards.
>>>>
>>>> 
>>>> From: Simon Weller <swel...@ena.com.INVALID>
>>>> Sent: Thursday, November 2, 2017 7:27:22 PM
>>>> To: users@cloudstack.apache.org
>>>> Subject: Re: Problems with KVM HA & STONITH
>>>>
>>>> James,
>>>>
>>>>
>>>> Ceph is a great solution and we run all of our ACS storage on Ceph.
>>>> Note that it adds another layer of complexity to your installation, so
>>>> you're going need to develop some expertise with that platform to get
>>>> comfortable with how it works. Typically you don't want to mix Ceph with
>>>> your ACS hosts. We in fact deploy 3 separate Ceph Monitors, and then scale
>>>> OSDs as required on a per cluster basis in order to add additional
>>>> resiliency (So every KVM ACS cluster has it's own Ceph "POD").  We also use
>>>> Ceph for S3 storage (on completely separate Ceph clusters) for some other
>>>> services.
>>>>
>>>>
>>>> NFS is much simpler to maintain for smaller installations in my
>>>> opinion. If the IO load you're looking at isn't going to be insanely high,
>>>> you could look at building a 2 node NFS cluster using pacemaker and DRDB
>>>> for data replication between nodes. That would reduce your storage
>>>> requirement to 2 fairly low power servers (NFS is not very cpu intensive).
>>>> 

Re: Problems with KVM HA & STONITH

2018-04-05 Thread victor

Hello Boris,

I am able to create VM with nfs+Ha and nfs without HA. The issue is with 
creating VM with Ceph  storage.


Regards
Victor


On 04/05/2018 01:18 PM, Boris Stoyanov wrote:

Hi Victor,
Host HA is working only with KVM + NFS. Ceph is not supported at this stage. 
Obviously RAW volumes are not supported on your pool, but I’m not sure if 
that’s because of Ceph or HA in general. Are you able to deploy a non-ha VM?

Boris Stoyanov


boris.stoya...@shapeblue.com
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
   
  


On 5 Apr 2018, at 4:19, victor <vic...@ihnetworks.com> wrote:

Hello Rohit,

Is the Host HA provider start working with Ceph. The reason I am asking is 
because, I am not able to create a VM with Ceph storage in a kvm host with HA 
enabled and I am getting the following error while creating VM.


.cloud.exception.StorageUnavailableException: Resource [StoragePool:2] is 
unreachable: Unable to create 
Vol[9|vm=6|DATADISK]:com.cloud.utils.exception.CloudRuntimeException: 
org.libvirt.LibvirtException: unsupported configuration: only RAW volumes are 
supported by this storage pool


Regards
Victor

On 11/04/2017 09:53 PM, Rohit Yadav wrote:

Hi James, (/cc Simon and others),


A new feature exists in upcoming ACS 4.11, Host HA:

https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA

You can read more about it here as well: 
http://www.shapeblue.com/host-ha-for-kvm-hosts-in-cloudstack/

This feature can use a custom HA provider, with default HA provider implemented 
for KVM and NFS, and uses ipmi based fencing (STONITH) of the host. The current 
HA mechanism provides no such method of fencing (powering off) a host and it 
depends under what circumstances the VM HA is failing (environment issues, ACS 
version etc).

As Simon mentioned, we have a (host) HA provider that works with Ceph in near 
future.

Regards.


From: Simon Weller <swel...@ena.com.INVALID>
Sent: Thursday, November 2, 2017 7:27:22 PM
To: users@cloudstack.apache.org
Subject: Re: Problems with KVM HA & STONITH

James,


Ceph is a great solution and we run all of our ACS storage on Ceph. Note that it adds 
another layer of complexity to your installation, so you're going need to develop some 
expertise with that platform to get comfortable with how it works. Typically you don't 
want to mix Ceph with your ACS hosts. We in fact deploy 3 separate Ceph Monitors, and 
then scale OSDs as required on a per cluster basis in order to add additional resiliency 
(So every KVM ACS cluster has it's own Ceph "POD").  We also use Ceph for S3 
storage (on completely separate Ceph clusters) for some other services.


NFS is much simpler to maintain for smaller installations in my opinion. If the 
IO load you're looking at isn't going to be insanely high, you could look at 
building a 2 node NFS cluster using pacemaker and DRDB for data replication 
between nodes. That would reduce your storage requirement to 2 fairly low power 
servers (NFS is not very cpu intensive). Currently on a host failure when using 
a storage other than NFS on KVM, you will not see HA occur until you take the 
failed host out of the ACS cluster. This is a historical limitation because ACS 
could not confirm the host had been fenced correctly, so to avoid potential 
data corruption (due to 2 hosts mounting the same storage), it doesn't do 
anything until the operator intervenes. As of ACS 4.10, IPMI based fencing is 
now supported on NFS and we're planning on developing similar support for Ceph.


Since you're an school district, I'm more than happy to jump on the phone with 
you to talk you through these options if you'd like.


- Si



From: McClune, James <mcclu...@norwalktruckers.net>
Sent: Thursday, November 2, 2017 8:28 AM
To: users@cloudstack.apache.org
Subject: Re: Problems with KVM HA & STONITH

Hi Simon,

Thanks for getting back to me. I created one single NFS share and added it
as primary storage. I think I better understand how the storage works, with
ACS.

I was able to get HA working with one NFS storage, which is good. However,
is there a way to incorporate multiple NFS storage pools and still have the
HA functionality? I think something like GlusterFS or Ceph (like Ivan and
Dag described) will work better.

Thank you Simon, Ivan, and Dag for your assistance!
James

On Wed, Nov 1, 2017 at 10:10 AM, Simon Weller <swel...@ena.com.invalid>
wrote:


James,


Try just configuring a single NFS server and see if your setup works. If
you have 3 NFS shares, across all 3 hosts, i'm wondering whether ACS is
picking the one you rebooted as the storage for your VMs and when that
storage goes away (when you bounce the host), all storage for your VMs
vanishes and ACS tries to reboot your other hosts.


Normally in a simple ACS setup, you would have a separate storage server
that can serve up N

Re: Problems with KVM HA & STONITH

2018-04-05 Thread Boris Stoyanov
Hi Victor, 
Host HA is working only with KVM + NFS. Ceph is not supported at this stage. 
Obviously RAW volumes are not supported on your pool, but I’m not sure if 
that’s because of Ceph or HA in general. Are you able to deploy a non-ha VM?

Boris Stoyanov


boris.stoya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

> On 5 Apr 2018, at 4:19, victor <vic...@ihnetworks.com> wrote:
> 
> Hello Rohit,
> 
> Is the Host HA provider start working with Ceph. The reason I am asking is 
> because, I am not able to create a VM with Ceph storage in a kvm host with HA 
> enabled and I am getting the following error while creating VM.
> 
> 
> .cloud.exception.StorageUnavailableException: Resource [StoragePool:2] is 
> unreachable: Unable to create 
> Vol[9|vm=6|DATADISK]:com.cloud.utils.exception.CloudRuntimeException: 
> org.libvirt.LibvirtException: unsupported configuration: only RAW volumes are 
> supported by this storage pool
> 
> 
> Regards
> Victor
> 
> On 11/04/2017 09:53 PM, Rohit Yadav wrote:
>> Hi James, (/cc Simon and others),
>> 
>> 
>> A new feature exists in upcoming ACS 4.11, Host HA:
>> 
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA
>> 
>> You can read more about it here as well: 
>> http://www.shapeblue.com/host-ha-for-kvm-hosts-in-cloudstack/
>> 
>> This feature can use a custom HA provider, with default HA provider 
>> implemented for KVM and NFS, and uses ipmi based fencing (STONITH) of the 
>> host. The current HA mechanism provides no such method of fencing (powering 
>> off) a host and it depends under what circumstances the VM HA is failing 
>> (environment issues, ACS version etc).
>> 
>> As Simon mentioned, we have a (host) HA provider that works with Ceph in 
>> near future.
>> 
>> Regards.
>> 
>> ____
>> From: Simon Weller <swel...@ena.com.INVALID>
>> Sent: Thursday, November 2, 2017 7:27:22 PM
>> To: users@cloudstack.apache.org
>> Subject: Re: Problems with KVM HA & STONITH
>> 
>> James,
>> 
>> 
>> Ceph is a great solution and we run all of our ACS storage on Ceph. Note 
>> that it adds another layer of complexity to your installation, so you're 
>> going need to develop some expertise with that platform to get comfortable 
>> with how it works. Typically you don't want to mix Ceph with your ACS hosts. 
>> We in fact deploy 3 separate Ceph Monitors, and then scale OSDs as required 
>> on a per cluster basis in order to add additional resiliency (So every KVM 
>> ACS cluster has it's own Ceph "POD").  We also use Ceph for S3 storage (on 
>> completely separate Ceph clusters) for some other services.
>> 
>> 
>> NFS is much simpler to maintain for smaller installations in my opinion. If 
>> the IO load you're looking at isn't going to be insanely high, you could 
>> look at building a 2 node NFS cluster using pacemaker and DRDB for data 
>> replication between nodes. That would reduce your storage requirement to 2 
>> fairly low power servers (NFS is not very cpu intensive). Currently on a 
>> host failure when using a storage other than NFS on KVM, you will not see HA 
>> occur until you take the failed host out of the ACS cluster. This is a 
>> historical limitation because ACS could not confirm the host had been fenced 
>> correctly, so to avoid potential data corruption (due to 2 hosts mounting 
>> the same storage), it doesn't do anything until the operator intervenes. As 
>> of ACS 4.10, IPMI based fencing is now supported on NFS and we're planning 
>> on developing similar support for Ceph.
>> 
>> 
>> Since you're an school district, I'm more than happy to jump on the phone 
>> with you to talk you through these options if you'd like.
>> 
>> 
>> - Si
>> 
>> 
>> 
>> From: McClune, James <mcclu...@norwalktruckers.net>
>> Sent: Thursday, November 2, 2017 8:28 AM
>> To: users@cloudstack.apache.org
>> Subject: Re: Problems with KVM HA & STONITH
>> 
>> Hi Simon,
>> 
>> Thanks for getting back to me. I created one single NFS share and added it
>> as primary storage. I think I better understand how the storage works, with
>> ACS.
>> 
>> I was able to get HA working with one NFS storage, which is good. However,
>> is there a way to incorporate multiple NFS storage pools and still have the
>> HA functionality? I think something like GlusterFS or Ceph (like Ivan and
>> Dag described

Re: Problems with KVM HA & STONITH

2018-04-04 Thread victor

Hello Rohit,

Is the Host HA provider start working with Ceph. The reason I am asking 
is because, I am not able to create a VM with Ceph storage in a kvm host 
with HA enabled and I am getting the following error while creating VM.



.cloud.exception.StorageUnavailableException: Resource [StoragePool:2] 
is unreachable: Unable to create 
Vol[9|vm=6|DATADISK]:com.cloud.utils.exception.CloudRuntimeException: 
org.libvirt.LibvirtException: unsupported configuration: only RAW 
volumes are supported by this storage pool



Regards
Victor

On 11/04/2017 09:53 PM, Rohit Yadav wrote:

Hi James, (/cc Simon and others),


A new feature exists in upcoming ACS 4.11, Host HA:

https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA

You can read more about it here as well: 
http://www.shapeblue.com/host-ha-for-kvm-hosts-in-cloudstack/

This feature can use a custom HA provider, with default HA provider implemented 
for KVM and NFS, and uses ipmi based fencing (STONITH) of the host. The current 
HA mechanism provides no such method of fencing (powering off) a host and it 
depends under what circumstances the VM HA is failing (environment issues, ACS 
version etc).

As Simon mentioned, we have a (host) HA provider that works with Ceph in near 
future.

Regards.


From: Simon Weller <swel...@ena.com.INVALID>
Sent: Thursday, November 2, 2017 7:27:22 PM
To: users@cloudstack.apache.org
Subject: Re: Problems with KVM HA & STONITH

James,


Ceph is a great solution and we run all of our ACS storage on Ceph. Note that it adds 
another layer of complexity to your installation, so you're going need to develop some 
expertise with that platform to get comfortable with how it works. Typically you don't 
want to mix Ceph with your ACS hosts. We in fact deploy 3 separate Ceph Monitors, and 
then scale OSDs as required on a per cluster basis in order to add additional resiliency 
(So every KVM ACS cluster has it's own Ceph "POD").  We also use Ceph for S3 
storage (on completely separate Ceph clusters) for some other services.


NFS is much simpler to maintain for smaller installations in my opinion. If the 
IO load you're looking at isn't going to be insanely high, you could look at 
building a 2 node NFS cluster using pacemaker and DRDB for data replication 
between nodes. That would reduce your storage requirement to 2 fairly low power 
servers (NFS is not very cpu intensive). Currently on a host failure when using 
a storage other than NFS on KVM, you will not see HA occur until you take the 
failed host out of the ACS cluster. This is a historical limitation because ACS 
could not confirm the host had been fenced correctly, so to avoid potential 
data corruption (due to 2 hosts mounting the same storage), it doesn't do 
anything until the operator intervenes. As of ACS 4.10, IPMI based fencing is 
now supported on NFS and we're planning on developing similar support for Ceph.


Since you're an school district, I'm more than happy to jump on the phone with 
you to talk you through these options if you'd like.


- Si



From: McClune, James <mcclu...@norwalktruckers.net>
Sent: Thursday, November 2, 2017 8:28 AM
To: users@cloudstack.apache.org
Subject: Re: Problems with KVM HA & STONITH

Hi Simon,

Thanks for getting back to me. I created one single NFS share and added it
as primary storage. I think I better understand how the storage works, with
ACS.

I was able to get HA working with one NFS storage, which is good. However,
is there a way to incorporate multiple NFS storage pools and still have the
HA functionality? I think something like GlusterFS or Ceph (like Ivan and
Dag described) will work better.

Thank you Simon, Ivan, and Dag for your assistance!
James

On Wed, Nov 1, 2017 at 10:10 AM, Simon Weller <swel...@ena.com.invalid>
wrote:


James,


Try just configuring a single NFS server and see if your setup works. If
you have 3 NFS shares, across all 3 hosts, i'm wondering whether ACS is
picking the one you rebooted as the storage for your VMs and when that
storage goes away (when you bounce the host), all storage for your VMs
vanishes and ACS tries to reboot your other hosts.


Normally in a simple ACS setup, you would have a separate storage server
that can serve up NFS to all hosts. If a host dies, then a VM would be
brought up on a spare hosts since all hosts have access to the same storage.

Your other option is to use local storage, but that won't provide HA.


- Si



From: McClune, James <mcclu...@norwalktruckers.net>
Sent: Monday, October 30, 2017 2:26 PM
To: users@cloudstack.apache.org
Subject: Re: Problems with KVM HA & STONITH

Hi Dag,

Thank you for responding back. I am currently running ACS 4.9 on an Ubuntu
14.04 VM. I have the three nodes, each having about 1TB of primary storage
(NFS) and 1TB of secondary storage (NFS). I added each 

Re: KVM HA configuration

2017-11-19 Thread William Alianto
Hi,

Thanks for the pointer. I managed to create instance after lower the speed 
offer. And after that, I tried to shutdown one host. It takes a while, but the 
HA do works.

--
Regards,

William


On 20-Nov-17 11:00:46, Ivan Kudryavtsev <kudryavtsev...@bw-sw.com> wrote:
Hi.

Host: 4 doesn't have cpu capability (cpu:12, speed:1895) to support
requested CPU: 2 and requested speed: 2000

It seems you have to decrease cpu speed constraint for the offering.

20 нояб. 2017 г. 10:50 ДП пользователь "William Alianto"
написал:

> Hi Ivan,
>
> Thanks for the pointer. I can now see the HA option on the offering. I
> created a new offer with HA option, and try to create a new instance using
> the offer. Unfortunately the action failed. Here is the error log from the
> action :
>
> https://pastebin.com/9DKXHdW3
>
> Seems like the ACS cannot find suitable host for the HA, although I have
> setup hypervisor hosts in the infrastructure. Did I do any wrong step in
> the configuration?
>
> --
> Regards,
>
> William
>
>
> On 17-Nov-17 22:40:43, Ivan Kudryavtsev wrote:
> Hi, when you create a service offering (not an instance) it allows
> specifying HA or not.
>
> 17 нояб. 2017 г. 10:36 ПП пользователь "William Alianto"
> написал:
>
> > Hi Dag,
> >
> > I can’t find any option for HA when I try to create new instances. How do
> > I know if the HA option available or not?
> >
> > --
> > Regards,
> >
> > William
> >
> > > On 17 Nov 2017, at 17.03, Dag Sonstebo
> > wrote:
> > >
> > > Hi William,
> > >
> > > HA follows the compute offering of your VMs, it’s not attached to the
> > host as such. So if the VM uses a HA offering then ACS will monitor it
> and
> > bring it back online if offline.
> > >
> > > Regards,
> > > Dag Sonstebo
> > > Cloud Architect
> > > ShapeBlue
> > >
> > > On 17/11/2017, 10:00, "William Alianto" wrote:
> > >
> > > Hi,
> > >
> > > I'm still learning more about ACS and I would like to know if there
> > is any configuration needed to have KVM HA enabled on ACS 4.9. I've been
> > searching for documentation but still haven't find any clear picture on
> how
> > to do that. Can anyone please give me some guide how to enable it? I
> > already have 2 KVM hosts added in the cluster.
> > >
> > > --
> > > Regards,
> > >
> > > William
> > >
> > >
> > >
> > >
> > >
> > > dag.sonst...@shapeblue.com
> > > www.shapeblue.com
> > > 53 Chandos Place, Covent Garden, London WC2N 4HSUK
> > > @shapeblue
> > >
> > >
> > >
> >
>


Re: KVM HA configuration

2017-11-19 Thread Ivan Kudryavtsev
Hi.

Host: 4 doesn't have cpu capability (cpu:12, speed:1895) to support
requested CPU: 2 and requested speed: 2000

It seems you have to decrease cpu speed constraint for the offering.

20 нояб. 2017 г. 10:50 ДП пользователь "William Alianto" <will...@xofap.com>
написал:

> Hi Ivan,
>
> Thanks for the pointer. I can now see the HA option on the offering. I
> created a new offer with HA option, and try to create a new instance using
> the offer. Unfortunately the action failed. Here is the error log from the
> action :
>
> https://pastebin.com/9DKXHdW3
>
> Seems like the ACS cannot find suitable host for the HA, although I have
> setup hypervisor hosts in the infrastructure. Did I do any wrong step in
> the configuration?
>
> --
> Regards,
>
> William
>
>
> On 17-Nov-17 22:40:43, Ivan Kudryavtsev <kudryavtsev...@bw-sw.com> wrote:
> Hi, when you create a service offering (not an instance) it allows
> specifying HA or not.
>
> 17 нояб. 2017 г. 10:36 ПП пользователь "William Alianto"
> написал:
>
> > Hi Dag,
> >
> > I can’t find any option for HA when I try to create new instances. How do
> > I know if the HA option available or not?
> >
> > --
> > Regards,
> >
> > William
> >
> > > On 17 Nov 2017, at 17.03, Dag Sonstebo
> > wrote:
> > >
> > > Hi William,
> > >
> > > HA follows the compute offering of your VMs, it’s not attached to the
> > host as such. So if the VM uses a HA offering then ACS will monitor it
> and
> > bring it back online if offline.
> > >
> > > Regards,
> > > Dag Sonstebo
> > > Cloud Architect
> > > ShapeBlue
> > >
> > > On 17/11/2017, 10:00, "William Alianto" wrote:
> > >
> > > Hi,
> > >
> > > I'm still learning more about ACS and I would like to know if there
> > is any configuration needed to have KVM HA enabled on ACS 4.9. I've been
> > searching for documentation but still haven't find any clear picture on
> how
> > to do that. Can anyone please give me some guide how to enable it? I
> > already have 2 KVM hosts added in the cluster.
> > >
> > > --
> > > Regards,
> > >
> > > William
> > >
> > >
> > >
> > >
> > >
> > > dag.sonst...@shapeblue.com
> > > www.shapeblue.com
> > > 53 Chandos Place, Covent Garden, London WC2N 4HSUK
> > > @shapeblue
> > >
> > >
> > >
> >
>


Re: KVM HA configuration

2017-11-19 Thread William Alianto
Hi Ivan,

Thanks for the pointer. I can now see the HA option on the offering. I created 
a new offer with HA option, and try to create a new instance using the offer. 
Unfortunately the action failed. Here is the error log from the action :

https://pastebin.com/9DKXHdW3

Seems like the ACS cannot find suitable host for the HA, although I have setup 
hypervisor hosts in the infrastructure. Did I do any wrong step in the 
configuration?

--
Regards,

William


On 17-Nov-17 22:40:43, Ivan Kudryavtsev <kudryavtsev...@bw-sw.com> wrote:
Hi, when you create a service offering (not an instance) it allows
specifying HA or not.

17 нояб. 2017 г. 10:36 ПП пользователь "William Alianto"
написал:

> Hi Dag,
>
> I can’t find any option for HA when I try to create new instances. How do
> I know if the HA option available or not?
>
> --
> Regards,
>
> William
>
> > On 17 Nov 2017, at 17.03, Dag Sonstebo
> wrote:
> >
> > Hi William,
> >
> > HA follows the compute offering of your VMs, it’s not attached to the
> host as such. So if the VM uses a HA offering then ACS will monitor it and
> bring it back online if offline.
> >
> > Regards,
> > Dag Sonstebo
> > Cloud Architect
> > ShapeBlue
> >
> > On 17/11/2017, 10:00, "William Alianto" wrote:
> >
> > Hi,
> >
> > I'm still learning more about ACS and I would like to know if there
> is any configuration needed to have KVM HA enabled on ACS 4.9. I've been
> searching for documentation but still haven't find any clear picture on how
> to do that. Can anyone please give me some guide how to enable it? I
> already have 2 KVM hosts added in the cluster.
> >
> > --
> > Regards,
> >
> > William
> >
> >
> >
> >
> >
> > dag.sonst...@shapeblue.com
> > www.shapeblue.com
> > 53 Chandos Place, Covent Garden, London WC2N 4HSUK
> > @shapeblue
> >
> >
> >
>


Re: KVM HA configuration

2017-11-17 Thread Ivan Kudryavtsev
Hi, when you create a service offering (not an instance) it allows
specifying HA or not.

17 нояб. 2017 г. 10:36 ПП пользователь "William Alianto" <will...@xofap.com>
написал:

> Hi Dag,
>
> I can’t find any option for HA when I try to create new instances. How do
> I know if the HA option available or not?
>
> --
> Regards,
>
> William
>
> > On 17 Nov 2017, at 17.03, Dag Sonstebo <dag.sonst...@shapeblue.com>
> wrote:
> >
> > Hi William,
> >
> > HA follows the compute offering of your VMs, it’s not attached to the
> host as such. So if the VM uses a HA offering then ACS will monitor it and
> bring it back online if offline.
> >
> > Regards,
> > Dag Sonstebo
> > Cloud Architect
> > ShapeBlue
> >
> > On 17/11/2017, 10:00, "William Alianto" <will...@xofap.com> wrote:
> >
> >Hi,
> >
> >I'm still learning more about ACS and I would like to know if there
> is any configuration needed to have KVM HA enabled on ACS 4.9. I've been
> searching for documentation but still haven't find any clear picture on how
> to do that. Can anyone please give me some guide how to enable it? I
> already have 2 KVM hosts added in the cluster.
> >
> >--
> >Regards,
> >
> >William
> >
> >
> >
> >
> >
> > dag.sonst...@shapeblue.com
> > www.shapeblue.com
> > 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> > @shapeblue
> >
> >
> >
>


Re: KVM HA configuration

2017-11-17 Thread William Alianto
Hi Dag,

I can’t find any option for HA when I try to create new instances. How do I 
know if the HA option available or not?

--
Regards,

William

> On 17 Nov 2017, at 17.03, Dag Sonstebo <dag.sonst...@shapeblue.com> wrote:
> 
> Hi William,
> 
> HA follows the compute offering of your VMs, it’s not attached to the host as 
> such. So if the VM uses a HA offering then ACS will monitor it and bring it 
> back online if offline.
> 
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
> 
> On 17/11/2017, 10:00, "William Alianto" <will...@xofap.com> wrote:
> 
>Hi,
> 
>I'm still learning more about ACS and I would like to know if there is any 
> configuration needed to have KVM HA enabled on ACS 4.9. I've been searching 
> for documentation but still haven't find any clear picture on how to do that. 
> Can anyone please give me some guide how to enable it? I already have 2 KVM 
> hosts added in the cluster.
> 
>--
>Regards,
> 
>William
> 
> 
> 
> 
> 
> dag.sonst...@shapeblue.com 
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
> 
> 
> 


Re: KVM HA configuration

2017-11-17 Thread Dag Sonstebo
Hi William,

HA follows the compute offering of your VMs, it’s not attached to the host as 
such. So if the VM uses a HA offering then ACS will monitor it and bring it 
back online if offline.

Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue

On 17/11/2017, 10:00, "William Alianto" <will...@xofap.com> wrote:

Hi,

I'm still learning more about ACS and I would like to know if there is any 
configuration needed to have KVM HA enabled on ACS 4.9. I've been searching for 
documentation but still haven't find any clear picture on how to do that. Can 
anyone please give me some guide how to enable it? I already have 2 KVM hosts 
added in the cluster.

--
Regards,

William





dag.sonst...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 



KVM HA configuration

2017-11-17 Thread William Alianto
Hi,

I'm still learning more about ACS and I would like to know if there is any 
configuration needed to have KVM HA enabled on ACS 4.9. I've been searching for 
documentation but still haven't find any clear picture on how to do that. Can 
anyone please give me some guide how to enable it? I already have 2 KVM hosts 
added in the cluster.

--
Regards,

William




RE: Problems with KVM HA & STONITH

2017-11-04 Thread Simon Weller
Yep, very exciting!

Simon Weller/615-312-6068

-Original Message-
From: Rohit Yadav [rohit.ya...@shapeblue.com]
Received: Saturday, 04 Nov 2017, 11:23AM
To: users@cloudstack.apache.org [users@cloudstack.apache.org]
Subject: Re: Problems with KVM HA & STONITH

Hi James, (/cc Simon and others),


A new feature exists in upcoming ACS 4.11, Host HA:

https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA

You can read more about it here as well: 
http://www.shapeblue.com/host-ha-for-kvm-hosts-in-cloudstack/

This feature can use a custom HA provider, with default HA provider implemented 
for KVM and NFS, and uses ipmi based fencing (STONITH) of the host. The current 
HA mechanism provides no such method of fencing (powering off) a host and it 
depends under what circumstances the VM HA is failing (environment issues, ACS 
version etc).

As Simon mentioned, we have a (host) HA provider that works with Ceph in near 
future.

Regards.


From: Simon Weller <swel...@ena.com.INVALID>
Sent: Thursday, November 2, 2017 7:27:22 PM
To: users@cloudstack.apache.org
Subject: Re: Problems with KVM HA & STONITH

James,


Ceph is a great solution and we run all of our ACS storage on Ceph. Note that 
it adds another layer of complexity to your installation, so you're going need 
to develop some expertise with that platform to get comfortable with how it 
works. Typically you don't want to mix Ceph with your ACS hosts. We in fact 
deploy 3 separate Ceph Monitors, and then scale OSDs as required on a per 
cluster basis in order to add additional resiliency (So every KVM ACS cluster 
has it's own Ceph "POD").  We also use Ceph for S3 storage (on completely 
separate Ceph clusters) for some other services.


NFS is much simpler to maintain for smaller installations in my opinion. If the 
IO load you're looking at isn't going to be insanely high, you could look at 
building a 2 node NFS cluster using pacemaker and DRDB for data replication 
between nodes. That would reduce your storage requirement to 2 fairly low power 
servers (NFS is not very cpu intensive). Currently on a host failure when using 
a storage other than NFS on KVM, you will not see HA occur until you take the 
failed host out of the ACS cluster. This is a historical limitation because ACS 
could not confirm the host had been fenced correctly, so to avoid potential 
data corruption (due to 2 hosts mounting the same storage), it doesn't do 
anything until the operator intervenes. As of ACS 4.10, IPMI based fencing is 
now supported on NFS and we're planning on developing similar support for Ceph.


Since you're an school district, I'm more than happy to jump on the phone with 
you to talk you through these options if you'd like.


- Si



From: McClune, James <mcclu...@norwalktruckers.net>
Sent: Thursday, November 2, 2017 8:28 AM
To: users@cloudstack.apache.org
Subject: Re: Problems with KVM HA & STONITH

Hi Simon,

Thanks for getting back to me. I created one single NFS share and added it
as primary storage. I think I better understand how the storage works, with
ACS.

I was able to get HA working with one NFS storage, which is good. However,
is there a way to incorporate multiple NFS storage pools and still have the
HA functionality? I think something like GlusterFS or Ceph (like Ivan and
Dag described) will work better.

Thank you Simon, Ivan, and Dag for your assistance!
James

On Wed, Nov 1, 2017 at 10:10 AM, Simon Weller <swel...@ena.com.invalid>
wrote:

> James,
>
>
> Try just configuring a single NFS server and see if your setup works. If
> you have 3 NFS shares, across all 3 hosts, i'm wondering whether ACS is
> picking the one you rebooted as the storage for your VMs and when that
> storage goes away (when you bounce the host), all storage for your VMs
> vanishes and ACS tries to reboot your other hosts.
>
>
> Normally in a simple ACS setup, you would have a separate storage server
> that can serve up NFS to all hosts. If a host dies, then a VM would be
> brought up on a spare hosts since all hosts have access to the same storage.
>
> Your other option is to use local storage, but that won't provide HA.
>
>
> - Si
>
>
> 
> From: McClune, James <mcclu...@norwalktruckers.net>
> Sent: Monday, October 30, 2017 2:26 PM
> To: users@cloudstack.apache.org
> Subject: Re: Problems with KVM HA & STONITH
>
> Hi Dag,
>
> Thank you for responding back. I am currently running ACS 4.9 on an Ubuntu
> 14.04 VM. I have the three nodes, each having about 1TB of primary storage
> (NFS) and 1TB of secondary storage (NFS). I added each NFS share into ACS.
> All nodes are in a cluster.
>
> Maybe I'm not understanding the setup or misconfigured something. I'm
> trying to setup an HA environment where if one node go

Re: Problems with KVM HA & STONITH

2017-11-04 Thread Rohit Yadav
Hi James, (/cc Simon and others),


A new feature exists in upcoming ACS 4.11, Host HA:

https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA

You can read more about it here as well: 
http://www.shapeblue.com/host-ha-for-kvm-hosts-in-cloudstack/

This feature can use a custom HA provider, with default HA provider implemented 
for KVM and NFS, and uses ipmi based fencing (STONITH) of the host. The current 
HA mechanism provides no such method of fencing (powering off) a host and it 
depends under what circumstances the VM HA is failing (environment issues, ACS 
version etc).

As Simon mentioned, we have a (host) HA provider that works with Ceph in near 
future.

Regards.


From: Simon Weller <swel...@ena.com.INVALID>
Sent: Thursday, November 2, 2017 7:27:22 PM
To: users@cloudstack.apache.org
Subject: Re: Problems with KVM HA & STONITH

James,


Ceph is a great solution and we run all of our ACS storage on Ceph. Note that 
it adds another layer of complexity to your installation, so you're going need 
to develop some expertise with that platform to get comfortable with how it 
works. Typically you don't want to mix Ceph with your ACS hosts. We in fact 
deploy 3 separate Ceph Monitors, and then scale OSDs as required on a per 
cluster basis in order to add additional resiliency (So every KVM ACS cluster 
has it's own Ceph "POD").  We also use Ceph for S3 storage (on completely 
separate Ceph clusters) for some other services.


NFS is much simpler to maintain for smaller installations in my opinion. If the 
IO load you're looking at isn't going to be insanely high, you could look at 
building a 2 node NFS cluster using pacemaker and DRDB for data replication 
between nodes. That would reduce your storage requirement to 2 fairly low power 
servers (NFS is not very cpu intensive). Currently on a host failure when using 
a storage other than NFS on KVM, you will not see HA occur until you take the 
failed host out of the ACS cluster. This is a historical limitation because ACS 
could not confirm the host had been fenced correctly, so to avoid potential 
data corruption (due to 2 hosts mounting the same storage), it doesn't do 
anything until the operator intervenes. As of ACS 4.10, IPMI based fencing is 
now supported on NFS and we're planning on developing similar support for Ceph.


Since you're an school district, I'm more than happy to jump on the phone with 
you to talk you through these options if you'd like.


- Si



From: McClune, James <mcclu...@norwalktruckers.net>
Sent: Thursday, November 2, 2017 8:28 AM
To: users@cloudstack.apache.org
Subject: Re: Problems with KVM HA & STONITH

Hi Simon,

Thanks for getting back to me. I created one single NFS share and added it
as primary storage. I think I better understand how the storage works, with
ACS.

I was able to get HA working with one NFS storage, which is good. However,
is there a way to incorporate multiple NFS storage pools and still have the
HA functionality? I think something like GlusterFS or Ceph (like Ivan and
Dag described) will work better.

Thank you Simon, Ivan, and Dag for your assistance!
James

On Wed, Nov 1, 2017 at 10:10 AM, Simon Weller <swel...@ena.com.invalid>
wrote:

> James,
>
>
> Try just configuring a single NFS server and see if your setup works. If
> you have 3 NFS shares, across all 3 hosts, i'm wondering whether ACS is
> picking the one you rebooted as the storage for your VMs and when that
> storage goes away (when you bounce the host), all storage for your VMs
> vanishes and ACS tries to reboot your other hosts.
>
>
> Normally in a simple ACS setup, you would have a separate storage server
> that can serve up NFS to all hosts. If a host dies, then a VM would be
> brought up on a spare hosts since all hosts have access to the same storage.
>
> Your other option is to use local storage, but that won't provide HA.
>
>
> - Si
>
>
> 
> From: McClune, James <mcclu...@norwalktruckers.net>
> Sent: Monday, October 30, 2017 2:26 PM
> To: users@cloudstack.apache.org
> Subject: Re: Problems with KVM HA & STONITH
>
> Hi Dag,
>
> Thank you for responding back. I am currently running ACS 4.9 on an Ubuntu
> 14.04 VM. I have the three nodes, each having about 1TB of primary storage
> (NFS) and 1TB of secondary storage (NFS). I added each NFS share into ACS.
> All nodes are in a cluster.
>
> Maybe I'm not understanding the setup or misconfigured something. I'm
> trying to setup an HA environment where if one node goes down, running an
> HA marked VM, the VM will start on another host. When I simulate a network
> disconnect or reboot of a host, all of the nodes go down (STONITH?).
>
> I am unsure on how to setup an HA environment, if all the nodes in the
> cluster go dow

Re: Problems with KVM HA & STONITH

2017-11-02 Thread Simon Weller
James,


Ceph is a great solution and we run all of our ACS storage on Ceph. Note that 
it adds another layer of complexity to your installation, so you're going need 
to develop some expertise with that platform to get comfortable with how it 
works. Typically you don't want to mix Ceph with your ACS hosts. We in fact 
deploy 3 separate Ceph Monitors, and then scale OSDs as required on a per 
cluster basis in order to add additional resiliency (So every KVM ACS cluster 
has it's own Ceph "POD").  We also use Ceph for S3 storage (on completely 
separate Ceph clusters) for some other services.


NFS is much simpler to maintain for smaller installations in my opinion. If the 
IO load you're looking at isn't going to be insanely high, you could look at 
building a 2 node NFS cluster using pacemaker and DRDB for data replication 
between nodes. That would reduce your storage requirement to 2 fairly low power 
servers (NFS is not very cpu intensive). Currently on a host failure when using 
a storage other than NFS on KVM, you will not see HA occur until you take the 
failed host out of the ACS cluster. This is a historical limitation because ACS 
could not confirm the host had been fenced correctly, so to avoid potential 
data corruption (due to 2 hosts mounting the same storage), it doesn't do 
anything until the operator intervenes. As of ACS 4.10, IPMI based fencing is 
now supported on NFS and we're planning on developing similar support for Ceph.


Since you're an school district, I'm more than happy to jump on the phone with 
you to talk you through these options if you'd like.


- Si



From: McClune, James <mcclu...@norwalktruckers.net>
Sent: Thursday, November 2, 2017 8:28 AM
To: users@cloudstack.apache.org
Subject: Re: Problems with KVM HA & STONITH

Hi Simon,

Thanks for getting back to me. I created one single NFS share and added it
as primary storage. I think I better understand how the storage works, with
ACS.

I was able to get HA working with one NFS storage, which is good. However,
is there a way to incorporate multiple NFS storage pools and still have the
HA functionality? I think something like GlusterFS or Ceph (like Ivan and
Dag described) will work better.

Thank you Simon, Ivan, and Dag for your assistance!
James

On Wed, Nov 1, 2017 at 10:10 AM, Simon Weller <swel...@ena.com.invalid>
wrote:

> James,
>
>
> Try just configuring a single NFS server and see if your setup works. If
> you have 3 NFS shares, across all 3 hosts, i'm wondering whether ACS is
> picking the one you rebooted as the storage for your VMs and when that
> storage goes away (when you bounce the host), all storage for your VMs
> vanishes and ACS tries to reboot your other hosts.
>
>
> Normally in a simple ACS setup, you would have a separate storage server
> that can serve up NFS to all hosts. If a host dies, then a VM would be
> brought up on a spare hosts since all hosts have access to the same storage.
>
> Your other option is to use local storage, but that won't provide HA.
>
>
> - Si
>
>
> 
> From: McClune, James <mcclu...@norwalktruckers.net>
> Sent: Monday, October 30, 2017 2:26 PM
> To: users@cloudstack.apache.org
> Subject: Re: Problems with KVM HA & STONITH
>
> Hi Dag,
>
> Thank you for responding back. I am currently running ACS 4.9 on an Ubuntu
> 14.04 VM. I have the three nodes, each having about 1TB of primary storage
> (NFS) and 1TB of secondary storage (NFS). I added each NFS share into ACS.
> All nodes are in a cluster.
>
> Maybe I'm not understanding the setup or misconfigured something. I'm
> trying to setup an HA environment where if one node goes down, running an
> HA marked VM, the VM will start on another host. When I simulate a network
> disconnect or reboot of a host, all of the nodes go down (STONITH?).
>
> I am unsure on how to setup an HA environment, if all the nodes in the
> cluster go down. Any help is much appreciated!
>
> Thanks,
> James
>
> On Mon, Oct 30, 2017 at 3:49 AM, Dag Sonstebo <dag.sonst...@shapeblue.com>
> wrote:
>
> > Hi James,
> >
> > I think  you possibly have over-configured your KVM hosts. If you use NFS
> > (and no clustered file system like CLVM) then there should be no need to
> > configure STONITH. CloudStack takes care of your HA, so this is not
> > something you offload to the KVM host.
> >
> > (As mentioned the only time I have played with STONITH and CloudStack was
> > for CLVM – and I eventually found it not fit for purpose, too unstable
> and
> > causing too many issues like you describe. Note this was for block
> storage
> > though – not NFS).
> >
> > Regards,
> > Dag Sonstebo
> > Cloud Architect
> > ShapeBlue
> &

Re: Problems with KVM HA & STONITH

2017-11-02 Thread McClune, James
Hi Simon,

Thanks for getting back to me. I created one single NFS share and added it
as primary storage. I think I better understand how the storage works, with
ACS.

I was able to get HA working with one NFS storage, which is good. However,
is there a way to incorporate multiple NFS storage pools and still have the
HA functionality? I think something like GlusterFS or Ceph (like Ivan and
Dag described) will work better.

Thank you Simon, Ivan, and Dag for your assistance!
James

On Wed, Nov 1, 2017 at 10:10 AM, Simon Weller <swel...@ena.com.invalid>
wrote:

> James,
>
>
> Try just configuring a single NFS server and see if your setup works. If
> you have 3 NFS shares, across all 3 hosts, i'm wondering whether ACS is
> picking the one you rebooted as the storage for your VMs and when that
> storage goes away (when you bounce the host), all storage for your VMs
> vanishes and ACS tries to reboot your other hosts.
>
>
> Normally in a simple ACS setup, you would have a separate storage server
> that can serve up NFS to all hosts. If a host dies, then a VM would be
> brought up on a spare hosts since all hosts have access to the same storage.
>
> Your other option is to use local storage, but that won't provide HA.
>
>
> - Si
>
>
> 
> From: McClune, James <mcclu...@norwalktruckers.net>
> Sent: Monday, October 30, 2017 2:26 PM
> To: users@cloudstack.apache.org
> Subject: Re: Problems with KVM HA & STONITH
>
> Hi Dag,
>
> Thank you for responding back. I am currently running ACS 4.9 on an Ubuntu
> 14.04 VM. I have the three nodes, each having about 1TB of primary storage
> (NFS) and 1TB of secondary storage (NFS). I added each NFS share into ACS.
> All nodes are in a cluster.
>
> Maybe I'm not understanding the setup or misconfigured something. I'm
> trying to setup an HA environment where if one node goes down, running an
> HA marked VM, the VM will start on another host. When I simulate a network
> disconnect or reboot of a host, all of the nodes go down (STONITH?).
>
> I am unsure on how to setup an HA environment, if all the nodes in the
> cluster go down. Any help is much appreciated!
>
> Thanks,
> James
>
> On Mon, Oct 30, 2017 at 3:49 AM, Dag Sonstebo <dag.sonst...@shapeblue.com>
> wrote:
>
> > Hi James,
> >
> > I think  you possibly have over-configured your KVM hosts. If you use NFS
> > (and no clustered file system like CLVM) then there should be no need to
> > configure STONITH. CloudStack takes care of your HA, so this is not
> > something you offload to the KVM host.
> >
> > (As mentioned the only time I have played with STONITH and CloudStack was
> > for CLVM – and I eventually found it not fit for purpose, too unstable
> and
> > causing too many issues like you describe. Note this was for block
> storage
> > though – not NFS).
> >
> > Regards,
> > Dag Sonstebo
> > Cloud Architect
> > ShapeBlue
> >
> > On 28/10/2017, 03:40, "Ivan Kudryavtsev" <kudryavtsev...@bw-sw.com>
> wrote:
> >
> > Hi. If the node losts nfs host it reboots (acs agent behaviour). If
> you
> > really have 3 storages, you'll go clusterwide reboot everytime your
> > host is
> > down.
> >
> > 28 окт. 2017 г. 3:02 пользователь "Simon Weller"
> > <swel...@ena.com.invalid>
> > написал:
> >
> > > Hi James,
> > >
> > >
> > > Can you elaborate a bit further on the storage? You say you're
> > running NFS
> >     > on all 3 nodes, can you explain how it is setup?
> > >
> > > Also, what version of ACS are you running?
> > >
> > >
> > > - Si
> > >
> > >
> > >
> > >
> > > 
> > > From: McClune, James <mcclu...@norwalktruckers.net>
> > > Sent: Friday, October 27, 2017 2:21 PM
> > > To: users@cloudstack.apache.org
> > > Subject: Problems with KVM HA & STONITH
> > >
> > > Hello Apache CloudStack Community,
> > >
> > > My setup consists of the following:
> > >
> > > - Three nodes (NODE1, NODE2, and NODE3)
> > > NODE1 is running Ubuntu 16.04.3, NODE2 is running Ubuntu 16.04.3,
> > and NODE3
> > > is running Ubuntu 14.04.5.
> > > - Management Server (running on separate VM, not in cluster)
> > >
> > > The three nodes use KVM as the hypervisor. I also configured
> primary
> > and
> >

Re: Problems with KVM HA & STONITH

2017-11-01 Thread Ivan Kudryavtsev
Also you can run ceph if you need HA. I met setup description which uses
compute nodes for ceph cluster nodes simultaneously.

1 нояб. 2017 г. 21:11 пользователь "Simon Weller" <swel...@ena.com.invalid>
написал:

> James,
>
>
> Try just configuring a single NFS server and see if your setup works. If
> you have 3 NFS shares, across all 3 hosts, i'm wondering whether ACS is
> picking the one you rebooted as the storage for your VMs and when that
> storage goes away (when you bounce the host), all storage for your VMs
> vanishes and ACS tries to reboot your other hosts.
>
>
> Normally in a simple ACS setup, you would have a separate storage server
> that can serve up NFS to all hosts. If a host dies, then a VM would be
> brought up on a spare hosts since all hosts have access to the same storage.
>
> Your other option is to use local storage, but that won't provide HA.
>
>
> - Si
>
>
> 
> From: McClune, James <mcclu...@norwalktruckers.net>
> Sent: Monday, October 30, 2017 2:26 PM
> To: users@cloudstack.apache.org
> Subject: Re: Problems with KVM HA & STONITH
>
> Hi Dag,
>
> Thank you for responding back. I am currently running ACS 4.9 on an Ubuntu
> 14.04 VM. I have the three nodes, each having about 1TB of primary storage
> (NFS) and 1TB of secondary storage (NFS). I added each NFS share into ACS.
> All nodes are in a cluster.
>
> Maybe I'm not understanding the setup or misconfigured something. I'm
> trying to setup an HA environment where if one node goes down, running an
> HA marked VM, the VM will start on another host. When I simulate a network
> disconnect or reboot of a host, all of the nodes go down (STONITH?).
>
> I am unsure on how to setup an HA environment, if all the nodes in the
> cluster go down. Any help is much appreciated!
>
> Thanks,
> James
>
> On Mon, Oct 30, 2017 at 3:49 AM, Dag Sonstebo <dag.sonst...@shapeblue.com>
> wrote:
>
> > Hi James,
> >
> > I think  you possibly have over-configured your KVM hosts. If you use NFS
> > (and no clustered file system like CLVM) then there should be no need to
> > configure STONITH. CloudStack takes care of your HA, so this is not
> > something you offload to the KVM host.
> >
> > (As mentioned the only time I have played with STONITH and CloudStack was
> > for CLVM – and I eventually found it not fit for purpose, too unstable
> and
> > causing too many issues like you describe. Note this was for block
> storage
> > though – not NFS).
> >
> > Regards,
> > Dag Sonstebo
> > Cloud Architect
> > ShapeBlue
> >
> > On 28/10/2017, 03:40, "Ivan Kudryavtsev" <kudryavtsev...@bw-sw.com>
> wrote:
> >
> > Hi. If the node losts nfs host it reboots (acs agent behaviour). If
> you
> > really have 3 storages, you'll go clusterwide reboot everytime your
> > host is
> > down.
> >
> > 28 окт. 2017 г. 3:02 пользователь "Simon Weller"
> > <swel...@ena.com.invalid>
> > написал:
> >
> > > Hi James,
> > >
> > >
> > > Can you elaborate a bit further on the storage? You say you're
> > running NFS
> > > on all 3 nodes, can you explain how it is setup?
> > >
> > > Also, what version of ACS are you running?
> > >
> > >
> > > - Si
> > >
> > >
> > >
> > >
> > > 
> > > From: McClune, James <mcclu...@norwalktruckers.net>
> > > Sent: Friday, October 27, 2017 2:21 PM
> > > To: users@cloudstack.apache.org
> > > Subject: Problems with KVM HA & STONITH
> > >
> > > Hello Apache CloudStack Community,
> > >
> > > My setup consists of the following:
> > >
> > > - Three nodes (NODE1, NODE2, and NODE3)
> > > NODE1 is running Ubuntu 16.04.3, NODE2 is running Ubuntu 16.04.3,
> > and NODE3
> > > is running Ubuntu 14.04.5.
> > > - Management Server (running on separate VM, not in cluster)
> > >
> > > The three nodes use KVM as the hypervisor. I also configured
> primary
> > and
> > > secondary storage on all three of the nodes. I'm using NFS for the
> > primary
> > > & secondary storage. VM operations work great. Live migration works
> > great.
> > >
> > > However, when a host goes down, the HA functionality does not work
> > at all.
> > >

Re: Problems with KVM HA & STONITH

2017-11-01 Thread Simon Weller
James,


Try just configuring a single NFS server and see if your setup works. If you 
have 3 NFS shares, across all 3 hosts, i'm wondering whether ACS is picking the 
one you rebooted as the storage for your VMs and when that storage goes away 
(when you bounce the host), all storage for your VMs vanishes and ACS tries to 
reboot your other hosts.


Normally in a simple ACS setup, you would have a separate storage server that 
can serve up NFS to all hosts. If a host dies, then a VM would be brought up on 
a spare hosts since all hosts have access to the same storage.

Your other option is to use local storage, but that won't provide HA.


- Si



From: McClune, James <mcclu...@norwalktruckers.net>
Sent: Monday, October 30, 2017 2:26 PM
To: users@cloudstack.apache.org
Subject: Re: Problems with KVM HA & STONITH

Hi Dag,

Thank you for responding back. I am currently running ACS 4.9 on an Ubuntu
14.04 VM. I have the three nodes, each having about 1TB of primary storage
(NFS) and 1TB of secondary storage (NFS). I added each NFS share into ACS.
All nodes are in a cluster.

Maybe I'm not understanding the setup or misconfigured something. I'm
trying to setup an HA environment where if one node goes down, running an
HA marked VM, the VM will start on another host. When I simulate a network
disconnect or reboot of a host, all of the nodes go down (STONITH?).

I am unsure on how to setup an HA environment, if all the nodes in the
cluster go down. Any help is much appreciated!

Thanks,
James

On Mon, Oct 30, 2017 at 3:49 AM, Dag Sonstebo <dag.sonst...@shapeblue.com>
wrote:

> Hi James,
>
> I think  you possibly have over-configured your KVM hosts. If you use NFS
> (and no clustered file system like CLVM) then there should be no need to
> configure STONITH. CloudStack takes care of your HA, so this is not
> something you offload to the KVM host.
>
> (As mentioned the only time I have played with STONITH and CloudStack was
> for CLVM – and I eventually found it not fit for purpose, too unstable and
> causing too many issues like you describe. Note this was for block storage
> though – not NFS).
>
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
>
> On 28/10/2017, 03:40, "Ivan Kudryavtsev" <kudryavtsev...@bw-sw.com> wrote:
>
> Hi. If the node losts nfs host it reboots (acs agent behaviour). If you
> really have 3 storages, you'll go clusterwide reboot everytime your
> host is
> down.
>
> 28 окт. 2017 г. 3:02 пользователь "Simon Weller"
> <swel...@ena.com.invalid>
> написал:
>
> > Hi James,
> >
> >
> > Can you elaborate a bit further on the storage? You say you're
> running NFS
> > on all 3 nodes, can you explain how it is setup?
> >
> > Also, what version of ACS are you running?
> >
> >
> > - Si
> >
> >
> >
> >
>     > 
> > From: McClune, James <mcclu...@norwalktruckers.net>
> > Sent: Friday, October 27, 2017 2:21 PM
> > To: users@cloudstack.apache.org
> > Subject: Problems with KVM HA & STONITH
> >
> > Hello Apache CloudStack Community,
> >
> > My setup consists of the following:
> >
> > - Three nodes (NODE1, NODE2, and NODE3)
> > NODE1 is running Ubuntu 16.04.3, NODE2 is running Ubuntu 16.04.3,
> and NODE3
> > is running Ubuntu 14.04.5.
> > - Management Server (running on separate VM, not in cluster)
> >
> > The three nodes use KVM as the hypervisor. I also configured primary
> and
> > secondary storage on all three of the nodes. I'm using NFS for the
> primary
> > & secondary storage. VM operations work great. Live migration works
> great.
> >
> > However, when a host goes down, the HA functionality does not work
> at all.
> > Instead of spinning up the VM on another available host, the down
> host
> > seems to trigger STONITH. When STONITH happens, all hosts in the
> cluster go
> > down. This not only causes no HA, but also downs perfectly good
> VM's. I
> > have read countless articles and documentation related to this
> issue. I
> > still cannot find a viable solution for this issue. I really want to
> use
> > Apache CloudStack, but cannot implement this in production when
> STONITH
> > happens.
> >
> > I think I have something misconfigured. I thought I would reach out
> to the
> > CloudStack community and ask for some friendly assistance.
> >
> > If there is anything (sy

Re: Problems with KVM HA & STONITH

2017-10-30 Thread McClune, James
Hi Dag,

Thank you for responding back. I am currently running ACS 4.9 on an Ubuntu
14.04 VM. I have the three nodes, each having about 1TB of primary storage
(NFS) and 1TB of secondary storage (NFS). I added each NFS share into ACS.
All nodes are in a cluster.

Maybe I'm not understanding the setup or misconfigured something. I'm
trying to setup an HA environment where if one node goes down, running an
HA marked VM, the VM will start on another host. When I simulate a network
disconnect or reboot of a host, all of the nodes go down (STONITH?).

I am unsure on how to setup an HA environment, if all the nodes in the
cluster go down. Any help is much appreciated!

Thanks,
James

On Mon, Oct 30, 2017 at 3:49 AM, Dag Sonstebo <dag.sonst...@shapeblue.com>
wrote:

> Hi James,
>
> I think  you possibly have over-configured your KVM hosts. If you use NFS
> (and no clustered file system like CLVM) then there should be no need to
> configure STONITH. CloudStack takes care of your HA, so this is not
> something you offload to the KVM host.
>
> (As mentioned the only time I have played with STONITH and CloudStack was
> for CLVM – and I eventually found it not fit for purpose, too unstable and
> causing too many issues like you describe. Note this was for block storage
> though – not NFS).
>
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
>
> On 28/10/2017, 03:40, "Ivan Kudryavtsev" <kudryavtsev...@bw-sw.com> wrote:
>
> Hi. If the node losts nfs host it reboots (acs agent behaviour). If you
> really have 3 storages, you'll go clusterwide reboot everytime your
> host is
> down.
>
> 28 окт. 2017 г. 3:02 пользователь "Simon Weller"
> <swel...@ena.com.invalid>
> написал:
>
> > Hi James,
> >
> >
> > Can you elaborate a bit further on the storage? You say you're
> running NFS
> > on all 3 nodes, can you explain how it is setup?
> >
> > Also, what version of ACS are you running?
> >
> >
> > - Si
> >
> >
> >
> >
> > ____
> > From: McClune, James <mcclu...@norwalktruckers.net>
> > Sent: Friday, October 27, 2017 2:21 PM
> > To: users@cloudstack.apache.org
> > Subject: Problems with KVM HA & STONITH
> >
> > Hello Apache CloudStack Community,
> >
> > My setup consists of the following:
> >
> > - Three nodes (NODE1, NODE2, and NODE3)
> > NODE1 is running Ubuntu 16.04.3, NODE2 is running Ubuntu 16.04.3,
> and NODE3
> > is running Ubuntu 14.04.5.
> > - Management Server (running on separate VM, not in cluster)
> >
> > The three nodes use KVM as the hypervisor. I also configured primary
> and
> > secondary storage on all three of the nodes. I'm using NFS for the
> primary
> > & secondary storage. VM operations work great. Live migration works
> great.
> >
> > However, when a host goes down, the HA functionality does not work
> at all.
> > Instead of spinning up the VM on another available host, the down
> host
> > seems to trigger STONITH. When STONITH happens, all hosts in the
> cluster go
> > down. This not only causes no HA, but also downs perfectly good
> VM's. I
> > have read countless articles and documentation related to this
> issue. I
> > still cannot find a viable solution for this issue. I really want to
> use
> > Apache CloudStack, but cannot implement this in production when
> STONITH
> > happens.
> >
> > I think I have something misconfigured. I thought I would reach out
> to the
> > CloudStack community and ask for some friendly assistance.
> >
> > If there is anything (system-wise) you request in order to further
> > troubleshoot this issue, please let me know and I'll send. I
> appreciate any
> > help in this issue!
> >
> > --
> >
> > Thanks,
> >
> > James
> >
>
>
>
> dag.sonst...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>


-- 



James McClune

Technical Support Specialist

Norwalk City Schools

Phone: 419-660-6590

mcclu...@norwalktruckers.net


Re: Problems with KVM HA & STONITH

2017-10-30 Thread McClune, James
Hi Simon,

Thank you for responding back. I am currently running ACS 4.9 on an Ubuntu
14.04 VM. I have the three nodes, each having about 1TB of primary storage
(NFS) and 1TB of secondary storage (NFS). I added each NFS share into ACS.
All nodes are in a cluster.

Maybe I'm not understanding the setup or misconfigured something. I'm
trying to setup an HA environment where if one node goes down, running an
HA marked VM, the VM will start on another host. When I simulate a network
disconnect or reboot of a host, all of the nodes go down.

If you request more information, please let me know. Again, any help is
greatly appreciated!

Thanks,
James

On Fri, Oct 27, 2017 at 4:02 PM, Simon Weller <swel...@ena.com.invalid>
wrote:

> Hi James,
>
>
> Can you elaborate a bit further on the storage? You say you're running NFS
> on all 3 nodes, can you explain how it is setup?
>
> Also, what version of ACS are you running?
>
>
> - Si
>
>
>
>
> 
> From: McClune, James <mcclu...@norwalktruckers.net>
> Sent: Friday, October 27, 2017 2:21 PM
> To: users@cloudstack.apache.org
> Subject: Problems with KVM HA & STONITH
>
> Hello Apache CloudStack Community,
>
> My setup consists of the following:
>
> - Three nodes (NODE1, NODE2, and NODE3)
> NODE1 is running Ubuntu 16.04.3, NODE2 is running Ubuntu 16.04.3, and NODE3
> is running Ubuntu 14.04.5.
> - Management Server (running on separate VM, not in cluster)
>
> The three nodes use KVM as the hypervisor. I also configured primary and
> secondary storage on all three of the nodes. I'm using NFS for the primary
> & secondary storage. VM operations work great. Live migration works great.
>
> However, when a host goes down, the HA functionality does not work at all.
> Instead of spinning up the VM on another available host, the down host
> seems to trigger STONITH. When STONITH happens, all hosts in the cluster go
> down. This not only causes no HA, but also downs perfectly good VM's. I
> have read countless articles and documentation related to this issue. I
> still cannot find a viable solution for this issue. I really want to use
> Apache CloudStack, but cannot implement this in production when STONITH
> happens.
>
> I think I have something misconfigured. I thought I would reach out to the
> CloudStack community and ask for some friendly assistance.
>
> If there is anything (system-wise) you request in order to further
> troubleshoot this issue, please let me know and I'll send. I appreciate any
> help in this issue!
>
> --
>
> Thanks,
>
> James
>



-- 



James McClune

Technical Support Specialist

Norwalk City Schools

Phone: 419-660-6590

mcclu...@norwalktruckers.net


Re: Problems with KVM HA & STONITH

2017-10-30 Thread Dag Sonstebo
Hi James,

I think  you possibly have over-configured your KVM hosts. If you use NFS (and 
no clustered file system like CLVM) then there should be no need to configure 
STONITH. CloudStack takes care of your HA, so this is not something you offload 
to the KVM host.

(As mentioned the only time I have played with STONITH and CloudStack was for 
CLVM – and I eventually found it not fit for purpose, too unstable and causing 
too many issues like you describe. Note this was for block storage though – not 
NFS).

Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue

On 28/10/2017, 03:40, "Ivan Kudryavtsev" <kudryavtsev...@bw-sw.com> wrote:

Hi. If the node losts nfs host it reboots (acs agent behaviour). If you
really have 3 storages, you'll go clusterwide reboot everytime your host is
down.

28 окт. 2017 г. 3:02 пользователь "Simon Weller" <swel...@ena.com.invalid>
написал:

> Hi James,
>
>
> Can you elaborate a bit further on the storage? You say you're running NFS
> on all 3 nodes, can you explain how it is setup?
>
> Also, what version of ACS are you running?
>
>
> - Si
>
>
>
>
> 
> From: McClune, James <mcclu...@norwalktruckers.net>
> Sent: Friday, October 27, 2017 2:21 PM
> To: users@cloudstack.apache.org
> Subject: Problems with KVM HA & STONITH
>
> Hello Apache CloudStack Community,
>
> My setup consists of the following:
>
> - Three nodes (NODE1, NODE2, and NODE3)
> NODE1 is running Ubuntu 16.04.3, NODE2 is running Ubuntu 16.04.3, and 
NODE3
> is running Ubuntu 14.04.5.
> - Management Server (running on separate VM, not in cluster)
>
> The three nodes use KVM as the hypervisor. I also configured primary and
> secondary storage on all three of the nodes. I'm using NFS for the primary
> & secondary storage. VM operations work great. Live migration works great.
>
> However, when a host goes down, the HA functionality does not work at all.
> Instead of spinning up the VM on another available host, the down host
> seems to trigger STONITH. When STONITH happens, all hosts in the cluster 
go
> down. This not only causes no HA, but also downs perfectly good VM's. I
> have read countless articles and documentation related to this issue. I
> still cannot find a viable solution for this issue. I really want to use
> Apache CloudStack, but cannot implement this in production when STONITH
> happens.
>
> I think I have something misconfigured. I thought I would reach out to the
> CloudStack community and ask for some friendly assistance.
>
> If there is anything (system-wise) you request in order to further
> troubleshoot this issue, please let me know and I'll send. I appreciate 
any
> help in this issue!
>
> --
>
> Thanks,
>
> James
>



dag.sonst...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 



Re: Problems with KVM HA & STONITH

2017-10-27 Thread Ivan Kudryavtsev
Hi. If the node losts nfs host it reboots (acs agent behaviour). If you
really have 3 storages, you'll go clusterwide reboot everytime your host is
down.

28 окт. 2017 г. 3:02 пользователь "Simon Weller" <swel...@ena.com.invalid>
написал:

> Hi James,
>
>
> Can you elaborate a bit further on the storage? You say you're running NFS
> on all 3 nodes, can you explain how it is setup?
>
> Also, what version of ACS are you running?
>
>
> - Si
>
>
>
>
> 
> From: McClune, James <mcclu...@norwalktruckers.net>
> Sent: Friday, October 27, 2017 2:21 PM
> To: users@cloudstack.apache.org
> Subject: Problems with KVM HA & STONITH
>
> Hello Apache CloudStack Community,
>
> My setup consists of the following:
>
> - Three nodes (NODE1, NODE2, and NODE3)
> NODE1 is running Ubuntu 16.04.3, NODE2 is running Ubuntu 16.04.3, and NODE3
> is running Ubuntu 14.04.5.
> - Management Server (running on separate VM, not in cluster)
>
> The three nodes use KVM as the hypervisor. I also configured primary and
> secondary storage on all three of the nodes. I'm using NFS for the primary
> & secondary storage. VM operations work great. Live migration works great.
>
> However, when a host goes down, the HA functionality does not work at all.
> Instead of spinning up the VM on another available host, the down host
> seems to trigger STONITH. When STONITH happens, all hosts in the cluster go
> down. This not only causes no HA, but also downs perfectly good VM's. I
> have read countless articles and documentation related to this issue. I
> still cannot find a viable solution for this issue. I really want to use
> Apache CloudStack, but cannot implement this in production when STONITH
> happens.
>
> I think I have something misconfigured. I thought I would reach out to the
> CloudStack community and ask for some friendly assistance.
>
> If there is anything (system-wise) you request in order to further
> troubleshoot this issue, please let me know and I'll send. I appreciate any
> help in this issue!
>
> --
>
> Thanks,
>
> James
>


Re: Problems with KVM HA & STONITH

2017-10-27 Thread Simon Weller
Hi James,


Can you elaborate a bit further on the storage? You say you're running NFS on 
all 3 nodes, can you explain how it is setup?

Also, what version of ACS are you running?


- Si





From: McClune, James <mcclu...@norwalktruckers.net>
Sent: Friday, October 27, 2017 2:21 PM
To: users@cloudstack.apache.org
Subject: Problems with KVM HA & STONITH

Hello Apache CloudStack Community,

My setup consists of the following:

- Three nodes (NODE1, NODE2, and NODE3)
NODE1 is running Ubuntu 16.04.3, NODE2 is running Ubuntu 16.04.3, and NODE3
is running Ubuntu 14.04.5.
- Management Server (running on separate VM, not in cluster)

The three nodes use KVM as the hypervisor. I also configured primary and
secondary storage on all three of the nodes. I'm using NFS for the primary
& secondary storage. VM operations work great. Live migration works great.

However, when a host goes down, the HA functionality does not work at all.
Instead of spinning up the VM on another available host, the down host
seems to trigger STONITH. When STONITH happens, all hosts in the cluster go
down. This not only causes no HA, but also downs perfectly good VM's. I
have read countless articles and documentation related to this issue. I
still cannot find a viable solution for this issue. I really want to use
Apache CloudStack, but cannot implement this in production when STONITH
happens.

I think I have something misconfigured. I thought I would reach out to the
CloudStack community and ask for some friendly assistance.

If there is anything (system-wise) you request in order to further
troubleshoot this issue, please let me know and I'll send. I appreciate any
help in this issue!

--

Thanks,

James


Problems with KVM HA & STONITH

2017-10-27 Thread McClune, James
Hello Apache CloudStack Community,

My setup consists of the following:

- Three nodes (NODE1, NODE2, and NODE3)
NODE1 is running Ubuntu 16.04.3, NODE2 is running Ubuntu 16.04.3, and NODE3
is running Ubuntu 14.04.5.
- Management Server (running on separate VM, not in cluster)

The three nodes use KVM as the hypervisor. I also configured primary and
secondary storage on all three of the nodes. I'm using NFS for the primary
& secondary storage. VM operations work great. Live migration works great.

However, when a host goes down, the HA functionality does not work at all.
Instead of spinning up the VM on another available host, the down host
seems to trigger STONITH. When STONITH happens, all hosts in the cluster go
down. This not only causes no HA, but also downs perfectly good VM's. I
have read countless articles and documentation related to this issue. I
still cannot find a viable solution for this issue. I really want to use
Apache CloudStack, but cannot implement this in production when STONITH
happens.

I think I have something misconfigured. I thought I would reach out to the
CloudStack community and ask for some friendly assistance.

If there is anything (system-wise) you request in order to further
troubleshoot this issue, please let me know and I'll send. I appreciate any
help in this issue!

-- 

Thanks,

James


Re: KVM+HA

2017-07-18 Thread ilya musayev
Apology for fragmented messages, in existing framework cloudstack does not
know for certain if your VMs are dead, or KVM hypervisor crashed, or its
just a network blip, or perhaps you stopped kvm agent (or agent died). It
takes a conservative approach and does not re-start the VMs on other
hypervisors to avoid split brain scenario.

The only time it will restart KVM hypervisor and move VMs over - is when
you loose a primary storage access to one of the hypervisors in the cluster
- using NFS heartbeat method i mentioned earlier.

New framework addresses the limitations above by
1) checking if there is any disk activity on VMs that are in uncertain
state - if no activity for ALL VMs for "x" number of seconds
2) cloudstack will issue IPMI fence command to power down/reboot a host
(via ILO or DRAC or something else similar)
3) the VMs will be restarted elsewhere

Regards
ilya

On Tue, Jul 18, 2017 at 6:10 AM, ilya musayev <ilya.mailing.li...@gmail.com>
wrote:

> What share primary storage backend do you have for your VMs?
>
> If it is NFS - cloudstack agent writes heartbeat. When issue occurs - the
> neighbor hosts will check if the hypervisor that failed - still writes to
> heartbeat file. There are bunch of corner case where cloudstack HA does not
> kick in - due to uncertainty.
>
> The new framework should address those uncertainties.
>
> KVM HA with IPMI Fencing - Apache Cloudstack - Apache Software ...
> <https://www.google.com/url?sa=t=j==s=web=1=rja=8=0ahUKEwi59uv58pLVAhXHslQKHSU_B5YQFgg2MAA=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FCLOUDSTACK%2FKVM%2BHA%2Bwith%2BIPMI%2BFencing=AFQjCNG_-JHCYhcZm0lM9M4gKM4vKQ3hew>
> [CLOUDSTACK-8943] KVM HA is broken, let's fix it - ASF JIRA
> <https://www.google.com/url?sa=t=j==s=web=2=rja=8=0ahUKEwi59uv58pLVAhXHslQKHSU_B5YQFgg9MAE=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCLOUDSTACK-8943=AFQjCNGkOyC0hR4otCJ1LZF4j-2HSayMyQ>
>
> Regards
> ilya
>
> On Tue, Jul 18, 2017 at 6:06 AM, ilya musayev <
> ilya.mailing.li...@gmail.com> wrote:
>
>> Hi Victor
>>
>> We recently rewrote KVM HA framework. Its being merged into latest build.
>>
>>
>> On Tue, Jul 18, 2017 at 5:39 AM, victor <vic...@ihnetworks.com> wrote:
>>
>>> Hello Guys,
>>>
>>> I am facing the same issue that mentioned in the following url .
>>>
>>> -
>>>
>>> https://issues.apache.org/jira/browse/CLOUDSTACK-3535
>>>
>>> -
>>>
>>> When the host is put in maintenance mode , then ha enabled VM's are
>>> automatically migrated to available host. But when the kvm host is down, no
>>> HA is done. The vm's are still down until I put the host node back up.
>>>
>>>
>>> I have tried everything like the following.
>>>
>>> =
>>>
>>> 1, system VM's  and client vm's are created in shared storage
>>>
>>> 3, Added ha.tag host tags
>>>
>>> 2, Created host by adding ha tag
>>>
>>> 3, Created VE's  in Ha enabled host with ha enabled service offering
>>>
>>> 
>>>
>>> Do you guys have successfully tested Ha. I am really stuck at this part.
>>>
>>> Regards
>>>
>>>
>>>
>>>
>>
>


Re: KVM+HA

2017-07-18 Thread ilya musayev
What share primary storage backend do you have for your VMs?

If it is NFS - cloudstack agent writes heartbeat. When issue occurs - the
neighbor hosts will check if the hypervisor that failed - still writes to
heartbeat file. There are bunch of corner case where cloudstack HA does not
kick in - due to uncertainty.

The new framework should address those uncertainties.

KVM HA with IPMI Fencing - Apache Cloudstack - Apache Software ...
<https://www.google.com/url?sa=t=j==s=web=1=rja=8=0ahUKEwi59uv58pLVAhXHslQKHSU_B5YQFgg2MAA=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FCLOUDSTACK%2FKVM%2BHA%2Bwith%2BIPMI%2BFencing=AFQjCNG_-JHCYhcZm0lM9M4gKM4vKQ3hew>
[CLOUDSTACK-8943] KVM HA is broken, let's fix it - ASF JIRA
<https://www.google.com/url?sa=t=j==s=web=2=rja=8=0ahUKEwi59uv58pLVAhXHslQKHSU_B5YQFgg9MAE=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCLOUDSTACK-8943=AFQjCNGkOyC0hR4otCJ1LZF4j-2HSayMyQ>

Regards
ilya

On Tue, Jul 18, 2017 at 6:06 AM, ilya musayev <ilya.mailing.li...@gmail.com>
wrote:

> Hi Victor
>
> We recently rewrote KVM HA framework. Its being merged into latest build.
>
>
> On Tue, Jul 18, 2017 at 5:39 AM, victor <vic...@ihnetworks.com> wrote:
>
>> Hello Guys,
>>
>> I am facing the same issue that mentioned in the following url .
>>
>> -
>>
>> https://issues.apache.org/jira/browse/CLOUDSTACK-3535
>>
>> -
>>
>> When the host is put in maintenance mode , then ha enabled VM's are
>> automatically migrated to available host. But when the kvm host is down, no
>> HA is done. The vm's are still down until I put the host node back up.
>>
>>
>> I have tried everything like the following.
>>
>> =
>>
>> 1, system VM's  and client vm's are created in shared storage
>>
>> 3, Added ha.tag host tags
>>
>> 2, Created host by adding ha tag
>>
>> 3, Created VE's  in Ha enabled host with ha enabled service offering
>>
>> 
>>
>> Do you guys have successfully tested Ha. I am really stuck at this part.
>>
>> Regards
>>
>>
>>
>>
>


Re: KVM+HA

2017-07-18 Thread ilya musayev
Hi Victor

We recently rewrote KVM HA framework. Its being merged into latest build.


On Tue, Jul 18, 2017 at 5:39 AM, victor <vic...@ihnetworks.com> wrote:

> Hello Guys,
>
> I am facing the same issue that mentioned in the following url .
>
> -
>
> https://issues.apache.org/jira/browse/CLOUDSTACK-3535
>
> -
>
> When the host is put in maintenance mode , then ha enabled VM's are
> automatically migrated to available host. But when the kvm host is down, no
> HA is done. The vm's are still down until I put the host node back up.
>
>
> I have tried everything like the following.
>
> =
>
> 1, system VM's  and client vm's are created in shared storage
>
> 3, Added ha.tag host tags
>
> 2, Created host by adding ha tag
>
> 3, Created VE's  in Ha enabled host with ha enabled service offering
>
> 
>
> Do you guys have successfully tested Ha. I am really stuck at this part.
>
> Regards
>
>
>
>


KVM+HA

2017-07-18 Thread victor

Hello Guys,

I am facing the same issue that mentioned in the following url .

-

https://issues.apache.org/jira/browse/CLOUDSTACK-3535

-

When the host is put in maintenance mode , then ha enabled VM's are 
automatically migrated to available host. But when the kvm host is down, 
no HA is done. The vm's are still down until I put the host node back up.



I have tried everything like the following.

=

1, system VM's  and client vm's are created in shared storage

3, Added ha.tag host tags

2, Created host by adding ha tag

3, Created VE's  in Ha enabled host with ha enabled service offering



Do you guys have successfully tested Ha. I am really stuck at this part.

Regards





Re: KVM HA is broken, let's fix it

2015-10-19 Thread Özhan Rüzgar Karaman
Hi;
This IPMI fencing is the technology where most of cloud providers like
OVirt use, so its good. How could we test this IPMI Fencing feature, where
could i find its scripts and its usage/test documents? I have some test
hardwares and i really like to try it.

Thanks
Özhan

On Sat, Oct 17, 2015 at 2:44 AM, ilya <ilya.mailing.li...@gmail.com> wrote:

> Please see another thread on DEV that proposes the fix for KVM HA ->
> [DISCUSS] KVM HA with IPMI Fencing
>
>
> 
>
> We propose the following solution that in our understanding should cover
> all use cases and provide a fencing mechanism.
>
> NOTE: Proposed IPMI fencing, is just a script. If you are using HP
> hardware with ILO, it could be an ILO executable with specific
> parameters. In theory - this can be *any*  script not just IPMI.
>
> Please take few minutes to read this through, to avoid duplicate efforts...
>
>
> Proposed FS below:
> ----
>
>
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/KVM+HA+with+IPMI+Fencing
>
>
> On 10/12/15 12:54 AM, Frank Louwers wrote:
> >
> >> On 10 Oct 2015, at 12:35, Remi Bergsma <rberg...@schubergphilis.com>
> wrote:
> >>
> >> Can you please explain what the issue is with KVM HA? In my tests, HA
> starts all VMs just fine without the hypervisor coming back. At least that
> is on current 4.6. Assuming a cluster of multiple nodes of course. It will
> then do a neighbor check from another host in the same cluster.
> >>
> >> Also, malfunctioning NFS leads to corruption and therefore we fence a
> box when the shared storage is unreliable. Combining primary and secondary
> NFS is not a good idea for production in my opinion.
> >
> > Well, it depends how you look at it, and what your situation is.
> >
> > If you use 1 NFS export als primary storage (and only NFS), then yes,
> the system works as one would expect, and doesn’t need to be fixed.
> >
> > However, HA is “not functioning” in any of these scenario’s:
> >
> > - you don’t use NFS as your only primary storage
> > - you use more than one NFS primary storage
> >
> > Even worse: imagine you only use local storage as primary storage, but
> have 1 NFS configured (as the UI “wizard” forces you to configure one). You
> don’t have any active VM configured on the primary storage. You then
> perform maintenance on the NFS storage, and take it offline…
> >
> > All your hosts will then reboot, resulting in major downtime, that’s
> completely unnecessary. There’s not even an option to disable this at this
> point… We’ve removed the reboot instructions from the HA script on all our
> instances…
> >
> > Regards,
> >
> > Frank
> >
>


Re: KVM HA is broken, let's fix it

2015-10-16 Thread ilya
Please see another thread on DEV that proposes the fix for KVM HA ->
[DISCUSS] KVM HA with IPMI Fencing




We propose the following solution that in our understanding should cover
all use cases and provide a fencing mechanism.

NOTE: Proposed IPMI fencing, is just a script. If you are using HP
hardware with ILO, it could be an ILO executable with specific
parameters. In theory - this can be *any*  script not just IPMI.

Please take few minutes to read this through, to avoid duplicate efforts...


Proposed FS below:


https://cwiki.apache.org/confluence/display/CLOUDSTACK/KVM+HA+with+IPMI+Fencing


On 10/12/15 12:54 AM, Frank Louwers wrote:
> 
>> On 10 Oct 2015, at 12:35, Remi Bergsma <rberg...@schubergphilis.com> wrote:
>>
>> Can you please explain what the issue is with KVM HA? In my tests, HA starts 
>> all VMs just fine without the hypervisor coming back. At least that is on 
>> current 4.6. Assuming a cluster of multiple nodes of course. It will then do 
>> a neighbor check from another host in the same cluster. 
>>
>> Also, malfunctioning NFS leads to corruption and therefore we fence a box 
>> when the shared storage is unreliable. Combining primary and secondary NFS 
>> is not a good idea for production in my opinion. 
> 
> Well, it depends how you look at it, and what your situation is.
> 
> If you use 1 NFS export als primary storage (and only NFS), then yes, the 
> system works as one would expect, and doesn’t need to be fixed.
> 
> However, HA is “not functioning” in any of these scenario’s:
> 
> - you don’t use NFS as your only primary storage
> - you use more than one NFS primary storage
> 
> Even worse: imagine you only use local storage as primary storage, but have 1 
> NFS configured (as the UI “wizard” forces you to configure one). You don’t 
> have any active VM configured on the primary storage. You then perform 
> maintenance on the NFS storage, and take it offline…
> 
> All your hosts will then reboot, resulting in major downtime, that’s 
> completely unnecessary. There’s not even an option to disable this at this 
> point… We’ve removed the reboot instructions from the HA script on all our 
> instances…
> 
> Regards,
> 
> Frank
> 


Re: KVM HA is broken, let's fix it

2015-10-12 Thread Frank Louwers

> On 10 Oct 2015, at 12:35, Remi Bergsma <rberg...@schubergphilis.com> wrote:
> 
> Can you please explain what the issue is with KVM HA? In my tests, HA starts 
> all VMs just fine without the hypervisor coming back. At least that is on 
> current 4.6. Assuming a cluster of multiple nodes of course. It will then do 
> a neighbor check from another host in the same cluster. 
> 
> Also, malfunctioning NFS leads to corruption and therefore we fence a box 
> when the shared storage is unreliable. Combining primary and secondary NFS is 
> not a good idea for production in my opinion. 

Well, it depends how you look at it, and what your situation is.

If you use 1 NFS export als primary storage (and only NFS), then yes, the 
system works as one would expect, and doesn’t need to be fixed.

However, HA is “not functioning” in any of these scenario’s:

- you don’t use NFS as your only primary storage
- you use more than one NFS primary storage

Even worse: imagine you only use local storage as primary storage, but have 1 
NFS configured (as the UI “wizard” forces you to configure one). You don’t have 
any active VM configured on the primary storage. You then perform maintenance 
on the NFS storage, and take it offline…

All your hosts will then reboot, resulting in major downtime, that’s completely 
unnecessary. There’s not even an option to disable this at this point… We’ve 
removed the reboot instructions from the HA script on all our instances…

Regards,

Frank

Re: KVM HA is broken, let's fix it

2015-10-10 Thread Nux!
Hi Remi,

So we started here with Andrei (v4.5) complaining a slow NFS causes a mass 
reboot:
http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201510.mbox/%3C18886119.904.1444382474932.JavaMail.andrei%40tuchka%3E


My claim that the VM is not started until the HV is back is not based on 
personal testing alas, but on Marcus' statement below as well as Simon Weller's 
reply in the "slow nfs = reboot all hosts" thread above:
http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201508.mbox/%3CCALFpzo5CotX0Qz%2Bd_OXEZJGYTau%2BfA%2Bmzxg_yQEUzswi_9gz5w%40mail.gmail.com%3E

If what you say is true about the HV not having to come back then this is 
great; we need to double check this is actually the case.
We could then try to tweak the settings in the heartbeat script to be more 
forgiving re timeouts and/or to add additional logic such as checking if other 
nodes or the mgmt server is online (therefore the HV has network) before 
rebooting.

Any further thoughts are welcome. I'll try to setup HA on my test deployment 
and check.

Lucian

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Remi Bergsma" <rberg...@schubergphilis.com>
> To: d...@cloudstack.apache.org
> Cc: "Cloudstack Users List" <users@cloudstack.apache.org>
> Sent: Saturday, 10 October, 2015 11:35:36
> Subject: Re: KVM HA is broken, let's fix it

> Hi Lucian,
> 
> Can you please explain what the issue is with KVM HA? In my tests, HA starts 
> all
> VMs just fine without the hypervisor coming back. At least that is on current
> 4.6. Assuming a cluster of multiple nodes of course. It will then do a 
> neighbor
> check from another host in the same cluster.
> 
> Also, malfunctioning NFS leads to corruption and therefore we fence a box when
> the shared storage is unreliable. Combining primary and secondary NFS is not a
> good idea for production in my opinion.
> 
> I'm happy to help and if you have a scenario I can replay I will try that in 
> my
> lab.
> 
> Regards, Remi
> 
> Sent from my iPhone
> 
>> On 10 Oct 2015, at 00:19, Nux! <n...@li.nux.ro> wrote:
>> 
>> Hello,
>> 
>> Following a recent thread on the users ml where slow NFS caused a mass 
>> reboot, I
>> have opened the following issue about improving HA on KVM.
>> https://issues.apache.org/jira/browse/CLOUDSTACK-8943
>> 
>> I know there are many people around here who use KVM and are interested in a
>> more robust way of doing HA.
>> 
>> Please share your ideas, comments, suggestions, let's see what we can come up
>> with to make this better.
>> 
>> Regards,
>> Lucian
>> 
>> --
>> Sent from the Delta quadrant using Borg technology!
>> 
>> Nux!
> > www.nux.ro


Re: KVM HA is broken, let's fix it

2015-10-10 Thread Remi Bergsma
Hi Lucian,

Can you please explain what the issue is with KVM HA? In my tests, HA starts 
all VMs just fine without the hypervisor coming back. At least that is on 
current 4.6. Assuming a cluster of multiple nodes of course. It will then do a 
neighbor check from another host in the same cluster. 

Also, malfunctioning NFS leads to corruption and therefore we fence a box when 
the shared storage is unreliable. Combining primary and secondary NFS is not a 
good idea for production in my opinion. 

I'm happy to help and if you have a scenario I can replay I will try that in my 
lab. 

Regards, Remi 

Sent from my iPhone

> On 10 Oct 2015, at 00:19, Nux! <n...@li.nux.ro> wrote:
> 
> Hello, 
> 
> Following a recent thread on the users ml where slow NFS caused a mass 
> reboot, I have opened the following issue about improving HA on KVM.
> https://issues.apache.org/jira/browse/CLOUDSTACK-8943
> 
> I know there are many people around here who use KVM and are interested in a 
> more robust way of doing HA.
> 
> Please share your ideas, comments, suggestions, let's see what we can come up 
> with to make this better.
> 
> Regards,
> Lucian
> 
> --
> Sent from the Delta quadrant using Borg technology!
> 
> Nux!
> www.nux.ro


KVM HA is broken, let's fix it

2015-10-09 Thread Nux!
Hello, 

Following a recent thread on the users ml where slow NFS caused a mass reboot, 
I have opened the following issue about improving HA on KVM.
https://issues.apache.org/jira/browse/CLOUDSTACK-8943

I know there are many people around here who use KVM and are interested in a 
more robust way of doing HA.

Please share your ideas, comments, suggestions, let's see what we can come up 
with to make this better.

Regards,
Lucian

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro


CS 4.2 kvm ha issues - NullPointerException

2013-09-24 Thread Valery Ciareszka
Hi all,

I try to test HA on CS 4.2/KVM and I have java.lang.NullPointerException.

ENV used:
CS 4.2(rev 2963) management on centos 6.4
CS 4.2(rev 2963) agent on centos 6.4 (node1,2)
separate nfs server as primary/secondary storage
zone type: KVM

scenario:
I create several VMs with HA-enabled offering.
Then I switch power on node2 via IPMI.
expected behaviour:
  ha-enabled vms from node2 should start on node1
real behaviour:
  vms remain in stopped state, unassigned to any host, NullPointerException
in mgmt log:

2013-09-24 11:21:25,500 ERROR [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-4:work-4) Terminating HAWork[4-HA-6-Running-Scheduled]
java.lang.NullPointerException
at
com.cloud.storage.VolumeManagerImpl.canVmRestartOnAnotherServer(VolumeManagerImpl.java:2641)
at
com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:516)
at
com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:831)

Full log is here: http://pastebin.com/upnEA601

Any thoughts ?




-- 
Regards,
Valery

http://protocol.by/slayer


CS 4.1 + KVM + HA

2013-07-08 Thread Jingyi Zhang
您好,

请问基于kvm的HA好使吗? 我按照官方文档把HA功能设置好了, VM和另一台host(一台跑VM, 另一台做HA)都设置了HA.
测试1: 把VM关闭, CS成功自动在HA host上把VM重新启动了
测试2: 当我把跑VM的host直接关机时, CS不能在HA host上把MV重新启动,
甚至都不能成功判断host和VM的状态应该为disconnected.

请问是BUG吗?

致敬,
张