Re: Cold VM migration across datacenters
Hello Adam We don't want to use Galera, the master-master setup is just more convenient in case of rebuild as a master-slave, but we will not write in both side in the same time Thanks for your insight BR A On Wed, 21 Jul 2021 at 17:13, Adam Witwicki wrote: > Hi Vladimir > > I would like to mention we have had issues with master-master-master > galara replication for the database, locks are not replicated. We had to > make sure all CS management servers are using the same master server at the > same time. > A proxy and a vip works fine. > Was just a little gotcha for us. > > Adam > > -Original Message- > From: Vladimir Dombrovski > Sent: 21 July 2021 16:09 > To: users@cloudstack.apache.org > Subject: Re: Cold VM migration across datacenters > > ** This mail originated from OUTSIDE the Oakford corporate network. Treat > hyperlinks and attachments in this email with caution. ** > > Hello Andrija, > > Thank you for the swift response. You did correctly understand our setup, > which is aimed towards disaster recovery (DRP), rather than business > continuity (BCP). > > From a strictly technical point of view, we know that running a unique > cluster is the only way to migrate compute resources between our > datacenters when using KVM. Our Ceph storage is indeed running as a stretch > cluster, and is provisioned with a CRUSH map that allows for the loss of a > datacenter. > > We have currently tested the 4.15 release of Cloudstack, which doesn't yet > integrate the changes you've described in (1). However, due to the fact > that the latency between our datacenters is quite low (~2ms), we could > allow less isolation between entities. > > What we are however looking into, is to ensure that in our configuration, > any and all resources are pointed towards the correct local endpoint. To be > more specific, these endpoints include: > > - The address of the manager, which we could technically move to a VIP, as > our hosts are plugged on the same subnet. From there, as we have > master-master MySQL replication setup, the agents on the DR site should > reconnect with the backup manager without much issue (we are still testing > this hypothesis). > - The address of our primary storage, with which we're struggling. > Technically we have a Ceph endpoint on each site, and we would like to keep > it that way. This implies that we add multiple addresses for the same > storage. This is possible in libvird as you can define multiple tags > in the declaration XML. We haven't found a way to do the same via > Cloudstack > - The address of our secondary storage, which is in our case NFS and is > simply plugged into Ceph using a different pool. > > All this being said, I believe that we would eventually brute-force our > way into a working setup, that would look similar to the solution (2) > you've described. We are still looking for ways to do this more elegantly > so we would be glad to hear any more ideas. > > As a side note, are the DB hacks you have mentioned simple IP > replacements? Or are there deeper modifications to be made? > > Vladimir > > On 2021/07/21 13:48:29, Andrija Panic wrote: > > Migration between zones is NOT possible in any shape or form, so this > > is a route you should, IMO, abandon (you can always export VMs in this > > way or another, but this is not feasible in production) > > > > I understand you have 2 DCs and you want VMs to, eventually, become > > alive in 2nd DC, if the plane crashes on 1st DC? (well, your data is > > there, unless CEPH is stretched/distributed across 2 DCs and could > > survive the whole DC1 going down) > > > > If you are insisting on that HA level - then you could do it in 2 > > ways, that cross my mind right now. > > (CEPH as distributed storage, zone wide, some nodes DC1, some DC2 - > > make sure your CEPH setup survives whole DC going down (this requires > > that CRUSH maps correctly configured etc) > > > > > > (1) DC1 = Pod1 (1/2/3) and DC2 = Pod2 (or Pod 4/5/6 etc) - i.e. > multiple > > Pods per DC - they all will be using zone-wide Ceph storage - your VMs > > are on your storage, that is the crucial part to not lose data. > > -- you can't really migrate VMs between Pods, only within cluster (and > > in some cases between clusters in the same Pod, staring from 4.16) > > -- this is OK if you have not-low-enough-latency between DC1 and DC2 > > (but then CEPH will also suffer from that higher latency) > > > > (2) A very untipical, not recommends, but technically possible setup - > > DC1+DC2 = one large DC = 1 POD = 1 cluster (or more clusters if > > DC1+needed) - > > still using CEPH as before > > -- requires ultra-low latency between DC1 and DC2 - and if plane > > crashes on > > DC1 (taking this example, as I've been to some Zurich DCs next to the > > airport...) - you can still start VM on hosts in DC2 in case it was a > > single cluster. In case you had multiple clusters - then it get's more > > complicated (minor DB hacks) etc. > > > > In both cases you
RE: Cold VM migration across datacenters
Hi Vladimir I would like to mention we have had issues with master-master-master galara replication for the database, locks are not replicated. We had to make sure all CS management servers are using the same master server at the same time. A proxy and a vip works fine. Was just a little gotcha for us. Adam -Original Message- From: Vladimir Dombrovski Sent: 21 July 2021 16:09 To: users@cloudstack.apache.org Subject: Re: Cold VM migration across datacenters ** This mail originated from OUTSIDE the Oakford corporate network. Treat hyperlinks and attachments in this email with caution. ** Hello Andrija, Thank you for the swift response. You did correctly understand our setup, which is aimed towards disaster recovery (DRP), rather than business continuity (BCP). From a strictly technical point of view, we know that running a unique cluster is the only way to migrate compute resources between our datacenters when using KVM. Our Ceph storage is indeed running as a stretch cluster, and is provisioned with a CRUSH map that allows for the loss of a datacenter. We have currently tested the 4.15 release of Cloudstack, which doesn't yet integrate the changes you've described in (1). However, due to the fact that the latency between our datacenters is quite low (~2ms), we could allow less isolation between entities. What we are however looking into, is to ensure that in our configuration, any and all resources are pointed towards the correct local endpoint. To be more specific, these endpoints include: - The address of the manager, which we could technically move to a VIP, as our hosts are plugged on the same subnet. From there, as we have master-master MySQL replication setup, the agents on the DR site should reconnect with the backup manager without much issue (we are still testing this hypothesis). - The address of our primary storage, with which we're struggling. Technically we have a Ceph endpoint on each site, and we would like to keep it that way. This implies that we add multiple addresses for the same storage. This is possible in libvird as you can define multiple tags in the declaration XML. We haven't found a way to do the same via Cloudstack - The address of our secondary storage, which is in our case NFS and is simply plugged into Ceph using a different pool. All this being said, I believe that we would eventually brute-force our way into a working setup, that would look similar to the solution (2) you've described. We are still looking for ways to do this more elegantly so we would be glad to hear any more ideas. As a side note, are the DB hacks you have mentioned simple IP replacements? Or are there deeper modifications to be made? Vladimir On 2021/07/21 13:48:29, Andrija Panic wrote: > Migration between zones is NOT possible in any shape or form, so this > is a route you should, IMO, abandon (you can always export VMs in this > way or another, but this is not feasible in production) > > I understand you have 2 DCs and you want VMs to, eventually, become > alive in 2nd DC, if the plane crashes on 1st DC? (well, your data is > there, unless CEPH is stretched/distributed across 2 DCs and could > survive the whole DC1 going down) > > If you are insisting on that HA level - then you could do it in 2 > ways, that cross my mind right now. > (CEPH as distributed storage, zone wide, some nodes DC1, some DC2 - > make sure your CEPH setup survives whole DC going down (this requires > that CRUSH maps correctly configured etc) > > > (1) DC1 = Pod1 (1/2/3) and DC2 = Pod2 (or Pod 4/5/6 etc) - i.e. multiple > Pods per DC - they all will be using zone-wide Ceph storage - your VMs > are on your storage, that is the crucial part to not lose data. > -- you can't really migrate VMs between Pods, only within cluster (and > in some cases between clusters in the same Pod, staring from 4.16) > -- this is OK if you have not-low-enough-latency between DC1 and DC2 > (but then CEPH will also suffer from that higher latency) > > (2) A very untipical, not recommends, but technically possible setup - > DC1+DC2 = one large DC = 1 POD = 1 cluster (or more clusters if > DC1+needed) - > still using CEPH as before > -- requires ultra-low latency between DC1 and DC2 - and if plane > crashes on > DC1 (taking this example, as I've been to some Zurich DCs next to the > airport...) - you can still start VM on hosts in DC2 in case it was a > single cluster. In case you had multiple clusters - then it get's more > complicated (minor DB hacks) etc. > > In both cases you still have to sort out Secondary Storage NFS HA > > In general, you can't achieve what you want that easy nor you should > be stretching the possibilities (that I just explained, as I would, > probably, never use them in production) > > I guess I didn't help - but there you go. > > Andrija > > > > On Tue, 20 Jul 2021 at 16:28, Vladimir Dombrovski < > vladimir.dombrov...@bso.co> wrote: > > > Hello, > > > > We're trying to
Re: Cold VM migration across datacenters
Hello Andrija, Thank you for the swift response. You did correctly understand our setup, which is aimed towards disaster recovery (DRP), rather than business continuity (BCP). >From a strictly technical point of view, we know that running a unique cluster >is the only way to migrate compute resources between our datacenters when >using KVM. Our Ceph storage is indeed running as a stretch cluster, and is >provisioned with a CRUSH map that allows for the loss of a datacenter. We have currently tested the 4.15 release of Cloudstack, which doesn't yet integrate the changes you've described in (1). However, due to the fact that the latency between our datacenters is quite low (~2ms), we could allow less isolation between entities. What we are however looking into, is to ensure that in our configuration, any and all resources are pointed towards the correct local endpoint. To be more specific, these endpoints include: - The address of the manager, which we could technically move to a VIP, as our hosts are plugged on the same subnet. From there, as we have master-master MySQL replication setup, the agents on the DR site should reconnect with the backup manager without much issue (we are still testing this hypothesis). - The address of our primary storage, with which we're struggling. Technically we have a Ceph endpoint on each site, and we would like to keep it that way. This implies that we add multiple addresses for the same storage. This is possible in libvird as you can define multiple tags in the declaration XML. We haven't found a way to do the same via Cloudstack - The address of our secondary storage, which is in our case NFS and is simply plugged into Ceph using a different pool. All this being said, I believe that we would eventually brute-force our way into a working setup, that would look similar to the solution (2) you've described. We are still looking for ways to do this more elegantly so we would be glad to hear any more ideas. As a side note, are the DB hacks you have mentioned simple IP replacements? Or are there deeper modifications to be made? Vladimir On 2021/07/21 13:48:29, Andrija Panic wrote: > Migration between zones is NOT possible in any shape or form, so this is a > route you should, IMO, abandon (you can always export VMs in this way or > another, but this is not feasible in production) > > I understand you have 2 DCs and you want VMs to, eventually, become alive > in 2nd DC, if the plane crashes on 1st DC? (well, your data is there, > unless CEPH is stretched/distributed across 2 DCs and could survive the > whole DC1 going down) > > If you are insisting on that HA level - then you could do it in 2 ways, > that cross my mind right now. > (CEPH as distributed storage, zone wide, some nodes DC1, some DC2 - make > sure your CEPH setup survives whole DC going down (this requires that CRUSH > maps correctly configured etc) > > > (1) DC1 = Pod1 (1/2/3) and DC2 = Pod2 (or Pod 4/5/6 etc) - i.e. multiple > Pods per DC - they all will be using zone-wide Ceph storage - your VMs are > on your storage, that is the crucial part to not lose data. > -- you can't really migrate VMs between Pods, only within cluster (and in > some cases between clusters in the same Pod, staring from 4.16) > -- this is OK if you have not-low-enough-latency between DC1 and DC2 (but > then CEPH will also suffer from that higher latency) > > (2) A very untipical, not recommends, but technically possible setup - > DC1+DC2 = one large DC = 1 POD = 1 cluster (or more clusters if needed) - > still using CEPH as before > -- requires ultra-low latency between DC1 and DC2 - and if plane crashes on > DC1 (taking this example, as I've been to some Zurich DCs next to the > airport...) - you can still start VM on hosts in DC2 in case it was a > single cluster. In case you had multiple clusters - then it get's more > complicated (minor DB hacks) etc. > > In both cases you still have to sort out Secondary Storage NFS HA > > In general, you can't achieve what you want that easy nor you should be > stretching the possibilities (that I just explained, as I would, probably, > never use them in production) > > I guess I didn't help - but there you go. > > Andrija > > > > On Tue, 20 Jul 2021 at 16:28, Vladimir Dombrovski < > vladimir.dombrov...@bso.co> wrote: > > > Hello, > > > > We're trying to draw a multisite architecture where any VM could be > > relocated to the secondary site whenever the primary site fails > > (primary/backup for disaster recovery purposes). We don't require live > > migration, and we are okay with shutting down machines in order to relocate > > them. > > > > We are using Cloudstack 4.15 on Ubuntu Focal. In our current setup, each > > datacenter has a Cloudstack management node, as well as a few hypervisors > > running KVM and a Cloudstack agent. We're using Ceph as our primary > > storage, and NFS as our secondary storage on each site. > > > > To ensure
Re: Cold VM migration across datacenters
Migration between zones is NOT possible in any shape or form, so this is a route you should, IMO, abandon (you can always export VMs in this way or another, but this is not feasible in production) I understand you have 2 DCs and you want VMs to, eventually, become alive in 2nd DC, if the plane crashes on 1st DC? (well, your data is there, unless CEPH is stretched/distributed across 2 DCs and could survive the whole DC1 going down) If you are insisting on that HA level - then you could do it in 2 ways, that cross my mind right now. (CEPH as distributed storage, zone wide, some nodes DC1, some DC2 - make sure your CEPH setup survives whole DC going down (this requires that CRUSH maps correctly configured etc) (1) DC1 = Pod1 (1/2/3) and DC2 = Pod2 (or Pod 4/5/6 etc) - i.e. multiple Pods per DC - they all will be using zone-wide Ceph storage - your VMs are on your storage, that is the crucial part to not lose data. -- you can't really migrate VMs between Pods, only within cluster (and in some cases between clusters in the same Pod, staring from 4.16) -- this is OK if you have not-low-enough-latency between DC1 and DC2 (but then CEPH will also suffer from that higher latency) (2) A very untipical, not recommends, but technically possible setup - DC1+DC2 = one large DC = 1 POD = 1 cluster (or more clusters if needed) - still using CEPH as before -- requires ultra-low latency between DC1 and DC2 - and if plane crashes on DC1 (taking this example, as I've been to some Zurich DCs next to the airport...) - you can still start VM on hosts in DC2 in case it was a single cluster. In case you had multiple clusters - then it get's more complicated (minor DB hacks) etc. In both cases you still have to sort out Secondary Storage NFS HA In general, you can't achieve what you want that easy nor you should be stretching the possibilities (that I just explained, as I would, probably, never use them in production) I guess I didn't help - but there you go. Andrija On Tue, 20 Jul 2021 at 16:28, Vladimir Dombrovski < vladimir.dombrov...@bso.co> wrote: > Hello, > > We're trying to draw a multisite architecture where any VM could be > relocated to the secondary site whenever the primary site fails > (primary/backup for disaster recovery purposes). We don't require live > migration, and we are okay with shutting down machines in order to relocate > them. > > We are using Cloudstack 4.15 on Ubuntu Focal. In our current setup, each > datacenter has a Cloudstack management node, as well as a few hypervisors > running KVM and a Cloudstack agent. We're using Ceph as our primary > storage, and NFS as our secondary storage on each site. > > To ensure metadata resiliency, we've replicated the MySQL database across > both sites, much like described following this guide: > > > https://docs.cloudstack.apache.org/projects/cloudstack-installation/en/4.11/choosing_deployment_architecture.html#multi-site-deployment > > We tried setting up multiple zones, one for each datacenter, each one > having its own primary storage, but we are faced with the issue where we > are not able to migrate VMs across zones (only Pod/Cluster/Host level is > available via the GUI and the Cloudmonkey CLI). > > Are we using the right level of abstraction for our case? If so, how can > we migrate a VM (compute + storage) from one zone to another? If not, what > is the right level to use that allows us to use two separate primary > storage endpoints and ensures that only the primary site gets used for > compute resource allocation in normal conditions? > > Also, we would like to know whether there is some documentation already > touching on the subject of best practices when performing these "more > advanced" deployments. > > Kind regards, > > Vladimir DOMBROVSKI > -- Andrija Panić
Cloudstack Usage and Quota Plugin. How it works?
Hello all. I'm using Usage and Quota Plugin to try billing with my future costumers accounts but I'm not understand de calculation an results that is showing me. Like this tow bellow: *Balance* *Quota* -599755.85 599755.85 What is "Balanece"? Why the number is negative? It is possible to make the number o Quota be shorter? Best regards.