Re: Cold VM migration across datacenters

Alexandre LEGRIX Wed, 21 Jul 2021 08:26:47 -0700

Hello Adam

We don't want to use Galera, the master-master setup is just more
convenient in case of rebuild as a master-slave, but we will not write in
both side in the same time
Thanks for your insight


BR
A


On Wed, 21 Jul 2021 at 17:13, Adam Witwicki <[email protected]> wrote:

> Hi Vladimir
>
> I would like to mention we have had issues with master-master-master
> galara replication for the database, locks are not replicated. We had to
> make sure all CS management servers are using the same master server at the
> same time.
> A proxy and a vip works fine.
> Was just a little gotcha for us.
>
> Adam
>
> -----Original Message-----
> From: Vladimir Dombrovski <[email protected]>
> Sent: 21 July 2021 16:09
> To: [email protected]
> Subject: Re: Cold VM migration across datacenters
>
> ** This mail originated from OUTSIDE the Oakford corporate network. Treat
> hyperlinks and attachments in this email with caution. **
>
> Hello Andrija,
>
> Thank you for the swift response. You did correctly understand our setup,
> which is aimed towards disaster recovery (DRP), rather than business
> continuity (BCP).
>
> From a strictly technical point of view, we know that running a unique
> cluster is the only way to migrate compute resources between our
> datacenters when using KVM. Our Ceph storage is indeed running as a stretch
> cluster, and is provisioned with a CRUSH map that allows for the loss of a
> datacenter.
>
> We have currently tested the 4.15 release of Cloudstack, which doesn't yet
> integrate the changes you've described in (1). However, due to the fact
> that the latency between our datacenters is quite low (~2ms), we could
> allow less isolation between entities.
>
> What we are however looking into, is to ensure that in our configuration,
> any and all resources are pointed towards the correct local endpoint. To be
> more specific, these endpoints include:
>
> - The address of the manager, which we could technically move to a VIP, as
> our hosts are plugged on the same subnet. From there, as we have
> master-master MySQL replication setup, the agents on the DR site should
> reconnect with the backup manager without much issue (we are still testing
> this hypothesis).
> - The address of our primary storage, with which we're struggling.
> Technically we have a Ceph endpoint on each site, and we would like to keep
> it that way. This implies that we add multiple addresses for the same
> storage. This is possible in libvird as you can define multiple <host> tags
> in the <source> declaration XML. We haven't found a way to do the same via
> Cloudstack
> - The address of our secondary storage, which is in our case NFS and is
> simply plugged into Ceph using a different pool.
>
> All this being said, I believe that we would eventually brute-force our
> way into a working setup, that would look similar to the solution (2)
> you've described. We are still looking for ways to do this more elegantly
> so we would be glad to hear any more ideas.
>
> As a side note, are the DB hacks you have mentioned simple IP
> replacements? Or are there deeper modifications to be made?
>
> Vladimir
>
> On 2021/07/21 13:48:29, Andrija Panic <[email protected]> wrote:
> > Migration between zones is NOT possible in any shape or form, so this
> > is a route you should, IMO, abandon (you can always export VMs in this
> > way or another, but this is not feasible in production)
> >
> > I understand you have 2 DCs and you want VMs to, eventually, become
> > alive in 2nd DC, if the plane crashes on 1st DC? (well, your data is
> > there, unless CEPH is stretched/distributed across 2 DCs and could
> > survive the whole DC1 going down)
> >
> > If you are insisting on that HA level - then you could do it in 2
> > ways, that cross my mind right now.
> > (CEPH as distributed storage, zone wide, some nodes DC1, some DC2 -
> > make sure your CEPH setup survives whole DC going down (this requires
> > that CRUSH maps correctly configured etc)
> >
> >
> > (1)   DC1 = Pod1 (1/2/3) and DC2 = Pod2 (or Pod 4/5/6 etc) - i.e.
> multiple
> > Pods per DC - they all will be using zone-wide Ceph storage - your VMs
> > are on your storage, that is the crucial part to not lose data.
> > -- you can't really migrate VMs between Pods, only within cluster (and
> > in some cases between clusters in the same Pod, staring from 4.16)
> > -- this is OK if you have not-low-enough-latency between DC1 and DC2
> > (but then CEPH will also suffer from that  higher latency)
> >
> > (2) A very untipical, not recommends, but technically possible setup -
> > DC1+DC2 = one large DC = 1 POD = 1 cluster (or more clusters if
> > DC1+needed) -
> > still using CEPH as before
> > -- requires ultra-low latency between DC1 and DC2 - and if plane
> > crashes on
> > DC1 (taking this example, as I've been to some Zurich DCs next to the
> > airport...) - you can still start VM on hosts in DC2 in case it was a
> > single cluster. In case you had multiple clusters - then it get's more
> > complicated (minor DB hacks) etc.
> >
> > In both cases you still have to sort out Secondary Storage NFS HA....
> >
> > In general,  you can't achieve what you want that easy nor you should
> > be stretching the possibilities (that I just explained, as I would,
> > probably, never use them in production)
> >
> > I guess I didn't help - but there you go.
> >
> > Andrija
> >
> >
> >
> > On Tue, 20 Jul 2021 at 16:28, Vladimir Dombrovski <
> > [email protected]> wrote:
> >
> > > Hello,
> > >
> > > We're trying to draw a multisite architecture where any VM could be
> > > relocated to the secondary site whenever the primary site fails
> > > (primary/backup for disaster recovery purposes). We don't require
> > > live migration, and we are okay with shutting down machines in order
> > > to relocate them.
> > >
> > > We are using Cloudstack 4.15 on Ubuntu Focal. In our current setup,
> > > each datacenter has a Cloudstack management node, as well as a few
> > > hypervisors running KVM and a Cloudstack agent. We're using Ceph as
> > > our primary storage, and NFS as our secondary storage on each site.
> > >
> > > To ensure metadata resiliency, we've replicated the MySQL database
> > > across both sites, much like described following this guide:
> > >
> > >
> > > https://docs.cloudstack.apache.org/projects/cloudstack-installation/
> > > en/4.11/choosing_deployment_architecture.html#multi-site-deployment
> > >
> > > We tried setting up multiple zones, one for each datacenter, each
> > > one having its own primary storage, but we are faced with the issue
> > > where we are not able to migrate VMs across zones (only
> > > Pod/Cluster/Host level is available via the GUI and the Cloudmonkey
> CLI).
> > >
> > > Are we using the right level of abstraction for our case? If so, how
> > > can we migrate a VM (compute + storage) from one zone to another? If
> > > not, what is the right level to use that allows us to use two
> > > separate primary storage endpoints and ensures that only the primary
> > > site gets used for compute resource allocation in normal conditions?
> > >
> > > Also, we would like to know whether there is some documentation
> > > already touching on the subject of best practices when performing
> > > these "more advanced" deployments.
> > >
> > > Kind regards,
> > >
> > > Vladimir DOMBROVSKI
> > >
> >
> >
> > --
> >
> > Andrija Panić
> >
> Disclaimer Notice:
> This email has been sent by Oakford Technology Limited, while we have
> checked this e-mail and any attachments for viruses, we can not guarantee
> that they are virus-free. You must therefore take full responsibility for
> virus checking.
> This message and any attachments are confidential and should only be read
> by those to whom they are addressed. If you are not the intended recipient,
> please contact us, delete the message from your computer and destroy any
> copies. Any distribution or copying without our prior permission is
> prohibited.
> Internet communications are not always secure and therefore Oakford
> Technology Limited does not accept legal responsibility for this message.
> The recipient is responsible for verifying its authenticity before acting
> on the contents. Any views or opinions presented are solely those of the
> author and do not necessarily represent those of Oakford Technology Limited.
> Registered address: Oakford Technology Limited, The Manor House, Potterne,
> Wiltshire. SN10 5PN.
> Registered in England and Wales No. 5971519
>
>

-- 
Alexandre Legrix / +33 632232109
Global Systems & Managed Services Director
4 av Pablo Picasso 92000 Nanterre - France
Hybrid Cloud Fabric : bso.cloud / bso.st / kb8s.io

-- 
*CONFIDENTIALITY AND DISCLAIMER NOTICE: *
This email is intended only for 
the person to whom it is addressed and/or otherwise authorized personnel. 
The information contained herein and attached is confidential. If you are 
not the intended recipient, please be advised that viewing this message and 
any attachments, as well as copying, forwarding, printing, and 
disseminating any information related to this email is prohibited, and that 
you should not take any action based on the content of this email and/or 
its attachments. If you received this message in error, please contact the 
sender and destroy all copies of this email and any attachment. Please note 
that the views and opinions expressed herein are solely those of the author 
and do not necessarily reflect those of the company. While antivirus 
protection tools have been employed, you should check this email and 
attachments for the presence of viruses. No warranties or assurances are 
made in relation to the safety and content of this email and attachments. 
The Company accepts no liability for any damage caused by any virus 
transmitted by or contained in this email and attachments. No liability is 
accepted for any consequences arising from this email.


*AVIS DE 
CONFIDENTIALITÉ ET DE NON RESPONSABILITE* : 
Ce courriel, ainsi que toute 
pièce jointe, est confidentiel et peut être protégé par le secret 
professionnel. Si vous n’en êtes pas le destinataire visé, veuillez en 
aviser l’expéditeur immédiatement et le supprimer. Vous ne devez pas le 
copier, ni l’utiliser à quelque fin que ce soit, ni divulguer son contenu à 
qui que ce soit. BSO se réserve le droit de contrôler toute transmission 
qui passe par son réseau. Veuillez noter que les opinions exprimées dans 
cet e-mail sont uniquement celles de l'auteur et ne reflètent pas 
nécessairement celles de la société. Bien que des outils de protection 
antivirus aient été utilisés, vous devez vérifier cet e-mail et les pièces 
jointes pour toute présence de virus. Aucune garantie ou assurance n'est 
donnée concernant la sécurité et le contenu de cet e-mail et de ses pièces 
jointes. La Société décline toute responsabilité pour tout dommage causé 
par tout virus transmis par ou contenu dans cet e-mail et ses pièces 
jointes. Aucune responsabilité n'est acceptée pour les conséquences 
découlant de cet e-mail.

Re: Cold VM migration across datacenters

Reply via email to