Re: [openstack-dev] Disaster Recovery for OpenStack - call for stakeholder - discussion reminder

Deepak Shetty Fri, 21 Mar 2014 00:02:33 -0700

Bruce,
   Thanks, it helps clarify.

thanx,
deepak




On Thu, Mar 20, 2014 at 10:07 PM, Bruce Montague <
[email protected]> wrote:

> HI, Deepak. With the caveat that both the etherpad and Ron's presentation
> are pretty high-level, my guess is:
>
>
>
> 1)      "DR middleware" refers to the orchestration engine managing the
> entire DR process between the primary and secondary sites. (Something like
> two Heat workflows interacting or a workflow that works across multiple
> OpenStack deployments.) The replication agent is what does what resembles
> continually cloning a volume from the primary to the secondary, with
> snapshots appearing on the secondary at times when the volumes contents are
> application-consistent and consistent with each other (for all the volumes
> of a VM or a multi-tier app). These secondary-site snapshots appear at
> specified rates (so you know how recent your oldest snapshots there will
> be). For instance, the replication agent might do some sort of snapshot(s)
> on the primary and then it updates the corresponding volume(s) on the
> secondary using the primary snapshot(s). This resembles (maybe it could
> even be) something like DRBD or NBD. Many SAN vendors provide some form of
> replication agent between SANs.
>
>
>
> 2)      Regarding metadata, the replication agent might only be
> replicating the volumes of some tenant VMs. It might not be replicating any
> volumes containing OpenStack metadata. (This is for the smaller tenant
> use-case, not complete OpenStack deployment mirroring, or somesuch. If
> complete mirroring was done, maybe you wouldn't have to sync metadata if
> you designed the system just for that). DR is often something that a tenant
> might apply only to a set of core servers (key pets).  In this use-case the
> two (or more DR sites) might not be symmetrical. The secondary site needs
> to know it is in the secondary role. Things like IP addresses, maybe
> security and firewall rules, might have to change for the workload to run
> at the secondary site. Applying this metadata to VMs on the secondary site
> (what needs to change in the personality), when they boot, is probably
> something Heat can do.
>
>
>
>
>
> -bruce
>
>
>
> *From:* Deepak Shetty [mailto:[email protected]]
> *Sent:* Wednesday, March 19, 2014 11:54 PM
>
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Subject:* Re: [openstack-dev] Disaster Recovery for OpenStack - call for
> stakeholder - discussion reminder
>
>
>
> Hi List,
>
>     I was looking at the etherpad and March 19 notes and have few Qs
>
> 1) How is the "DR middleware" (depicted in Ron's youtube video) different
> than the "replication agent" (noted in the March 19 etherpad notes). Are
> they same, if not, how/why are they different ?
>
> 2) Maybe a dumb Q.. but still.. Why do we need to worry about syncing
> metadata differently ? If all the storage that is used across openstack
> services (and in typical case it might be just 1 backend, say GlsuterFS)
> are beign replicated durign the DR, wouldn't the metadata be replicated
> too.. why do we need to be concerned abt it as a separate entity ?
>
> thanx,
> deepak
>
>
>
> On Wed, Mar 19, 2014 at 2:11 PM, Ronen Kat <[email protected]> wrote:
>
> For those who are interested we will discuss the disaster recovery
> use-cases and how to proceed toward the Juno summit on March 19 at 17:00
> UTC (invitation below)
>
>
>
> Call-in:
> https://www.teleconference.att.com/servlet/glbAccess?process=1&accessCode=6406941&accessNumber=1809417783#C2
> Passcode: 6406941
>
> Etherpad:
> https://etherpad.openstack.org/p/juno-disaster-recovery-call-for-stakeholders
> Wiki: https://wiki.openstack.org/wiki/DisasterRecovery
>
> Regards,
> __________________________________________
> Ronen I. Kat, PhD
> Storage Research
> *IBM Research - Haifa*
> Phone: +972.3.7689493
> Email: [email protected]
>
>
>
>
> From:        "Luohao (brian)" <[email protected]>
> To:        "OpenStack Development Mailing List (not for usage questions)"
> <[email protected]>,
> Date:        14/03/2014 03:59 AM
> Subject:        Re: [openstack-dev] Disaster Recovery for OpenStack -
> call for stakeholder
> ------------------------------
>
>
>
>
> 1.  fsfreeze with vss has been added to qemu upstream, see
> http://lists.gnu.org/archive/html/qemu-devel/2013-02/msg01963.html for
> usage.
> 2.  libvirt allows a client to send any commands to qemu-ga, see
> http://wiki.libvirt.org/page/Qemu_guest_agent
> 3.  linux fsfreeze is not equivalent to windows fsfreeze+vss. Linux
> fsreeze offers fs consistency only, while windows vss allows agents like
> sqlserver to register their plugins to flush their cache to disk when a
> snapshot occurs.
> 4.  my understanding is xenserver does not support fsfreeze+vss now,
> because xenserver normally does not use block backend in qemu.
>
> -----Original Message-----
> From: Bruce Montague 
> [mailto:[email protected]<[email protected]>]
>
> Sent: Thursday, March 13, 2014 10:35 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] Disaster Recovery for OpenStack - call for
> stakeholder
>
> Hi, about OpenStack and VSS. Does anyone have experience with the qemu
> project's implementation of VSS support? They appear to have a within-guest
> agent, qemu-ga, that perhaps can work as a VSS requestor. Does it also work
> with KVM? Does qemu-ga work with libvirt (can VSS quiesce be triggered via
> libvirt)? I think there was an effort for qemu-ga to use fsfreeze as an
> equivalent to VSS on Linux systems, was that done?  If so, could an
> OpenStack API provide a generic quiesce request that would then get passed
> to libvirt? (Also, the XenServer VSS support seems different than
> qemu/KVM's, is this true? Can it also be accessed through libvirt?
>
> Thanks,
>
> -bruce
>
> -----Original Message-----
> From: Alessandro Pilotti 
> [mailto:[email protected]<[email protected]>
> ]
> Sent: Thursday, March 13, 2014 6:49 AM
> To: [email protected]
> Subject: Re: [openstack-dev] Disaster Recovery for OpenStack - call for
> stakeholder
>
> Those use cases are very important in enterprise scenarios requirements,
> but there's an important missing piece in the current OpenStack APIs:
> support for application consistent backups via Volume Shadow Copy (or other
> solutions) at the instance level, including differential / incremental
> backups.
>
> VSS can be seamlessly added to the Nova Hyper-V driver (it's included with
> the free Hyper-V Server) with e.g. vSphere and XenServer supporting it as
> well (quescing) and with the option for third party vendors to add drivers
> for their solutions.
>
> A generic Nova backup / restore API supporting those features is quite
> straightforward to design. The main question at this stage is if the
> OpenStack community wants to support those use cases or not. Cinder
> backup/restore support [1] and volume replication [2] are surely a great
> starting point in this direction.
>
> Alessandro
>
> [1] https://review.openstack.org/#/c/69351/
> [2] https://review.openstack.org/#/c/64026/
>
>
> > On 12/mar/2014, at 20:45, "Bruce Montague" <[email protected]>
> wrote:
> >
> >
> > Hi, regarding the call to create a list of disaster recovery (DR) use
> cases (
> http://lists.openstack.org/pipermail/openstack-dev/2014-March/028859.html), 
> the following list sketches some speculative OpenStack DR use cases.
> These use cases do not reflect any specific product behavior and span a
> wide spectrum. This list is not a proposal, it is intended primarily to
> solicit additional discussion. The first basic use case, (1), is described
> in a bit more detail than the others; many of the others are elaborations
> on this basic theme.
> >
> >
> >
> > * (1) [Single VM]
> >
> > A single Windows VM with 4 volumes and VSS (Microsoft's Volume
> Shadowcopy Services) installed runs a key application and integral
> database. VSS can quiesce the app, database, filesystem, and I/O on demand
> and can be invoked external to the guest.
> >
> >   a. The VM's volumes, including the boot volume, are replicated to a
> remote DR site (another OpenStack deployment).
> >
> >   b. Some form of replicated VM or VM metadata exists at the remote
> site. This VM/description includes the replicated volumes. Some systems
> might use cold migration or some form of wide-area live VM migration to
> establish this remote site VM/description.
> >
> >   c. When specified by an SLA or policy, VSS is invoked, putting the
> VM's volumes in an application-consistent state. This state is flushed all
> the way through to the remote volumes. As each remote volume reaches its
> application-consistent state, this is recognized in some fashion, perhaps
> by an in-band signal, and a snapshot of the volume is made at the remote
> site. Volume replication is re-enabled immediately following the snapshot.
> A backup is then made of the snapshot on the remote site. At the completion
> of this cycle, application-consistent volume snapshots and backups exist on
> the remote site.
> >
> >   d.  When a disaster or firedrill happens, the replication network
> > connection is cut. The remote site VM pre-created or defined so as to
> use the replicated volumes is then booted, using the latest
> application-consistent state of the replicated volumes. The entire VM
> environment (management accounts, networking, external firewalling, console
> access, etc..), similar to that of the primary, either needs to pre-exist
> in some fashion on the secondary or be created dynamically by the DR
> system. The booting VM either needs to attach to a virtual network
> environment similar to at the primary site or the VM needs to have boot
> code that can alter its network personality. Networking configuration may
> occur in conjunction with an update to DNS and other networking
> infrastructure. It is necessary for all required networking configuration
>  to be pre-specified or done automatically. No manual admin activity should
> be required. Environment requirements may be stored in a DR configuration o
> r database associated with the replication.
> >
> >   e. In a firedrill or test, the virtual network environment at the
> remote site may be a "test bubble" isolated from the real network, with
> some provision for protected access (such as NAT). Automatic testing is
> necessary to verify that replication succeeded. These tests need to be
> configurable by the end-user and admin and integrated with DR orchestration.
> >
> >   f. After the VM has booted and been operational, the network
> > connection between the two sites is re-established. A replication
> > connection between the replicated volumes is restablished, and the
> > replicated volumes are re-synced, with the roles of primary and
> > secondary reversed. (Ongoing replication in this configuration may
> > occur, driven from the new primary.)
> >
> >   g. A planned failback of the VM to the old primary proceeds similar to
> the failover from the old primary to the old replica, but with roles
> reversed and the process minimizing offline time and data loss.
> >
> >
> >
> > * (2) [Core tenant/project infrastructure VMs]
> >
> > Twenty VMs power the core infrastructure of a group using a private
> cloud (OpenStack in their own datacenter). Not all VMs run Windows with
> VSS, some run Linux with some equivalent mechanism, such as qemu-ga,
> driving fsfreeze and signal scripts. These VMs are replicated to a remote
> OpenStack deployment, in a fashion similar to (1). Orchestration occurring
> at the remote site on failover is more complex (correct VM boot order is
> orchestrated, DHCP service is configured as expected, all IPs are made
> available and verified). An equivalent virtual network topology consisting
> of multiple networks or subnets might be pre-created or dynamically created
> at failover time.
> >
> >   a. Storage for all volumes of all VMs might be on a single storage
> backend (logically a single large volume containing many smaller
> sub-volumes, examples being a VMware datastore or Hyper-V CSV). This entire
> large volume might be replicated between similar storage backends at the
> primary and secondary site. A single replicated large volume thus
> replicates all the tenant VM's volumes. The DR system must trigger quiesce
> of all volumes to application-consistent state.
> >
> >   b. This environment needs to deal with failures of the primary
> datacenter (as when a trenching tool cuts its connection to the internet),
> routine firedrill tests that perform failover and failback, and planned
> migration.
> >
> >   c. VSS or fsfreeze may be expected to fail for some VMs and policies
> and SLAs need to contend with this and alert admins for manual follow-up.
> >
> >   d. Network bandwidth used for replication needs to be throttled so as
> not to overly disrupt the private cloud's gateway capacity.
> >
> >   e. DR replication needs to deal with intermittent network replication
> failure and recover gracefully. In case of a known network issue, such as
> maintenance, it needs to be possible for the admin to explicitly suspend
> network replication. Replication I/O is then logged locally at the primary
> site in some fashion. The remote site needs to stay replication ready, but
> failover does not occur. When the network issue is over, replication
> resumes, perhaps recovering via a log, a map of updated blocks, or an
> equivalent technique. In this example the RPO window is deliberately
> ignored and allowed to grow until replication is resumed by the admin.
> >
> >   f. This tenant requires encryption of network replication traffic.
> >
> >   g. Cost accounting and chargeback is required.
> >
> >
> >
> > * (3) [Multi-tier app infrastructure]
> >
> > A tenant has a service consisting of 8 multi-tier apps that each consist
> of 3 to 5 VMs, with each VM having 2 to 4 disks. Replication snapshots need
> to be made of the volumes in an application-consistent way across all the
> volumes of all the VMs in all the multi-tier apps. Again, these volumes may
> exist on a single large volume or datastore, perhaps simplifying creation
> of the cross-VM application consistency snapshot. Not all of the VMs in a
> multi-tier app may need to be quiesced, some may be stateless and simply
> need to be recovered to a running state.
> >
> > a. This tenant requires that 3 of the multi-tier apps failover to one
> remote OpenStack site and the other 5 multi-tier apps failover to a
> different remote site than the first.
> >
> > b. This tenant weekly performs a non-disruptive test-bubble failover
> test. Real failover is not triggered. Instead, all the multi-tier app VMs
> that would boot upon failure are booted (from their latest snapshots on the
> secondary), but the VM's virtual network environment on the secondary is
> isolated from external networking. Test bubbles at the two OpenStack remote
> sites may need to be connected via some VPN/tunnel or equivalent without
> manual admin activity.
> >
> >
> >
> > * (4) [Tenant failover]
> >
> > An OpenStack tenant has 40 VMs, relatively lightly loaded, used for
> development. The VMs do not contain VSS, qemu-ga, or standard tools (they
> may be running any Linux distro, some may be running Plan9, the tenant may
> be doing Linux kernel development (that is, the VMs can be anything)). A
> remote OpenStack deployment needs to exist so that in event of loss of the
> primary OpenStack site, the tenant can continue development. In addition to
> volume replication as in (1), subject to policies and SLAs, cold migration
> may be performed on a VM's volumes upon shutdown (or dismount) and tenant
> end-users can explicitly request replication of a volume that is in an
> application-consistent state (when they have quiesced it by VSS, dismount,
> or equivalent).
> >
> > a. Being down for a short period may be acceptable to this tenant. If
> all the hosts on the primary site are rebooted, for instance, due to power
> failure, it is the operators choice to fail over or not. If the operator
> chooses not to fail over, upon reboot of the VM's at the primary site, any
> established replication should automatically be continued.
> >
> >
> >
> > * (5) [Scale-out workload]
> >
> > A tenant has a Cassandra (or Hadoop or similar type of system)
> consisting of 75 VMs. Use is bursty. The system is used by a pharmaceutical
> company for design work. Loss of a week's work can be repeated, but weekly
> replication is mandatory. The application itself may provide some form of
> built-in geo-replication. Some controller-type VMs may need to be
> replicated as in (1). Other VMs may partner with replica VMs for explicit
> application data replication. For weekly replication of Cassandra data,
> Cassandra user-level snapshots are made into replicated volumes attached to
> each Cassandra VM. Replication is periodic with respect to the last
> replication event, that is, only data changed since the last replication
> event is sent.
> >
> >   a. The tenant requires use of a particular aggregated network link for
> replication.
> >
> >   b. The tenant requires custom integration with the DR replication
> workflow to quiesce Cassandra via user-level commands and scripts developed
> by the end-user.
> >
> >   c. Initial synchronization of replicated primary and secondary volume
> need not be over a network link. Secondary volumes can be created initially
> from physical disks or backups physically moved to the secondary site.
> >
> >
> >
> > * (6) [Degraded-mode Mission-critical single VM]
> >
> > This single VM use case is similar to (1), but when a network
> > partition occurs between the primary and secondary OpenStack sites,
> > with both sites remaining up, the primary VM remains operational while
> > the secondary replica VM also comes online. Both VMs operate in a mode
> > that resembles replication with a momentary network fault, logging
> > their would-be replication traffic for continuation when the network
> > comes back. When network connectivity is reestablished, one site again
> > becomes the primary and differences in the VM's volumes can optionally
> > (as controlled by policy) be reconciled. (In a simple case, each site
> > might have its own dedicated volume partition or attached volume with
> > its latest state.)
> >
> >
> >
> > * (7) [Self-contained application volume]
> >
> > A cinder volume contains a complete database application, including the
> database and all binaries and configuration files. Replication of the
> entire VM to which this volume is attached is not needed. The VM and  its
> configuration can be recreated on demand at the remote site and attached to
> the replicated application volume. The DR system still needs to orchestrate
> the process and create or manage the required network environment. A simple
> DR strategy can be used in which the volume is quiesced on the primary, a
> volume snapshot taken, the volume unquiesced (enabling the VM to continue
> running), and a backup is then made of the snapshot. Backups can be
> transported by whatever means to the DR site, where the volume can be
> restored to its state at time of snapshot.
> >
> >
> >
> > * (8) [Stateless]
> >
> > No volumes and VMs need to be replicated, as VMs and their configuration
> can be recreated on demand, using configuration tools, and application data
> is accessed over the wide-area network (NFS or object store). The DR
> process still has to orchestrate creating the VMs, running configuration
> tools to populate them, creating the network environment, and booting VMs
> in required order.
> >
> >
> >
> > * (9) [Site Evacuation]
> >
> > The holy grail, automatic planned migration of the workload and data
> from one cloud-scale datacenter to another (or a set of others). In
> practice, likely to include admins in-the-loop. At both tenant-scale and
> entire datacenter scale. The entire cloud datacenter is expected to go
> offline for an extended period (the hurricane scenario).
> >
> >
> >
> > -bruce
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > [email protected]
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> _______________________________________________
> OpenStack-dev mailing list
> [email protected]
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> _______________________________________________
> OpenStack-dev mailing list
> [email protected]
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> _______________________________________________
> OpenStack-dev mailing list
> [email protected]
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> [email protected]
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> [email protected]
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Disaster Recovery for OpenStack - call for stakeholder - discussion reminder

Reply via email to