Bruce, Thanks, it helps clarify. thanx, deepak
On Thu, Mar 20, 2014 at 10:07 PM, Bruce Montague < [email protected]> wrote: > HI, Deepak. With the caveat that both the etherpad and Ron's presentation > are pretty high-level, my guess is: > > > > 1) "DR middleware" refers to the orchestration engine managing the > entire DR process between the primary and secondary sites. (Something like > two Heat workflows interacting or a workflow that works across multiple > OpenStack deployments.) The replication agent is what does what resembles > continually cloning a volume from the primary to the secondary, with > snapshots appearing on the secondary at times when the volumes contents are > application-consistent and consistent with each other (for all the volumes > of a VM or a multi-tier app). These secondary-site snapshots appear at > specified rates (so you know how recent your oldest snapshots there will > be). For instance, the replication agent might do some sort of snapshot(s) > on the primary and then it updates the corresponding volume(s) on the > secondary using the primary snapshot(s). This resembles (maybe it could > even be) something like DRBD or NBD. Many SAN vendors provide some form of > replication agent between SANs. > > > > 2) Regarding metadata, the replication agent might only be > replicating the volumes of some tenant VMs. It might not be replicating any > volumes containing OpenStack metadata. (This is for the smaller tenant > use-case, not complete OpenStack deployment mirroring, or somesuch. If > complete mirroring was done, maybe you wouldn't have to sync metadata if > you designed the system just for that). DR is often something that a tenant > might apply only to a set of core servers (key pets). In this use-case the > two (or more DR sites) might not be symmetrical. The secondary site needs > to know it is in the secondary role. Things like IP addresses, maybe > security and firewall rules, might have to change for the workload to run > at the secondary site. Applying this metadata to VMs on the secondary site > (what needs to change in the personality), when they boot, is probably > something Heat can do. > > > > > > -bruce > > > > *From:* Deepak Shetty [mailto:[email protected]] > *Sent:* Wednesday, March 19, 2014 11:54 PM > > *To:* OpenStack Development Mailing List (not for usage questions) > *Subject:* Re: [openstack-dev] Disaster Recovery for OpenStack - call for > stakeholder - discussion reminder > > > > Hi List, > > I was looking at the etherpad and March 19 notes and have few Qs > > 1) How is the "DR middleware" (depicted in Ron's youtube video) different > than the "replication agent" (noted in the March 19 etherpad notes). Are > they same, if not, how/why are they different ? > > 2) Maybe a dumb Q.. but still.. Why do we need to worry about syncing > metadata differently ? If all the storage that is used across openstack > services (and in typical case it might be just 1 backend, say GlsuterFS) > are beign replicated durign the DR, wouldn't the metadata be replicated > too.. why do we need to be concerned abt it as a separate entity ? > > thanx, > deepak > > > > On Wed, Mar 19, 2014 at 2:11 PM, Ronen Kat <[email protected]> wrote: > > For those who are interested we will discuss the disaster recovery > use-cases and how to proceed toward the Juno summit on March 19 at 17:00 > UTC (invitation below) > > > > Call-in: > https://www.teleconference.att.com/servlet/glbAccess?process=1&accessCode=6406941&accessNumber=1809417783#C2 > Passcode: 6406941 > > Etherpad: > https://etherpad.openstack.org/p/juno-disaster-recovery-call-for-stakeholders > Wiki: https://wiki.openstack.org/wiki/DisasterRecovery > > Regards, > __________________________________________ > Ronen I. Kat, PhD > Storage Research > *IBM Research - Haifa* > Phone: +972.3.7689493 > Email: [email protected] > > > > > From: "Luohao (brian)" <[email protected]> > To: "OpenStack Development Mailing List (not for usage questions)" > <[email protected]>, > Date: 14/03/2014 03:59 AM > Subject: Re: [openstack-dev] Disaster Recovery for OpenStack - > call for stakeholder > ------------------------------ > > > > > 1. fsfreeze with vss has been added to qemu upstream, see > http://lists.gnu.org/archive/html/qemu-devel/2013-02/msg01963.html for > usage. > 2. libvirt allows a client to send any commands to qemu-ga, see > http://wiki.libvirt.org/page/Qemu_guest_agent > 3. linux fsfreeze is not equivalent to windows fsfreeze+vss. Linux > fsreeze offers fs consistency only, while windows vss allows agents like > sqlserver to register their plugins to flush their cache to disk when a > snapshot occurs. > 4. my understanding is xenserver does not support fsfreeze+vss now, > because xenserver normally does not use block backend in qemu. > > -----Original Message----- > From: Bruce Montague > [mailto:[email protected]<[email protected]>] > > Sent: Thursday, March 13, 2014 10:35 PM > To: OpenStack Development Mailing List (not for usage questions) > Subject: Re: [openstack-dev] Disaster Recovery for OpenStack - call for > stakeholder > > Hi, about OpenStack and VSS. Does anyone have experience with the qemu > project's implementation of VSS support? They appear to have a within-guest > agent, qemu-ga, that perhaps can work as a VSS requestor. Does it also work > with KVM? Does qemu-ga work with libvirt (can VSS quiesce be triggered via > libvirt)? I think there was an effort for qemu-ga to use fsfreeze as an > equivalent to VSS on Linux systems, was that done? If so, could an > OpenStack API provide a generic quiesce request that would then get passed > to libvirt? (Also, the XenServer VSS support seems different than > qemu/KVM's, is this true? Can it also be accessed through libvirt? > > Thanks, > > -bruce > > -----Original Message----- > From: Alessandro Pilotti > [mailto:[email protected]<[email protected]> > ] > Sent: Thursday, March 13, 2014 6:49 AM > To: [email protected] > Subject: Re: [openstack-dev] Disaster Recovery for OpenStack - call for > stakeholder > > Those use cases are very important in enterprise scenarios requirements, > but there's an important missing piece in the current OpenStack APIs: > support for application consistent backups via Volume Shadow Copy (or other > solutions) at the instance level, including differential / incremental > backups. > > VSS can be seamlessly added to the Nova Hyper-V driver (it's included with > the free Hyper-V Server) with e.g. vSphere and XenServer supporting it as > well (quescing) and with the option for third party vendors to add drivers > for their solutions. > > A generic Nova backup / restore API supporting those features is quite > straightforward to design. The main question at this stage is if the > OpenStack community wants to support those use cases or not. Cinder > backup/restore support [1] and volume replication [2] are surely a great > starting point in this direction. > > Alessandro > > [1] https://review.openstack.org/#/c/69351/ > [2] https://review.openstack.org/#/c/64026/ > > > > On 12/mar/2014, at 20:45, "Bruce Montague" <[email protected]> > wrote: > > > > > > Hi, regarding the call to create a list of disaster recovery (DR) use > cases ( > http://lists.openstack.org/pipermail/openstack-dev/2014-March/028859.html), > the following list sketches some speculative OpenStack DR use cases. > These use cases do not reflect any specific product behavior and span a > wide spectrum. This list is not a proposal, it is intended primarily to > solicit additional discussion. The first basic use case, (1), is described > in a bit more detail than the others; many of the others are elaborations > on this basic theme. > > > > > > > > * (1) [Single VM] > > > > A single Windows VM with 4 volumes and VSS (Microsoft's Volume > Shadowcopy Services) installed runs a key application and integral > database. VSS can quiesce the app, database, filesystem, and I/O on demand > and can be invoked external to the guest. > > > > a. The VM's volumes, including the boot volume, are replicated to a > remote DR site (another OpenStack deployment). > > > > b. Some form of replicated VM or VM metadata exists at the remote > site. This VM/description includes the replicated volumes. Some systems > might use cold migration or some form of wide-area live VM migration to > establish this remote site VM/description. > > > > c. When specified by an SLA or policy, VSS is invoked, putting the > VM's volumes in an application-consistent state. This state is flushed all > the way through to the remote volumes. As each remote volume reaches its > application-consistent state, this is recognized in some fashion, perhaps > by an in-band signal, and a snapshot of the volume is made at the remote > site. Volume replication is re-enabled immediately following the snapshot. > A backup is then made of the snapshot on the remote site. At the completion > of this cycle, application-consistent volume snapshots and backups exist on > the remote site. > > > > d. When a disaster or firedrill happens, the replication network > > connection is cut. The remote site VM pre-created or defined so as to > use the replicated volumes is then booted, using the latest > application-consistent state of the replicated volumes. The entire VM > environment (management accounts, networking, external firewalling, console > access, etc..), similar to that of the primary, either needs to pre-exist > in some fashion on the secondary or be created dynamically by the DR > system. The booting VM either needs to attach to a virtual network > environment similar to at the primary site or the VM needs to have boot > code that can alter its network personality. Networking configuration may > occur in conjunction with an update to DNS and other networking > infrastructure. It is necessary for all required networking configuration > to be pre-specified or done automatically. No manual admin activity should > be required. Environment requirements may be stored in a DR configuration o > r database associated with the replication. > > > > e. In a firedrill or test, the virtual network environment at the > remote site may be a "test bubble" isolated from the real network, with > some provision for protected access (such as NAT). Automatic testing is > necessary to verify that replication succeeded. These tests need to be > configurable by the end-user and admin and integrated with DR orchestration. > > > > f. After the VM has booted and been operational, the network > > connection between the two sites is re-established. A replication > > connection between the replicated volumes is restablished, and the > > replicated volumes are re-synced, with the roles of primary and > > secondary reversed. (Ongoing replication in this configuration may > > occur, driven from the new primary.) > > > > g. A planned failback of the VM to the old primary proceeds similar to > the failover from the old primary to the old replica, but with roles > reversed and the process minimizing offline time and data loss. > > > > > > > > * (2) [Core tenant/project infrastructure VMs] > > > > Twenty VMs power the core infrastructure of a group using a private > cloud (OpenStack in their own datacenter). Not all VMs run Windows with > VSS, some run Linux with some equivalent mechanism, such as qemu-ga, > driving fsfreeze and signal scripts. These VMs are replicated to a remote > OpenStack deployment, in a fashion similar to (1). Orchestration occurring > at the remote site on failover is more complex (correct VM boot order is > orchestrated, DHCP service is configured as expected, all IPs are made > available and verified). An equivalent virtual network topology consisting > of multiple networks or subnets might be pre-created or dynamically created > at failover time. > > > > a. Storage for all volumes of all VMs might be on a single storage > backend (logically a single large volume containing many smaller > sub-volumes, examples being a VMware datastore or Hyper-V CSV). This entire > large volume might be replicated between similar storage backends at the > primary and secondary site. A single replicated large volume thus > replicates all the tenant VM's volumes. The DR system must trigger quiesce > of all volumes to application-consistent state. > > > > b. This environment needs to deal with failures of the primary > datacenter (as when a trenching tool cuts its connection to the internet), > routine firedrill tests that perform failover and failback, and planned > migration. > > > > c. VSS or fsfreeze may be expected to fail for some VMs and policies > and SLAs need to contend with this and alert admins for manual follow-up. > > > > d. Network bandwidth used for replication needs to be throttled so as > not to overly disrupt the private cloud's gateway capacity. > > > > e. DR replication needs to deal with intermittent network replication > failure and recover gracefully. In case of a known network issue, such as > maintenance, it needs to be possible for the admin to explicitly suspend > network replication. Replication I/O is then logged locally at the primary > site in some fashion. The remote site needs to stay replication ready, but > failover does not occur. When the network issue is over, replication > resumes, perhaps recovering via a log, a map of updated blocks, or an > equivalent technique. In this example the RPO window is deliberately > ignored and allowed to grow until replication is resumed by the admin. > > > > f. This tenant requires encryption of network replication traffic. > > > > g. Cost accounting and chargeback is required. > > > > > > > > * (3) [Multi-tier app infrastructure] > > > > A tenant has a service consisting of 8 multi-tier apps that each consist > of 3 to 5 VMs, with each VM having 2 to 4 disks. Replication snapshots need > to be made of the volumes in an application-consistent way across all the > volumes of all the VMs in all the multi-tier apps. Again, these volumes may > exist on a single large volume or datastore, perhaps simplifying creation > of the cross-VM application consistency snapshot. Not all of the VMs in a > multi-tier app may need to be quiesced, some may be stateless and simply > need to be recovered to a running state. > > > > a. This tenant requires that 3 of the multi-tier apps failover to one > remote OpenStack site and the other 5 multi-tier apps failover to a > different remote site than the first. > > > > b. This tenant weekly performs a non-disruptive test-bubble failover > test. Real failover is not triggered. Instead, all the multi-tier app VMs > that would boot upon failure are booted (from their latest snapshots on the > secondary), but the VM's virtual network environment on the secondary is > isolated from external networking. Test bubbles at the two OpenStack remote > sites may need to be connected via some VPN/tunnel or equivalent without > manual admin activity. > > > > > > > > * (4) [Tenant failover] > > > > An OpenStack tenant has 40 VMs, relatively lightly loaded, used for > development. The VMs do not contain VSS, qemu-ga, or standard tools (they > may be running any Linux distro, some may be running Plan9, the tenant may > be doing Linux kernel development (that is, the VMs can be anything)). A > remote OpenStack deployment needs to exist so that in event of loss of the > primary OpenStack site, the tenant can continue development. In addition to > volume replication as in (1), subject to policies and SLAs, cold migration > may be performed on a VM's volumes upon shutdown (or dismount) and tenant > end-users can explicitly request replication of a volume that is in an > application-consistent state (when they have quiesced it by VSS, dismount, > or equivalent). > > > > a. Being down for a short period may be acceptable to this tenant. If > all the hosts on the primary site are rebooted, for instance, due to power > failure, it is the operators choice to fail over or not. If the operator > chooses not to fail over, upon reboot of the VM's at the primary site, any > established replication should automatically be continued. > > > > > > > > * (5) [Scale-out workload] > > > > A tenant has a Cassandra (or Hadoop or similar type of system) > consisting of 75 VMs. Use is bursty. The system is used by a pharmaceutical > company for design work. Loss of a week's work can be repeated, but weekly > replication is mandatory. The application itself may provide some form of > built-in geo-replication. Some controller-type VMs may need to be > replicated as in (1). Other VMs may partner with replica VMs for explicit > application data replication. For weekly replication of Cassandra data, > Cassandra user-level snapshots are made into replicated volumes attached to > each Cassandra VM. Replication is periodic with respect to the last > replication event, that is, only data changed since the last replication > event is sent. > > > > a. The tenant requires use of a particular aggregated network link for > replication. > > > > b. The tenant requires custom integration with the DR replication > workflow to quiesce Cassandra via user-level commands and scripts developed > by the end-user. > > > > c. Initial synchronization of replicated primary and secondary volume > need not be over a network link. Secondary volumes can be created initially > from physical disks or backups physically moved to the secondary site. > > > > > > > > * (6) [Degraded-mode Mission-critical single VM] > > > > This single VM use case is similar to (1), but when a network > > partition occurs between the primary and secondary OpenStack sites, > > with both sites remaining up, the primary VM remains operational while > > the secondary replica VM also comes online. Both VMs operate in a mode > > that resembles replication with a momentary network fault, logging > > their would-be replication traffic for continuation when the network > > comes back. When network connectivity is reestablished, one site again > > becomes the primary and differences in the VM's volumes can optionally > > (as controlled by policy) be reconciled. (In a simple case, each site > > might have its own dedicated volume partition or attached volume with > > its latest state.) > > > > > > > > * (7) [Self-contained application volume] > > > > A cinder volume contains a complete database application, including the > database and all binaries and configuration files. Replication of the > entire VM to which this volume is attached is not needed. The VM and its > configuration can be recreated on demand at the remote site and attached to > the replicated application volume. The DR system still needs to orchestrate > the process and create or manage the required network environment. A simple > DR strategy can be used in which the volume is quiesced on the primary, a > volume snapshot taken, the volume unquiesced (enabling the VM to continue > running), and a backup is then made of the snapshot. Backups can be > transported by whatever means to the DR site, where the volume can be > restored to its state at time of snapshot. > > > > > > > > * (8) [Stateless] > > > > No volumes and VMs need to be replicated, as VMs and their configuration > can be recreated on demand, using configuration tools, and application data > is accessed over the wide-area network (NFS or object store). The DR > process still has to orchestrate creating the VMs, running configuration > tools to populate them, creating the network environment, and booting VMs > in required order. > > > > > > > > * (9) [Site Evacuation] > > > > The holy grail, automatic planned migration of the workload and data > from one cloud-scale datacenter to another (or a set of others). In > practice, likely to include admins in-the-loop. At both tenant-scale and > entire datacenter scale. The entire cloud datacenter is expected to go > offline for an extended period (the hurricane scenario). > > > > > > > > -bruce > > > > > > _______________________________________________ > > OpenStack-dev mailing list > > [email protected] > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > _______________________________________________ > OpenStack-dev mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > _______________________________________________ > OpenStack-dev mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > _______________________________________________ > OpenStack-dev mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > _______________________________________________ > OpenStack-dev mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > _______________________________________________ > OpenStack-dev mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
_______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
