Re: [PATCH stable-2.16 3/7] Make FinalizeMigration{Src,Dst} more robust
On Fri, Dec 02, 2016 at 05:02:27PM +, 'Viktor Bachraty' via ganeti-devel wrote: > Migration is supported on KVM/Xen hypervisors only. Finalize methods are > called in both success or suspected failure (the master node can't > certainly know about failures - e.g. in case of timeouts). Thus these > functions need to be as robust as possible in order to be able to > recover from failures - in case success=False the functions should be > conservative, trying to detect the actual state and idempotent. The > cleanup workaround works for XM, on XL it is a no-op. > > Signed-off-by: Viktor Bachraty LGTM. Thanks. Brian.
Re: [PATCH stable-2.16 3/7] Make FinalizeMigration{Src,Dst} more robust
My thinking was, that I don't have much experience to tell if that happens on XL so I haven't seen if this cleanup is the correct thing to do there. So I meant that the original version would have handled XM and on XL it would do nothing (since there is no migrating-$DOM) domain. But ok, let's make the code generic. I think removing a suffix from the domain name isn't a dangerous operation so we can safely do it on XL as well. On Fri, Dec 2, 2016 at 3:34 PM, Brian Foley wrote: > On Fri, Dec 02, 2016 at 01:59:49PM +, 'Viktor Bachraty' via > ganeti-devel wrote: > > Migration is supported on KVM/Xen hypervisors only. Finalize methods are > > called in both success or suspected failure (the master node can't > > certainly know about failures - e.g. in case of timeouts). Thus these > > functions need to be as robust as possible in order to be able to > > recover from failures - in case success=False the functions should be > > conservative, trying to detect the actual state and idempotent. The > > cleanup workaround works for XM, on XL it is a no-op. > > See inline... > > > +else: > > + # Cleanup the most common failure case when the source instance > fails > > + # to freeze and remains running as 'migrating-${oldname}' This > works with > > + # XM, on XL it turns to a no-op. > > I was going to ask if we could have a slightly less terse comment here > maybe > say "because xl doesn't rename instances during migrate." > > but then I remembered that xl *does* in fact rename source domains. > After the copy, xl renames the source domain to domain--migratedaway, > allows the target domain to resume, then destroys the source domain if the > target resumes properly. See > https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f= > tools/libxl/xl_cmdimpl.c;h=521b0b533833;hb=refs/heads/stable-4.4#l3685 > > There's a small window here where there's a domain--migratedaway on the > source, > and the target didn't resume properly. Maybe we should have xl_temp_name > and > xm_temp_name and attempt to _RenameInstance either one if they happen to > exist? > > Thanks, > Brian. > > > + temp_name = 'migrating-' + instance.name > > + info = self.GetInstanceInfo(temp_name, > hvparams=instance.hvparams) > > + if info: > > +self._RenameInstance(temp_name, instance.name, > instance.hvparams) > > > >def GetMigrationStatus(self, instance): > > """Get the migration status > > -- > > 2.8.0.rc3.226.g39d4020 > > >
Re: [PATCH stable-2.16 3/7] Make FinalizeMigration{Src,Dst} more robust
On Fri, Dec 02, 2016 at 01:59:49PM +, 'Viktor Bachraty' via ganeti-devel wrote: > Migration is supported on KVM/Xen hypervisors only. Finalize methods are > called in both success or suspected failure (the master node can't > certainly know about failures - e.g. in case of timeouts). Thus these > functions need to be as robust as possible in order to be able to > recover from failures - in case success=False the functions should be > conservative, trying to detect the actual state and idempotent. The > cleanup workaround works for XM, on XL it is a no-op. See inline... > +else: > + # Cleanup the most common failure case when the source instance fails > + # to freeze and remains running as 'migrating-${oldname}' This works > with > + # XM, on XL it turns to a no-op. I was going to ask if we could have a slightly less terse comment here maybe say "because xl doesn't rename instances during migrate." but then I remembered that xl *does* in fact rename source domains. After the copy, xl renames the source domain to domain--migratedaway, allows the target domain to resume, then destroys the source domain if the target resumes properly. See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/xl_cmdimpl.c;h=521b0b533833;hb=refs/heads/stable-4.4#l3685 There's a small window here where there's a domain--migratedaway on the source, and the target didn't resume properly. Maybe we should have xl_temp_name and xm_temp_name and attempt to _RenameInstance either one if they happen to exist? Thanks, Brian. > + temp_name = 'migrating-' + instance.name > + info = self.GetInstanceInfo(temp_name, hvparams=instance.hvparams) > + if info: > +self._RenameInstance(temp_name, instance.name, instance.hvparams) > >def GetMigrationStatus(self, instance): > """Get the migration status > -- > 2.8.0.rc3.226.g39d4020 >
Re: [PATCH stable-2.16 3/7] Make FinalizeMigration{Src,Dst} more robust
On Fri, Oct 07, 2016 at 08:38:00PM +0100, 'Viktor Bachraty' via ganeti-devel wrote: > Migration is supported on KVM/Xen hypervisors only. Finalize methods are > called in both success or suspected failure (the master node can't > certainly know about failures - e.g. in case of timeouts). Thus these > functions need to be as robust as possible in order to be able to > recover from failures - in case success=False the functions should be > conservative, trying to detect the actual state and idempotent. LGTM, thanks. This should make Xen migrates a lot more robust.