Re: [PATCH stable-2.16 3/7] Make FinalizeMigration{Src,Dst} more robust

2016-12-02 Thread 'Brian Foley' via ganeti-devel
On Fri, Dec 02, 2016 at 05:02:27PM +, 'Viktor Bachraty' via ganeti-devel 
wrote:
> Migration is supported on KVM/Xen hypervisors only. Finalize methods are
> called in both success or suspected failure (the master node can't
> certainly know about failures - e.g. in case of timeouts). Thus these
> functions need to be as robust as possible in order to be able to
> recover from failures - in case success=False the functions should be
> conservative, trying to detect the actual state and idempotent. The
> cleanup workaround works for XM, on XL it is a no-op.
> 
> Signed-off-by: Viktor Bachraty 

LGTM. Thanks.

Brian.


Re: [PATCH stable-2.16 3/7] Make FinalizeMigration{Src,Dst} more robust

2016-12-02 Thread 'Viktor Bachraty' via ganeti-devel
My thinking was, that I don't have much experience to tell if that happens
on XL so I haven't seen if this cleanup is the correct thing to do there.
So I meant that the original version would have handled XM and on XL it
would do nothing (since there is no migrating-$DOM) domain. But ok, let's
make the code generic. I think removing a suffix from the domain name isn't
a dangerous operation so we can safely do it on XL as well.

On Fri, Dec 2, 2016 at 3:34 PM, Brian Foley  wrote:

> On Fri, Dec 02, 2016 at 01:59:49PM +, 'Viktor Bachraty' via
> ganeti-devel wrote:
> > Migration is supported on KVM/Xen hypervisors only. Finalize methods are
> > called in both success or suspected failure (the master node can't
> > certainly know about failures - e.g. in case of timeouts). Thus these
> > functions need to be as robust as possible in order to be able to
> > recover from failures - in case success=False the functions should be
> > conservative, trying to detect the actual state and idempotent. The
> > cleanup workaround works for XM, on XL it is a no-op.
>
> See inline...
>
> > +else:
> > +  # Cleanup the most common failure case when the source instance
> fails
> > +  # to freeze and remains running as 'migrating-${oldname}' This
> works with
> > +  # XM, on XL it turns to a no-op.
>
> I was going to ask if we could have a slightly less terse comment here
> maybe
> say "because xl doesn't rename instances during migrate."
>
> but then I remembered that xl *does* in fact rename source domains.
> After the copy, xl renames the source domain to domain--migratedaway,
> allows the target domain to resume, then destroys the source domain if the
> target resumes properly. See
> https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=
> tools/libxl/xl_cmdimpl.c;h=521b0b533833;hb=refs/heads/stable-4.4#l3685
>
> There's a small window here where there's a domain--migratedaway on the
> source,
> and the target didn't resume properly. Maybe we should have xl_temp_name
> and
> xm_temp_name and attempt to _RenameInstance either one if they happen to
> exist?
>
> Thanks,
> Brian.
>
> > +  temp_name = 'migrating-' + instance.name
> > +  info = self.GetInstanceInfo(temp_name,
> hvparams=instance.hvparams)
> > +  if info:
> > +self._RenameInstance(temp_name, instance.name,
> instance.hvparams)
> >
> >def GetMigrationStatus(self, instance):
> >  """Get the migration status
> > --
> > 2.8.0.rc3.226.g39d4020
> >
>


Re: [PATCH stable-2.16 3/7] Make FinalizeMigration{Src,Dst} more robust

2016-12-02 Thread 'Brian Foley' via ganeti-devel
On Fri, Dec 02, 2016 at 01:59:49PM +, 'Viktor Bachraty' via ganeti-devel 
wrote:
> Migration is supported on KVM/Xen hypervisors only. Finalize methods are
> called in both success or suspected failure (the master node can't
> certainly know about failures - e.g. in case of timeouts). Thus these
> functions need to be as robust as possible in order to be able to
> recover from failures - in case success=False the functions should be
> conservative, trying to detect the actual state and idempotent. The
> cleanup workaround works for XM, on XL it is a no-op.

See inline...

> +else:
> +  # Cleanup the most common failure case when the source instance fails
> +  # to freeze and remains running as 'migrating-${oldname}' This works 
> with
> +  # XM, on XL it turns to a no-op.

I was going to ask if we could have a slightly less terse comment here maybe 
say "because xl doesn't rename instances during migrate."

but then I remembered that xl *does* in fact rename source domains.
After the copy, xl renames the source domain to domain--migratedaway,
allows the target domain to resume, then destroys the source domain if the
target resumes properly. See
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/xl_cmdimpl.c;h=521b0b533833;hb=refs/heads/stable-4.4#l3685

There's a small window here where there's a domain--migratedaway on the source,
and the target didn't resume properly. Maybe we should have xl_temp_name and
xm_temp_name and attempt to _RenameInstance either one if they happen to exist?

Thanks,
Brian.

> +  temp_name = 'migrating-' + instance.name
> +  info = self.GetInstanceInfo(temp_name, hvparams=instance.hvparams)
> +  if info:
> +self._RenameInstance(temp_name, instance.name, instance.hvparams)
>  
>def GetMigrationStatus(self, instance):
>  """Get the migration status
> -- 
> 2.8.0.rc3.226.g39d4020
> 


Re: [PATCH stable-2.16 3/7] Make FinalizeMigration{Src,Dst} more robust

2016-10-13 Thread 'Brian Foley' via ganeti-devel
On Fri, Oct 07, 2016 at 08:38:00PM +0100, 'Viktor Bachraty' via ganeti-devel 
wrote:
> Migration is supported on KVM/Xen hypervisors only. Finalize methods are
> called in both success or suspected failure (the master node can't
> certainly know about failures - e.g. in case of timeouts). Thus these
> functions need to be as robust as possible in order to be able to
> recover from failures - in case success=False the functions should be
> conservative, trying to detect the actual state and idempotent.

LGTM, thanks. This should make Xen migrates a lot more robust.