Re: [Xen-devel] [Planning for Xen-4.6] Migration v2

2014-11-27 Thread Hongyang Yang



在 11/26/2014 03:54 AM, Andrew Cooper 写道:

Hello,

The purpose of this email is to plan how to progress the migrationv2
series through to being merged.  I believe I have CC'd everyone with a
specific interest in this area, but apologies if I have missed anyone.

Migration v2 is in exclusive use in XenServer 6.5.  We primarily
developed migration v2 because we needed a 32bit - 64bit toolstack
upgrade path.  The code has all the features XenServer previously
supported, and we consider it fully baked and without any known bugs,
including transparent legacy-to-v2 conversion on upgrade.

We did endeavour to get migration v2 into Xen 4.5, but regrettably this
did not happen.  A consequence of this, along with the code being in
XenServer 6.5, is that the wire format is now set in stone.  Luckily, it
has been explicitly designed to be easy to extend in a forward
compatible manor, so this is not a problem moving forward.

The expectation is that the migration v2 code will completely replace
the existing migration code, which will involve removing
xc_domain_save.c and xc_domain_restore.c, as well as assorted other
orphaned code in libxenctrl and libxenguest

There are 3 areas of concern which have been identified so far.

1) TMEM support

Migration v2 doesn't currently have any tmem migration support.  The
maintainers have been asked whether they actually expect legacy tmem
migration to work, but I have not heard any reply yet.  At the very
least, migration v2 tmem support would want some new thought put into
wire protocol.  I am hoping that, as TMEM is still tech preview and
still in the process of having XSA-15 fixed, working tmem migration v2
is not insisted as a prerequisite.

2) Remus/COLO support

Migration v2 doesn't currently have any Remus support.  There was a
draft series which added Remus support, and showed that it was
particularly simple to add Remus support to migration v2.  I integrated
several bugfixes as a side effect of that series, but the actual Remus
content needed a refresh.  This got delayed behind the Remus libxl
effort.  It is my hope that the Remus maintainers can refresh that
series and provide assistance while testing.


Sure, I'm planning to refresh the patches as soon as Xen 4.6 merge window
opened. And also going to start the work on libxl side because libxl part
of migration v2 has already done(although not fullly finished?). And we
hope COLO support will go into Xen 4.6 also.



3) Libxl and xl support

Libxl and xl have as many problems as the libxc code did when it comes
to incompatible wire formats and layering violations.  In particular, it
is not possible to determine the bitness of the sending
libxl-saverestore-helper, meaning that legacy conversion requires active
administrator input, or at least a passive assumption that the bitness
is the same.

There is an xl/libxl part of the migration v2 series which attempts to
rectify this all in one go, as there is no alternative way of doing so.
The libxl section of the series is certainly not yet complete, but
specific queries to the maintainers have thusfar gone unanswered.  On
the other hand, the series does basically WorkForMe, including
transparent legacy upgrade, suggesting that it is at least in an
appropriate ballpark.


*) Specific non-requirements:

There have been issues identified with dynamic (in a p2m sense) guests
and migration, which results in failed migration or image corruption.
While these issues certainly want fixing, they are bugs which exist in
the legacy code.  As such, they are not prerequisites to fix before v2
can be accepted.


Anyway, it is my hope that this planning email can help get things on
track to start perusing active development again as soon as the 4.6 dev
window opens again, with the aim to get all the code merged as early as
possible in the dev window to allow as much testing as possible.

~Andrew

.



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Planning for Xen-4.6] Migration v2

2014-11-27 Thread Ian Campbell
On Wed, 2014-11-26 at 17:39 +, Andrew Cooper wrote:
  IMHO this is fine. It essentially means that for xl users there is some
  delayed gratification wrt the promise of migration between non-alike
  dom0s. The migration from 4.5(legacy)-4.6(v2) won't support such
  migrations, but the next step from 4.6(v2)-4.7(v2) will.
 
 Two options exist.
 
 1) Assume that the sending bitness is the same as the receiving
 bitness.  This is already the status quo, and will require that the two
 dom0s are the same width.

As I said above I think this is absolutely acceptable as a transitional
step.

 2) Allow the administrator to specify the bitness of the sending side. 
 In this case, xl 4.5(legacy)-4.6(v2) works even cross-bitness.

If this is trivial to plumb in and you are motivated to do so then this
seems like a reasonable enough stretch goal.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Planning for Xen-4.6] Migration v2

2014-11-26 Thread Olaf Hering
On Tue, Nov 25, Andrew Cooper wrote:

 The purpose of this email is to plan how to progress the migrationv2
 series through to being merged.  I believe I have CC'd everyone with a
 specific interest in this area, but apologies if I have missed anyone.

While you mow that lawn, did you guys think of handling downtime of the
migrated VM? I added some knobs to abort migration in a very libxc
specific way. What I would like to see is a simple user interface for
virsh/xl to control the downtime. See the thread limit downtime during
life migration from xl/virsh:

http://lists.xenproject.org/archives/html/xen-devel/2014-03/msg00785.html


Olaf

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Planning for Xen-4.6] Migration v2

2014-11-26 Thread Olaf Hering
On Wed, Nov 26, Andrew Cooper wrote:

 It is certainly my hope going forward that different knobs can be
 exposed.  One thing I think would be interesting is some proper
 calculations of the delta in the dirty set, and offering a threshold
 which chooses between pause and complete or abort the migration and
 complain that the VM is too active

The pause and complete step is what causes unexpected time jumps in
the guest. Would be nice if that can be controlled with a knob.

Olaf

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Planning for Xen-4.6] Migration v2

2014-11-26 Thread Ian Campbell
On Tue, 2014-11-25 at 19:54 +, Andrew Cooper wrote:
 There is an xl/libxl part of the migration v2 series which attempts to
 rectify this all in one go, as there is no alternative way of doing so. 
 The libxl section of the series is certainly not yet complete, but
 specific queries to the maintainers have thusfar gone unanswered.  On
 the other hand, the series does basically WorkForMe, including
 transparent legacy upgrade, suggesting that it is at least in an
 appropriate ballpark.

Is this, from [PATCH 27/29] [VERY RFC] tools/libxl: Support restoring
legacy streams:

This WorksForMe in the success case, but the error handling is 
certainly lacking.

Specifically, the conversion scripts output fd can't be closed until 
the v2
read loop has exited (cleanly or otherwise), without risking a 
close()/open()
race silently replacing the fd behind the loops back.

However, it can't be closed when the read loop exits, as the conversion 
script
child might still be alive, and would prefer terminating cleaning than 
failing
with a bad FD.

Obviously, having one error handler block for the success/failure of 
the other
side is a no-go, and would still involve a preselecting which was 
expected to
exit first.

Does anyone have any clever ideas of how to asynchronously collect the 
events
the conversion script has exited, the save helper has exited and 
the v2
read loop has finished given the available infrastructure, to kick of a
combined cleanup of all 3?

? I said then:

This is probably one for Ian when he gets back, but a state machine
which is cranked in response to the callbacks from the various
completion events might be one way to approach this.

Prodding Ian again (by moving to the To: line...)

Was there any other questions? I've had a scrobble through the bit of v7
which 00/29 suggests might contain them, but that's the only one I saw.

Ian. 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Planning for Xen-4.6] Migration v2

2014-11-26 Thread Ian Campbell
On Tue, 2014-11-25 at 19:54 +, Andrew Cooper wrote:
 3) Libxl and xl support
 
 Libxl and xl have as many problems as the libxc code did when it comes
 to incompatible wire formats and layering violations.  In particular, it
 is not possible to determine the bitness of the sending
 libxl-saverestore-helper, meaning that legacy conversion requires active
 administrator input, or at least a passive assumption that the bitness
 is the same.

IOW when migrating legacy-new we have the same restriction as we do
today in the purely legacy world, which is that the two dom0's must
having match bit widths?

IMHO this is fine. It essentially means that for xl users there is some
delayed gratification wrt the promise of migration between non-alike
dom0s. The migration from 4.5(legacy)-4.6(v2) won't support such
migrations, but the next step from 4.6(v2)-4.7(v2) will.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Planning for Xen-4.6] Migration v2

2014-11-26 Thread Andrew Cooper
On 26/11/14 16:50, Ian Campbell wrote:
 On Tue, 2014-11-25 at 19:54 +, Andrew Cooper wrote:
 3) Libxl and xl support

 Libxl and xl have as many problems as the libxc code did when it comes
 to incompatible wire formats and layering violations.  In particular, it
 is not possible to determine the bitness of the sending
 libxl-saverestore-helper, meaning that legacy conversion requires active
 administrator input, or at least a passive assumption that the bitness
 is the same.
 IOW when migrating legacy-new we have the same restriction as we do
 today in the purely legacy world, which is that the two dom0's must
 having match bit widths?

The legacy-new conversion removes bitness from the equation, but the
bitness of the legacy side is an input parameter to conversion.

For XenServer, this is easy, as all older versions of XenServer are
32bit.  This version, and future versions will use the new format, where
bitness is specifically irrelevant.

For xl, this is harder.  There exist both 32 and 64bit versions doing
legacy migration, and on the receiving side it is impossible to
determine, given only the incoming stream.


 IMHO this is fine. It essentially means that for xl users there is some
 delayed gratification wrt the promise of migration between non-alike
 dom0s. The migration from 4.5(legacy)-4.6(v2) won't support such
 migrations, but the next step from 4.6(v2)-4.7(v2) will.

Two options exist.

1) Assume that the sending bitness is the same as the receiving
bitness.  This is already the status quo, and will require that the two
dom0s are the same width.

2) Allow the administrator to specify the bitness of the sending side. 
In this case, xl 4.5(legacy)-4.6(v2) works even cross-bitness.

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [Planning for Xen-4.6] Migration v2

2014-11-25 Thread Andrew Cooper
Hello,

The purpose of this email is to plan how to progress the migrationv2
series through to being merged.  I believe I have CC'd everyone with a
specific interest in this area, but apologies if I have missed anyone.

Migration v2 is in exclusive use in XenServer 6.5.  We primarily
developed migration v2 because we needed a 32bit - 64bit toolstack
upgrade path.  The code has all the features XenServer previously
supported, and we consider it fully baked and without any known bugs,
including transparent legacy-to-v2 conversion on upgrade.

We did endeavour to get migration v2 into Xen 4.5, but regrettably this
did not happen.  A consequence of this, along with the code being in
XenServer 6.5, is that the wire format is now set in stone.  Luckily, it
has been explicitly designed to be easy to extend in a forward
compatible manor, so this is not a problem moving forward.

The expectation is that the migration v2 code will completely replace
the existing migration code, which will involve removing
xc_domain_save.c and xc_domain_restore.c, as well as assorted other
orphaned code in libxenctrl and libxenguest

There are 3 areas of concern which have been identified so far.

1) TMEM support

Migration v2 doesn't currently have any tmem migration support.  The
maintainers have been asked whether they actually expect legacy tmem
migration to work, but I have not heard any reply yet.  At the very
least, migration v2 tmem support would want some new thought put into
wire protocol.  I am hoping that, as TMEM is still tech preview and
still in the process of having XSA-15 fixed, working tmem migration v2
is not insisted as a prerequisite.

2) Remus/COLO support

Migration v2 doesn't currently have any Remus support.  There was a
draft series which added Remus support, and showed that it was
particularly simple to add Remus support to migration v2.  I integrated
several bugfixes as a side effect of that series, but the actual Remus
content needed a refresh.  This got delayed behind the Remus libxl
effort.  It is my hope that the Remus maintainers can refresh that
series and provide assistance while testing.

3) Libxl and xl support

Libxl and xl have as many problems as the libxc code did when it comes
to incompatible wire formats and layering violations.  In particular, it
is not possible to determine the bitness of the sending
libxl-saverestore-helper, meaning that legacy conversion requires active
administrator input, or at least a passive assumption that the bitness
is the same.

There is an xl/libxl part of the migration v2 series which attempts to
rectify this all in one go, as there is no alternative way of doing so. 
The libxl section of the series is certainly not yet complete, but
specific queries to the maintainers have thusfar gone unanswered.  On
the other hand, the series does basically WorkForMe, including
transparent legacy upgrade, suggesting that it is at least in an
appropriate ballpark.


*) Specific non-requirements:

There have been issues identified with dynamic (in a p2m sense) guests
and migration, which results in failed migration or image corruption. 
While these issues certainly want fixing, they are bugs which exist in
the legacy code.  As such, they are not prerequisites to fix before v2
can be accepted.


Anyway, it is my hope that this planning email can help get things on
track to start perusing active development again as soon as the 4.6 dev
window opens again, with the aim to get all the code merged as early as
possible in the dev window to allow as much testing as possible.

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel