date:20150715

[Xen-devel] [ovmf test] 59592: all pass - PUSHED

2015-07-15 Thread osstest service owner

flight 59592 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59592/

Perfect :-)
All tests in this flight passed
version targeted for testing:
 ovmf 680742607132a7733880407453b5f792699d7143
baseline version:
 ovmf 2ad9cf37a492e69a4e1b7624d92d9a35fce083fc

Last test of basis59511  2015-07-13 13:47:15 Z2 days
Testing same since59592  2015-07-15 22:42:18 Z0 days1 attempts


People who touched revisions under test:
  Ard Biesheuvel ard.biesheu...@linaro.org
  Brendan Jackman brendan.jack...@arm.com
  Bruce Cran br...@cran.org.uk
  Chao Zhang chao.b.zh...@intel.com
  fanwang2 fan.w...@intel.com
  Gabriel Somlo so...@cmu.edu
  Hao Wu hao.a...@intel.com
  Jeff Fan jeff@intel.com
  Jiaxin Wu jiaxin...@intel.com
  Laszlo Ersek ler...@redhat.com
  Olivier Martin olivier.mar...@arm.com
  Qiu Shumin shumin@intel.com
  Ronald Cron ronald.c...@arm.com
  Tapan Shah tapands...@hp.com
  Zhang Lubo lubo.zh...@intel.com

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=ovmf
+ revision=680742607132a7733880407453b5f792699d7143
+ . cri-lock-repos
++ . cri-common
+++ . cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{Repos} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push ovmf 
680742607132a7733880407453b5f792699d7143
+ branch=ovmf
+ revision=680742607132a7733880407453b5f792699d7143
+ . cri-lock-repos
++ . cri-common
+++ . cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{Repos} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . cri-common
++ . cri-getconfig
++ umask 002
+ select_xenbranch
+ case $branch in
+ tree=ovmf
+ xenbranch=xen-unstable
+ '[' xovmf = xlinux ']'
+ linuxbranch=
+ '[' x = x ']'
+ qemuubranch=qemu-upstream-unstable
+ : tested/2.6.39.x
+ . ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{OsstestUpstream} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/staging/qemu-xen-unstable.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://xenbits.xen.org/libvirt.git
++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git
++ : git://xenbits.xen.org/libvirt.git
++ : git://xenbits.xen.org/rumpuser-xen.git
++ : git
++ : git://xenbits.xen.org/rumpuser-xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/rumpuser-xen.git
+++ besteffort_repo https://github.com/rumpkernel/rumpkernel-netbsd-src
+++ local repo=https://github.com/rumpkernel/rumpkernel-netbsd-src
+++ cached_repo https://github.com/rumpkernel/rumpkernel-netbsd-src 
'[fetch=try]'
+++ local repo=https://github.com/rumpkernel/rumpkernel-netbsd-src
+++ local 'options=[fetch=try]'
 getconfig GitCacheProxy
 perl -e '

Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 05/25] libxl/remus: introduce libxl__remus_setup

2015-07-15 Thread Yang Hongyang




On 07/15/2015 07:26 PM, Ian Campbell wrote:

On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote:

Refactoring Remus setup by introducing libxl__remus_setup API.
All Remus setup work are done in this function.

Also remove the libxl__ prefix for static functions.


There is a subtle behavioural change here, which is that if anything
which is now done in _setup fails then the result is a call to
dss-callback( ..,..,ERROR_FAIL) rather than _start returning
AO_CREATE_FAIL(ERROR_FAIL).

I think this is probably a reasonable and correct change, but I think it
is worth mentioning in the commit log.


Yes, will update the commit log.



That said, I also wonder if the actual check for netbuffer_enabled (the
only such failure in practice) ought to be moved up such that it stays
in _start along with the other similar checks, i.e. _start would do:

 if (libxl_defbool_val(info-netbuf)  !libxl__netbuffer_enabled(gc)) {
 LOG(ERROR, Remus: No support for network buffering);
 rc = ERROR_FAIL;
 goto out;
 }


This check is for Remus only, we want to reuse _start for COLO, so anything
related to Remus only should sit in libxl_remus.c.



while _setup would do:

 if (libxl_defbool_val(info-netbuf)) {
 // MAYBE : assert(libxl__netbuffer_enabled(gc))
 rds-device_kind_flags |= (1  LIBXL__DEVICE_KIND_VIF);
 }

Ian.

.



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 11/25] tools/libxc: support to resume uncooperative HVM guests

2015-07-15 Thread Yang Hongyang




On 07/15/2015 08:26 PM, Ian Campbell wrote:

On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote:

From: Wen Congyang we...@cn.fujitsu.com

1. suspend
a. PVHVM and PV: we use the same way to suspend the guest (send the suspend
request to the guest). If the guest doesn't support evtchn, the xenstore
variant will be used, suspending the guest via XenBus control node.
b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend
the guest

2. Resume:
a. fast path
In this case, we don't change the guest's state.
PV: modify the return code to 1, and than call the domctl:
XEN_DOMCTL_resumedomain
PVHVM: same with PV
HVM: do nothing in modify_returncode, and than call the domctl:
 XEN_DOMCTL_resumedomain
b. slow
Used when the guest's state have been changed.
PV: update start info, and reset all secondary CPU states. Than call the
domctl: XEN_DOMCTL_resumedomain
PVHVM and HVM can not be resumed.

For PVHVM, in my test, only call the domctl: XEN_DOMCTL_resumedomain
can work. I am not sure if we should update start info and reset all
secondary CPU states.

For pure HVM guest, in my test, only call the domctl:
XEN_DOMCTL_resumedomain can work.

So we can call libxl__domain_resume(..., 1) if we don't change the guest
state, otherwise call libxl__domain_resume(..., 0).

Under COLO, we will update the guest's state(modify memory, cpu's registers,
device status...). In this case, we cannot use the fast path to resume it.
Keep the return code 0, and use a slow path to resume the guest. While
resuming HVM using slow path is not supported currently, this patch is to
make the resume call do not fail.


I'm afraid that the addition of this paragraph has not really addressed
my comment on v3:

 I'm afraid I think the commit message for this patch (and the 
associated
 doc comments) need revisiting almost from scratch, to clearly explain
 what this patch is doing and why and what the constraints on the new
 functionality will be.

 At the moment it mostly talks in a confusing way about the old 
behaviour
 and adds very specific assumptions to the new function which are not
 made clear.

It also appears that this has not been addressed:

 Hrm, so it sounds here like the correctness of this new functionality
 requires the caller to have not messed with the domain's state? What
 sort of changes are to the guest state are we talking about here?


This is used for secondary, at a checkpoint, we do:
1. suspend the guest
2. sync the guest state with primary  == here the guest state has been changed
3. resume the guest
The guest state is changed by step 2, then we will resume the guest, since
the guest state has been changed, we cannot use the fast path to resume it.
For slow path, resume HVM is not supported currently, this patch is to add
the support.

While the XEN_DOMCTL_resumedomain hyper call for HVM is an NOP, it happens
to me that we could do this in a different way. We can modify
libxl__domain_resume, if the domain is HVM, we skip the xc_domain_resume
call, what do you think?



 Isn't that a new requirement for this call? If so then it should be
 documented somewhere, specifically what sorts of changes are and are 
not
 allowed and the types of guests which are affected.

The two usages of in my test in the commit message also do not inspire
confidence that this change is understood to be correct, vs. happening
to be something which works for you.

Ian.


Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
  tools/libxc/xc_resume.c | 22 ++
  1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
index e67bebd..bd82334 100644
--- a/tools/libxc/xc_resume.c
+++ b/tools/libxc/xc_resume.c
@@ -109,6 +109,23 @@ static int xc_domain_resume_cooperative(xc_interface *xch, 
uint32_t domid)
  return do_domctl(xch, domctl);
  }

+static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)
+{
+DECLARE_DOMCTL;
+
+/*
+ * If it is PVHVM, the hypercall return code is 0, because this
+ * is not a fast path resume, we do not modify_returncode as in
+ * xc_domain_resume_cooperative.
+ * (resuming it in a new domain context)
+ *
+ * If it is a HVM, the hypercall is a NOP.
+ */
+domctl.cmd = XEN_DOMCTL_resumedomain;
+domctl.domain = domid;
+return do_domctl(xch, domctl);
+}
+
  static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
  {
  DECLARE_DOMCTL;
@@ -138,10 +155,7 @@ static int xc_domain_resume_any(xc_interface *xch, 
uint32_t domid)
   */
  #if defined(__i386__) || defined(__x86_64__)
  if ( info.hvm )
-{
-ERROR(Cannot resume uncooperative HVM guests);
-return rc;
-}
+return xc_domain_resume_hvm(xch, domid);

  if (

Re: [Xen-devel] [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT

2015-07-15 Thread Yang Hongyang




On 07/15/2015 09:35 PM, Ian Jackson wrote:

When the timeout set for prompting the next remus iteration fires, we
should not treat the ERROR_TIMEDOUT as an error.

Bug in 31c836f4 libxl: events: Permit timeouts to signal ao abort.

Reported-by: Yang Hongyang yan...@cn.fujitsu.com
Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com
CC: Yang Hongyang yan...@cn.fujitsu.com
CC: Wei Liu wei.l...@citrix.com
CC: Ian Campbell ian.campb...@citrix.com


Acked-by: Yang Hongyang yan...@cn.fujitsu.com


---
  tools/libxl/libxl_dom.c |3 +++
  1 file changed, 3 insertions(+)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 81adb3d..4cb247a 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -2024,6 +2024,9 @@ static void remus_next_checkpoint(libxl__egc *egc, 
libxl__ev_time *ev,

  STATE_AO_GC(dss-ao);

+if (rc == ERROR_TIMEDOUT) /* As intended */
+rc = 0;
+
  /*
   * Time to checkpoint the guest again. We return 1 to libxc
   * (xc_domain_save.c). in order to continue executing the infinite loop



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] libxl/remus: fix the return value of the checkpoint callback

2015-07-15 Thread Yang Hongyang




On 07/15/2015 09:37 PM, Ian Jackson wrote:

Ian Campbell writes (Re: [PATCH] libxl/remus: fix the return value of the 
checkpoint callback):

Does that mean it won't apply to current staging?


Indeed it doesn't.


I think we probably want this fix ASAP rather than waiting for that
series?


Yes.  Patch just sent.  Untested but fairly obvious.  Yang, do you
want to test this, or do you want us to apply it as-is ?  I don't have
a remus test setup.


Please apply, thanks!



This is the second rc-handling bug in 31c836f4 libxl: events: Permit
timeouts to signal ao abort.  I am going to re-read that patch to see
if I can find any more.

Ian.
.



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 04/25] tools/libxl: rename remus checkpoint callbacks

2015-07-15 Thread Yang Hongyang


On 07/15/2015 07:17 PM, Ian Campbell wrote:

On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote:

There are 2 remus checkpoint callbacks(save/restore), currently, they
both called libxl__remus_domain_checkpoint_callback in diffrent
file, so it is ok. But in the following patch, we will move all of the
remus callback code into a seperate file, the name should be diffrent.


separate and different (twice).


OK, thanks!




So rename them to:
   libxl__remus_domain_{save/restore}_checkpoint_callback

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com


Acked-by: Ian Campbell ian.campb...@citrix.com


CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
  tools/libxl/libxl_create.c | 4 ++--
  tools/libxl/libxl_dom.c| 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 5b4d333..a32e3df 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -677,7 +677,7 @@ static int store_libxl_entry(libxl__gc *gc, uint32_t domid,
  static void remus_checkpoint_stream_done(
  libxl__egc *egc, libxl__stream_read_state *srs, int rc);

-static void libxl__remus_domain_checkpoint_callback(void *data)
+static void libxl__remus_domain_restore_checkpoint_callback(void *data)
  {
  libxl__save_helper_state *shs = data;
  libxl__domain_create_state *dcs = shs-caller_state;
@@ -989,7 +989,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
  }

  /* Restore */
-callbacks-checkpoint = libxl__remus_domain_checkpoint_callback;
+callbacks-checkpoint = libxl__remus_domain_restore_checkpoint_callback;

  rc = libxl__build_pre(gc, domid, d_config, state);
  if (rc)
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 0788309..9c61fa7 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1586,7 +1586,7 @@ static void remus_next_checkpoint(libxl__egc *egc, 
libxl__ev_time *ev,
const struct timeval *requested_abs,
int rc);

-static void libxl__remus_domain_checkpoint_callback(void *data)
+static void libxl__remus_domain_save_checkpoint_callback(void *data)
  {
  libxl__save_helper_state *shs = data;
  libxl__domain_suspend_state *dss = shs-caller_state;
@@ -1749,7 +1749,7 @@ void libxl__domain_save(libxl__egc *egc, 
libxl__domain_suspend_state *dss)
  if (r_info != NULL) {
  callbacks-suspend = libxl__remus_domain_suspend_callback;
  callbacks-postcopy = libxl__remus_domain_resume_callback;
-callbacks-checkpoint = libxl__remus_domain_checkpoint_callback;
+callbacks-checkpoint = libxl__remus_domain_save_checkpoint_callback;
  dss-sws.checkpoint_callback = remus_checkpoint_stream_written;
  } else
  callbacks-suspend = libxl__domain_suspend_callback;



.



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 06/25] libxl/remus: introduce libxl__remus_teardown

2015-07-15 Thread Yang Hongyang




On 07/15/2015 07:59 PM, Ian Campbell wrote:

On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote:

introduce libxl__remus_teardown to teardown Remus devices.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com


Acked-by: Ian Campbell ian.campb...@citrix.com

If you need to respin then you might consider inverting the if remus
check in domain_suspend_done and calling this new function if true, e.g.

 if (dss-remus) {
libxl__remus_teardown(...)
return;
 }

 dss-callback(egc, dss, rc);

I think the control flow would feel more natural then.


will do, thanks!




.



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 24/25] tools/libxl: move remus state into a seperate structure

2015-07-15 Thread Yang Hongyang




On 07/15/2015 09:28 PM, Ian Campbell wrote:

On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote:

@@ -2921,6 +2911,26 @@ _hidden void 
libxl__checkpoint_devices_preresume(libxl__egc *egc,
  libxl__checkpoint_devices_state *cds);
  _hidden void libxl__checkpoint_devices_commit(libxl__egc *egc,
  libxl__checkpoint_devices_state *cds);
+
+/*- Remus related state structure -*/
+typedef struct libxl__remus_state libxl__remus_state;
+struct libxl__remus_state {
+/* private */
+libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
+int interval; /* checkpoint interval */
+
+/* abstract layer */
+libxl__checkpoint_devices_state cds;


This mostly makes sense, I think, but this one field feels like it will
be wanted by colo too. Does that mean we will end up with dss-rs.cds
and dss-colo.cds doing effectively the same thing?


Yes, checkpoint device is an abstract layer, used by both Remus  colo,
in the abstract layer, we do not aware of remus or colo, in Remus or colo,
we can use container of cds to retrive Remus/colo state.



Ian.

.



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 12/25] tools/libxl: introduce enum type libxl_checkpointed_stream

2015-07-15 Thread Yang Hongyang




On 07/15/2015 08:34 PM, Ian Campbell wrote:

On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote:

introduce enum type libxl_checkpointed_stream in IDL.
rename the last argument of migrate_receive from remus to
checkpointed since the semantics of this parameter has
changed.

NOTE:
  libxl_domain_restore_params isn't changed here,
  checkpointed_stream is still an int.
  It has to change eventually and other callers will have to be
  updated to cope (and there should be LIBXL_HAVE_...).


Will this be fixed up later in this series? If so please say so.


It's not fixed in this series, I plan to fix this later, but seems there
will be another round for this series, I can fix this in the next version.
My main concern is that this change is an api change, it will affect the
existing callers.




@@ -4282,7 +4282,7 @@ static void migrate_domain(uint32_t domid, const char 
*rune, int debug,
  }

  static void migrate_receive(int debug, int daemonize, int monitor,
-int send_fd, int recv_fd, int remus)
+int send_fd, int recv_fd, int checkpointed)


I think you can start using the new enum type in xl straight away even
if dom_info.checkpointed_stream remains an int. So that means here.


@@ -4489,7 +4489,8 @@ int main_restore(int argc, char **argv)

  int main_migrate_receive(int argc, char **argv)
  {
-int debug = 0, daemonize = 1, monitor = 1, remus = 0;
+int debug = 0, daemonize = 1, monitor = 1;
+int checkpointed = LIBXL_CHECKPOINTED_STREAM_NONE;


and here.


@@ -4318,7 +4318,7 @@ static void migrate_receive(int debug, int daemonize, int 
monitor,

  domid = rc;

-if (remus) {
+if (checkpointed) {
  /* If we are here, it means that the sender (primary) has crashed.
   * TODO: Split-Brain Check.
   */


Is it the case that we expect all check pointing solutions will use the
same failover code here? If yes then this should be if (checkpointed !
= ...NONE).

If we think they might differ (even if remus and colo happen to be the
same) then I think a switch where the NONE case does nothing would be
more structurally appropriate.

Ian.

.



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 21/25] tools/libxl: rename remus device to checkpoint device

2015-07-15 Thread Yang Hongyang




On 07/15/2015 09:15 PM, Ian Campbell wrote:

On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote:

This patch is auto generated by the following commands:
  1. git mv tools/libxl/libxl_remus_device.c 
tools/libxl/libxl_checkpoint_device.c


This patch does not appear to have been formatted with git format-patch
-M as requested last time around.


Sorry I missed this :(
will do in the next version. btw, I have a dump question...how to specify -M
for only this patch while it is in a series?




.



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 07/25] libxl/remus: init checkpoint_callback in Remus checkpoint callback

2015-07-15 Thread Yang Hongyang




On 07/15/2015 08:02 PM, Ian Campbell wrote:

On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote:

init stream {read/write} state checkpoint_callback in Remus
checkpoint callback.


Why? Is this earlier or later than previously? Seems later?


There's no functional change, it's just refactoring so that we can move
all remus code into one file.





Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
  tools/libxl/libxl_create.c | 2 +-
  tools/libxl/libxl_dom.c| 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index a32e3df..94fe98f 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -684,6 +684,7 @@ static void 
libxl__remus_domain_restore_checkpoint_callback(void *data)
  libxl__egc *egc = shs-egc;
  STATE_AO_GC(dcs-ao);

+dcs-srs.checkpoint_callback = remus_checkpoint_stream_done;
  libxl__stream_read_start_checkpoint(egc, dcs-srs);
  }

@@ -1000,7 +1001,6 @@ static void domcreate_bootloader_done(libxl__egc *egc,
  dcs-srs.fd = restore_fd;
  dcs-srs.legacy = (dcs-restore_params.stream_version == 1);
  dcs-srs.completion_callback = domcreate_stream_done;
-dcs-srs.checkpoint_callback = remus_checkpoint_stream_done;

  libxl__stream_read_start(egc, dcs-srs);
  return;
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 77a917c..1740bed 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1593,6 +1593,7 @@ static void 
libxl__remus_domain_save_checkpoint_callback(void *data)
  libxl__egc *egc = shs-egc;
  STATE_AO_GC(dss-ao);

+dss-sws.checkpoint_callback = remus_checkpoint_stream_written;
  libxl__stream_write_start_checkpoint(egc, dss-sws);
  }

@@ -1750,7 +1751,6 @@ void libxl__domain_save(libxl__egc *egc, 
libxl__domain_suspend_state *dss)
  callbacks-suspend = libxl__remus_domain_suspend_callback;
  callbacks-postcopy = libxl__remus_domain_resume_callback;
  callbacks-checkpoint = libxl__remus_domain_save_checkpoint_callback;
-dss-sws.checkpoint_callback = remus_checkpoint_stream_written;
  } else
  callbacks-suspend = libxl__domain_suspend_callback;




.



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 21/25] tools/libxl: rename remus device to checkpoint device

2015-07-15 Thread Yang Hongyang




On 07/15/2015 09:32 PM, Ian Campbell wrote:

On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote:

  tools/libxl/libxl_types.idl   |   4 +-



diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index e8d3647..1d676ef 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -61,8 +61,8 @@ libxl_error = Enumeration(error, [
  (-15, LOCK_FAIL),
  (-16, JSON_CONFIG_EMPTY),
  (-17, DEVICE_EXISTS),
-(-18, REMUS_DEVOPS_DOES_NOT_MATCH),
-(-19, REMUS_DEVICE_NOT_SUPPORTED),
+(-18, CHECKPOINT_DEVOPS_DOES_NOT_MATCH),
+(-19, CHECKPOINT_DEVICE_NOT_SUPPORTED),
  (-20, VNUMA_CONFIG_INVALID),
  (-21, DOMAIN_NOTFOUND),
  (-22, ABORTED),


This is an API change, which I think we discussed before.


Also missed this one, sorry.



In 558bc6ee.60...@cn.fujitsu.com you said you would add an extra patch
to deal with that, and I think that needs to come before this automatic


will add the patch before the automatic renaming.


renaming so that there is no bisect hazard. I don't see any such patch
even after this point though (from grepping your colo-v8 branch).

Ian.

.



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] libxl/remus: fix the return value of the checkpoint callback

2015-07-15 Thread Yang Hongyang




On 07/15/2015 08:13 PM, Ian Campbell wrote:

On Wed, 2015-07-15 at 18:32 +0800, Yang Hongyang wrote:

In checkpoint callback, we wait for the interval and then start
another checkpoint, so the ERROR_TIMEDOUT should be intended
and should not treat as error.

This patch is based on
[PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for 
Non-stop Service


Does that mean it won't apply to current staging?


No. This can apply on top of colo pre series.



I think we probably want this fix ASAP rather than waiting for that
series?


I can resubmit the patch apply to staging, but the colo pre series will need
to be rebased...




Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
---
  tools/libxl/libxl_remus.c | 11 +++
  1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index 46dcc3c..ffc92a7 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -355,11 +355,14 @@ static void remus_next_checkpoint(libxl__egc *egc, 
libxl__ev_time *ev,
   * (xc_domain_save.c). in order to continue executing the infinite loop
   * (suspend, checkpoint, resume) in xc_domain_save().
   */
-
-if (rc)
+if (rc == ERROR_TIMEDOUT) {
+/* This is intended, we set the timeout and start another checkpoint */
+libxl__xc_domain_saverestore_async_callback_done(egc, dss-sws.shs, 
1);


Please wrap this (slightly) overlong line (and probably the comment too
which is borderline AFAICT).


+} else {
  dss-rc = rc;
-
-libxl__xc_domain_saverestore_async_callback_done(egc, dss-sws.shs, !rc);
+libxl__xc_domain_saverestore_async_callback_done(egc,
+ dss-sws.shs, !rc);
+}
  }

  /*-- remus callbacks (restore) ---*/



.



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 24/25] tools/libxl: move remus state into a seperate structure

2015-07-15 Thread Yang Hongyang




On 07/15/2015 11:08 PM, Ian Jackson wrote:

Yang Hongyang writes ([Xen-devel] [PATCH v4 --for 4.6 COLOPre 24/25] tools/libxl: 
move remus state into a seperate structure):

Add a new structure remus state, and move concrete layer's private
member to remus state.
it is pure refactoring and no functional changes.


Thanks.  I don't have much to add to what Ian Campbell has said, but


  if (dss-checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_REMUS) {
-dss-interval = r_info-interval;
  if (libxl_defbool_val(r_info-compression))
  dss-xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;


In your next version it would be worth mentioning the movement of this
initialisation in the commit message.


Ok, thanks!



Ian.
.



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun


On 2015/7/16 0:14, George Dunlap wrote:

On Wed, Jul 15, 2015 at 2:56 PM, George Dunlap
george.dun...@eu.citrix.com wrote:

Would it be possible, on a collision, to have one last stab at
allocating the BAR somewhere else, without relocating memory (or
relocating any other BARs)?  At very least then an administrator could
work around this kind of thing by setting the mmio_hole larger in the
domain config.


If it's not possible to have this last-ditch relocation effort, then


Could you take a look at the original patch #06 ?  Although Jan thought 
that is complicated, that is really one version that I can refine in 
current time slot.



yes, I'd be OK with just disabling the device for the time being.



Just let me send out new patch series based this idea. We can continue 
discuss this over there but we also need to further review other 
remaining comments based on a new revision.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 16/25] tools/libxl: Update libxl_domain_unpause() to support qemu-xen

2015-07-15 Thread Yang Hongyang




On 07/15/2015 08:50 PM, Ian Campbell wrote:

On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote:

Currently, libxl__domain_unpause() only supports
qemu-xen-traditional. Update it to support qemu-xen.
We use libxl__domain_resume_device_model to unpause guest dm.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
  tools/libxl/libxl.c | 15 +--
  1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 5b2d045..799aead 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -941,8 +941,6 @@ out:
  int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
  {
  GC_INIT(ctx);
-char *path;
-char *state;
  int ret, rc = 0;

  libxl_domain_type type = libxl__domain_type(gc, domid);
@@ -952,14 +950,11 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
  }

  if (type == LIBXL_DOMAIN_TYPE_HVM) {
-uint32_t dm_domid = libxl_get_stubdom_id(ctx, domid);
-
-path = libxl__device_model_xs_path(gc, dm_domid, domid, /state);
-state = libxl__xs_read(gc, XBT_NULL, path);
-if (state != NULL  !strcmp(state, paused)) {
-libxl__qemu_traditional_cmd(gc, domid, continue);
-libxl__wait_for_device_model_deprecated(gc, domid, running,
- NULL, NULL, NULL);
+rc = libxl__domain_resume_device_model(gc, domid);
+if (rc  0) {
+LIBXL__LOG(ctx, LIBXL__LOG_ERROR, failed to unpause device model 
+   for domain %u:%d, domid, rc);


Please use the preferred form of LOG(ERROR, failed to...), which
should also hopefully allow you to avoid splitting the line in the
middle of a string constant which is discouraged.

If you can't use LOG() then please:
 LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
failed to unpause device model for domain %u:%d,
 domid, rc);

Not splitting string constants means you can grep for an error message.


Sorry, the commit message is wrong, it's libxl_domain_unpause, not
libxl__domain_unpause, LOG() can't be used, so I will update commit message
and use your later suggestion, thank you!



Ian.

.



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-15 Thread Chen, Tiejun


I think I would say:

--
Now use the hypervisor-supplied memory map to build our final e820 table:
* Add regions for BIOS ranges and other special mappings not in the
hypervisor map
* Add in the hypervisor regions
* Adjust the lowmem and highmem regions if we've had to relocate
memory (adding a highmem region if necessary)
* Sort all the ranges so that they appear in memory order.
--


I'll update this and thanks a lot.





CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---


[snip]


+/* Low RAM goes here. Reserve space for special pages. */
+BUG_ON(low_mem_end  (2u  20));


Won't this BUG if the guest was actually given less than 2GiB of RAM?


2u  20 = 0x20, so this is 2M, not 2G :)




+
+/*
+ * We may need to adjust real lowmem end since we may
+ * populate RAM to get enough MMIO previously.
+ */


[snip]


+
+/*
+ * And then we also need to adjust highmem.
+ */
+if ( add_high_mem )
+{
+for ( i = 0; i  memory_map.nr_map; i++ )
+{
+if ( e820[i].type == E820_RAM 
+ e820[i].addr  (1ull  32))
+e820[i].size += add_high_mem;
+}
+}


What if there was originally no high memory, but resizing the pci hole
meant we had to create a high memory region?



You're right. We need to consider this case.

Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [linux-3.18 test] 59587: regressions - FAIL

2015-07-15 Thread osstest service owner

flight 59587 linux-3.18 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59587/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail REGR. vs. 58581

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-libvirt  6 xen-boot  fail REGR. vs. 58581
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 12 guest-localmigrate 
fail baseline untested
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 14 guest-localmigrate.2 
fail baseline untested
 test-armhf-armhf-xl-rtds 14 guest-start.2   fail baseline untested
 test-amd64-i386-libvirt-xsm  11 guest-start  fail   like 58558
 test-amd64-amd64-libvirt 11 guest-start  fail   like 58558
 test-amd64-i386-rumpuserxen-i386 15 
rumpuserxen-demo-xenstorels/xenstorels.repeat fail like 58558
 test-amd64-amd64-libvirt-xsm 11 guest-start  fail   like 58558
 test-amd64-i386-libvirt  11 guest-start  fail   like 58581
 test-armhf-armhf-xl   6 xen-boot fail   like 58581
 test-armhf-armhf-xl-credit2   6 xen-boot fail   like 58581
 test-armhf-armhf-xl-multivcpu  6 xen-boot fail  like 58581
 test-armhf-armhf-xl-xsm   6 xen-boot fail   like 58581
 test-armhf-armhf-libvirt-xsm  6 xen-boot fail   like 58581
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 58581
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail like 58581
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 58581

Tests which did not succeed, but are not blocking:
 test-amd64-i386-freebsd10-i386  9 freebsd-install  fail never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-i386-freebsd10-amd64  9 freebsd-install fail never pass
 test-armhf-armhf-xl-cubietruck  6 xen-boot fail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass

version targeted for testing:
 linux866cebe251f4fb2b435f4ecfe6d3bb4025938533
baseline version:
 linuxd048c068d00da7d4cfa5ea7651933b99026958cf

Last test of basis58581  2015-06-15 09:42:22 Z   30 days
Failing since 58976  2015-06-29 19:43:23 Z   16 days   20 attempts
Testing same since59412  2015-07-11 00:18:42 Z5 days8 attempts


308 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  pass
 build-i386-rumpuserxen   pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  fail
 test-amd64-i386-xl   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsmfail
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm fail
 test-amd64-amd64-libvirt-xsm fail
 test-armhf-armhf-libvirt-xsm fail
 test-amd64-i386-libvirt-xsm  fail
 test-amd64-amd64-xl-xsm  pass
 test-armhf-armhf-xl-xsm  fail
 test-amd64-i386-xl-xsm   pass
 test-amd64-amd64-xl-pvh-amd

Re: [Xen-devel] [PATCH v6] dmar: device scope mem leak fix

2015-07-15 Thread Zhang, Yang Z

elena.ufimts...@oracle.com wrote on 2015-07-07:
 From: Elena Ufimtseva elena.ufimts...@oracle.com
 
 Release memory allocated for scope.devices dmar units on various
 failure paths and when disabling dmar. Set device count after
 successful memory allocation, not before, in device scope parsing function.
 
 Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com

Acked-by: Yang Zhang yang.z.zh...@intel.com

Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 00/25] Prerequisite patches for COLO

2015-07-15 Thread Yang Hongyang


Seems my reply emails last night are lost. they didn't appear on the
list, I'm going to repost them.

On 07/15/2015 03:45 PM, Yang Hongyang wrote:

This patchset is Prerequisite for COLO feature. Refer to:
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping

This patchse is based on Andrew Cooper's Libxl migration v4.1:
   
http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/libxl-migv2-v4.1

In this version, I moved some of the COLO specific patches down to the COLO
main series, so most patches of this series are refactoring and can be applied
first.

I've done some simple test. Both Remus and normal migration work after apply
this patchset. The patch to fix Remus on migration v2 will be sent later as
a seperate patch.

You can also get the patchset from:
   https://github.com/macrosheep/xen/tree/colo-v8

v3-v4:
  - Rebased to the latest migration v2 branch
  - Addressed comments from last round

v2-v3:
  - Merge '[PATCH v2 0/6] Misc cleanups for libxl' into this patchset
for easy review
  - Addressed review comments
  - Add back channel to libxc
  - Introduce should_checkpoint callback
  - Introduce DIRTY_BITMAP record on libxc side
  - Introduce COLO_CONTEXT record on libxl side
  - Ported to Libxl migration v2

v1-v2:
  - Rebased to [PATCH v2 0/6] Misc cleanups for libxl
  - Add a bugfix for the error handling of process_record


Wen Congyang (2):
   tools/libxc: support to resume uncooperative HVM guests
   tools/libxl: Add back channel to allow migration target send data back

Yang Hongyang (23):
   tools/libxl: rename libxl__domain_suspend to libxl__domain_save
A  tools/libxl: move domain suspend code into libxl_dom_suspend.c
A  tools/libxl: move domain resume code into libxl_dom_suspend.c
   tools/libxl: rename remus checkpoint callbacks
   libxl/remus: introduce libxl__remus_setup
   libxl/remus: introduce libxl__remus_teardown
   libxl/remus: init checkpoint_callback in Remus checkpoint callback
   tools/libxl: move remus code into libxl_remus.c
A  tools/libxl: move save/restore code into libxl_dom_save.c
   libxl/save: Refactor libxl__domain_suspend_state
   tools/libxl: introduce enum type libxl_checkpointed_stream
   migration/save: pass checkpointed_stream from libxl to libxc
   tools/libxl: introduce libxl__domain_restore_device_model to load qemu
 state
   tools/libxl: check QEMU state before resume dm
   tools/libxl: Update libxl_domain_unpause() to support qemu-xen
A  tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty()
A  tools/libxl: export logdirty_init
   tools/libx{l,c}: add back channel to libxc
   tools/libxl: rename remus device to checkpoint device
A  tools/libxl: adjust the indentation
   tools/libxl: store remus_ops in checkpoint device state
   tools/libxl: move remus state into a seperate structure
   tools/libxl: seperate device init/cleanup from checkpoint device layer

  tools/libxc/include/xenguest.h|   13 +-
  tools/libxc/xc_domain_restore.c   |4 +-
  tools/libxc/xc_domain_save.c  |6 +-
  tools/libxc/xc_nomigrate.c|3 +-
  tools/libxc/xc_resume.c   |   22 +-
  tools/libxc/xc_sr_common.h|2 +-
  tools/libxc/xc_sr_restore.c   |2 +-
  tools/libxc/xc_sr_save.c  |5 +-
  tools/libxl/Makefile  |5 +-
  tools/libxl/libxl.c   |  119 +---
  tools/libxl/libxl.h   |   30 +-
  tools/libxl/libxl_checkpoint_device.c |  282 
  tools/libxl/libxl_create.c|   33 +-
  tools/libxl/libxl_dom.c   | 1243 -
  tools/libxl/libxl_dom_save.c  |  721 +++
  tools/libxl/libxl_dom_suspend.c   |  503 +
  tools/libxl/libxl_internal.h  |  246 ---
  tools/libxl/libxl_netbuffer.c |  117 ++--
  tools/libxl/libxl_nonetbuffer.c   |   10 +-
  tools/libxl/libxl_qmp.c   |   10 +
  tools/libxl/libxl_remus.c |  395 +++
  tools/libxl/libxl_remus_device.c  |  327 -
  tools/libxl/libxl_remus_disk_drbd.c   |   56 +-
  tools/libxl/libxl_save_callout.c  |   43 +-
  tools/libxl/libxl_save_helper.c   |9 +-
  tools/libxl/libxl_stream_write.c  |   14 +-
  tools/libxl/libxl_types.idl   |   10 +-
  tools/libxl/xl_cmdimpl.c  |   21 +-
  tools/ocaml/libs/xl/xenlight_stubs.c  |2 +-
  29 files changed, 2321 insertions(+), 1932 deletions(-)
  create mode 100644 tools/libxl/libxl_checkpoint_device.c
  create mode 100644 tools/libxl/libxl_dom_save.c
  create mode 100644 tools/libxl/libxl_dom_suspend.c
  create mode 100644 tools/libxl/libxl_remus.c
  delete mode 100644 tools/libxl/libxl_remus_device.c



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 --for 4.6 COLOPre 22/25] tools/libxl: adjust the indentation

2015-07-15 Thread Yang Hongyang

This is just tidying up after the previous automatic renaming.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Acked-by: Ian Campbell ian.campb...@citrix.com
---
 tools/libxl/libxl_checkpoint_device.c | 21 +++--
 tools/libxl/libxl_internal.h  | 19 +++
 2 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/tools/libxl/libxl_checkpoint_device.c 
b/tools/libxl/libxl_checkpoint_device.c
index 109cd23..226f159 100644
--- a/tools/libxl/libxl_checkpoint_device.c
+++ b/tools/libxl/libxl_checkpoint_device.c
@@ -73,9 +73,9 @@ static void devices_teardown_cb(libxl__egc *egc,
 /* checkpoint device setup and teardown */
 
 static libxl__checkpoint_device* checkpoint_device_init(libxl__egc *egc,
-  libxl__checkpoint_devices_state 
*cds,
-  libxl__device_kind kind,
-  void *libxl_dev)
+libxl__checkpoint_devices_state *cds,
+libxl__device_kind kind,
+void *libxl_dev)
 {
 libxl__checkpoint_device *dev = NULL;
 
@@ -89,9 +89,10 @@ static libxl__checkpoint_device* 
checkpoint_device_init(libxl__egc *egc,
 }
 
 static void checkpoint_devices_setup(libxl__egc *egc,
-libxl__checkpoint_devices_state *cds);
+ libxl__checkpoint_devices_state *cds);
 
-void libxl__checkpoint_devices_setup(libxl__egc *egc, 
libxl__checkpoint_devices_state *cds)
+void libxl__checkpoint_devices_setup(libxl__egc *egc,
+ libxl__checkpoint_devices_state *cds)
 {
 int i, rc;
 
@@ -137,7 +138,7 @@ out:
 }
 
 static void checkpoint_devices_setup(libxl__egc *egc,
-libxl__checkpoint_devices_state *cds)
+ libxl__checkpoint_devices_state *cds)
 {
 int i, rc;
 
@@ -285,12 +286,12 @@ static void devices_checkpoint_cb(libxl__egc *egc,
 
 /* API implementations */
 
-#define define_checkpoint_api(api)\
-void libxl__checkpoint_devices_##api(libxl__egc *egc,\
-libxl__checkpoint_devices_state *cds)\
+#define define_checkpoint_api(api)  \
+void libxl__checkpoint_devices_##api(libxl__egc *egc,   \
+libxl__checkpoint_devices_state *cds)   \
 {   \
 int i;  \
-libxl__checkpoint_device *dev;   \
+libxl__checkpoint_device *dev;  \
 \
 STATE_AO_GC(cds-ao);   \
 \
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 901e216..af992fc 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2765,7 +2765,8 @@ typedef struct libxl__save_helper_state {
  * Each device type needs to implement the interfaces specified in
  * the libxl__checkpoint_device_instance_ops if it wishes to support Remus.
  *
- * The high-level control flow through the checkpoint device layer is shown 
below:
+ * The high-level control flow through the checkpoint device layer is shown
+ * below:
  *
  * xl remus
  *  |-  libxl_domain_remus_start
@@ -2826,7 +2827,8 @@ int 
init_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
 void cleanup_subkind_drbd_disk(libxl__checkpoint_devices_state *cds);
 
 typedef void libxl__checkpoint_callback(libxl__egc *,
-   libxl__checkpoint_devices_state *, int rc);
+libxl__checkpoint_devices_state *,
+int rc);
 
 /*
  * State associated with a checkpoint invocation, including parameters
@@ -2834,7 +2836,7 @@ typedef void libxl__checkpoint_callback(libxl__egc *,
  * save/restore machinery.
  */
 struct libxl__checkpoint_devices_state {
-/* must be set by caller of libxl__checkpoint_device_(setup|teardown) 
*/
+/*-- must be set by caller of libxl__checkpoint_device_(setup|teardown) 
--*/
 
 libxl__ao *ao;
 uint32_t domid;
@@ -2847,7 +2849,8 @@ struct libxl__checkpoint_devices_state {
 /*
  * this array is allocated before setup the checkpoint devices by the
  * checkpoint abstract layer.
- * devs may be NULL, means there's no checkpoint devices that has been set 
up.
+ * devs may be NULL, means there's no checkpoint devices that has been
+ * set up.
  *

[Xen-devel] [PATCH v4 --for 4.6 COLOPre 21/25] tools/libxl: rename remus device to checkpoint device

2015-07-15 Thread Yang Hongyang

This patch is auto generated by the following commands:
 1. git mv tools/libxl/libxl_remus_device.c 
tools/libxl/libxl_checkpoint_device.c
 2. perl -pi -e 's/libxl_remus_device/libxl_checkpoint_device/g' 
tools/libxl/Makefile
 3. perl -pi -e 's/\blibxl__remus_devices/libxl__checkpoint_devices/g' 
tools/libxl/*.[ch]
 4. perl -pi -e 's/\blibxl__remus_device\b/libxl__checkpoint_device/g' 
tools/libxl/*.[ch]
 5. perl -pi -e 
's/\blibxl__remus_device_instance_ops\b/libxl__checkpoint_device_instance_ops/g'
 tools/libxl/*.[ch]
 6. perl -pi -e 's/\blibxl__remus_callback\b/libxl__checkpoint_callback/g' 
tools/libxl/*.[ch]
 7. perl -pi -e 's/\bremus_device_init\b/checkpoint_device_init/g' 
tools/libxl/*.[ch]
 8. perl -pi -e 's/\bremus_devices_setup\b/checkpoint_devices_setup/g' 
tools/libxl/*.[ch]
 9. perl -pi -e 's/\bdefine_remus_checkpoint_api\b/define_checkpoint_api/g' 
tools/libxl/*.[ch]
10. perl -pi -e 's/\brds\b/cds/g' tools/libxl/*.[ch]
11. perl -pi -e 's/REMUS_DEVICE/CHECKPOINT_DEVICE/g' tools/libxl/*.[ch] 
tools/libxl/*.idl
12. perl -pi -e 's/REMUS_DEVOPS/CHECKPOINT_DEVOPS/g' tools/libxl/*.[ch] 
tools/libxl/*.idl
13. perl -pi -e 's/\bremus\b/checkpoint/g' 
tools/libxl/libxl_checkpoint_device.[ch]
14. perl -pi -e 's/\bremus device/checkpoint device/g' 
tools/libxl/libxl_internal.h
15. perl -pi -e 's/\bRemus device/checkpoint device/g' 
tools/libxl/libxl_internal.h
16. perl -pi -e 's/\bremus abstract/checkpoint abstract/g' 
tools/libxl/libxl_internal.h
17. perl -pi -e 's/\bremus invocation/checkpoint invocation/g' 
tools/libxl/libxl_internal.h
18. perl -pi -e 's/\blibxl__remus_device_\(/libxl__checkpoint_device_(/g' 
tools/libxl/libxl_internal.h

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 tools/libxl/Makefile  |   2 +-
 tools/libxl/libxl_checkpoint_device.c | 327 ++
 tools/libxl/libxl_internal.h  | 112 ++--
 tools/libxl/libxl_netbuffer.c | 108 +--
 tools/libxl/libxl_nonetbuffer.c   |  10 +-
 tools/libxl/libxl_remus.c |  76 
 tools/libxl/libxl_remus_device.c  | 327 --
 tools/libxl/libxl_remus_disk_drbd.c   |  52 +++---
 tools/libxl/libxl_types.idl   |   4 +-
 9 files changed, 509 insertions(+), 509 deletions(-)
 create mode 100644 tools/libxl/libxl_checkpoint_device.c
 delete mode 100644 tools/libxl/libxl_remus_device.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 2e4c944..3cb3ae9 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -62,7 +62,7 @@ else
 LIBXL_OBJS-y += libxl_no_convert_callout.o
 endif
 
-LIBXL_OBJS-y += libxl_remus.o libxl_remus_device.o libxl_remus_disk_drbd.o
+LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl_checkpoint_device.c 
b/tools/libxl/libxl_checkpoint_device.c
new file mode 100644
index 000..109cd23
--- /dev/null
+++ b/tools/libxl/libxl_checkpoint_device.c
@@ -0,0 +1,327 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Yang Hongyang yan...@cn.fujitsu.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include libxl_osdeps.h /* must come before any other headers */
+
+#include libxl_internal.h
+
+extern const libxl__checkpoint_device_instance_ops remus_device_nic;
+extern const libxl__checkpoint_device_instance_ops remus_device_drbd_disk;
+static const libxl__checkpoint_device_instance_ops *remus_ops[] = {
+remus_device_nic,
+remus_device_drbd_disk,
+NULL,
+};
+
+/*- helper functions -*/
+
+static int init_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+/* init device subkind-specific state in the libxl ctx */
+int rc;
+STATE_AO_GC(cds-ao);
+
+if (libxl__netbuffer_enabled(gc)) {
+rc = init_subkind_nic(cds);
+if (rc) goto out;
+}
+
+rc = init_subkind_drbd_disk(cds);
+if (rc) goto out;
+
+rc = 0;
+out:
+return rc;
+}
+
+static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+/* cleanup device subkind-specific state in the libxl ctx */
+STATE_AO_GC(cds-ao);
+
+if (libxl__netbuffer_enabled(gc))
+cleanup_subkind_nic(cds);
+
+cleanup_subkind_drbd_disk(cds);
+}
+
+/*- setup() and teardown() -*/
+
+/* callbacks */

[Xen-devel] [PATCH v4 --for 4.6 COLOPre 25/25] tools/libxl: seperate device init/cleanup from checkpoint device layer

2015-07-15 Thread Yang Hongyang

we call (init|cleanup)_subkind_nic and (init|cleanup)_subkind_drbd_disk
directly in checkpoint device. Move them to libxl_remus.c, Call them before
calling libxl__checkpoint_devices_setup() or after calling
libxl__checkpoint_devices_teardown().
it is pure refactoring and no functional changes.

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 tools/libxl/libxl_checkpoint_device.c | 42 ++-
 tools/libxl/libxl_remus.c | 42 +++
 2 files changed, 44 insertions(+), 40 deletions(-)

diff --git a/tools/libxl/libxl_checkpoint_device.c 
b/tools/libxl/libxl_checkpoint_device.c
index bbc6dc4..0a16dbb 100644
--- a/tools/libxl/libxl_checkpoint_device.c
+++ b/tools/libxl/libxl_checkpoint_device.c
@@ -17,38 +17,6 @@
 
 #include libxl_internal.h
 
-/*- helper functions -*/
-
-static int init_device_subkind(libxl__checkpoint_devices_state *cds)
-{
-/* init device subkind-specific state in the libxl ctx */
-int rc;
-STATE_AO_GC(cds-ao);
-
-if (libxl__netbuffer_enabled(gc)) {
-rc = init_subkind_nic(cds);
-if (rc) goto out;
-}
-
-rc = init_subkind_drbd_disk(cds);
-if (rc) goto out;
-
-rc = 0;
-out:
-return rc;
-}
-
-static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
-{
-/* cleanup device subkind-specific state in the libxl ctx */
-STATE_AO_GC(cds-ao);
-
-if (libxl__netbuffer_enabled(gc))
-cleanup_subkind_nic(cds);
-
-cleanup_subkind_drbd_disk(cds);
-}
-
 /*- setup() and teardown() -*/
 
 /* callbacks */
@@ -86,14 +54,10 @@ static void checkpoint_devices_setup(libxl__egc *egc,
 void libxl__checkpoint_devices_setup(libxl__egc *egc,
  libxl__checkpoint_devices_state *cds)
 {
-int i, rc;
+int i;
 
 STATE_AO_GC(cds-ao);
 
-rc = init_device_subkind(cds);
-if (rc)
-goto out;
-
 cds-num_devices = 0;
 cds-num_nics = 0;
 cds-num_disks = 0;
@@ -126,7 +90,7 @@ void libxl__checkpoint_devices_setup(libxl__egc *egc,
 return;
 
 out:
-cds-callback(egc, cds, rc);
+cds-callback(egc, cds, 0);
 }
 
 static void checkpoint_devices_setup(libxl__egc *egc,
@@ -263,8 +227,6 @@ static void devices_teardown_cb(libxl__egc *egc,
 cds-disks = NULL;
 cds-num_disks = 0;
 
-cleanup_device_subkind(cds);
-
 cds-callback(egc, cds, rc);
 }
 
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index 91abf8e..46dcc3c 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -26,6 +26,38 @@ static const libxl__checkpoint_device_instance_ops 
*remus_ops[] = {
 NULL,
 };
 
+/*- helper functions -*/
+
+static int init_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+/* init device subkind-specific state in the libxl ctx */
+int rc;
+STATE_AO_GC(cds-ao);
+
+if (libxl__netbuffer_enabled(gc)) {
+rc = init_subkind_nic(cds);
+if (rc) goto out;
+}
+
+rc = init_subkind_drbd_disk(cds);
+if (rc) goto out;
+
+rc = 0;
+out:
+return rc;
+}
+
+static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+/* cleanup device subkind-specific state in the libxl ctx */
+STATE_AO_GC(cds-ao);
+
+if (libxl__netbuffer_enabled(gc))
+cleanup_subkind_nic(cds);
+
+cleanup_subkind_drbd_disk(cds);
+}
+
 /* Remus setup and teardown -*/
 
 static void remus_setup_done(libxl__egc *egc,
@@ -60,6 +92,12 @@ void libxl__remus_setup(libxl__egc *egc, libxl__remus_state 
*rs)
 cds-ops = remus_ops;
 rs-interval = info-interval;
 
+if (init_device_subkind(cds)) {
+LOG(ERROR, Remus: failed to init device subkind for guest %u,
+dss-domid);
+goto out;
+}
+
 libxl__checkpoint_devices_setup(egc, cds);
 return;
 
@@ -94,6 +132,8 @@ static void remus_setup_failed(libxl__egc *egc,
 LOG(ERROR, Remus: failed to teardown device after setup failed
  for guest with domid %u, rc %d, dss-domid, rc);
 
+cleanup_device_subkind(cds);
+
 dss-callback(egc, dss, rc);
 }
 
@@ -123,6 +163,8 @@ static void remus_teardown_done(libxl__egc *egc,
 LOG(ERROR, Remus: failed to teardown device for guest with domid %u,
  rc %d, dss-domid, rc);
 
+cleanup_device_subkind(cds);
+
 dss-callback(egc, dss, rc);
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 --for 4.6 COLOPre 24/25] tools/libxl: move remus state into a seperate structure

2015-07-15 Thread Yang Hongyang

Add a new structure remus state, and move concrete layer's private
member to remus state.
it is pure refactoring and no functional changes.

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 tools/libxl/libxl.c |  2 +-
 tools/libxl/libxl_dom_save.c|  3 +--
 tools/libxl/libxl_internal.h| 38 ---
 tools/libxl/libxl_netbuffer.c   | 51 +
 tools/libxl/libxl_remus.c   | 38 ++-
 tools/libxl/libxl_remus_disk_drbd.c |  8 +++---
 6 files changed, 79 insertions(+), 61 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index fcf91f1..5502709 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -845,7 +845,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, 
libxl_domain_remus_info *info,
 assert(info);
 
 /* Point of no return */
-libxl__remus_setup(egc, dss);
+libxl__remus_setup(egc, dss-rs);
 return AO_INPROGRESS;
 
  out:
diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index 9364a1d..9b7159f 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -428,7 +428,6 @@ void libxl__domain_save(libxl__egc *egc, 
libxl__domain_save_state *dss)
   | (dss-hvm ? XCFLAGS_HVM : 0);
 
 if (dss-checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_REMUS) {
-dss-interval = r_info-interval;
 if (libxl_defbool_val(r_info-compression))
 dss-xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
 }
@@ -578,7 +577,7 @@ static void domain_save_done(libxl__egc *egc,
  * from sending checkpoints. Teardown the network buffers and
  * release netlink resources.  This is an async op.
  */
-libxl__remus_teardown(egc, dss, rc);
+libxl__remus_teardown(egc, dss-rs, rc);
 }
 
 /*= Domain restore */
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index d92eabc..9c81d8d 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2864,16 +2864,6 @@ struct libxl__checkpoint_devices_state {
 int num_disks;
 
 libxl__multidev multidev;
-
-/*- private for concrete (device-specific) layer only -*/
-
-/* private for nic device subkind ops */
-char *netbufscript;
-struct nl_sock *nlsock;
-struct nl_cache *qdisc_cache;
-
-/* private for drbd disk subkind ops */
-char *drbd_probe_script;
 };
 
 /*
@@ -2921,6 +2911,26 @@ _hidden void 
libxl__checkpoint_devices_preresume(libxl__egc *egc,
 libxl__checkpoint_devices_state *cds);
 _hidden void libxl__checkpoint_devices_commit(libxl__egc *egc,
 libxl__checkpoint_devices_state *cds);
+
+/*- Remus related state structure -*/
+typedef struct libxl__remus_state libxl__remus_state;
+struct libxl__remus_state {
+/* private */
+libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
+int interval; /* checkpoint interval */
+
+/* abstract layer */
+libxl__checkpoint_devices_state cds;
+
+/*- private for concrete (device-specific) layer only -*/
+/* private for nic device subkind ops */
+char *netbufscript;
+struct nl_sock *nlsock;
+struct nl_cache *qdisc_cache;
+
+/* private for drbd disk subkind ops */
+char *drbd_probe_script;
+};
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*- Legacy conversion helper -*/
@@ -3073,9 +3083,7 @@ struct libxl__domain_save_state {
 int hvm;
 int xcflags;
 libxl__domain_suspend_state dsps;
-libxl__checkpoint_devices_state cds;
-libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
-int interval; /* checkpoint interval (for Remus) */
+libxl__remus_state rs;
 libxl__stream_write_state sws;
 libxl__logdirty_switch logdirty;
 /* private for libxl__domain_save_device_model */
@@ -3490,9 +3498,9 @@ _hidden void 
libxl__remus_domain_save_checkpoint_callback(void *data);
 _hidden void libxl__remus_domain_restore_checkpoint_callback(void *data);
 /* Remus setup and teardown*/
 _hidden void libxl__remus_setup(libxl__egc *egc,
-libxl__domain_save_state *dss);
+libxl__remus_state *rs);
 _hidden void libxl__remus_teardown(libxl__egc *egc,
-   libxl__domain_save_state *dss,
+   libxl__remus_state *rs,
int rc);
 
 /*
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 33c2a42..f7a8448 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -41,18 +41,19 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
 int init_subkind_nic(libxl__checkpoint_devices_state *cds)
 {
 int rc, ret;
-libxl__domain_save_state *dss

[Xen-devel] [PATCH v4 --for 4.6 COLOPre 08/25] tools/libxl: move remus code into libxl_remus.c

2015-07-15 Thread Yang Hongyang

After previous refactoring, we are now able to move all remus code
into a separate file libxl_remus.c.

Export following functions for internal use:
- Remus callbacks
  * libxl__remus_domain_suspend_callback
  * libxl__remus_domain_resume_callback
  * libxl__remus_domain_save_checkpoint_callback
  * libxl__remus_domain_restore_checkpoint_callback
- setup/teardown Remus:
  * libxl__remus_setup
  * libxl__remus_teardown

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 tools/libxl/Makefile |   2 +-
 tools/libxl/libxl.c  |  67 -
 tools/libxl/libxl_create.c   |  22 ---
 tools/libxl/libxl_dom.c  | 223 
 tools/libxl/libxl_internal.h |  12 ++
 tools/libxl/libxl_remus.c| 339 +++
 6 files changed, 352 insertions(+), 313 deletions(-)
 create mode 100644 tools/libxl/libxl_remus.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 4a5957e..b10f4e7 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -62,7 +62,7 @@ else
 LIBXL_OBJS-y += libxl_no_convert_callout.o
 endif
 
-LIBXL_OBJS-y += libxl_remus_device.o libxl_remus_disk_drbd.o
+LIBXL_OBJS-y += libxl_remus.o libxl_remus_device.o libxl_remus_disk_drbd.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index acb5639..f1237d8 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -795,12 +795,6 @@ out:
 return ptr;
 }
 
-static void libxl__remus_setup(libxl__egc *egc,
-   libxl__domain_suspend_state *dss);
-static void remus_setup_done(libxl__egc *egc,
- libxl__remus_devices_state *rds, int rc);
-static void remus_setup_failed(libxl__egc *egc,
-   libxl__remus_devices_state *rds, int rc);
 static void remus_failover_cb(libxl__egc *egc,
   libxl__domain_suspend_state *dss, int rc);
 
@@ -857,67 +851,6 @@ int libxl_domain_remus_start(libxl_ctx *ctx, 
libxl_domain_remus_info *info,
 return AO_CREATE_FAIL(rc);
 }
 
-static void libxl__remus_setup(libxl__egc *egc,
-   libxl__domain_suspend_state *dss)
-{
-/* Convenience aliases */
-libxl__remus_devices_state *const rds = dss-rds;
-const libxl_domain_remus_info *const info = dss-remus;
-
-STATE_AO_GC(dss-ao);
-
-if (libxl_defbool_val(info-netbuf)) {
-if (!libxl__netbuffer_enabled(gc)) {
-LOG(ERROR, Remus: No support for network buffering);
-goto out;
-}
-rds-device_kind_flags |= (1  LIBXL__DEVICE_KIND_VIF);
-}
-
-if (libxl_defbool_val(info-diskbuf))
-rds-device_kind_flags |= (1  LIBXL__DEVICE_KIND_VBD);
-
-rds-ao = ao;
-rds-domid = dss-domid;
-rds-callback = remus_setup_done;
-
-libxl__remus_devices_setup(egc, rds);
-return;
-
-out:
-dss-callback(egc, dss, ERROR_FAIL);
-}
-
-static void remus_setup_done(libxl__egc *egc,
- libxl__remus_devices_state *rds, int rc)
-{
-libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
-STATE_AO_GC(dss-ao);
-
-if (!rc) {
-libxl__domain_save(egc, dss);
-return;
-}
-
-LOG(ERROR, Remus: failed to setup device for guest with domid %u, rc %d,
-dss-domid, rc);
-rds-callback = remus_setup_failed;
-libxl__remus_devices_teardown(egc, rds);
-}
-
-static void remus_setup_failed(libxl__egc *egc,
-   libxl__remus_devices_state *rds, int rc)
-{
-libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
-STATE_AO_GC(dss-ao);
-
-if (rc)
-LOG(ERROR, Remus: failed to teardown device after setup failed
- for guest with domid %u, rc %d, dss-domid, rc);
-
-dss-callback(egc, dss, rc);
-}
-
 static void remus_failover_cb(libxl__egc *egc,
   libxl__domain_suspend_state *dss, int rc)
 {
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 94fe98f..cbd7693 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -672,28 +672,6 @@ static int store_libxl_entry(libxl__gc *gc, uint32_t domid,
 libxl_device_model_version_to_string(b_info-device_model_version));
 }
 
-/*- remus asynchronous checkpoint callback -*/
-
-static void remus_checkpoint_stream_done(
-libxl__egc *egc, libxl__stream_read_state *srs, int rc);
-
-static void libxl__remus_domain_restore_checkpoint_callback(void *data)
-{
-libxl__save_helper_state *shs = data;
-libxl__domain_create_state *dcs = shs-caller_state;
-libxl__egc *egc = shs-egc;
-STATE_AO_GC(dcs-ao);
-
-dcs-srs.checkpoint_callback = remus_checkpoint_stream_done;
-

[Xen-devel] [PATCH v4 --for 4.6 COLOPre 20/25] tools/libx{l, c}: add back channel to libxc

2015-07-15 Thread Yang Hongyang

In COLO mode, both VMs are running, and are considered in sync if the
visible network traffic is identical.  After some time, they fall out of
sync.

At this point, the two VMs have definitely diverged.  Lets call the
primary dirty bitmap set A, while the secondary dirty bitmap set B.

Sets A and B are different.

Under normal migration, the page data for set A will be sent form the
primary to the secondary.

However, the set difference B - A (lets call this C) is out-of-date on
the secondary (with respect to the primary) and will not be sent by the
primary, as it was not memory dirtied by the primary.  The secondary
needs the page data for C to reconstruct an exact copy of the primary at
the checkpoint.

The secondary cannot calculate C as it doesn't know A.  Instead, the
secondary must send B to the primary, at which point the primary
calculates the union of A and B (lets call this D) which is all the
pages dirtied by both the primary and the secondary, and sends all page
data covered by D.

In the general case, D is a superset of both A and B.  Without the
backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid
copy of the primary.

We transfer the dirty bitmap on libxc side, so we need to introduce back
channel to libxc.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
commit message:
Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 tools/libxc/include/xenguest.h   |  8 
 tools/libxc/xc_domain_restore.c  |  4 ++--
 tools/libxc/xc_domain_save.c |  4 ++--
 tools/libxc/xc_sr_restore.c  |  2 +-
 tools/libxc/xc_sr_save.c |  2 +-
 tools/libxl/libxl_save_callout.c | 39 ++-
 tools/libxl/libxl_save_helper.c  |  8 ++--
 7 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 6e24b6c..4056955 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -91,13 +91,13 @@ struct save_callbacks {
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
max_iters,
uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
struct save_callbacks* callbacks, int hvm,
-   int checkpointed_stream);
+   int checkpointed_stream, int back_fd);
 
 /* Domain Save v2 */
 int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
max_iters,
 uint32_t max_factor, uint32_t flags,
 struct save_callbacks* callbacks, int hvm,
-int checkpointed_stream);
+int checkpointed_stream, int back_fd);
 
 /* callbacks provided by xc_domain_restore */
 struct restore_callbacks {
@@ -140,7 +140,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, 
uint32_t dom,
   unsigned long *console_mfn, domid_t console_domid,
   unsigned int hvm, unsigned int pae, int superpages,
   int checkpointed_stream,
-  struct restore_callbacks *callbacks);
+  struct restore_callbacks *callbacks, int back_fd);
 
 /* Domain Restore v2 */
 int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom,
@@ -149,7 +149,7 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, 
uint32_t dom,
unsigned long *console_mfn, domid_t console_domid,
unsigned int hvm, unsigned int pae, int superpages,
int checkpointed_stream,
-   struct restore_callbacks *callbacks);
+   struct restore_callbacks *callbacks, int back_fd);
 /**
  * xc_domain_restore writes a file to disk that contains the device
  * model saved state.
diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index 3cd3483..63d1e6b 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -1515,7 +1515,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, 
uint32_t dom,
   unsigned long *console_mfn, domid_t console_domid,
   unsigned int hvm, unsigned int pae, int superpages,
   int checkpointed_stream,
-  struct restore_callbacks *callbacks)
+  struct restore_callbacks *callbacks, int back_fd)
 {
 DECLARE_DOMCTL;
 xc_dominfo_t info;
@@ -1578,7 +1578,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, 
uint32_t dom,
 return xc_domain_restore2(
 xch, io_fd, dom, store_evtchn, store_mfn,
 store_domid, console_evtchn, console_mfn, console_domid,
-hvm,  pae,  superpages, checkpointed_stream, callbacks);
+hvm,  pae,  superpages, checkpointed_stream, callbacks, back_fd);
 }
 
 DPRINTF(%s: starting

[Xen-devel] [PATCH v4 --for 4.6 COLOPre 23/25] tools/libxl: store remus_ops in checkpoint device state

2015-07-15 Thread Yang Hongyang

Checkpoint device is an abstract layer to do checkpoint.
COLO can also use it to do checkpoint. But there are
still some codes in checkpoint device which touch remus.

This patch and the following 2 will seperate remus from
checkpoint device layer.

We use remus ops directly in checkpoint device. Store it
in checkpoint device state so that we do not aware of
remus_ops in the checkpoint device layer.

it is pure refactoring and no functional changes.

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 tools/libxl/libxl_checkpoint_device.c | 10 +-
 tools/libxl/libxl_internal.h  |  2 ++
 tools/libxl/libxl_remus.c |  9 +
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/tools/libxl/libxl_checkpoint_device.c 
b/tools/libxl/libxl_checkpoint_device.c
index 226f159..bbc6dc4 100644
--- a/tools/libxl/libxl_checkpoint_device.c
+++ b/tools/libxl/libxl_checkpoint_device.c
@@ -17,14 +17,6 @@
 
 #include libxl_internal.h
 
-extern const libxl__checkpoint_device_instance_ops remus_device_nic;
-extern const libxl__checkpoint_device_instance_ops remus_device_drbd_disk;
-static const libxl__checkpoint_device_instance_ops *remus_ops[] = {
-remus_device_nic,
-remus_device_drbd_disk,
-NULL,
-};
-
 /*- helper functions -*/
 
 static int init_device_subkind(libxl__checkpoint_devices_state *cds)
@@ -172,7 +164,7 @@ static void device_setup_iterate(libxl__egc *egc, 
libxl__ao_device *aodev)
 goto out;
 
 do {
-dev-ops = remus_ops[++dev-ops_index];
+dev-ops = dev-cds-ops[++dev-ops_index];
 if (!dev-ops) {
 libxl_device_nic * nic = NULL;
 libxl_device_disk * disk = NULL;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index af992fc..d92eabc 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2842,6 +2842,8 @@ struct libxl__checkpoint_devices_state {
 uint32_t domid;
 libxl__checkpoint_callback *callback;
 int device_kind_flags;
+/* The ops must be pointer array, and the last ops must be NULL */
+const libxl__checkpoint_device_instance_ops **ops;
 
 /*- private for abstract layer only -*/
 
diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c
index fb21b6d..d2e4d42 100644
--- a/tools/libxl/libxl_remus.c
+++ b/tools/libxl/libxl_remus.c
@@ -18,6 +18,14 @@
 
 #include libxl_internal.h
 
+extern const libxl__checkpoint_device_instance_ops remus_device_nic;
+extern const libxl__checkpoint_device_instance_ops remus_device_drbd_disk;
+static const libxl__checkpoint_device_instance_ops *remus_ops[] = {
+remus_device_nic,
+remus_device_drbd_disk,
+NULL,
+};
+
 /* Remus setup and teardown -*/
 
 static void remus_setup_done(libxl__egc *egc,
@@ -48,6 +56,7 @@ void libxl__remus_setup(libxl__egc *egc,
 cds-ao = ao;
 cds-domid = dss-domid;
 cds-callback = remus_setup_done;
+cds-ops = remus_ops;
 
 libxl__checkpoint_devices_setup(egc, cds);
 return;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 --for 4.6 COLOPre 10/25] libxl/save: Refactor libxl__domain_suspend_state

2015-07-15 Thread Yang Hongyang

Currently struct libxl__domain_suspend_state contains 2 type of states,
one is save state, another is suspend state. This patch separates those
two out.
The motivation of this is that COLO will need to do suspend/resume
continuously, we need a more common suspend state.

After this change, dss stands for libxl__domain_save_state,
dsps stands for libxl__domain_suspend_state.

Also introduce libxl__domain_suspend_init to initialise the
libxl__domain_suspend_state.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
CC: Andrew Cooper andrew.coop...@citrix.com
---
 tools/libxl/libxl.c  |  10 +-
 tools/libxl/libxl_dom_save.c |  69 +
 tools/libxl/libxl_dom_suspend.c  | 217 +--
 tools/libxl/libxl_internal.h |  60 +++
 tools/libxl/libxl_netbuffer.c|   2 +-
 tools/libxl/libxl_remus.c|  37 ---
 tools/libxl/libxl_save_callout.c |   2 +-
 tools/libxl/libxl_stream_write.c |  14 +--
 8 files changed, 234 insertions(+), 177 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index f1237d8..05688cd 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -796,7 +796,7 @@ out:
 }
 
 static void remus_failover_cb(libxl__egc *egc,
-  libxl__domain_suspend_state *dss, int rc);
+  libxl__domain_save_state *dss, int rc);
 
 /* TODO: Explicit Checkpoint acknowledgements via recv_fd. */
 int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
@@ -804,7 +804,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, 
libxl_domain_remus_info *info,
  const libxl_asyncop_how *ao_how)
 {
 AO_CREATE(ctx, domid, ao_how);
-libxl__domain_suspend_state *dss;
+libxl__domain_save_state *dss;
 int rc;
 
 libxl_domain_type type = libxl__domain_type(gc, domid);
@@ -852,7 +852,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, 
libxl_domain_remus_info *info,
 }
 
 static void remus_failover_cb(libxl__egc *egc,
-  libxl__domain_suspend_state *dss, int rc)
+  libxl__domain_save_state *dss, int rc)
 {
 STATE_AO_GC(dss-ao);
 /*
@@ -864,7 +864,7 @@ static void remus_failover_cb(libxl__egc *egc,
 }
 
 static void domain_suspend_cb(libxl__egc *egc,
-  libxl__domain_suspend_state *dss, int rc)
+  libxl__domain_save_state *dss, int rc)
 {
 STATE_AO_GC(dss-ao);
 libxl__ao_complete(egc,ao,rc);
@@ -883,7 +883,7 @@ int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, 
int fd, int flags,
 goto out_err;
 }
 
-libxl__domain_suspend_state *dss;
+libxl__domain_save_state *dss;
 GCNEW(dss);
 
 dss-ao = ao;
diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index d8383b1..6348cae 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -41,7 +41,7 @@ struct libxl__physmap_info {
 static void stream_done(libxl__egc *egc,
 libxl__stream_write_state *sws, int rc);
 static void domain_save_done(libxl__egc *egc,
- libxl__domain_suspend_state *dss, int rc);
+ libxl__domain_save_state *dss, int rc);
 
 /*- complicated callback, called by xc_domain_save -*/
 
@@ -59,7 +59,7 @@ static void switch_logdirty_timeout(libxl__egc *egc, 
libxl__ev_time *ev,
 static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*,
 const char *watch_path, const char *event_path);
 static void switch_logdirty_done(libxl__egc *egc,
- libxl__domain_suspend_state *dss, int rc);
+ libxl__domain_save_state *dss, int rc);
 
 static void logdirty_init(libxl__logdirty_switch *lds)
 {
@@ -73,7 +73,7 @@ static void 
domain_suspend_switch_qemu_xen_traditional_logdirty
 libxl__save_helper_state *shs)
 {
 libxl__egc *egc = shs-egc;
-libxl__domain_suspend_state *dss = shs-caller_state;
+libxl__domain_save_state *dss = shs-caller_state;
 libxl__logdirty_switch *lds = dss-logdirty;
 STATE_AO_GC(dss-ao);
 int rc;
@@ -145,7 +145,7 @@ static void domain_suspend_switch_qemu_xen_logdirty
 libxl__save_helper_state *shs)
 {
 libxl__egc *egc = shs-egc;
-libxl__domain_suspend_state *dss = shs-caller_state;
+libxl__domain_save_state *dss = shs-caller_state;
 STATE_AO_GC(dss-ao);
 int rc;
 
@@ -164,7 +164,7 @@ void libxl__domain_suspend_common_switch_qemu_logdirty
 {
 libxl__save_helper_state *shs = user;
 libxl__egc *egc = shs-egc;
-libxl__domain_suspend_state *dss = shs-caller_state;
+libxl__domain_save_state *dss = shs-caller_state;
 STATE_AO_GC(dss-ao);

[Xen-devel] [PATCH v4 --for 4.6 COLOPre 18/25] tools/libxl: export logdirty_init

2015-07-15 Thread Yang Hongyang

We need to enable logdirty on secondary, so we export logdirty_init
for internal use. Rename it to libxl__logdirty_init.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
Acked-by: Ian Campbell ian.campb...@citrix.com
---
 tools/libxl/libxl_dom_save.c | 4 ++--
 tools/libxl/libxl_internal.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index ba7fc42..9364a1d 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -61,7 +61,7 @@ static void switch_logdirty_xswatch(libxl__egc *egc, 
libxl__ev_xswatch*,
 static void switch_logdirty_done(libxl__egc *egc,
  libxl__logdirty_switch *lds, int rc);
 
-static void logdirty_init(libxl__logdirty_switch *lds)
+void libxl__logdirty_init(libxl__logdirty_switch *lds)
 {
 lds-cmd_path = 0;
 libxl__ev_xswatch_init(lds-watch);
@@ -403,7 +403,7 @@ void libxl__domain_save(libxl__egc *egc, 
libxl__domain_save_state *dss)
 }
 
 dss-rc = 0;
-logdirty_init(dss-logdirty);
+libxl__logdirty_init(dss-logdirty);
 dss-logdirty.ao = ao;
 
 dsps-ao = ao;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 0b792e3..219176e 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3025,6 +3025,8 @@ typedef struct libxl__logdirty_switch {
 libxl__ev_time timeout;
 } libxl__logdirty_switch;
 
+_hidden void libxl__logdirty_init(libxl__logdirty_switch *lds);
+
 struct libxl__domain_suspend_state {
 /* set by caller of libxl__domain_suspend_init */
 libxl__ao *ao;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 --for 4.6 COLOPre 19/25] tools/libxl: Add back channel to allow migration target send data back

2015-07-15 Thread Yang Hongyang

From: Wen Congyang we...@cn.fujitsu.com

In colo mode, slave needs to send data to master, but the io_fd
only can be written in master, and only can be read in slave.
Save recv_fd in domain_suspend_state, and send_fd in
domain_create_state.
Extend libxl_domain_create_restore API, add a send_fd param to
it.
Add LIBXL_HAVE_CREATE_RESTORE_SEND_FD to indicate the API change.

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 tools/libxl/libxl.c  |  2 +-
 tools/libxl/libxl.h  | 30 --
 tools/libxl/libxl_create.c   |  9 +
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_types.idl  |  1 +
 tools/libxl/xl_cmdimpl.c |  8 +++-
 tools/ocaml/libs/xl/xenlight_stubs.c |  2 +-
 7 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 799aead..fcf91f1 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -835,7 +835,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, 
libxl_domain_remus_info *info,
 dss-callback = remus_failover_cb;
 dss-domid = domid;
 dss-fd = send_fd;
-/* TODO do something with recv_fd */
+dss-recv_fd = recv_fd;
 dss-type = type;
 dss-live = 1;
 dss-debug = 0;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 5a7308d..c492d20 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -617,6 +617,15 @@ typedef struct libxl__ctx libxl_ctx;
 #define LIBXL_HAVE_DOMAIN_CREATE_RESTORE_PARAMS 1
 
 /*
+ * LIBXL_HAVE_DOMAIN_CREATE_RESTORE_SEND_FD 1
+ *
+ * If this is defined, libxl_domain_create_restore()'s API has changed to
+ * include a send_fd param which used for libxl migration back channel
+ * during COLO FT.
+ */
+#define LIBXL_HAVE_DOMAIN_CREATE_RESTORE_SEND_FD 1
+
+/*
  * LIBXL_HAVE_CREATEINFO_PVH
  * If this is defined, then libxl supports creation of a PVH guest.
  */
@@ -1089,7 +1098,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, 
libxl_domain_config *d_config,
 const libxl_asyncprogress_how *aop_console_how)
 LIBXL_EXTERNAL_CALLERS_ONLY;
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
-uint32_t *domid, int restore_fd,
+uint32_t *domid, int restore_fd, int send_fd,
 const libxl_domain_restore_params *params,
 const libxl_asyncop_how *ao_how,
 const libxl_asyncprogress_how *aop_console_how)
@@ -1110,7 +1119,7 @@ int static inline libxl_domain_create_restore_0x040200(
 libxl_domain_restore_params_init(params);
 
 ret = libxl_domain_create_restore(
-ctx, d_config, domid, restore_fd, params, ao_how, aop_console_how);
+ctx, d_config, domid, restore_fd, -1, params, ao_how, 
aop_console_how);
 
 libxl_domain_restore_params_dispose(params);
 return ret;
@@ -1118,6 +1127,23 @@ int static inline libxl_domain_create_restore_0x040200(
 
 #define libxl_domain_create_restore libxl_domain_create_restore_0x040200
 
+#elif defined(LIBXL_API_VERSION)  LIBXL_API_VERSION = 0x040400 \
+  LIBXL_API_VERSION  0x040600
+
+int static inline libxl_domain_create_restore_0x040400(
+libxl_ctx *ctx, libxl_domain_config *d_config,
+uint32_t *domid, int restore_fd,
+const libxl_domain_restore_params *params,
+const libxl_asyncop_how *ao_how,
+const libxl_asyncprogress_how *aop_console_how)
+LIBXL_EXTERNAL_CALLERS_ONLY
+{
+return libxl_domain_create_restore(ctx, d_config, domid, restore_fd,
+   -1, params, ao_how, aop_console_how);
+}
+
+#define libxl_domain_create_restore libxl_domain_create_restore_0x040400
+
 #endif
 
   /* A progress report will be made via ao_console_how, of type
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index cbd7693..1d4b13b 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1498,7 +1498,7 @@ static void domain_create_cb(libxl__egc *egc,
  int rc, uint32_t domid);
 
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
-uint32_t *domid, int restore_fd,
+uint32_t *domid, int restore_fd, int send_fd,
 const libxl_domain_restore_params *params,
 const libxl_asyncop_how *ao_how,
 const libxl_asyncprogress_how *aop_console_how)
@@ -1512,6 +1512,7 @@ static int do_domain_create(libxl_ctx *ctx, 
libxl_domain_config *d_config,
 libxl_domain_config_init(cdcs-dcs.guest_config_saved);
 libxl_domain_config_copy(ctx, cdcs-dcs.guest_config_saved, d_config);
 cdcs-dcs.restore_fd = cdcs-dcs.libxc_fd = restore_fd;
+

[Xen-devel] [PATCH v4 --for 4.6 COLOPre 16/25] tools/libxl: Update libxl_domain_unpause() to support qemu-xen

2015-07-15 Thread Yang Hongyang

Currently, libxl__domain_unpause() only supports
qemu-xen-traditional. Update it to support qemu-xen.
We use libxl__domain_resume_device_model to unpause guest dm.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 tools/libxl/libxl.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 5b2d045..799aead 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -941,8 +941,6 @@ out:
 int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
 {
 GC_INIT(ctx);
-char *path;
-char *state;
 int ret, rc = 0;
 
 libxl_domain_type type = libxl__domain_type(gc, domid);
@@ -952,14 +950,11 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
 }
 
 if (type == LIBXL_DOMAIN_TYPE_HVM) {
-uint32_t dm_domid = libxl_get_stubdom_id(ctx, domid);
-
-path = libxl__device_model_xs_path(gc, dm_domid, domid, /state);
-state = libxl__xs_read(gc, XBT_NULL, path);
-if (state != NULL  !strcmp(state, paused)) {
-libxl__qemu_traditional_cmd(gc, domid, continue);
-libxl__wait_for_device_model_deprecated(gc, domid, running,
- NULL, NULL, NULL);
+rc = libxl__domain_resume_device_model(gc, domid);
+if (rc  0) {
+LIBXL__LOG(ctx, LIBXL__LOG_ERROR, failed to unpause device model 
+   for domain %u:%d, domid, rc);
+goto out;
 }
 }
 ret = xc_domain_unpause(ctx-xch, domid);
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 --for 4.6 COLOPre 17/25] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty()

2015-07-15 Thread Yang Hongyang

Secondary vm is running in colo mode, we need to send
secondary vm's dirty page information to master at checkpoint,
so we have to enable qemu logdirty on secondary.

libxl__domain_suspend_common_switch_qemu_logdirty() is to enable
qemu logdirty. But it uses domain_save_state, and calls
libxl__xc_domain_saverestore_async_callback_done()
before exits. This can not be used for secondary vm.

Update libxl__domain_suspend_common_switch_qemu_logdirty() to
introduce a new API libxl__domain_common_switch_qemu_logdirty().
This API only uses libxl__logdirty_switch, and calls
lds-callback before exits.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
CC: Andrew Cooper andrew.coop...@citrix.com
Acked-by: Ian Campbell ian.campb...@citrix.com
---
 tools/libxl/libxl_dom_save.c | 93 
 tools/libxl/libxl_internal.h |  8 
 2 files changed, 59 insertions(+), 42 deletions(-)

diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index 0926b71..ba7fc42 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -59,7 +59,7 @@ static void switch_logdirty_timeout(libxl__egc *egc, 
libxl__ev_time *ev,
 static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*,
 const char *watch_path, const char *event_path);
 static void switch_logdirty_done(libxl__egc *egc,
- libxl__domain_save_state *dss, int rc);
+ libxl__logdirty_switch *lds, int rc);
 
 static void logdirty_init(libxl__logdirty_switch *lds)
 {
@@ -69,13 +69,10 @@ static void logdirty_init(libxl__logdirty_switch *lds)
 }
 
 static void domain_suspend_switch_qemu_xen_traditional_logdirty
-   (int domid, unsigned enable,
-libxl__save_helper_state *shs)
+   (libxl__egc *egc, int domid, unsigned enable,
+libxl__logdirty_switch *lds)
 {
-libxl__egc *egc = shs-egc;
-libxl__domain_save_state *dss = shs-caller_state;
-libxl__logdirty_switch *lds = dss-logdirty;
-STATE_AO_GC(dss-ao);
+STATE_AO_GC(lds-ao);
 int rc;
 xs_transaction_t t = 0;
 const char *got;
@@ -137,26 +134,34 @@ static void 
domain_suspend_switch_qemu_xen_traditional_logdirty
  out:
 LOG(ERROR,logdirty switch failed (rc=%d), abandoning suspend,rc);
 libxl__xs_transaction_abort(gc, t);
-switch_logdirty_done(egc,dss,rc);
+switch_logdirty_done(egc,lds,rc);
 }
 
 static void domain_suspend_switch_qemu_xen_logdirty
-   (int domid, unsigned enable,
-libxl__save_helper_state *shs)
+   (libxl__egc *egc, int domid, unsigned enable,
+libxl__logdirty_switch *lds)
 {
-libxl__egc *egc = shs-egc;
-libxl__domain_save_state *dss = shs-caller_state;
-STATE_AO_GC(dss-ao);
+STATE_AO_GC(lds-ao);
 int rc;
 
 rc = libxl__qmp_set_global_dirty_log(gc, domid, enable);
-if (!rc) {
-libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
-} else {
+if (rc)
 LOG(ERROR,logdirty switch failed (rc=%d), abandoning suspend,rc);
+
+lds-callback(egc, lds, rc);
+}
+
+static void domain_suspend_switch_qemu_logdirty_done
+(libxl__egc *egc, libxl__logdirty_switch *lds, int rc)
+{
+libxl__domain_save_state *dss = CONTAINER_OF(lds, *dss, logdirty);
+
+if (rc) {
 dss-rc = rc;
-libxl__xc_domain_saverestore_async_callback_done(egc, shs, -1);
-}
+libxl__xc_domain_saverestore_async_callback_done(egc,
+ dss-sws.shs, -1);
+} else
+libxl__xc_domain_saverestore_async_callback_done(egc, dss-sws.shs, 
0);
 }
 
 void libxl__domain_suspend_common_switch_qemu_logdirty
@@ -165,39 +170,49 @@ void libxl__domain_suspend_common_switch_qemu_logdirty
 libxl__save_helper_state *shs = user;
 libxl__egc *egc = shs-egc;
 libxl__domain_save_state *dss = shs-caller_state;
-STATE_AO_GC(dss-ao);
+
+/* convenience aliases */
+libxl__logdirty_switch *const lds = dss-logdirty;
+
+lds-callback = domain_suspend_switch_qemu_logdirty_done;
+libxl__domain_common_switch_qemu_logdirty(egc, domid, enable, lds);
+}
+
+void libxl__domain_common_switch_qemu_logdirty(libxl__egc *egc,
+   int domid, unsigned enable,
+   libxl__logdirty_switch *lds)
+{
+STATE_AO_GC(lds-ao);
 
 switch (libxl__device_model_version_running(gc, domid)) {
 case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
-domain_suspend_switch_qemu_xen_traditional_logdirty(domid, enable, 
shs);
+domain_suspend_switch_qemu_xen_traditional_logdirty(egc, domid, enable,
+

Re: [Xen-devel] [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts

2015-07-15 Thread Jan Beulich

 On 15.07.15 at 04:40, feng...@intel.com wrote:

 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Friday, July 10, 2015 9:08 PM
 To: Wu, Feng
 Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
 Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org 
 Subject: Re: [v3 06/15] vmx: Extend struct pi_desc to support VT-d
 Posted-Interrupts

  On 24.06.15 at 07:18, feng...@intel.com wrote:
  @@ -81,8 +81,19 @@ struct vmx_domain {

   struct pi_desc {
   DECLARE_BITMAP(pir, NR_VECTORS);
  -u32 control;
  -u32 rsvd[7];
  +union {
  +struct
  +{
  +u16 on : 1,  /* bit 256 - Outstanding Notification */
  +sn : 1,  /* bit 257 - Suppress Notification */
  +rsvd_1 : 14; /* bit 271:258 - Reserved */
  +u8  nv;  /* bit 279:272 - Notification Vector */
  +u8  rsvd_2;  /* bit 287:280 - Reserved */
  +u32 ndst;/* bit 319:288 - Notification Destination */
  +};
  +u64 control;
  +};

 So current code, afaics, uses e.g. test_and_set_bit() to set ON.
 By also declaring this as a bitfield you're opening the structure for
 non-atomic accesses. If that's correct, why is other code not
 being changed to _only_ use the bitfield mechanism (likely also
 eliminating the need for it being a union with the now 64-bit
 control? If atomic accesses are required, then I'd strongly
 suggest against making this a bit field.

 And in no event can I see why ndst needs to be union-ized
 with control if it doesn't need to be updated atomically with
 e.g. nv.

 When the vCPU is to be blocked, we need to atomically update
 the nv and ndst, then the wakeup notification event can be
 delivered to the right destination.

Okay. Your reply made me go through the patches again to check
where updates to nv/ndst happen - what's the reason they aren't
being updated as a pair in patch 14's RUNSTATE_running handling
(or in the replacement draft's vmx_ctxt_switch_to() adjustment)?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun


On 2015/7/15 16:34, Jan Beulich wrote:

On 15.07.15 at 06:27, tiejun.c...@intel.com wrote:

Furthermore, could we have this solution as follows?


Yet more special casing code you want to add. I said no to this
model, and unless you can address the issue _without_ adding
a lot of special casing code, the answer will remain no (subject


What about this?

@@ -301,6 +301,19 @@ void pci_setup(void)
 pci_mem_start = 1;
 }

+for ( i = 0; i  memory_map.nr_map ; i++ )
+{
+uint64_t reserved_start, reserved_size;
+reserved_start = memory_map.map[i].addr;
+reserved_size = memory_map.map[i].size;
+if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
+   reserved_start, reserved_size) )
+{
+printf(Reserved device memory conflicts current PCI 
memory.\n);

+BUG();
+}
+}
+
 if ( mmio_total  (pci_mem_end - pci_mem_start) )
 {
 printf(Low MMIO hole not large enough for all devices,

This is very similar to our current policy to 
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6 
since actually this is also another rare possibility in real world. Even 
I can do this as well when we handle that conflict with 
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6.


Note its not necessary to concern high memory since we already handle 
this case in the hv code previously, and its also not affected by those 
relocated memory later since our previous policy can make sure RAM isn't 
overlapping with RDM.


Thanks
Tiejun


to co-maintainers overriding me).

Jan





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v8 --for 4.6 COLO 25/25] cmdline switches and config vars to control colo-proxy

2015-07-15 Thread Yang Hongyang

Add cmdline switches to 'xl migrate-receive' command to specify
a domain-specific hotplug script to setup COLO proxy.

Add a new config var 'colo.default.agentscript' to xl.conf, that
allows the user to override the default global script used to
setup COLO proxy.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 docs/man/xl.conf.pod.5  |  6 ++
 docs/man/xl.pod.1   |  1 -
 tools/libxl/libxl.c |  6 ++
 tools/libxl/libxl_create.c  | 14 +++--
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl.c|  3 +++
 tools/libxl/xl.h|  1 +
 tools/libxl/xl_cmdimpl.c| 50 ++---
 8 files changed, 67 insertions(+), 15 deletions(-)

diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5
index 8ae19bb..8f7fd28 100644
--- a/docs/man/xl.conf.pod.5
+++ b/docs/man/xl.conf.pod.5
@@ -111,6 +111,12 @@ Configures the default script used by Remus to setup 
network buffering.
 
 Default: C/etc/xen/scripts/remus-netbuf-setup
 
+=item Bcolo.default.proxyscript=PATH
+
+Configures the default script used by COLO to setup colo-proxy.
+
+Default: C/etc/xen/scripts/colo-proxy-setup
+
 =item Boutput_format=json|sxp
 
 Configures the default output format used by xl when printing machine
diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 1effce7..a7ac32f 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -454,7 +454,6 @@ N.B: Remus support in xl is still in experimental 
(proof-of-concept) phase.
  Disk replication support is limited to DRBD disks.
 
  COLO support in xl is still in experimental (proof-of-concept) phase.
- There is no support for network at the moment.
 
 BOPTIONS
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index c6cc5aa..75372ea 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -3305,6 +3305,11 @@ void libxl__device_nic_add(libxl__egc *egc, uint32_t 
domid,
 flexarray_append(back, nic-ifname);
 }
 
+if (nic-forwarddev) {
+flexarray_append(back, forwarddev);
+flexarray_append(back, nic-forwarddev);
+}
+
 flexarray_append(back, mac);
 flexarray_append(back,libxl__sprintf(gc,
 LIBXL_MAC_FMT, LIBXL_MAC_BYTES(nic-mac)));
@@ -3428,6 +3433,7 @@ static int libxl__device_nic_from_xs_be(libxl__gc *gc,
 nic-ip = READ_BACKEND(NOGC, ip);
 nic-bridge = READ_BACKEND(NOGC, bridge);
 nic-script = READ_BACKEND(NOGC, script);
+nic-forwarddev = READ_BACKEND(NOGC, forwarddev);
 
 /* vif_ioemu nics use the same xenstore entries as vif interfaces */
 tmp = READ_BACKEND(gc, type);
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index d99d5ef..7de2e89 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1089,6 +1089,11 @@ static void domcreate_bootloader_done(libxl__egc *egc,
 crs-recv_fd = restore_fd;
 crs-hvm = (info-type == LIBXL_DOMAIN_TYPE_HVM);
 crs-callback = libxl__colo_restore_setup_done;
+if (dcs-colo_proxy_script)
+crs-colo_proxy_script = libxl__strdup(gc, dcs-colo_proxy_script);
+else
+crs-colo_proxy_script = GCSPRINTF(%s/colo-proxy-setup,
+   libxl__xen_script_dir_path());
 libxl__colo_restore_setup(egc, crs);
 } else
 libxl__stream_read_start(egc, dcs-srs);
@@ -1612,6 +1617,7 @@ static void domain_create_cb(libxl__egc *egc,
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
 uint32_t *domid, int restore_fd, int send_fd,
 const libxl_domain_restore_params *params,
+const char *colo_proxy_script,
 const libxl_asyncop_how *ao_how,
 const libxl_asyncprogress_how *aop_console_how)
 {
@@ -1628,6 +1634,7 @@ static int do_domain_create(libxl_ctx *ctx, 
libxl_domain_config *d_config,
 if (restore_fd  -1)
 cdcs-dcs.restore_params = *params;
 cdcs-dcs.callback = domain_create_cb;
+cdcs-dcs.colo_proxy_script = colo_proxy_script;
 libxl__ao_progress_gethow(cdcs-dcs.aop_console_how, aop_console_how);
 cdcs-domid_out = domid;
 
@@ -1670,7 +1677,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, 
libxl_domain_config *d_config,
 const libxl_asyncprogress_how *aop_console_how)
 {
 unset_disk_colo_restore(d_config);
-return do_domain_create(ctx, d_config, domid, -1, -1, NULL,
+return do_domain_create(ctx, d_config, domid, -1, -1, NULL, NULL,
 ao_how, aop_console_how);
 }
 
@@ -1680,14 +1687,17 @@ int libxl_domain_create_restore(libxl_ctx *ctx, 
libxl_domain_config *d_config,
 const libxl_asyncop_how *ao_how,
 const libxl_asyncprogress_how *aop_console_how)
 {

[Xen-devel] [PATCH v4 --for 4.6 COLOPre 05/25] libxl/remus: introduce libxl__remus_setup

2015-07-15 Thread Yang Hongyang

Refactoring Remus setup by introducing libxl__remus_setup API.
All Remus setup work are done in this function.

Also remove the libxl__ prefix for static functions.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 tools/libxl/libxl.c | 46 ++
 1 file changed, 30 insertions(+), 16 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 69a6937..acb5639 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -795,10 +795,12 @@ out:
 return ptr;
 }
 
-static void libxl__remus_setup_done(libxl__egc *egc,
-libxl__remus_devices_state *rds, int rc);
-static void libxl__remus_setup_failed(libxl__egc *egc,
-  libxl__remus_devices_state *rds, int rc);
+static void libxl__remus_setup(libxl__egc *egc,
+   libxl__domain_suspend_state *dss);
+static void remus_setup_done(libxl__egc *egc,
+ libxl__remus_devices_state *rds, int rc);
+static void remus_setup_failed(libxl__egc *egc,
+   libxl__remus_devices_state *rds, int rc);
 static void remus_failover_cb(libxl__egc *egc,
   libxl__domain_suspend_state *dss, int rc);
 
@@ -847,13 +849,26 @@ int libxl_domain_remus_start(libxl_ctx *ctx, 
libxl_domain_remus_info *info,
 
 assert(info);
 
+/* Point of no return */
+libxl__remus_setup(egc, dss);
+return AO_INPROGRESS;
+
+ out:
+return AO_CREATE_FAIL(rc);
+}
+
+static void libxl__remus_setup(libxl__egc *egc,
+   libxl__domain_suspend_state *dss)
+{
 /* Convenience aliases */
 libxl__remus_devices_state *const rds = dss-rds;
+const libxl_domain_remus_info *const info = dss-remus;
+
+STATE_AO_GC(dss-ao);
 
 if (libxl_defbool_val(info-netbuf)) {
 if (!libxl__netbuffer_enabled(gc)) {
 LOG(ERROR, Remus: No support for network buffering);
-rc = ERROR_FAIL;
 goto out;
 }
 rds-device_kind_flags |= (1  LIBXL__DEVICE_KIND_VIF);
@@ -863,19 +878,18 @@ int libxl_domain_remus_start(libxl_ctx *ctx, 
libxl_domain_remus_info *info,
 rds-device_kind_flags |= (1  LIBXL__DEVICE_KIND_VBD);
 
 rds-ao = ao;
-rds-domid = domid;
-rds-callback = libxl__remus_setup_done;
+rds-domid = dss-domid;
+rds-callback = remus_setup_done;
 
-/* Point of no return */
 libxl__remus_devices_setup(egc, rds);
-return AO_INPROGRESS;
+return;
 
- out:
-return AO_CREATE_FAIL(rc);
+out:
+dss-callback(egc, dss, ERROR_FAIL);
 }
 
-static void libxl__remus_setup_done(libxl__egc *egc,
-libxl__remus_devices_state *rds, int rc)
+static void remus_setup_done(libxl__egc *egc,
+ libxl__remus_devices_state *rds, int rc)
 {
 libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
 STATE_AO_GC(dss-ao);
@@ -887,12 +901,12 @@ static void libxl__remus_setup_done(libxl__egc *egc,
 
 LOG(ERROR, Remus: failed to setup device for guest with domid %u, rc %d,
 dss-domid, rc);
-rds-callback = libxl__remus_setup_failed;
+rds-callback = remus_setup_failed;
 libxl__remus_devices_teardown(egc, rds);
 }
 
-static void libxl__remus_setup_failed(libxl__egc *egc,
-  libxl__remus_devices_state *rds, int rc)
+static void remus_setup_failed(libxl__egc *egc,
+   libxl__remus_devices_state *rds, int rc)
 {
 libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
 STATE_AO_GC(dss-ao);
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 --for 4.6 COLOPre 02/25] tools/libxl: move domain suspend code into libxl_dom_suspend.c

2015-07-15 Thread Yang Hongyang

Move domain suspend code into a separate file libxl_dom_suspend.c.
Add an API libxl__domain_suspend() which wraps the static
function domain_suspend_callback_common() for internal use.
Export the existing API libxl__domain_suspend_callback() used by
libxc to suspend the guest during migration.

Note that the newly added file libxl_dom_suspend.c is used for
suspend/resume code.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Ian Campbell ian.campb...@citrix.com
---
 tools/libxl/Makefile|   3 +-
 tools/libxl/libxl_dom.c | 342 +---
 tools/libxl/libxl_dom_suspend.c | 380 
 tools/libxl/libxl_internal.h|   6 +
 4 files changed, 389 insertions(+), 342 deletions(-)
 create mode 100644 tools/libxl/libxl_dom_suspend.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 0150ec7..4a5957e 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -102,7 +102,8 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o 
libxl_pci.o \
libxl_json.o libxl_aoutils.o libxl_numa.o libxl_vnuma.o 
\
libxl_stream_read.o libxl_stream_write.o \
libxl_save_callout.o _libxl_save_msgs_callout.o \
-   libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y)
+   libxl_qmp.o libxl_event.o libxl_fork.o \
+   libxl_dom_suspend.o $(LIBXL_OBJS-y)
 LIBXL_OBJS += libxl_genid.o
 LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 3bbec99..e21e110 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1157,8 +1157,6 @@ static void stream_done(libxl__egc *egc,
 libxl__stream_write_state *sws, int rc);
 static void domain_save_done(libxl__egc *egc,
  libxl__domain_suspend_state *dss, int rc);
-static void domain_suspend_callback_common_done(libxl__egc *egc,
-libxl__domain_suspend_state *dss, int rc);
 
 /*- complicated callback, called by xc_domain_save -*/
 
@@ -1386,35 +1384,6 @@ static void switch_logdirty_done(libxl__egc *egc,
 
 /*- callbacks, called by xc_domain_save -*/
 
-int libxl__domain_suspend_device_model(libxl__gc *gc,
-   libxl__domain_suspend_state *dss)
-{
-int ret = 0;
-uint32_t const domid = dss-domid;
-const char *const filename = dss-dm_savefile;
-
-switch (libxl__device_model_version_running(gc, domid)) {
-case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: {
-LOG(DEBUG, Saving device model state to %s, filename);
-libxl__qemu_traditional_cmd(gc, domid, save);
-libxl__wait_for_device_model_deprecated(gc, domid, paused, NULL, 
NULL, NULL);
-break;
-}
-case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
-if (libxl__qmp_stop(gc, domid))
-return ERROR_FAIL;
-/* Save DM state into filename */
-ret = libxl__qmp_save(gc, domid, filename);
-if (ret)
-unlink(filename);
-break;
-default:
-return ERROR_INVAL;
-}
-
-return ret;
-}
-
 int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid)
 {
 
@@ -1435,298 +1404,6 @@ int libxl__domain_resume_device_model(libxl__gc *gc, 
uint32_t domid)
 return 0;
 }
 
-static void domain_suspend_common_wait_guest(libxl__egc *egc,
- libxl__domain_suspend_state *dss);
-static void domain_suspend_common_guest_suspended(libxl__egc *egc,
- libxl__domain_suspend_state *dss);
-
-static void domain_suspend_common_pvcontrol_suspending(libxl__egc *egc,
-  libxl__xswait_state *xswa, int rc, const char *state);
-static void domain_suspend_common_wait_guest_evtchn(libxl__egc *egc,
-libxl__ev_evtchn *evev);
-static void suspend_common_wait_guest_watch(libxl__egc *egc,
-  libxl__ev_xswatch *xsw, const char *watch_path, const char *event_path);
-static void suspend_common_wait_guest_check(libxl__egc *egc,
-libxl__domain_suspend_state *dss);
-static void suspend_common_wait_guest_timeout(libxl__egc *egc,
-  libxl__ev_time *ev, const struct timeval *requested_abs, int rc);
-
-static void domain_suspend_common_done(libxl__egc *egc,
-   libxl__domain_suspend_state *dss,
-   int rc);
-
-static bool domain_suspend_pvcontrol_acked(const char *state) {
-/* any value other than suspend, including ENOENT (i.e. !state), is OK */
-if (!state) return 1;
-return strcmp(state,suspend);
-}
-
-/* calls dss-callback_common_done when done */
-static void domain_suspend_callback_common(libxl__egc *egc,
-

[Xen-devel] [PATCH v4 --for 4.6 COLOPre 03/25] tools/libxl: move domain resume code into libxl_dom_suspend.c

2015-07-15 Thread Yang Hongyang

move domain resume code into libxl_dom_suspend.c.
pure code move.

libxl__domain_resume_device_model() will be used later by COLO,
so we are not making this func static.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Ian Campbell ian.campb...@citrix.com
---
 tools/libxl/libxl.c | 33 -
 tools/libxl/libxl_dom.c | 20 ---
 tools/libxl/libxl_dom_suspend.c | 55 +
 3 files changed, 55 insertions(+), 53 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index fa42c1c..69a6937 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -513,39 +513,6 @@ int libxl_domain_rename(libxl_ctx *ctx, uint32_t domid,
 return rc;
 }
 
-int libxl__domain_resume(libxl__gc *gc, uint32_t domid, int suspend_cancel)
-{
-int rc = 0;
-
-if (xc_domain_resume(CTX-xch, domid, suspend_cancel)) {
-LOGE(ERROR, xc_domain_resume failed for domain %u, domid);
-rc = ERROR_FAIL;
-goto out;
-}
-
-libxl_domain_type type = libxl__domain_type(gc, domid);
-if (type == LIBXL_DOMAIN_TYPE_INVALID) {
-rc = ERROR_FAIL;
-goto out;
-}
-
-if (type == LIBXL_DOMAIN_TYPE_HVM) {
-rc = libxl__domain_resume_device_model(gc, domid);
-if (rc) {
-LOG(ERROR, failed to resume device model for domain %u:%d,
-domid, rc);
-goto out;
-}
-}
-
-if (!xs_resume_domain(CTX-xsh, domid)) {
-LOGE(ERROR, xs_resume_domain failed for domain %u, domid);
-rc = ERROR_FAIL;
-}
-out:
-return rc;
-}
-
 int libxl_domain_resume(libxl_ctx *ctx, uint32_t domid, int suspend_cancel,
 const libxl_asyncop_how *ao_how)
 {
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index e21e110..0788309 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1384,26 +1384,6 @@ static void switch_logdirty_done(libxl__egc *egc,
 
 /*- callbacks, called by xc_domain_save -*/
 
-int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid)
-{
-
-switch (libxl__device_model_version_running(gc, domid)) {
-case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: {
-libxl__qemu_traditional_cmd(gc, domid, continue);
-libxl__wait_for_device_model_deprecated(gc, domid, running, NULL, 
NULL, NULL);
-break;
-}
-case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
-if (libxl__qmp_resume(gc, domid))
-return ERROR_FAIL;
-break;
-default:
-return ERROR_INVAL;
-}
-
-return 0;
-}
-
 static inline char *physmap_path(libxl__gc *gc, uint32_t dm_domid,
  uint32_t domid,
  char *phys_offset, char *node)
diff --git a/tools/libxl/libxl_dom_suspend.c b/tools/libxl/libxl_dom_suspend.c
index 5146402..a90800d 100644
--- a/tools/libxl/libxl_dom_suspend.c
+++ b/tools/libxl/libxl_dom_suspend.c
@@ -371,6 +371,61 @@ static void domain_suspend_callback_common_done(libxl__egc 
*egc,
 libxl__xc_domain_saverestore_async_callback_done(egc, dss-sws.shs, !rc);
 }
 
+/*=== Domain resume */
+
+int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid)
+{
+
+switch (libxl__device_model_version_running(gc, domid)) {
+case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: {
+libxl__qemu_traditional_cmd(gc, domid, continue);
+libxl__wait_for_device_model_deprecated(gc, domid, running, NULL, 
NULL, NULL);
+break;
+}
+case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
+if (libxl__qmp_resume(gc, domid))
+return ERROR_FAIL;
+break;
+default:
+return ERROR_INVAL;
+}
+
+return 0;
+}
+
+int libxl__domain_resume(libxl__gc *gc, uint32_t domid, int suspend_cancel)
+{
+int rc = 0;
+
+if (xc_domain_resume(CTX-xch, domid, suspend_cancel)) {
+LOGE(ERROR, xc_domain_resume failed for domain %u, domid);
+rc = ERROR_FAIL;
+goto out;
+}
+
+libxl_domain_type type = libxl__domain_type(gc, domid);
+if (type == LIBXL_DOMAIN_TYPE_INVALID) {
+rc = ERROR_FAIL;
+goto out;
+}
+
+if (type == LIBXL_DOMAIN_TYPE_HVM) {
+rc = libxl__domain_resume_device_model(gc, domid);
+if (rc) {
+LOG(ERROR, failed to resume device model for domain %u:%d,
+domid, rc);
+goto out;
+}
+}
+
+if (!xs_resume_domain(CTX-xsh, domid)) {
+LOGE(ERROR, xs_resume_domain failed for domain %u, domid);
+rc = ERROR_FAIL;
+}
+out:
+return rc;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 --for 4.6 COLOPre 00/25] Prerequisite patches for COLO

2015-07-15 Thread Yang Hongyang

This patchset is Prerequisite for COLO feature. Refer to:
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping

This patchse is based on Andrew Cooper's Libxl migration v4.1:
  
http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/libxl-migv2-v4.1

In this version, I moved some of the COLO specific patches down to the COLO
main series, so most patches of this series are refactoring and can be applied
first.

I've done some simple test. Both Remus and normal migration work after apply
this patchset. The patch to fix Remus on migration v2 will be sent later as
a seperate patch.

You can also get the patchset from:
  https://github.com/macrosheep/xen/tree/colo-v8

v3-v4:
 - Rebased to the latest migration v2 branch
 - Addressed comments from last round

v2-v3:
 - Merge '[PATCH v2 0/6] Misc cleanups for libxl' into this patchset
   for easy review
 - Addressed review comments
 - Add back channel to libxc
 - Introduce should_checkpoint callback
 - Introduce DIRTY_BITMAP record on libxc side
 - Introduce COLO_CONTEXT record on libxl side
 - Ported to Libxl migration v2

v1-v2:
 - Rebased to [PATCH v2 0/6] Misc cleanups for libxl
 - Add a bugfix for the error handling of process_record


Wen Congyang (2):
  tools/libxc: support to resume uncooperative HVM guests
  tools/libxl: Add back channel to allow migration target send data back

Yang Hongyang (23):
  tools/libxl: rename libxl__domain_suspend to libxl__domain_save
A  tools/libxl: move domain suspend code into libxl_dom_suspend.c
A  tools/libxl: move domain resume code into libxl_dom_suspend.c
  tools/libxl: rename remus checkpoint callbacks
  libxl/remus: introduce libxl__remus_setup
  libxl/remus: introduce libxl__remus_teardown
  libxl/remus: init checkpoint_callback in Remus checkpoint callback
  tools/libxl: move remus code into libxl_remus.c
A  tools/libxl: move save/restore code into libxl_dom_save.c
  libxl/save: Refactor libxl__domain_suspend_state
  tools/libxl: introduce enum type libxl_checkpointed_stream
  migration/save: pass checkpointed_stream from libxl to libxc
  tools/libxl: introduce libxl__domain_restore_device_model to load qemu
state
  tools/libxl: check QEMU state before resume dm
  tools/libxl: Update libxl_domain_unpause() to support qemu-xen
A  tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty()
A  tools/libxl: export logdirty_init
  tools/libx{l,c}: add back channel to libxc
  tools/libxl: rename remus device to checkpoint device
A  tools/libxl: adjust the indentation
  tools/libxl: store remus_ops in checkpoint device state
  tools/libxl: move remus state into a seperate structure
  tools/libxl: seperate device init/cleanup from checkpoint device layer

 tools/libxc/include/xenguest.h|   13 +-
 tools/libxc/xc_domain_restore.c   |4 +-
 tools/libxc/xc_domain_save.c  |6 +-
 tools/libxc/xc_nomigrate.c|3 +-
 tools/libxc/xc_resume.c   |   22 +-
 tools/libxc/xc_sr_common.h|2 +-
 tools/libxc/xc_sr_restore.c   |2 +-
 tools/libxc/xc_sr_save.c  |5 +-
 tools/libxl/Makefile  |5 +-
 tools/libxl/libxl.c   |  119 +---
 tools/libxl/libxl.h   |   30 +-
 tools/libxl/libxl_checkpoint_device.c |  282 
 tools/libxl/libxl_create.c|   33 +-
 tools/libxl/libxl_dom.c   | 1243 -
 tools/libxl/libxl_dom_save.c  |  721 +++
 tools/libxl/libxl_dom_suspend.c   |  503 +
 tools/libxl/libxl_internal.h  |  246 ---
 tools/libxl/libxl_netbuffer.c |  117 ++--
 tools/libxl/libxl_nonetbuffer.c   |   10 +-
 tools/libxl/libxl_qmp.c   |   10 +
 tools/libxl/libxl_remus.c |  395 +++
 tools/libxl/libxl_remus_device.c  |  327 -
 tools/libxl/libxl_remus_disk_drbd.c   |   56 +-
 tools/libxl/libxl_save_callout.c  |   43 +-
 tools/libxl/libxl_save_helper.c   |9 +-
 tools/libxl/libxl_stream_write.c  |   14 +-
 tools/libxl/libxl_types.idl   |   10 +-
 tools/libxl/xl_cmdimpl.c  |   21 +-
 tools/ocaml/libs/xl/xenlight_stubs.c  |2 +-
 29 files changed, 2321 insertions(+), 1932 deletions(-)
 create mode 100644 tools/libxl/libxl_checkpoint_device.c
 create mode 100644 tools/libxl/libxl_dom_save.c
 create mode 100644 tools/libxl/libxl_dom_suspend.c
 create mode 100644 tools/libxl/libxl_remus.c
 delete mode 100644 tools/libxl/libxl_remus_device.c

-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 --for 4.6 COLOPre 06/25] libxl/remus: introduce libxl__remus_teardown

2015-07-15 Thread Yang Hongyang

introduce libxl__remus_teardown to teardown Remus devices.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 tools/libxl/libxl_dom.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 9c61fa7..77a917c 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1865,6 +1865,9 @@ static void save_device_model_datacopier_done(libxl__egc 
*egc,
 dss-save_dm_callback(egc, dss, our_rc);
 }
 
+static void libxl__remus_teardown(libxl__egc *egc,
+  libxl__domain_suspend_state *dss,
+  int rc);
 static void remus_teardown_done(libxl__egc *egc,
libxl__remus_devices_state *rds,
int rc);
@@ -1894,6 +1897,15 @@ static void domain_save_done(libxl__egc *egc,
  * from sending checkpoints. Teardown the network buffers and
  * release netlink resources.  This is an async op.
  */
+libxl__remus_teardown(egc, dss, rc);
+}
+
+static void libxl__remus_teardown(libxl__egc *egc,
+  libxl__domain_suspend_state *dss,
+  int rc)
+{
+EGC_GC;
+
 LOG(WARN, Remus: Domain suspend terminated with rc %d,
  teardown Remus devices..., rc);
 dss-rds.callback = remus_teardown_done;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 --for 4.6 COLOPre 04/25] tools/libxl: rename remus checkpoint callbacks

2015-07-15 Thread Yang Hongyang

There are 2 remus checkpoint callbacks(save/restore), currently, they
both called libxl__remus_domain_checkpoint_callback in diffrent
file, so it is ok. But in the following patch, we will move all of the
remus callback code into a seperate file, the name should be diffrent.
So rename them to:
  libxl__remus_domain_{save/restore}_checkpoint_callback

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 tools/libxl/libxl_create.c | 4 ++--
 tools/libxl/libxl_dom.c| 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 5b4d333..a32e3df 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -677,7 +677,7 @@ static int store_libxl_entry(libxl__gc *gc, uint32_t domid,
 static void remus_checkpoint_stream_done(
 libxl__egc *egc, libxl__stream_read_state *srs, int rc);
 
-static void libxl__remus_domain_checkpoint_callback(void *data)
+static void libxl__remus_domain_restore_checkpoint_callback(void *data)
 {
 libxl__save_helper_state *shs = data;
 libxl__domain_create_state *dcs = shs-caller_state;
@@ -989,7 +989,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
 }
 
 /* Restore */
-callbacks-checkpoint = libxl__remus_domain_checkpoint_callback;
+callbacks-checkpoint = libxl__remus_domain_restore_checkpoint_callback;
 
 rc = libxl__build_pre(gc, domid, d_config, state);
 if (rc)
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 0788309..9c61fa7 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1586,7 +1586,7 @@ static void remus_next_checkpoint(libxl__egc *egc, 
libxl__ev_time *ev,
   const struct timeval *requested_abs,
   int rc);
 
-static void libxl__remus_domain_checkpoint_callback(void *data)
+static void libxl__remus_domain_save_checkpoint_callback(void *data)
 {
 libxl__save_helper_state *shs = data;
 libxl__domain_suspend_state *dss = shs-caller_state;
@@ -1749,7 +1749,7 @@ void libxl__domain_save(libxl__egc *egc, 
libxl__domain_suspend_state *dss)
 if (r_info != NULL) {
 callbacks-suspend = libxl__remus_domain_suspend_callback;
 callbacks-postcopy = libxl__remus_domain_resume_callback;
-callbacks-checkpoint = libxl__remus_domain_checkpoint_callback;
+callbacks-checkpoint = libxl__remus_domain_save_checkpoint_callback;
 dss-sws.checkpoint_callback = remus_checkpoint_stream_written;
 } else
 callbacks-suspend = libxl__domain_suspend_callback;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 05/17] xen/arm: ITS: implement hw_irq_controller for LPIs

2015-07-15 Thread Julien Grall


Hi Vijay,

On 15/07/2015 09:16, Vijay Kilari wrote:

On Tue, Jul 14, 2015 at 2:48 AM, Julien Grall julien.gr...@citrix.com wrote:

Hi,

On 10/07/2015 09:42, vijay.kil...@gmail.com wrote:


diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index c41e82e..4f3801b 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
+static inline hw_irq_controller *get_host_hw_irq_controller(unsigned int
irq)
+{
+if ( is_lpi(irq) )
+return its_hw_ops-lpi_host_irq_type;
+else
+return gic_hw_ops-gic_host_irq_type;
+}



This is not what I asked on v3 [1]. The ITS hardware controller shouldn't be
exposed to the common GIC. We have to keep a clean and comprehensible
interface.

What I asked is to replace the gic_host_irq_type variable by a new callback
which will return the correct hw_irq_controller.

For GICv2, it will return the same hw_irq_controller as today. For GICv3, it
will check is the IRQ is an LPI and return the correct controller.

FWIW, it was ack by Ian [2].


  If we don't want to expose any ITS interfaces to common gic code, then we
have to register callbacks to GICv3 driver.


Why? In fine, the ITS is an integral part of the GICv3, so you could 
directly call the ITS code within the GICv3 without any callback.


Actually, you already do that in some place. So I don't see why you 
can't do it there...



@@ -149,7 +173,7 @@ int gic_route_irq_to_guest(struct domain *d, unsigned
int virq,
test_bit(GIC_IRQ_GUEST_ENABLED, p-status) )
   goto out;

-desc-handler = gic_hw_ops-gic_guest_irq_type;
+desc-handler = get_guest_hw_irq_controller(desc-irq);
   set_bit(_IRQ_GUEST, desc-status);

   gic_set_irq_properties(desc, cpumask_of(v_target-processor),
priority);
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index 2dd43ee..ba8528a 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -35,7 +35,13 @@ static DEFINE_SPINLOCK(local_irqs_type_lock);
   struct irq_guest
   {
   struct domain *d;
-unsigned int virq;
+union
+{
+/* virq refer to virtual irq in case of spi */
+unsigned int virq;
+/* virq refer to event ID in case of lpi */
+unsigned int vid;



Why can't we store the event ID in the irq_guest? As said on v3, this is not


Are you referring to irq_desc in above statement?


Yes sorry.




domain specific [3]. Furthermore, you add support to route LPI in Xen (see
gic_route_irq_to_xen) where you will clearly need the event ID.


   void irq_set_affinity(struct irq_desc *desc, const cpumask_t *cpu_mask)
   {
   if ( desc != NULL )
diff --git a/xen/include/asm-arm/gic-its.h b/xen/include/asm-arm/gic-its.h
index b5e09bd..e8d244f 100644
--- a/xen/include/asm-arm/gic-its.h
+++ b/xen/include/asm-arm/gic-its.h
@@ -161,6 +161,10 @@ typedef union {
* The ITS view of a device.
*/
   struct its_device {
+/* Physical ITS */
+struct its_node *its;
+/* Number of Physical LPIs assigned */
+int nr_lpis;



Why didn't you add this field directly in the patch #4? It would be more
logical.


   /*
* ITS registers, offsets from ITS_base
diff --git a/xen/include/asm-arm/irq.h b/xen/include/asm-arm/irq.h
index 34b492b..55e219f 100644
--- a/xen/include/asm-arm/irq.h
+++ b/xen/include/asm-arm/irq.h
@@ -17,6 +17,8 @@ struct arch_pirq
   struct arch_irq_desc {
   int eoi_cpu;
   unsigned int type;
+struct its_device *dev;
+u16 col_id;



It has been suggested by Ian to move col_id in the its_device in the
previous version [4]. Any reason to not doing it?


In round robin fashion each plpi is attached to col_id. So storing
in its_device is not possible. In linux latest col_id is stored in its_device
structure for which set_affinity is called.


You could do round robin on its_device... It would be exactly the same 
and save 2 byte if not more with the alignment per irq_desc.


Don't forget that 1 byte in the irq_desc means 1KB added in Xen binary. 
These bytes saved could be used to store the event ID.


That remind me, these 2 new fields should only be defined when GICv3 is 
used (#ifdef HAS_GICV3).


I'm would be fine if you skip the former for 4.6, but the latter is 
mandatory. ITS code shouldn't be compiled on arm32.


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts

2015-07-15 Thread Wu, Feng

 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, July 15, 2015 4:20 PM
 To: Wu, Feng
 Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
 Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
 Subject: RE: [v3 06/15] vmx: Extend struct pi_desc to support VT-d
 Posted-Interrupts

  On 15.07.15 at 04:40, feng...@intel.com wrote:

  -Original Message-
  From: Jan Beulich [mailto:jbeul...@suse.com]
  Sent: Friday, July 10, 2015 9:08 PM
  To: Wu, Feng
  Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
  Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
  Subject: Re: [v3 06/15] vmx: Extend struct pi_desc to support VT-d
  Posted-Interrupts

   On 24.06.15 at 07:18, feng...@intel.com wrote:
   @@ -81,8 +81,19 @@ struct vmx_domain {

struct pi_desc {
DECLARE_BITMAP(pir, NR_VECTORS);
   -u32 control;
   -u32 rsvd[7];
   +union {
   +struct
   +{
   +u16 on : 1,  /* bit 256 - Outstanding Notification */
   +sn : 1,  /* bit 257 - Suppress Notification */
   +rsvd_1 : 14; /* bit 271:258 - Reserved */
   +u8  nv;  /* bit 279:272 - Notification Vector */
   +u8  rsvd_2;  /* bit 287:280 - Reserved */
   +u32 ndst;/* bit 319:288 - Notification Destination */
   +};
   +u64 control;
   +};

  So current code, afaics, uses e.g. test_and_set_bit() to set ON.
  By also declaring this as a bitfield you're opening the structure for
  non-atomic accesses. If that's correct, why is other code not
  being changed to _only_ use the bitfield mechanism (likely also
  eliminating the need for it being a union with the now 64-bit
  control? If atomic accesses are required, then I'd strongly
  suggest against making this a bit field.

  And in no event can I see why ndst needs to be union-ized
  with control if it doesn't need to be updated atomically with
  e.g. nv.

  When the vCPU is to be blocked, we need to atomically update
  the nv and ndst, then the wakeup notification event can be
  delivered to the right destination.

 Okay. Your reply made me go through the patches again to check
 where updates to nv/ndst happen - what's the reason they aren't
 being updated as a pair in patch 14's RUNSTATE_running handling
 (or in the replacement draft's vmx_ctxt_switch_to() adjustment)?

It is because, we can only enter running state from runnable, in which,
the NV field has been already changed back to ' posted_intr_vector ',
we don't need to do it here again.

Thanks,
Feng

 Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Jan Beulich

 On 15.07.15 at 06:27, tiejun.c...@intel.com wrote:
 Furthermore, could we have this solution as follows?

Yet more special casing code you want to add. I said no to this
model, and unless you can address the issue _without_ adding
a lot of special casing code, the answer will remain no (subject
to co-maintainers overriding me).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v8 01/11] xen: introduce SHUTDOWN_soft_reset shutdown reason

2015-07-15 Thread Vitaly Kuznetsov

Ian Jackson ian.jack...@eu.citrix.com writes:

 Konrad Rzeszutek Wilk writes (Re: [PATCH v8 01/11] xen: introduce 
 SHUTDOWN_soft_reset shutdown reason):
 On Tue, Jun 23, 2015 at 06:11:43PM +0200, Vitaly Kuznetsov wrote:
  This special type of shutdown is supposed to be used by PVHVM guests when
  they want to perform some sort of kexec/kdump.
 ...
  +#define SHUTDOWN_soft_reset 5  /* Domain did soft reset. Clean up and 
  resume.*/

 I would like more documentation about the semantics of this new
 request.  (The semantics of the existing shutdown requests are fairly
 well understood because they generally map to real hardware.)

Sure,

would you like me to expand the comment here or should I write something
somewhere else?

Thanks,

-- 
  Vitaly

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used

2015-07-15 Thread Jan Beulich

 On 15.07.15 at 10:38, feng...@intel.com wrote:

 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, July 15, 2015 4:25 PM
 To: Wu, Feng
 Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
 Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org 
 Subject: RE: [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used

  On 15.07.15 at 08:04, feng...@intel.com wrote:
  From: Jan Beulich [mailto:jbeul...@suse.com]
  Sent: Friday, July 10, 2015 10:02 PM
  I'm particularly worried by the call to acpi_find_matched_drhd_unit()
  - is it maybe worth storing the iommu pointer in struct msi_desc?

  I think it worth, Like Andrew also mentioned this point before. I tend
  to make this a independent work and do it later, since the 4.6 release
  is coming, I am still try my best to target it. Could you please share
  your concern here, performance? Or other things? Thanks!

 Interrupt latency in particular.

 This update IRTE operation is not so frequently. It only happens in few 
 times,
 especially in the initialization phase of the guest. And even the guest set
 the affinity, in the MSI/MSIx configuration doesn't change, QEMU will not
 ask Xen to update it.

When the guest sets the affinity, the MSI{,-X} configuration is
rather likely to change (at least for Linux guests).

 There are two possible scenarios:

 1) There are bits that can be updated behind the back of the code
 here. In that case you need to loop, and each iteration of the loop
 needs to re-fetch the current value (not doing so would make the
 loop infinite).

 Oh, yes, I think I made a mistake here, it is too hastily these days,
 Sorry for that! I think I need do it like this:

 do {
 new_ire = *p;

 /* Setup/Update interrupt remapping table entry. */
 setup_posted_irte(new_ire, pi_desc, gvec);

 old_ire = *(uint128_t *)p;
 ret = cmpxchg16b(p, old_ire, new_ire);
 } while ( memcmp(ret, old_ire, sizeof(old_ire)) );

So since you put this in a loop again, would you mind pointing out
which bits can get modified behind our back?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH V6 1/3] xen/mem_access: Support for memory-content hiding

2015-07-15 Thread Razvan Cojocaru

This patch adds support for memory-content hiding, by modifying the
value returned by emulated instructions that read certain memory
addresses that contain sensitive data. The patch only applies to
cases where VM_FLAG_ACCESS_EMULATE has been set to a vm_event
response.

Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com
Acked-by: Tamas K Lengyel tleng...@novetta.com

---
Changes since V5:
 - Renamed set_context_data()'s bytes parameter to size.
 - Inverted if() condition in set_context_data().
 - Removed memcpy() conditional from set_context_data().
 - Removed label from hvmemul_rep_outs_set_context().
 - Now bypassing hvm_copy_from_guest_phys() in hvmemul_rep_movs() if
   hvmemul_ctxt-set_context is true.
 - Fixed for_each_vcpu() coding style (blank before the opening
   parenthesis).
 - Added comments about the serialization status of
   vm_event_init_domain() and vm_event_cleanup_domain().
 - Setting v-arch.vm_event.emul_read_data to NULL after xfree() in
   vcpu_destroy() for safety.
---
 tools/tests/xen-access/xen-access.c |2 +-
 xen/arch/x86/domain.c   |3 +
 xen/arch/x86/hvm/emulate.c  |  117 ---
 xen/arch/x86/hvm/event.c|   50 +++
 xen/arch/x86/mm/p2m.c   |   92 +++
 xen/arch/x86/vm_event.c |   35 +++
 xen/common/vm_event.c   |8 +++
 xen/include/asm-arm/vm_event.h  |   13 
 xen/include/asm-x86/domain.h|1 +
 xen/include/asm-x86/hvm/emulate.h   |   10 ++-
 xen/include/asm-x86/vm_event.h  |4 ++
 xen/include/public/vm_event.h   |   35 ---
 12 files changed, 287 insertions(+), 83 deletions(-)

diff --git a/tools/tests/xen-access/xen-access.c 
b/tools/tests/xen-access/xen-access.c
index 12ab921..e6ca9ba 100644
--- a/tools/tests/xen-access/xen-access.c
+++ b/tools/tests/xen-access/xen-access.c
@@ -530,7 +530,7 @@ int main(int argc, char *argv[])
 break;
 case VM_EVENT_REASON_SOFTWARE_BREAKPOINT:
 printf(Breakpoint: rip=%016PRIx64, gfn=%PRIx64 (vcpu 
%d)\n,
-   req.regs.x86.rip,
+   req.data.regs.x86.rip,
req.u.software_breakpoint.gfn,
req.vcpu_id);
 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 34ecd7c..1ef9fad 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -511,6 +511,9 @@ int vcpu_initialise(struct vcpu *v)
 
 void vcpu_destroy(struct vcpu *v)
 {
+xfree(v-arch.vm_event.emul_read_data);
+v-arch.vm_event.emul_read_data = NULL;
+
 if ( is_pv_32bit_vcpu(v) )
 {
 free_compat_arg_xlat(v);
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 795321c..2766919 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -67,6 +67,24 @@ static int null_write(const struct hvm_io_handler *handler,
 return X86EMUL_OKAY;
 }
 
+static int set_context_data(void *buffer, unsigned int size)
+{
+struct vcpu *curr = current;
+
+if ( curr-arch.vm_event.emul_read_data )
+{
+unsigned int safe_size =
+min(size, curr-arch.vm_event.emul_read_data-size);
+
+memcpy(buffer, curr-arch.vm_event.emul_read_data-data, safe_size);
+memset(buffer + safe_size, 0, size - safe_size);
+}
+else
+return X86EMUL_UNHANDLEABLE;
+
+return X86EMUL_OKAY;
+}
+
 static const struct hvm_io_ops null_ops = {
 .read = null_read,
 .write = null_write
@@ -771,6 +789,12 @@ static int hvmemul_read(
 unsigned int bytes,
 struct x86_emulate_ctxt *ctxt)
 {
+struct hvm_emulate_ctxt *hvmemul_ctxt =
+container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
+
+if ( unlikely(hvmemul_ctxt-set_context) )
+return set_context_data(p_data, bytes);
+
 return __hvmemul_read(
 seg, offset, p_data, bytes, hvm_access_read,
 container_of(ctxt, struct hvm_emulate_ctxt, ctxt));
@@ -963,6 +987,17 @@ static int hvmemul_cmpxchg(
 unsigned int bytes,
 struct x86_emulate_ctxt *ctxt)
 {
+struct hvm_emulate_ctxt *hvmemul_ctxt =
+container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
+
+if ( unlikely(hvmemul_ctxt-set_context) )
+{
+int rc = set_context_data(p_new, bytes);
+
+if ( rc != X86EMUL_OKAY )
+return rc;
+}
+
 /* Fix this in case the guest is really relying on r-m-w atomicity. */
 return hvmemul_write(seg, offset, p_new, bytes, ctxt);
 }
@@ -1005,6 +1040,38 @@ static int hvmemul_rep_ins(
!!(ctxt-regs-eflags  X86_EFLAGS_DF), gpa);
 }
 
+static int hvmemul_rep_outs_set_context(
+enum x86_segment src_seg,
+unsigned long src_offset,
+uint16_t dst_port,
+unsigned int bytes_per_rep,
+unsigned long *reps,
+struct x86_emulate_ctxt *ctxt)
+{
+unsigned int bytes = *reps * bytes_per_rep;
+char *buf;
+int rc;
+
+

[Xen-devel] [PATCH V6 2/3] xen/vm_event: Support for guest-requested events

2015-07-15 Thread Razvan Cojocaru

Added support for a new class of vm_events: VM_EVENT_REASON_REQUEST,
sent via HVMOP_request_vm_event. The guest can request that a
generic vm_event (containing only the vm_event-filled guest registers
as information) be sent to userspace by setting up the correct
registers and doing a VMCALL. For example, for a 32-bit guest, this
means: EAX = 34 (hvmop), EBX = 24 (HVMOP_guest_request_vm_event),
ECX = 0 (NULL required for the hypercall parameter, reserved).

Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com
Acked-by: Tamas K Lengyel tleng...@novetta.com
Acked-by: Wei Liu wei.l...@citrix.com
Acked-by: Jan Beulich jbeul...@suse.com
Acked-by: George Dunlap george.dun...@eu.citrix.com

---
Changes since V5:
 - None.
---
 tools/libxc/include/xenctrl.h   |2 ++
 tools/libxc/xc_monitor.c|   15 +++
 xen/arch/x86/hvm/event.c|   16 
 xen/arch/x86/hvm/hvm.c  |8 +++-
 xen/arch/x86/monitor.c  |   19 ++-
 xen/include/asm-x86/domain.h|   16 +---
 xen/include/asm-x86/hvm/event.h |1 +
 xen/include/public/domctl.h |6 ++
 xen/include/public/hvm/hvm_op.h |2 ++
 xen/include/public/vm_event.h   |2 ++
 10 files changed, 78 insertions(+), 9 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 0bbae2a..ce9029c 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2378,6 +2378,8 @@ int xc_monitor_mov_to_msr(xc_interface *xch, domid_t 
domain_id, bool enable,
 int xc_monitor_singlestep(xc_interface *xch, domid_t domain_id, bool enable);
 int xc_monitor_software_breakpoint(xc_interface *xch, domid_t domain_id,
bool enable);
+int xc_monitor_guest_request(xc_interface *xch, domid_t domain_id,
+ bool enable, bool sync);
 
 /***
  * Memory sharing operations.
diff --git a/tools/libxc/xc_monitor.c b/tools/libxc/xc_monitor.c
index b64bce3..d5f87da 100644
--- a/tools/libxc/xc_monitor.c
+++ b/tools/libxc/xc_monitor.c
@@ -129,3 +129,18 @@ int xc_monitor_singlestep(xc_interface *xch, domid_t 
domain_id,
 
 return do_domctl(xch, domctl);
 }
+
+int xc_monitor_guest_request(xc_interface *xch, domid_t domain_id, bool enable,
+ bool sync)
+{
+DECLARE_DOMCTL;
+
+domctl.cmd = XEN_DOMCTL_monitor_op;
+domctl.domain = domain_id;
+domctl.u.monitor_op.op = enable ? XEN_DOMCTL_MONITOR_OP_ENABLE
+: XEN_DOMCTL_MONITOR_OP_DISABLE;
+domctl.u.monitor_op.event = XEN_DOMCTL_MONITOR_EVENT_GUEST_REQUEST;
+domctl.u.monitor_op.u.guest_request.sync = sync;
+
+return do_domctl(xch, domctl);
+}
diff --git a/xen/arch/x86/hvm/event.c b/xen/arch/x86/hvm/event.c
index 5341937..17638ea 100644
--- a/xen/arch/x86/hvm/event.c
+++ b/xen/arch/x86/hvm/event.c
@@ -126,6 +126,22 @@ void hvm_event_msr(unsigned int msr, uint64_t value)
 hvm_event_traps(1, req);
 }
 
+void hvm_event_guest_request(void)
+{
+struct vcpu *curr = current;
+struct arch_domain *currad = curr-domain-arch;
+
+if ( currad-monitor.guest_request_enabled )
+{
+vm_event_request_t req = {
+.reason = VM_EVENT_REASON_GUEST_REQUEST,
+.vcpu_id = curr-vcpu_id,
+};
+
+hvm_event_traps(currad-monitor.guest_request_sync, req);
+}
+}
+
 int hvm_event_int3(unsigned long gla)
 {
 int rc = 0;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 8a10111..22dbab1 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5999,7 +5999,6 @@ static int hvmop_get_param(
 #define HVMOP_op_mask 0xff
 
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
-
 {
 unsigned long start_iter, mask;
 long rc = 0;
@@ -6413,6 +6412,13 @@ long do_hvm_op(unsigned long op, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 break;
 }
 
+case HVMOP_guest_request_vm_event:
+if ( guest_handle_is_null(arg) )
+hvm_event_guest_request();
+else
+rc = -EINVAL;
+break;
+
 default:
 {
 gdprintk(XENLOG_DEBUG, Bad HVM op %ld.\n, op);
diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
index 0da855e..d35907b 100644
--- a/xen/arch/x86/monitor.c
+++ b/xen/arch/x86/monitor.c
@@ -55,7 +55,8 @@ static inline uint32_t get_capabilities(struct domain *d)
 
 capabilities = (1  XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG) |
(1  XEN_DOMCTL_MONITOR_EVENT_MOV_TO_MSR) |
-   (1  XEN_DOMCTL_MONITOR_EVENT_SOFTWARE_BREAKPOINT);
+   (1  XEN_DOMCTL_MONITOR_EVENT_SOFTWARE_BREAKPOINT) |
+   (1  XEN_DOMCTL_MONITOR_EVENT_GUEST_REQUEST);
 
 /* Since we know this is on VMX, we can just call the hvm func */
 if ( hvm_is_singlestep_supported() )
@@ -184,6 +185,22 @@ int monitor_domctl(struct domain *d, struct 
xen_domctl_monitor_op *mop)
 break;
 }

[Xen-devel] [PATCH V6 3/3] xen/vm_event: Deny register writes if refused by vm_event reply

2015-07-15 Thread Razvan Cojocaru

Deny register writes if a vm_client subscribed to mov_to_msr or
control register write events forbids them. Currently supported for
MSR, CR0, CR3 and CR4 events.

Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com
Acked-by: George Dunlap george.dun...@eu.citrix.com
Acked-by: Jan Beulich jbeul...@suse.com
Acked-by: Tamas K Lengyel tleng...@novetta.com

---
Changes since V5:
 - Now using vzalloc() / vfree() for d-arch.event_write_data,
   and setting it to NULL after releasing it in arch_domain_destroy()
   for safety.
---
 xen/arch/x86/domain.c |3 +
 xen/arch/x86/hvm/emulate.c|8 +--
 xen/arch/x86/hvm/event.c  |5 +-
 xen/arch/x86/hvm/hvm.c|  118 -
 xen/arch/x86/hvm/svm/nestedsvm.c  |   14 ++---
 xen/arch/x86/hvm/svm/svm.c|2 +-
 xen/arch/x86/hvm/vmx/vmx.c|   15 +++--
 xen/arch/x86/hvm/vmx/vvmx.c   |   18 +++---
 xen/arch/x86/vm_event.c   |   43 ++
 xen/common/vm_event.c |4 ++
 xen/include/asm-arm/vm_event.h|7 +++
 xen/include/asm-x86/domain.h  |   18 +-
 xen/include/asm-x86/hvm/event.h   |9 ++-
 xen/include/asm-x86/hvm/support.h |9 +--
 xen/include/asm-x86/vm_event.h|3 +
 xen/include/public/vm_event.h |5 ++
 16 files changed, 235 insertions(+), 46 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 1ef9fad..045f6ff 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -668,6 +668,9 @@ int arch_domain_create(struct domain *d, unsigned int 
domcr_flags,
 
 void arch_domain_destroy(struct domain *d)
 {
+vfree(d-arch.event_write_data);
+d-arch.event_write_data = NULL;
+
 if ( has_hvm_container_domain(d) )
 hvm_domain_destroy(d);
 
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 2766919..bc7514a 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -1428,14 +1428,14 @@ static int hvmemul_write_cr(
 switch ( reg )
 {
 case 0:
-return hvm_set_cr0(val);
+return hvm_set_cr0(val, 1);
 case 2:
 current-arch.hvm_vcpu.guest_cr[2] = val;
 return X86EMUL_OKAY;
 case 3:
-return hvm_set_cr3(val);
+return hvm_set_cr3(val, 1);
 case 4:
-return hvm_set_cr4(val);
+return hvm_set_cr4(val, 1);
 default:
 break;
 }
@@ -1456,7 +1456,7 @@ static int hvmemul_write_msr(
 uint64_t val,
 struct x86_emulate_ctxt *ctxt)
 {
-return hvm_msr_write_intercept(reg, val);
+return hvm_msr_write_intercept(reg, val, 1);
 }
 
 static int hvmemul_wbinvd(
diff --git a/xen/arch/x86/hvm/event.c b/xen/arch/x86/hvm/event.c
index 17638ea..042e583 100644
--- a/xen/arch/x86/hvm/event.c
+++ b/xen/arch/x86/hvm/event.c
@@ -90,7 +90,7 @@ static int hvm_event_traps(uint8_t sync, vm_event_request_t 
*req)
 return 1;
 }
 
-void hvm_event_cr(unsigned int index, unsigned long value, unsigned long old)
+bool_t hvm_event_cr(unsigned int index, unsigned long value, unsigned long old)
 {
 struct arch_domain *currad = current-domain-arch;
 unsigned int ctrlreg_bitmask = monitor_ctrlreg_bitmask(index);
@@ -109,7 +109,10 @@ void hvm_event_cr(unsigned int index, unsigned long value, 
unsigned long old)
 
 hvm_event_traps(currad-monitor.write_ctrlreg_sync  ctrlreg_bitmask,
 req);
+return 1;
 }
+
+return 0;
 }
 
 void hvm_event_msr(unsigned int msr, uint64_t value)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 22dbab1..c07e3ef 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -52,6 +52,7 @@
 #include asm/traps.h
 #include asm/mc146818rtc.h
 #include asm/mce.h
+#include asm/monitor.h
 #include asm/hvm/hvm.h
 #include asm/hvm/vpt.h
 #include asm/hvm/support.h
@@ -519,6 +520,35 @@ void hvm_do_resume(struct vcpu *v)
 break;
 }
 
+if ( unlikely(d-arch.event_write_data) )
+{
+struct monitor_write_data *w = d-arch.event_write_data[v-vcpu_id];
+
+if ( w-do_write.msr )
+{
+hvm_msr_write_intercept(w-msr, w-value, 0);
+w-do_write.msr = 0;
+}
+
+if ( w-do_write.cr0 )
+{
+hvm_set_cr0(w-cr0, 0);
+w-do_write.cr0 = 0;
+}
+
+if ( w-do_write.cr4 )
+{
+hvm_set_cr4(w-cr4, 0);
+w-do_write.cr4 = 0;
+}
+
+if ( w-do_write.cr3 )
+{
+hvm_set_cr3(w-cr3, 0);
+w-do_write.cr3 = 0;
+}
+}
+
 /* Inject pending hw/sw trap */
 if ( v-arch.hvm_vcpu.inject_trap.vector != -1 ) 
 {
@@ -3123,13 +3153,13 @@ int hvm_mov_to_cr(unsigned int cr, unsigned int gpr)
 switch ( cr )
 {
 case 0:
-return hvm_set_cr0(val);
+return hvm_set_cr0(val, 1);
 
 case 3:
-return hvm_set_cr3(val);
+return hvm_set_cr3(val, 1);
 
 case 4:
-

Re: [Xen-devel] [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts

2015-07-15 Thread Wu, Feng

 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, July 15, 2015 4:36 PM
 To: Wu, Feng
 Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
 Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
 Subject: RE: [v3 06/15] vmx: Extend struct pi_desc to support VT-d
 Posted-Interrupts

  On 15.07.15 at 10:26, feng...@intel.com wrote:

  -Original Message-
  From: Jan Beulich [mailto:jbeul...@suse.com]
  Sent: Wednesday, July 15, 2015 4:20 PM
  To: Wu, Feng
  Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
  Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
  Subject: RE: [v3 06/15] vmx: Extend struct pi_desc to support VT-d
  Posted-Interrupts

   On 15.07.15 at 04:40, feng...@intel.com wrote:

   -Original Message-
   From: Jan Beulich [mailto:jbeul...@suse.com]
   Sent: Friday, July 10, 2015 9:08 PM
   To: Wu, Feng
   Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian,
 Kevin;
   Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
   Subject: Re: [v3 06/15] vmx: Extend struct pi_desc to support VT-d
   Posted-Interrupts

On 24.06.15 at 07:18, feng...@intel.com wrote:
@@ -81,8 +81,19 @@ struct vmx_domain {

 struct pi_desc {
 DECLARE_BITMAP(pir, NR_VECTORS);
-u32 control;
-u32 rsvd[7];
+union {
+struct
+{
+u16 on : 1,  /* bit 256 - Outstanding Notification */
+sn : 1,  /* bit 257 - Suppress Notification */
+rsvd_1 : 14; /* bit 271:258 - Reserved */
+u8  nv;  /* bit 279:272 - Notification Vector */
+u8  rsvd_2;  /* bit 287:280 - Reserved */
+u32 ndst;/* bit 319:288 - Notification Destination
 */
+};
+u64 control;
+};

   So current code, afaics, uses e.g. test_and_set_bit() to set ON.
   By also declaring this as a bitfield you're opening the structure for
   non-atomic accesses. If that's correct, why is other code not
   being changed to _only_ use the bitfield mechanism (likely also
   eliminating the need for it being a union with the now 64-bit
   control? If atomic accesses are required, then I'd strongly
   suggest against making this a bit field.

   And in no event can I see why ndst needs to be union-ized
   with control if it doesn't need to be updated atomically with
   e.g. nv.

   When the vCPU is to be blocked, we need to atomically update
   the nv and ndst, then the wakeup notification event can be
   delivered to the right destination.

  Okay. Your reply made me go through the patches again to check
  where updates to nv/ndst happen - what's the reason they aren't
  being updated as a pair in patch 14's RUNSTATE_running handling
  (or in the replacement draft's vmx_ctxt_switch_to() adjustment)?

  It is because, we can only enter running state from runnable, in which,
  the NV field has been already changed back to ' posted_intr_vector ',
  we don't need to do it here again.

 Without sitting in the runstate update path anymore, I can't see how
 you would get to see all transitions to runnable.

Sorry, I cannot understanding the above comments well. Do you mean
after using the new method (arch hooks ) to update posted-interrupt
descriptor, I cannot track all the state transitions to runnable?

Thanks,
Feng

 Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [xen-unstable test] 59544: regressions - FAIL

2015-07-15 Thread osstest service owner

flight 59544 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59544/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 14 guest-localmigrate.2 
fail REGR. vs. 58958
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail 
REGR. vs. 58965
 test-armhf-armhf-xl   6 xen-boot  fail REGR. vs. 58965
 test-amd64-amd64-xl-qemuu-win7-amd64  9 windows-install   fail REGR. vs. 58965

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-rumpuserxen-amd64 15 
rumpuserxen-demo-xenstorels/xenstorels.repeat fail REGR. vs. 58965
 test-amd64-i386-xl-qemuu-win7-amd64  9 windows-install fail like 58958
 test-armhf-armhf-xl-rtds 11 guest-start  fail   like 58965
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 58965

Tests which did not succeed, but are not blocking:
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail never pass

version targeted for testing:
 xen  d924ddbf59f54f432f5fb6907d1262ddb9a9070a
baseline version:
 xen  c40317f11b3f05e7c06a2213560c8471081f2662

Last test of basis58965  2015-06-29 02:08:30 Z   16 days
Failing since 58974  2015-06-29 15:11:59 Z   15 days   17 attempts
Testing same since59544  2015-07-14 13:41:02 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper andrew.coop...@citrix.com
  Anthony PERARD anthony.per...@citrix.com
  Ard Biesheuvel a...@linaro.org
  Ben Catterall ben.catter...@citrix.com
  Boris Ostrovsky boris.ostrov...@oracle.com
  Chao Peng chao.p.p...@linux.intel.com
  Chen Baozi baoz...@gmail.com
  Daniel De Graaf dgde...@tycho.nsa.gov
  Dario Faggioli dario.faggi...@citrix.com
  David Scott dave.sc...@citrix.com
  David Vrabel david.vra...@citrix.com
  Dietmar Hahn dietmar.h...@ts.fujitsu.com
  Euan Harris euan.har...@citrix.com
  Fabio Fantoni fabio.fant...@m2r.biz
  Feng Wu feng...@intel.com
  George Dunlap george.dun...@eu.citrix.com
  Ian Campbell ian,campb...@citrix.com
  Ian Campbell ian.campb...@citrix.com
  Ian Jackson ian.jack...@eu.citrix.com
  Jan Beulich jbeul...@suse.com
  Jennifer Herbert jennifer.herb...@citrix.com
  Juergen Gross jgr...@suse.com
  Julien Grall julien.gr...@citrix.com
  Julien Grall julien.gr...@linaro.org
  Kevin Tian kevin.t...@intel.com
  Liang Li liang.z...@intel.com
  Paul Durrant paul.durr...@citrix.com
  Razvan Cojocaru rcojoc...@bitdefender.com
  Rob Hoes rob.h...@citrix.com
  Roger Pau MonnÃ© roger@citrix.com
  Samuel Thibault samuel.thiba...@ens-lyon.org
  Sander Eikelenboom li...@eikelenboom.it
  Tamas K Lengyel tleng...@novetta.com
  Thomas Leonard tal...@gmail.com
  Tiejun Chen tiejun.c...@intel.com
  Tim Deegan t...@xen.org
  Vitaly Kuznetsov vkuzn...@redhat.com
  Wei Liu wei.l...@citrix.com
  Wen Congyang we...@cn.fujitsu.com
  Yang Zhang yang.z.zh...@intel.com

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-oldkern  pass
 build-i386-oldkern   pass
 build-amd64-pvopspass
 build-armhf-pvops

Re: [Xen-devel] [PATCH v4 15/17] xen/arm: ITS: Map ITS translation space

2015-07-15 Thread Julien Grall


Hi Vijay,

On 10/07/2015 09:42, vijay.kil...@gmail.com wrote:

From: Vijaya Kumar K vijaya.ku...@caviumnetworks.com

ITS translation space contains GITS_TRANSLATOR
register which is written by device to raise
LPI. This space needs to mapped to every domain
address space for all physical ITS available,
so that device can access GITS_TRANSLATOR
register using SMMU.

Signed-off-by: Vijaya Kumar K vijaya.ku...@caviumnetworks.com
---
  xen/arch/arm/vgic-v3-its.c |   31 ++-
  1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/vgic-v3-its.c b/xen/arch/arm/vgic-v3-its.c
index 74e6ee7..301f065 100644
--- a/xen/arch/arm/vgic-v3-its.c
+++ b/xen/arch/arm/vgic-v3-its.c
@@ -1082,6 +1082,35 @@ static const struct mmio_handler_ops 
vgic_gits_mmio_handler = {
  .write_handler = vgic_v3_gits_mmio_write,
  };

+/*
+ * Map the 64K ITS translation space in guest.
+ * This is required purely for device smmu writes.
+*/
+
+static int vits_map_translation_space(struct domain *d)
+{
+uint64_t addr, size;
+int ret;
+
+addr = d-arch.vits-gits_base + SZ_64K;
+size = SZ_64K;
+
+ret = map_mmio_regions(d,
+   paddr_to_pfn(addr  PAGE_MASK),
+   DIV_ROUND_UP(size, PAGE_SIZE),
+   paddr_to_pfn(addr  PAGE_MASK));


You are assuming a direct mapping in the guest memory for the ITS 
translation space.


While this may be true for dom0, it won't work for guests.

I'm fine if you don't handle this case for 4.6. Although I'd like to at 
least see a comment stating that we are using 1:1 mapping and an assert 
to check if the domain is using direct mapping (i.e 
is_domain_direct_mapped(d)).


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun


This is very similar to our current policy to
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6
since actually this is also another rare possibility in real world. Even
I can do this as well when we handle that conflict with
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6.


Sorry, here is one typo, s/#6/#5

Thanks
Tiejun



Note its not necessary to concern high memory since we already handle
this case in the hv code previously, and its also not affected by those
relocated memory later since our previous policy can make sure RAM isn't
overlapping with RDM.

Thanks
Tiejun


to co-maintainers overriding me).

Jan





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v8 --for 4.6 COLO 07/25] tools/libxl: add back channel support to read stream

2015-07-15 Thread Yang Hongyang

This is used by primay to read records sent by secondary.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 tools/libxl/libxl_create.c  |  1 +
 tools/libxl/libxl_internal.h|  1 +
 tools/libxl/libxl_stream_read.c | 17 +
 3 files changed, 19 insertions(+)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 1d4b13b..1af7103 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -978,6 +978,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
 dcs-srs.dcs = dcs;
 dcs-srs.fd = restore_fd;
 dcs-srs.legacy = (dcs-restore_params.stream_version == 1);
+dcs-srs.back_channel = false;
 dcs-srs.completion_callback = domcreate_stream_done;
 
 libxl__stream_read_start(egc, dcs-srs);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 2634836..05cee04 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3358,6 +3358,7 @@ struct libxl__stream_read_state {
 libxl__domain_create_state *dcs;
 int fd;
 bool legacy;
+bool back_channel;
 void (*completion_callback)(libxl__egc *egc,
 libxl__stream_read_state *srs,
 int rc);
diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index 2d17403..b924f05 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -104,6 +104,15 @@
  * Depending on the contents of the stream, there are likely to be several
  * parallel tasks being managed.  check_all_finished() is used to join all
  * tasks in both success and error cases.
+ *
+ * For back channel stream:
+ * - libxl__stream_read_start()
+ *- Set up the stream to running state
+ *
+ * - libxl__stream_read_continue()
+ * - Set up reading the next record from a started stream.
+ *   Add some codes to process_record() to handle the record.
+ *   Then call stream-checkpoint_callback() to return.
  */
 
 /* Success/error/cleanup handling. */
@@ -200,6 +209,9 @@ void libxl__stream_read_start(libxl__egc *egc,
 stream-running = true;
 stream-phase   = SRS_PHASE_NORMAL;
 
+if (stream-back_channel)
+return;
+
 if (stream-legacy) {
 /* Convert the legacy stream. */
 libxl__conversion_helper_state *chs = stream-chs;
@@ -700,6 +712,11 @@ static void stream_done(libxl__egc *egc,
 assert(!stream-in_checkpoint);
 stream-running = false;
 
+if (stream-back_channel) {
+stream-completion_callback(egc, stream, stream-rc);
+return;
+}
+
 if (stream-incoming_record)
 free_record(stream-incoming_record);
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v8 --for 4.6 COLO 10/25] tools/libx{l, c}: add postcopy/suspend callback to restore side

2015-07-15 Thread Yang Hongyang

Secondary(restore side) is running under COLO, we also need
postcopy/suspend callbacks.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 tools/libxc/include/xenguest.h | 10 ++
 tools/libxl/libxl_save_msgs_gen.pl |  4 ++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index fa06d9b..1e7e1bb 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -114,6 +114,16 @@ struct restore_callbacks {
 int (*toolstack_restore)(uint32_t domid, const uint8_t *buf,
 uint32_t size, void* data);
 
+/* Called after a new checkpoint to suspend the guest.
+ */
+int (*suspend)(void* data);
+
+/* Called after the secondary vm is ready to resume.
+ * Callback function resumes the guest  the device model,
+ * returns to xc_domain_restore.
+ */
+int (*postcopy)(void* data);
+
 /* A checkpoint record has been found in the stream.
  * returns: */
 #define XGR_CHECKPOINT_ERROR0 /* Terminate processing */
diff --git a/tools/libxl/libxl_save_msgs_gen.pl 
b/tools/libxl/libxl_save_msgs_gen.pl
index 9107a86..7c9859b 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -23,8 +23,8 @@ our @msgs = (
  STRING doing_what),
 'unsigned long', 'done',
 'unsigned long', 'total'] ],
-[  3, 'scxA',   suspend, [] ],
-[  4, 'scxA',   postcopy, [] ],
+[  3, 'srcxA',  suspend, [] ],
+[  4, 'srcxA',  postcopy, [] ],
 [  5, 'srcxA',  checkpoint, [] ],
 [  6, 'srcxA',  should_checkpoint, [] ],
 [  7, 'scxA',   switch_qemu_logdirty,  [qw(int domid
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v8 --for 4.6 COLO 04/25] libxc/migration: export read_record for common use

2015-07-15 Thread Yang Hongyang

read_record() could be used by primary to read dirty bitmap
record sent by secondary under COLO.
When used by save side, we need to pass the backchannel fd
instead of ctx-fd to read_record(), so we added a fd param to
it.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
CC: Andrew Cooper andrew.coop...@citrix.com
---
 tools/libxc/xc_sr_common.c  | 49 +++
 tools/libxc/xc_sr_common.h  | 14 ++
 tools/libxc/xc_sr_restore.c | 63 +
 3 files changed, 64 insertions(+), 62 deletions(-)

diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c
index becc0f4..0ee607c 100644
--- a/tools/libxc/xc_sr_common.c
+++ b/tools/libxc/xc_sr_common.c
@@ -89,6 +89,55 @@ int write_split_record(struct xc_sr_context *ctx, struct 
xc_sr_record *rec,
 return -1;
 }
 
+int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec)
+{
+xc_interface *xch = ctx-xch;
+struct xc_sr_rhdr rhdr;
+size_t datasz;
+
+if ( read_exact(fd, rhdr, sizeof(rhdr)) )
+{
+PERROR(Failed to read Record Header from stream);
+return -1;
+}
+else if ( rhdr.length  REC_LENGTH_MAX )
+{
+ERROR(Record (0x%08x, %s) length %#x exceeds max (%#x), rhdr.type,
+  rec_type_to_str(rhdr.type), rhdr.length, REC_LENGTH_MAX);
+return -1;
+}
+
+datasz = ROUNDUP(rhdr.length, REC_ALIGN_ORDER);
+
+if ( datasz )
+{
+rec-data = malloc(datasz);
+
+if ( !rec-data )
+{
+ERROR(Unable to allocate %zu bytes for record data (0x%08x, %s),
+  datasz, rhdr.type, rec_type_to_str(rhdr.type));
+return -1;
+}
+
+if ( read_exact(fd, rec-data, datasz) )
+{
+free(rec-data);
+rec-data = NULL;
+PERROR(Failed to read %zu bytes of data for record (0x%08x, %s),
+   datasz, rhdr.type, rec_type_to_str(rhdr.type));
+return -1;
+}
+}
+else
+rec-data = NULL;
+
+rec-type   = rhdr.type;
+rec-length = rhdr.length;
+
+return 0;
+};
+
 static void __attribute__((unused)) build_assertions(void)
 {
 XC_BUILD_BUG_ON(sizeof(struct xc_sr_ihdr) != 24);
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 28755ac..632160e 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -356,6 +356,20 @@ static inline int write_record(struct xc_sr_context *ctx,
 }
 
 /*
+ * Reads a record from the stream, and fills in the record structure.
+ *
+ * Returns 0 on success and non-0 on failure.
+ *
+ * On success, the records type and size shall be valid.
+ * - If size is 0, data shall be NULL.
+ * - If size is non-0, data shall be a buffer allocated by malloc() which must
+ *   be passed to free() by the caller.
+ *
+ * On failure, the contents of the record structure are undefined.
+ */
+int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec);
+
+/*
  * This would ideally be private in restore.c, but is needed by
  * x86_pv_localise_page() if we receive pagetables frames ahead of the
  * contents of the frames they point at.
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 504463e..d53694b 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -69,67 +69,6 @@ static int read_headers(struct xc_sr_context *ctx)
 }
 
 /*
- * Reads a record from the stream, and fills in the record structure.
- *
- * Returns 0 on success and non-0 on failure.
- *
- * On success, the records type and size shall be valid.
- * - If size is 0, data shall be NULL.
- * - If size is non-0, data shall be a buffer allocated by malloc() which must
- *   be passed to free() by the caller.
- *
- * On failure, the contents of the record structure are undefined.
- */
-static int read_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
-{
-xc_interface *xch = ctx-xch;
-struct xc_sr_rhdr rhdr;
-size_t datasz;
-
-if ( read_exact(ctx-fd, rhdr, sizeof(rhdr)) )
-{
-PERROR(Failed to read Record Header from stream);
-return -1;
-}
-else if ( rhdr.length  REC_LENGTH_MAX )
-{
-ERROR(Record (0x%08x, %s) length %#x exceeds max (%#x), rhdr.type,
-  rec_type_to_str(rhdr.type), rhdr.length, REC_LENGTH_MAX);
-return -1;
-}
-
-datasz = ROUNDUP(rhdr.length, REC_ALIGN_ORDER);
-
-if ( datasz )
-{
-rec-data = malloc(datasz);
-
-if ( !rec-data )
-{
-ERROR(Unable to allocate %zu bytes for record data (0x%08x, %s),
-  datasz, rhdr.type, rec_type_to_str(rhdr.type));
-return -1;
-}
-
-if ( read_exact(ctx-fd, rec-data, datasz) )
-{
-free(rec-data);
-rec-data = NULL;
-PERROR(Failed to read %zu bytes of data for record (0x%08x, %s),
-   datasz, rhdr.type,

Re: [Xen-devel] [PATCH v4 05/17] xen/arm: ITS: implement hw_irq_controller for LPIs

2015-07-15 Thread Ian Campbell

On Wed, 2015-07-15 at 10:26 +0200, Julien Grall wrote:

  @@ -149,7 +173,7 @@ int gic_route_irq_to_guest(struct domain *d, unsigned
  int virq,
  test_bit(GIC_IRQ_GUEST_ENABLED, p-status) )
 goto out;
 
  -desc-handler = gic_hw_ops-gic_guest_irq_type;
  +desc-handler = get_guest_hw_irq_controller(desc-irq);
 set_bit(_IRQ_GUEST, desc-status);
 
 gic_set_irq_properties(desc, cpumask_of(v_target-processor),
  priority);
  diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
  index 2dd43ee..ba8528a 100644
  --- a/xen/arch/arm/irq.c
  +++ b/xen/arch/arm/irq.c
  @@ -35,7 +35,13 @@ static DEFINE_SPINLOCK(local_irqs_type_lock);
 struct irq_guest
 {
 struct domain *d;
  -unsigned int virq;
  +union
  +{
  +/* virq refer to virtual irq in case of spi */
  +unsigned int virq;
  +/* virq refer to event ID in case of lpi */
  +unsigned int vid;
 
 
  Why can't we store the event ID in the irq_guest? As said on v3, this is 
  not
 
  Are you referring to irq_desc in above statement?
 
 Yes sorry.

I'm afraid I don't follow your suggestion here, are you suggesting that
the vid field added above should be moved to irq_desc?

But the vid _is_ domain specific, it is the virtual event ID which is
per-domain (it's the thing looked up in the ITT to get a vLPI to be
injected). I think it is a pretty direct analogue of the virq field used
for non-LPI irq_guest structs.

If we had need for the physical event id then that would like belong in
the irq_desc.

Your proposal on v3 looks to be around moving the its_device pointer to
the irq_desc, which appears to have been done here, along with turning
the virq+vid into a union as requested there too.

  It has been suggested by Ian to move col_id in the its_device in the
  previous version [4]. Any reason to not doing it?
 
  In round robin fashion each plpi is attached to col_id. So storing
  in its_device is not possible. In linux latest col_id is stored in 
  its_device
  structure for which set_affinity is called.

Are you saying that in Linux all Events/LPIs associated with a given ITS
device are routed to the same collection?

 You could do round robin on its_device... It would be exactly the same 

Routing all LPIs associated with a given its_device to the same
collection is not exactly the same as round robin-ing all LPIs from the
device over the collections.

 and save 2 byte if not more with the alignment per irq_desc.

If this is a concern then I would say we would either want a separate
array of per-pLPI information which we do not want in irq_desc because
it is irq specific, or do add a pointer to its_desc which points to an
array of per-event information.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used

2015-07-15 Thread Jan Beulich

 On 15.07.15 at 10:55, feng...@intel.com wrote:
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, July 15, 2015 4:46 PM
  On 15.07.15 at 10:38, feng...@intel.com wrote:
  From: Jan Beulich [mailto:jbeul...@suse.com]
  Sent: Wednesday, July 15, 2015 4:25 PM
   On 15.07.15 at 08:04, feng...@intel.com wrote:
   From: Jan Beulich [mailto:jbeul...@suse.com]
   Sent: Friday, July 10, 2015 10:02 PM
   I'm particularly worried by the call to acpi_find_matched_drhd_unit()
   - is it maybe worth storing the iommu pointer in struct msi_desc?

   I think it worth, Like Andrew also mentioned this point before. I tend
   to make this a independent work and do it later, since the 4.6 release
   is coming, I am still try my best to target it. Could you please share
   your concern here, performance? Or other things? Thanks!

  Interrupt latency in particular.

  This update IRTE operation is not so frequently. It only happens in few
  times,
  especially in the initialization phase of the guest. And even the guest set
  the affinity, in the MSI/MSIx configuration doesn't change, QEMU will not
  ask Xen to update it.

 When the guest sets the affinity, the MSI{,-X} configuration is
 rather likely to change (at least for Linux guests).

 Yes, it is. But I'd say, it is not a frequent operation. In my test, it only 
 happens
 in the initialization phase and some updates doesn't go the Xen since the
 configuration is the same (QEMU filters it).

Can I please ask you to move away from this way of thinking? What
you see in experiments is useful from a functionality pov, but pretty
meaningless from a security perspective. For that, you'd rather start
thinking about what a _malicious_ guest might be doing.

 And I agree I will change this,
 my question is that can we put this a little late, and I can focus on some
 other critical issue before 4.6 is release, which may make more chance for
 this patch to catch up with 4.6. Is this okay for you?

As long as the feature (due to the other issue) remains experimental,
is off by default, and the code has a prominent comment outlining the
intended improvement, I'd be fine, yes.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts

2015-07-15 Thread Wu, Feng

 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, July 15, 2015 5:28 PM
 To: Wu, Feng
 Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
 Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
 Subject: RE: [v3 06/15] vmx: Extend struct pi_desc to support VT-d
 Posted-Interrupts

  On 15.07.15 at 10:43, feng...@intel.com wrote:

  -Original Message-
  From: Jan Beulich [mailto:jbeul...@suse.com]
  Sent: Wednesday, July 15, 2015 4:36 PM
  To: Wu, Feng
  Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
  Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
  Subject: RE: [v3 06/15] vmx: Extend struct pi_desc to support VT-d
  Posted-Interrupts

   On 15.07.15 at 10:26, feng...@intel.com wrote:

   -Original Message-
   From: Jan Beulich [mailto:jbeul...@suse.com]
   Sent: Wednesday, July 15, 2015 4:20 PM
   To: Wu, Feng
   Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian,
 Kevin;
   Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
   Subject: RE: [v3 06/15] vmx: Extend struct pi_desc to support VT-d
   Posted-Interrupts

On 15.07.15 at 04:40, feng...@intel.com wrote:

-Original Message-
From: Jan Beulich [mailto:jbeul...@suse.com]
Sent: Friday, July 10, 2015 9:08 PM
To: Wu, Feng
Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian,
  Kevin;
Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
Subject: Re: [v3 06/15] vmx: Extend struct pi_desc to support VT-d
Posted-Interrupts

 On 24.06.15 at 07:18, feng...@intel.com wrote:
 @@ -81,8 +81,19 @@ struct vmx_domain {

  struct pi_desc {
  DECLARE_BITMAP(pir, NR_VECTORS);
 -u32 control;
 -u32 rsvd[7];
 +union {
 +struct
 +{
 +u16 on : 1,  /* bit 256 - Outstanding Notification */
 +sn : 1,  /* bit 257 - Suppress Notification */
 +rsvd_1 : 14; /* bit 271:258 - Reserved */
 +u8  nv;  /* bit 279:272 - Notification Vector */
 +u8  rsvd_2;  /* bit 287:280 - Reserved */
 +u32 ndst;/* bit 319:288 - Notification
 Destination
  */
 +};
 +u64 control;
 +};

So current code, afaics, uses e.g. test_and_set_bit() to set ON.
By also declaring this as a bitfield you're opening the structure for
non-atomic accesses. If that's correct, why is other code not
being changed to _only_ use the bitfield mechanism (likely also
eliminating the need for it being a union with the now 64-bit
control? If atomic accesses are required, then I'd strongly
suggest against making this a bit field.

And in no event can I see why ndst needs to be union-ized
with control if it doesn't need to be updated atomically with
e.g. nv.

When the vCPU is to be blocked, we need to atomically update
the nv and ndst, then the wakeup notification event can be
delivered to the right destination.

   Okay. Your reply made me go through the patches again to check
   where updates to nv/ndst happen - what's the reason they aren't
   being updated as a pair in patch 14's RUNSTATE_running handling
   (or in the replacement draft's vmx_ctxt_switch_to() adjustment)?

   It is because, we can only enter running state from runnable, in which,
   the NV field has been already changed back to ' posted_intr_vector ',
   we don't need to do it here again.

  Without sitting in the runstate update path anymore, I can't see how
  you would get to see all transitions to runnable.

  Sorry, I cannot understanding the above comments well. Do you mean
  after using the new method (arch hooks ) to update posted-interrupt
  descriptor, I cannot track all the state transitions to runnable?

 Not sure if track is the right word here, but yes.

The new method is still in development, let's see how it will be then. :)

Thanks,
Feng

 Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Question about mapping between domains

2015-07-15 Thread Oleksandr Dmytryshyn

Hi, Ian. Thank You for the response.

 Look at how the balloon driver does it, the hypercalls you want are
 XENMEM_(increase|decrease)_reservation.
I'll try to use those hypercalls.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts

2015-07-15 Thread Jan Beulich

 On 15.07.15 at 10:26, feng...@intel.com wrote:

 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, July 15, 2015 4:20 PM
 To: Wu, Feng
 Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
 Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org 
 Subject: RE: [v3 06/15] vmx: Extend struct pi_desc to support VT-d
 Posted-Interrupts

  On 15.07.15 at 04:40, feng...@intel.com wrote:

  -Original Message-
  From: Jan Beulich [mailto:jbeul...@suse.com]
  Sent: Friday, July 10, 2015 9:08 PM
  To: Wu, Feng
  Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
  Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org 
  Subject: Re: [v3 06/15] vmx: Extend struct pi_desc to support VT-d
  Posted-Interrupts

   On 24.06.15 at 07:18, feng...@intel.com wrote:
   @@ -81,8 +81,19 @@ struct vmx_domain {

struct pi_desc {
DECLARE_BITMAP(pir, NR_VECTORS);
   -u32 control;
   -u32 rsvd[7];
   +union {
   +struct
   +{
   +u16 on : 1,  /* bit 256 - Outstanding Notification */
   +sn : 1,  /* bit 257 - Suppress Notification */
   +rsvd_1 : 14; /* bit 271:258 - Reserved */
   +u8  nv;  /* bit 279:272 - Notification Vector */
   +u8  rsvd_2;  /* bit 287:280 - Reserved */
   +u32 ndst;/* bit 319:288 - Notification Destination */
   +};
   +u64 control;
   +};

  So current code, afaics, uses e.g. test_and_set_bit() to set ON.
  By also declaring this as a bitfield you're opening the structure for
  non-atomic accesses. If that's correct, why is other code not
  being changed to _only_ use the bitfield mechanism (likely also
  eliminating the need for it being a union with the now 64-bit
  control? If atomic accesses are required, then I'd strongly
  suggest against making this a bit field.

  And in no event can I see why ndst needs to be union-ized
  with control if it doesn't need to be updated atomically with
  e.g. nv.

  When the vCPU is to be blocked, we need to atomically update
  the nv and ndst, then the wakeup notification event can be
  delivered to the right destination.

 Okay. Your reply made me go through the patches again to check
 where updates to nv/ndst happen - what's the reason they aren't
 being updated as a pair in patch 14's RUNSTATE_running handling
 (or in the replacement draft's vmx_ctxt_switch_to() adjustment)?

 It is because, we can only enter running state from runnable, in which,
 the NV field has been already changed back to ' posted_intr_vector ',
 we don't need to do it here again.

Without sitting in the runstate update path anymore, I can't see how
you would get to see all transitions to runnable.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH V6 0/3] Vm_event memory introspection helpers

2015-07-15 Thread Razvan Cojocaru

This series addresses reviews addressed to V5. All patches have
at least one ack, and the modifications are minor.

Patch 2/3 has not been modified at all, and the only modification
in patch 3/3 is that it now uses vzalloc() / vfree() instead of
xzalloc_array() / xfree(), and both patch 3/3 and 1/3 now set
the allocated data to NULL after freeing it on domain destruction
paths.

Patch 1/3 does has slightly more modifications, however they are
mostly cosmetic (the only non-cosmetic one is that the patch now
bypasses a hvm_copy_from_guest_phys() call that did no harm but
was unnecessary).

As discussed, I've kept the better-safe-than-sorry approach of
freeing allocated data on both domain destruction paths and
vm_event_cleanup(), based on the comments in
shadow_final_teardown(), which imply that it is theoretically
possible to end up on a domain destruction path without
domain_kill() being called (and domain_kill() does the
vm_event_cleanup()).

[PATCH V6 1/3] xen/mem_access: Support for memory-content hiding
[PATCH V6 2/3] xen/vm_event: Support for guest-requested events
[PATCH V6 3/3] xen/vm_event: Deny register writes if refused by
vm_event reply


Thanks in advance for your reviews,
Razvan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v8 11/11] (lib)xl: soft reset support

2015-07-15 Thread Vitaly Kuznetsov

Ian Jackson ian.jack...@eu.citrix.com writes:

 Vitaly Kuznetsov writes ([PATCH v8 11/11] (lib)xl: soft reset support):
 Use existing create/restore path to perform 'soft reset' for HVM domains.
 Tear everything down, e.g. destroy domain's device model, remove the domain
 from xenstore, save toolstack record and start over.

 This patch has a number of long lines (eg in the documentation and
 comments) which make it hard to review.  Can you please keep it to 70
 columns, or 75 if you absolutely must ?

No problem, will do in v9. BTW, libxl/CODING_STYLE states that 'Lines
are limited to 75-80 characters'. I'd suggest we update that in case
70-75 is preferred.


 I'm not sure that this descriptiion:

 +=item Bsoft-reset
 +
 +cleanup the domain without destroying it, restart the device
 +model. This action is supported for HVM guests only.

 is really accurate from a user point of view.

Yea, I'm trying hard to avoid mentioning Linux and kexec while
describing soft reset. Will try to come up with something..



 Ian.

-- 
  Vitaly

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/traps: Dump instruction stream in show_execution_state()

2015-07-15 Thread Jan Beulich

 On 14.07.15 at 18:15, andrew.coop...@citrix.com wrote:
 Currently limited to just hypervisor context, but it could be extended
 to vcpus as well.

Considering this ...

 --- a/xen/arch/x86/traps.c
 +++ b/xen/arch/x86/traps.c
 @@ -115,6 +115,31 @@
  #define stack_words_per_line 4
  #define ESP_BEFORE_EXCEPTION(regs) ((unsigned long *)regs-rsp)
  
 +static void show_code(const struct cpu_user_regs *regs)
 +{
 +char insns[24];
 +unsigned int i, not_copied;
 +void *__user start_ip = (void *)regs-rip - 8;
 +
 +if ( guest_mode(regs) )
 +return;
 +
 +not_copied = __copy_from_user(insns, start_ip, ARRAY_SIZE(insns));
 +
 +printk(Xen code around %04x:%p (%ps)%s:\n,

... I'd prefer the Xen  here to be dropped.

 +   regs-cs, _p(regs-rip), _p(regs-rip),
 +   !!not_copied ?  [fault on access] : );

Pointless !!.

 +for ( i = 0; i  ARRAY_SIZE(insns) - not_copied; ++i )
 +{
 +if ( (unsigned long)(start_ip + i) == regs-rip )
 +printk( %02x, (unsigned char)insns[i]);
 +else
 +printk( %02x, (unsigned char)insns[i]);

Why not have insns[] be unsigned char right away?

Also I think you should avoid the subtraction from regs-rip to wrap
through zero, or even bail when RIP doesn't point into Xen space.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun


Certainly appreciate your time.

I didn't mean its wasting time at this point. I just want to express
that its hard to implement that solution in one or two weeks to walking
into 4.6 as an exception.

Note I know this feature is still not accepted as an exception to 4.6
right now so I'm making an assumption.


After all this is a bug fix (and would have been allowed into 4.5 had
it been ready in time), so doesn't necessarily need a freeze
exception (but of course the bar raises the later it gets). Rather


Yes, this is not a bug fix again into 4.6.


than rushing in something that's cumbersome to maintain, I'd much
prefer this to be done properly.



Indeed, we'd like to finalize this properly as you said. But apparently 
time is not sufficient to allow this happened. So I just suggest we can 
further seek the best solution in next phase.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/traps: Misc tweaks to several printk()s

2015-07-15 Thread Jan Beulich

 On 14.07.15 at 19:54, andrew.coop...@citrix.com wrote:
 @@ -626,8 +626,9 @@ static void do_trap(struct cpu_user_regs *regs, int 
 use_error_code)
  
  if ( likely((fixup = search_exception_table(regs-eip)) != 0) )
  {
 -dprintk(XENLOG_ERR, Trap %d: %p - %p\n,
 -trapnr, _p(regs-eip), _p(fixup));
 +printk(XENLOG_INFO Exception [#%d, ec=%04x] (%s): %ps %p - %p\n,
 +   trapnr, use_error_code ? regs-error_code : 0, 
 trapstr(trapnr),
 +   _p(regs-eip), _p(regs-eip), _p(fixup));

But why the transition dprintk() - printk()?

 @@ -2677,9 +2678,9 @@ static int emulate_privileged_op(struct cpu_user_regs 
 *regs)
  
  if ( (rdmsr_safe(regs-ecx, val) != 0) || (msr_content != val) )
  invalid:
 -gdprintk(XENLOG_WARNING, Domain attempted WRMSR %p from 
 -0x%016PRIx64 to 0x%016PRIx64.\n,
 -_p(regs-ecx), val, msr_content);
 +gprintk(XENLOG_WARNING,
 +attempted WRMSR 0x%08x: 0x%016PRIx64 - 
 0x%016PRIx64\n,
 +regs-_ecx, val, msr_content);

In cases where the values can't usefully be taken to be decimal I'd
prefer the 0x prefixes to be omitted.

 @@ -2813,10 +2814,11 @@ static int emulate_privileged_op(struct cpu_user_regs 
 *regs)
  case MSR_EFER:
   rdmsr_normal:
  /* Everyone can read the MSR space. */
 -/* gdprintk(XENLOG_WARNING,Domain attempted RDMSR %p.\n,
 -_p(regs-ecx));*/
  if ( rdmsr_safe(regs-ecx, val) )
 +{
 +gprintk(XENLOG_WARNING, attempted RDMSR 0x%08x\n, 
 regs-_ecx);
  goto fail;
 +}

Do you really see this to be useful in production builds?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v8 --for 4.6 COLO 18/25] Support colo mode for qemu disk

2015-07-15 Thread Yang Hongyang

From: Wen Congyang we...@cn.fujitsu.com

Usage: disk = ['...,colo,colo-params=xxx,active-disk=xxx,hidden-disk=xxx...']
The format of colo-params: host:port:exportname=xx
For QEMU block replication details:
http://wiki.qemu.org/Features/BlockReplication

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 docs/man/xl.pod.1   |   2 +-
 docs/misc/xl-disk-configuration.txt |  38 ++
 tools/libxl/libxl.c |  42 +-
 tools/libxl/libxl_create.c  |  25 +++-
 tools/libxl/libxl_device.c  |  38 ++
 tools/libxl/libxl_dm.c  | 257 +++-
 tools/libxl/libxl_types.idl |   5 +
 tools/libxl/libxlu_disk_l.l |   5 +
 8 files changed, 403 insertions(+), 9 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 2cd34bb..1effce7 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -454,7 +454,7 @@ N.B: Remus support in xl is still in experimental 
(proof-of-concept) phase.
  Disk replication support is limited to DRBD disks.
 
  COLO support in xl is still in experimental (proof-of-concept) phase.
- There is no support for network or disk at the moment.
+ There is no support for network at the moment.
 
 BOPTIONS
 
diff --git a/docs/misc/xl-disk-configuration.txt 
b/docs/misc/xl-disk-configuration.txt
index 6a2118d..e366e8d 100644
--- a/docs/misc/xl-disk-configuration.txt
+++ b/docs/misc/xl-disk-configuration.txt
@@ -234,6 +234,44 @@ were intentionally created non-sparse to avoid 
fragmentation of the
 file.
 
 
+===
+COLO PARAMETERS
+===
+
+
+colo
+
+
+Enable COLO HA for disk. For better understanding block replication on
+QEMU, please refer to:
+http://wiki.qemu.org/Features/BlockReplication
+
+
+colo-params=host:port:exportname=name
+---
+
+Description:   Secondary host's address and port information,
+   We will run a nbd server on secondary host,
+   exportname is the nbd server's disk export name.
+Mandatory: Yes when COLO enabled
+
+
+active-disk
+---
+
+Description:   This is used by secondary. Secondary guest's write
+   will be buffered in this disk.
+Mandatory: Yes when COLO enabled
+
+
+hidden-disk
+---
+
+Description:   This is used by secondary. It buffers the original
+   content that is modified by the primary VM.
+Mandatory: Yes when COLO enabled
+
+
 
 DEPRECATED PARAMETERS, PREFIXES AND SYNTAXES
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 791f364..c6cc5aa 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -2256,6 +2256,8 @@ int libxl__device_disk_setdefault(libxl__gc *gc, 
libxl_device_disk *disk)
 int rc;
 
 libxl_defbool_setdefault(disk-discard_enable, !!disk-readwrite);
+libxl_defbool_setdefault(disk-colo_enable, false);
+libxl_defbool_setdefault(disk-colo_restore_enable, false);
 
 rc = libxl__resolve_domid(gc, disk-backend_domname, disk-backend_domid);
 if (rc  0) return rc;
@@ -2456,6 +2458,14 @@ static void device_disk_add(libxl__egc *egc, uint32_t 
domid,
 flexarray_append(back, params);
 flexarray_append(back, libxl__sprintf(gc, %s:%s,
   
libxl__device_disk_string_of_format(disk-format), disk-pdev_path));
+if (libxl_defbool_val(disk-colo_enable)) {
+flexarray_append(back, colo-params);
+flexarray_append(back, libxl__sprintf(gc, %s, 
disk-colo_params));
+flexarray_append(back, active-disk);
+flexarray_append(back, libxl__sprintf(gc, %s, 
disk-active_disk));
+flexarray_append(back, hidden-disk);
+flexarray_append(back, libxl__sprintf(gc, %s, 
disk-hidden_disk));
+}
 assert(device-backend_kind == LIBXL__DEVICE_KIND_QDISK);
 break;
 default:
@@ -2570,7 +2580,10 @@ static int libxl__device_disk_from_xs_be(libxl__gc *gc,
 goto cleanup;
 }
 
-/* params may not be present; but everything else must be. */
+/*
+ * params and colo-params may not be present; but everything
+ * else must be.
+ */
 tmp = xs_read(ctx-xsh, XBT_NULL,
   libxl__sprintf(gc, %s/params, be_path), len);
 if (tmp  strchr(tmp, ':')) {
@@ -2580,6 +2593,33 @@ static int libxl__device_disk_from_xs_be(libxl__gc *gc,
 disk-pdev_path = tmp;
 }
 
+tmp = xs_read(ctx-xsh, XBT_NULL,
+  libxl__sprintf(gc, %s/colo-params, be_path), len);
+if (tmp) {
+libxl_defbool_set(disk-colo_enable, true);
+disk-colo_params = tmp;
+} else {
+

[Xen-devel] [PATCH v8 --for 4.6 COLO 06/25] tools/libxl: write colo_context records into the stream

2015-07-15 Thread Yang Hongyang

write colo_context records into the stream, used by both
primary and secondary to send colo context.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 tools/libxl/libxl_internal.h |  5 +++
 tools/libxl/libxl_stream_write.c | 87 
 2 files changed, 92 insertions(+)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index a83d6a5..2634836 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3000,6 +3000,7 @@ struct libxl__stream_write_state {
 int rc;
 bool running;
 bool in_checkpoint;
+bool in_colo_context;
 libxl__save_helper_state shs;
 
 /* Main stream-writing data. */
@@ -3019,6 +3020,10 @@ _hidden void libxl__stream_write_start(libxl__egc *egc,
 _hidden void
 libxl__stream_write_start_checkpoint(libxl__egc *egc,
  libxl__stream_write_state *stream);
+_hidden void
+libxl__stream_write_colo_context(libxl__egc *egc,
+ libxl__stream_write_state *stream,
+ libxl_sr_colo_context *colo_context);
 _hidden void libxl__stream_write_abort(libxl__egc *egc,
libxl__stream_write_state *stream,
int rc);
diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
index df55277..e7a32c4 100644
--- a/tools/libxl/libxl_stream_write.c
+++ b/tools/libxl/libxl_stream_write.c
@@ -96,6 +96,16 @@ static void write_checkpoint_end_record(libxl__egc *egc,
 static void checkpoint_end_record_done(libxl__egc *egc,
libxl__stream_write_state *stream);
 
+/* COLO context */
+static void write_colo_context(libxl__egc *egc,
+   libxl__stream_write_state *stream,
+   libxl_sr_colo_context *colo_context);
+static void write_colo_context_done(libxl__egc *egc,
+libxl__datacopier_state *dc,
+int rc, int onwrite, int errnoval);
+static void colo_context_done(libxl__egc *egc,
+  libxl__stream_write_state *stream, int rc);
+
 /*- Helpers -*/
 
 static void write_done(libxl__egc *egc,
@@ -500,6 +510,11 @@ static void stream_complete(libxl__egc *egc,
 return;
 }
 
+if (stream-in_colo_context) {
+colo_context_done(egc, stream, rc);
+return;
+}
+
 if (!stream-rc)
 stream-rc = rc;
 stream_done(egc, stream);
@@ -555,6 +570,78 @@ static void check_all_finished(libxl__egc *egc,
 stream-completion_callback(egc, stream, stream-rc);
 }
 
+/*- COLO context -*/
+void libxl__stream_write_colo_context(libxl__egc *egc,
+  libxl__stream_write_state *stream,
+  libxl_sr_colo_context *colo_context)
+{
+assert(stream-running);
+assert(!stream-in_checkpoint);
+assert(!stream-in_colo_context);
+stream-in_colo_context = true;
+
+write_colo_context(egc, stream, colo_context);
+}
+
+static void write_colo_context(libxl__egc *egc,
+   libxl__stream_write_state *stream,
+   libxl_sr_colo_context *colo_context)
+{
+static const uint8_t zero_padding[1U  REC_ALIGN_ORDER] = { 0 };
+libxl__datacopier_state *dc = stream-dc;
+STATE_AO_GC(stream-ao);
+struct libxl__sr_rec_hdr rec = { REC_TYPE_COLO_CONTEXT, 0 };
+int rc = 0;
+uint32_t padding_len;
+
+dc-copywhat = colo context record;
+dc-writewhat = save/migration stream;
+dc-callback = write_colo_context_done;
+
+rc = libxl__datacopier_start(dc);
+if (rc)
+goto err;
+
+rec.length = sizeof(*colo_context);
+
+libxl__datacopier_prefixdata(egc, dc, rec, sizeof(rec));
+libxl__datacopier_prefixdata(egc, dc, colo_context, rec.length);
+
+padding_len = ROUNDUP(rec.length, REC_ALIGN_ORDER) - rec.length;
+if (padding_len)
+libxl__datacopier_prefixdata(egc, dc, zero_padding, padding_len);
+
+return;
+
+ err:
+assert(rc);
+stream_complete(egc, stream, rc);
+}
+
+static void write_colo_context_done(libxl__egc *egc,
+libxl__datacopier_state *dc,
+int rc, int onwrite, int errnoval)
+{
+libxl__stream_write_state *stream = CONTAINER_OF(dc, *stream, dc);
+STATE_AO_GC(stream-ao);
+
+if (rc || onwrite || errnoval) {
+stream_complete(egc, stream, rc ?: ERROR_FAIL);
+return;
+}
+
+colo_context_done(egc, stream, rc);
+return;
+}
+
+static void colo_context_done(libxl__egc *egc,
+  libxl__stream_write_state *stream, int rc)
+{
+assert(stream-in_colo_context);
+stream-in_colo_context = false;
+stream-checkpoint_callback(egc, stream, rc);

[Xen-devel] [PATCH v8 --for 4.6 COLO 17/25] implement the cmdline for COLO

2015-07-15 Thread Yang Hongyang

From: Wen Congyang we...@cn.fujitsu.com

Add a new option -c to the command 'xl remus'. If you want
to use COLO HA instead of Remus HA, please use -c option.

Update man pages to reflect the addition of a new option to
'xl remus' command.

Also add a new option -c to the internal command 'xl migrate-receive'.

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 docs/man/xl.pod.1 | 12 --
 tools/libxl/libxl.c   | 23 --
 tools/libxl/xl_cmdimpl.c  | 61 ---
 tools/libxl/xl_cmdtable.c |  4 +++-
 4 files changed, 81 insertions(+), 19 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index f22c3f3..2cd34bb 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -447,12 +447,15 @@ Print huge (!) amount of debug during the migration 
process.
 
 =item Bremus [IOPTIONS] Idomain-id Ihost
 
-Enable Remus HA for domain. By default Bxl relies on ssh as a transport
-mechanism between the two hosts.
+Enable Remus HA or COLO HA for domain. By default Bxl relies on ssh as a
+transport mechanism between the two hosts.
 
 N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
  Disk replication support is limited to DRBD disks.
 
+ COLO support in xl is still in experimental (proof-of-concept) phase.
+ There is no support for network or disk at the moment.
+
 BOPTIONS
 
 =over 4
@@ -498,6 +501,11 @@ Disable network output buffering. Requires enabling unsafe 
mode.
 
 Disable disk replication. Requires enabling unsafe mode.
 
+=item B-c
+
+Enable COLO HA. This conflicts with B-i and B-b, and memory
+checkpoint compression must be disabled.
+
 =back
 
 =item Bpause Idomain-id
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index c040909..791f364 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -814,12 +814,28 @@ int libxl_domain_remus_start(libxl_ctx *ctx, 
libxl_domain_remus_info *info,
 goto out;
 }
 
+/* The caller must set this defbool */
+if (libxl_defbool_is_default(info-colo)) {
+LOG(ERROR, colo mode must be enabled/disabled);
+rc = ERROR_FAIL;
+goto out;
+}
+
 libxl_defbool_setdefault(info-allow_unsafe, false);
 libxl_defbool_setdefault(info-blackhole, false);
-libxl_defbool_setdefault(info-compression, true);
+libxl_defbool_setdefault(info-compression,
+ !libxl_defbool_val(info-colo));
 libxl_defbool_setdefault(info-netbuf, true);
 libxl_defbool_setdefault(info-diskbuf, true);
 
+if (libxl_defbool_val(info-colo)) {
+if (libxl_defbool_val(info-compression)) {
+LOG(ERROR, cannot use memory checkpoint compression in COLO 
mode);
+rc = ERROR_FAIL;
+goto out;
+}
+}
+
 if (!libxl_defbool_val(info-allow_unsafe) 
 (libxl_defbool_val(info-blackhole) ||
  !libxl_defbool_val(info-netbuf) ||
@@ -841,7 +857,10 @@ int libxl_domain_remus_start(libxl_ctx *ctx, 
libxl_domain_remus_info *info,
 dss-live = 1;
 dss-debug = 0;
 dss-remus = info;
-dss-checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_REMUS;
+if (libxl_defbool_val(info-colo))
+dss-checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_COLO;
+else
+dss-checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_REMUS;
 
 assert(info);
 
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index ace4a65..45ec435 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -4292,6 +4292,8 @@ static void migrate_receive(int debug, int daemonize, int 
monitor,
 char rc_buf;
 char *migration_domname;
 struct domain_create dom_info;
+const char *ha = checkpointed == LIBXL_CHECKPOINTED_STREAM_COLO ?
+ COLO : Remus;
 
 signal(SIGPIPE, SIG_IGN);
 /* if we get SIGPIPE we'd rather just have it as an error */
@@ -4312,6 +4314,9 @@ static void migrate_receive(int debug, int daemonize, int 
monitor,
 dom_info.send_fd = send_fd;
 dom_info.migration_domname_r = migration_domname;
 dom_info.checkpointed_stream = checkpointed;
+if (checkpointed == LIBXL_CHECKPOINTED_STREAM_COLO)
+/* COLO uses stdout to send control message to master */
+dom_info.quiet = 1;
 
 rc = create_domain(dom_info);
 if (rc  0) {
@@ -4326,8 +4331,8 @@ static void migrate_receive(int debug, int daemonize, int 
monitor,
 /* If we are here, it means that the sender (primary) has crashed.
  * TODO: Split-Brain Check.
  */
-fprintf(stderr, migration target: Remus Failover for domain %u\n,
-domid);
+fprintf(stderr, migration target: %s Failover for domain %u\n,
+ha, domid);
 
 /*
  * If domain renaming fails, lets just continue (as we need the domain
@@ -4343,16 +4348,20 @@ static void migrate_receive(int debug, int daemonize, 
int

[Xen-devel] [PATCH v8 --for 4.6 COLO 19/25] COLO: use qemu block replication

2015-07-15 Thread Yang Hongyang

From: Wen Congyang we...@cn.fujitsu.com

Use qemu block replication as our block replication solution.
Note that guest must be paused before starting COLO, otherwise,
the disk won't be consistent between primary and secondary.

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
for commit message,
Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 tools/libxl/Makefile |   1 +
 tools/libxl/libxl_colo_qdisk.c   | 209 +++
 tools/libxl/libxl_colo_restore.c |  20 +++-
 tools/libxl/libxl_colo_save.c|  36 ++-
 tools/libxl/libxl_internal.h |  18 
 tools/libxl/libxl_qmp.c  |  31 ++
 6 files changed, 311 insertions(+), 4 deletions(-)
 create mode 100644 tools/libxl/libxl_colo_qdisk.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 71bf7a2..e91ae79 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -64,6 +64,7 @@ endif
 
 LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
 LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o
+LIBXL_OBJS-y += libxl_colo_qdisk.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl_colo_qdisk.c b/tools/libxl/libxl_colo_qdisk.c
new file mode 100644
index 000..d73572e
--- /dev/null
+++ b/tools/libxl/libxl_colo_qdisk.c
@@ -0,0 +1,209 @@
+/*
+ * Copyright (C) 2015 FUJITSU LIMITED
+ * Author: Wen Congyang we...@cn.fujitsu.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include libxl_osdeps.h /* must come before any other headers */
+
+#include libxl_internal.h
+
+typedef struct libxl__colo_qdisk {
+libxl__checkpoint_device *dev;
+} libxl__colo_qdisk;
+
+/* == init() and cleanup() == */
+int init_subkind_qdisk(libxl__checkpoint_devices_state *cds)
+{
+/*
+ * We don't know if we use qemu block replication, so
+ * we cannot start block replication here.
+ */
+return 0;
+}
+
+void cleanup_subkind_qdisk(libxl__checkpoint_devices_state *cds)
+{
+}
+
+/* == setup() and teardown() == */
+static void colo_qdisk_setup(libxl__egc *egc, libxl__checkpoint_device *dev,
+ bool primary)
+{
+const libxl_device_disk *disk = dev-backend_dev;
+const char *addr = NULL;
+const char *export_name;
+int ret, rc = 0;
+
+/* Convenience aliases */
+libxl__checkpoint_devices_state *const cds = dev-cds;
+const char *colo_params = disk-colo_params;
+const int domid = cds-domid;
+
+EGC_GC;
+
+if (disk-backend != LIBXL_DISK_BACKEND_QDISK ||
+!libxl_defbool_val(disk-colo_enable)) {
+rc = ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH;
+goto out;
+}
+
+export_name = strstr(colo_params, :exportname=);
+if (!export_name) {
+rc = ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH;
+goto out;
+}
+export_name += strlen(:exportname=);
+if (export_name[0] == 0) {
+rc = ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH;
+goto out;
+}
+
+dev-matched = 1;
+
+if (primary) {
+/* NBD server is not ready, so we cannot start block replication now */
+goto out;
+} else {
+libxl__colo_restore_state *crs = CONTAINER_OF(cds, *crs, cds);
+int len;
+
+if (crs-qdisk_setuped)
+goto out;
+
+crs-qdisk_setuped = true;
+
+len = export_name - strlen(:exportname=) - colo_params;
+addr = libxl__strndup(gc, colo_params, len);
+}
+
+ret = libxl__qmp_block_start_replication(gc, domid, primary, addr);
+if (ret)
+rc = ERROR_FAIL;
+
+out:
+dev-aodev.rc = rc;
+dev-aodev.callback(egc, dev-aodev);
+}
+
+static void colo_qdisk_teardown(libxl__egc *egc, libxl__checkpoint_device *dev,
+bool primary)
+{
+int ret, rc = 0;
+
+/* Convenience aliases */
+libxl__checkpoint_devices_state *const cds = dev-cds;
+const int domid = cds-domid;
+
+EGC_GC;
+
+if (primary) {
+libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds);
+
+if (!css-qdisk_setuped)
+goto out;
+
+css-qdisk_setuped = false;
+} else {
+libxl__colo_restore_state *crs = CONTAINER_OF(cds, *crs, cds);
+
+if (!crs-qdisk_setuped)
+goto out;
+
+crs-qdisk_setuped = false;
+}
+
+ret = libxl__qmp_block_stop_replication(gc, domid,

[Xen-devel] [PATCH v8 --for 4.6 COLO 11/25] secondary vm suspend/resume/checkpoint code

2015-07-15 Thread Yang Hongyang

From: Wen Congyang we...@cn.fujitsu.com

Secondary vm is running in colo mode. So we will do
the following things again and again:
1. Resume secondary vm
   a. Send LIBXL_COLO_SVM_READY to master.
   b. If it is not the first resume, call libxl__checkpoint_devices_preresume().
   c. If it is the first resume(resume right after live migration),
  - call libxl__xc_domain_restore_done() to build the secondary vm.
  - enable secondary vm's logdirty.
  - call libxl__domain_resume() to resume secondary vm.
  - call libxl__checkpoint_devices_setup() to setup checkpoint devices.
   d. Send LIBXL_COLO_SVM_RESUMED to master.
2. Wait a new checkpoint
   a. Call libxl__checkpoint_devices_commit().
   b. Read LIBXL_COLO_NEW_CHECKPOINT from master.
3. Suspend secondary vm
   a. Suspend secondary vm.
   b. Call libxl__checkpoint_devices_postsuspend().
   c. Send LIBXL_COLO_SVM_SUSPENDED to master.

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 tools/libxl/Makefile |   1 +
 tools/libxl/libxl_colo.h |  27 ++
 tools/libxl/libxl_colo_restore.c | 991 +++
 tools/libxl/libxl_create.c   | 111 -
 tools/libxl/libxl_internal.h |  19 +
 tools/libxl/libxl_save_callout.c |   7 +-
 6 files changed, 1154 insertions(+), 2 deletions(-)
 create mode 100644 tools/libxl/libxl_colo.h
 create mode 100644 tools/libxl/libxl_colo_restore.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 3cb3ae9..97b3753 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -63,6 +63,7 @@ LIBXL_OBJS-y += libxl_no_convert_callout.o
 endif
 
 LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
+LIBXL_OBJS-y += libxl_colo_restore.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
new file mode 100644
index 000..54dc835
--- /dev/null
+++ b/tools/libxl/libxl_colo.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Wen Congyang we...@cn.fujitsu.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#ifndef LIBXL_COLO_H
+#define LIBXL_COLO_H
+
+extern void libxl__colo_restore_done(libxl__egc *egc, void *dcs_void,
+ int ret, int retval, int errnoval);
+extern void libxl__colo_restore_setup(libxl__egc *egc,
+  libxl__colo_restore_state *crs);
+extern void libxl__colo_restore_teardown(libxl__egc *egc,
+ libxl__colo_restore_state *crs,
+ int rc);
+
+#endif
diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
new file mode 100644
index 000..5cda0b2
--- /dev/null
+++ b/tools/libxl/libxl_colo_restore.c
@@ -0,0 +1,991 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Wen Congyang we...@cn.fujitsu.com
+ * Yang Hongyang yan...@cn.fujitsu.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include libxl_osdeps.h /* must come before any other headers */
+
+#include libxl_internal.h
+#include libxl_colo.h
+#include libxl_sr_stream_format.h
+
+enum {
+LIBXL_COLO_SETUPED,
+LIBXL_COLO_SUSPENDED,
+LIBXL_COLO_RESUMED,
+};
+
+typedef struct libxl__colo_restore_checkpoint_state 
libxl__colo_restore_checkpoint_state;
+struct libxl__colo_restore_checkpoint_state {
+libxl__domain_suspend_state dsps;
+libxl__logdirty_switch lds;
+libxl__colo_restore_state *crs;
+libxl__stream_write_state sws;
+int status;
+bool preresume;
+/* used for teardown */
+int teardown_devices;
+int saved_rc;
+
+void (*callback)(libxl__egc *,
+ libxl__colo_restore_checkpoint_state *,
+ int);
+};
+
+
+static void libxl__colo_restore_domain_resume_callback(void *data);
+static

[Xen-devel] [PATCH v8 --for 4.6 COLO 14/25] libxc/restore: send dirty bitmap to primary when checkpoint under colo

2015-07-15 Thread Yang Hongyang

Send dirty bitmap to primary when checkpoint under colo.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 tools/libxc/xc_sr_common.h  |   4 ++
 tools/libxc/xc_sr_restore.c | 120 +++-
 2 files changed, 123 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index c5603ff..7fc2021 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -213,6 +213,10 @@ struct xc_sr_context
 struct xc_sr_restore_ops ops;
 struct restore_callbacks *callbacks;
 
+int send_fd;
+unsigned long p2m_size;
+xc_hypercall_buffer_t dirty_bitmap_hbuf;
+
 /* From Image Header. */
 uint32_t format_version;
 
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 696bf30..8b13d8d 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -409,6 +409,92 @@ static int handle_page_data(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
 return rc;
 }
 
+/*
+ * Send dirty_bitmap to primary.
+ */
+static int send_dirty_bitmap(struct xc_sr_context *ctx)
+{
+xc_interface *xch = ctx-xch;
+int rc = -1;
+unsigned count, written;
+uint64_t i, *pfns = NULL;
+struct iovec *iov = NULL;
+xc_shadow_op_stats_t stats = { 0, ctx-save.p2m_size };
+struct xc_sr_record rec =
+{
+.type = REC_TYPE_DIRTY_BITMAP,
+};
+DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
+ctx-save.dirty_bitmap_hbuf);
+
+if ( xc_shadow_control(
+ xch, ctx-domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
+ HYPERCALL_BUFFER(dirty_bitmap), ctx-restore.p2m_size,
+ NULL, 0, stats) != ctx-restore.p2m_size )
+{
+PERROR(Failed to retrieve logdirty bitmap);
+goto err;
+}
+
+for ( i = 0, count = 0; i  ctx-restore.p2m_size; i++ )
+{
+if ( test_bit(i, dirty_bitmap) )
+count++;
+}
+
+
+pfns = malloc(count * sizeof(*pfns));
+if ( !pfns )
+{
+ERROR(Unable to allocate %zu bytes of memory for dirty pfn list,
+  count * sizeof(*pfns));
+goto err;
+}
+
+for ( i = 0, written = 0; i  ctx-restore.p2m_size; ++i )
+{
+if ( !test_bit(i, dirty_bitmap) )
+continue;
+
+if ( written  count )
+{
+ERROR(Dirty pfn list exceed);
+goto err;
+}
+
+pfns[written++] = i;
+}
+
+/* iovec[] for writev(). */
+iov = malloc(3 * sizeof(*iov));
+if ( !iov )
+{
+ERROR(Unable to allocate memory for sending dirty bitmap);
+goto err;
+}
+
+rec.length = count * sizeof(*pfns);
+
+iov[0].iov_base = rec.type;
+iov[0].iov_len = sizeof(rec.type);
+
+iov[1].iov_base = rec.length;
+iov[1].iov_len = sizeof(rec.length);
+
+iov[2].iov_base = pfns;
+iov[2].iov_len = count * sizeof(*pfns);
+
+if ( writev_exact(ctx-restore.send_fd, iov, 3) )
+{
+PERROR(Failed to write dirty bitmap to stream);
+goto err;
+}
+
+rc = 0;
+ err:
+return rc;
+}
+
 static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec);
 static int handle_checkpoint(struct xc_sr_context *ctx)
 {
@@ -494,7 +580,9 @@ static int handle_checkpoint(struct xc_sr_context *ctx)
 
 #undef HANDLE_CALLBACK_RETURN_VALUE
 
-/* TODO: send dirty bitmap to primary */
+rc = send_dirty_bitmap(ctx);
+if ( rc )
+goto err;
 }
 
  err:
@@ -566,6 +654,21 @@ static int setup(struct xc_sr_context *ctx)
 {
 xc_interface *xch = ctx-xch;
 int rc;
+DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
+ctx-restore.dirty_bitmap_hbuf);
+
+if ( ctx-restore.checkpointed == MIG_STREAM_COLO )
+{
+dirty_bitmap = xc_hypercall_buffer_alloc_pages(xch, dirty_bitmap,
+NRPAGES(bitmap_size(ctx-restore.p2m_size)));
+
+if ( !dirty_bitmap )
+{
+ERROR(Unable to allocate memory for dirty bitmap);
+rc = -1;
+goto err;
+}
+}
 
 rc = ctx-restore.ops.setup(ctx);
 if ( rc )
@@ -599,10 +702,15 @@ static void cleanup(struct xc_sr_context *ctx)
 {
 xc_interface *xch = ctx-xch;
 unsigned i;
+DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
+ctx-save.dirty_bitmap_hbuf);
 
 for ( i = 0; i  ctx-restore.buffered_rec_num; i++ )
 free(ctx-restore.buffered_records[i].data);
 
+if ( ctx-restore.checkpointed == MIG_STREAM_COLO )
+xc_hypercall_buffer_free_pages(xch, dirty_bitmap,
+   NRPAGES(bitmap_size(ctx-save.p2m_size)));
 free(ctx-restore.buffered_records);
 free(ctx-restore.populated_pfns);
 if ( ctx-restore.ops.cleanup(ctx) )
@@ -713,6 +821,7 @@

[Xen-devel] [PATCH v8 --for 4.6 COLO 16/25] libxc/save: support COLO save

2015-07-15 Thread Yang Hongyang

After suspend primary vm, get dirty bitmap on secondary vm,
and send pages both dirty on primary/secondary to secondary.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
CC: Andrew Cooper andrew.coop...@citrix.com
---
 tools/libxc/xc_sr_common.h |   2 +
 tools/libxc/xc_sr_save.c   | 104 +++--
 2 files changed, 102 insertions(+), 4 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 7fc2021..5f2d99b 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -183,6 +183,8 @@ struct xc_sr_context
 {
 struct /* Save data. */
 {
+int recv_fd;
+
 struct xc_sr_save_ops ops;
 struct save_callbacks *callbacks;
 
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index d12e5b1..6f13706 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -515,6 +515,58 @@ static int send_memory_live(struct xc_sr_context *ctx)
 return rc;
 }
 
+static int merge_secondary_dirty_bitmap(struct xc_sr_context *ctx)
+{
+xc_interface *xch = ctx-xch;
+struct xc_sr_record rec;
+uint64_t *pfns = NULL;
+uint64_t pfn;
+unsigned count, i;
+int rc;
+DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
+ctx-save.dirty_bitmap_hbuf);
+
+rc = read_record(ctx, ctx-save.recv_fd, rec);
+if ( rc )
+goto err;
+
+if ( rec.type != REC_TYPE_DIRTY_BITMAP )
+{
+PERROR(Expect dirty bitmap record, but received %u, rec.type );
+rc = -1;
+goto err;
+}
+
+if ( rec.length % sizeof(*pfns) )
+{
+PERROR(Invalid dirty bitmap record length %u, rec.length );
+rc = -1;
+goto err;
+}
+
+count = rec.length / sizeof(*pfns);
+pfns = rec.data;
+
+for ( i = 0; i  count; i++ )
+{
+pfn = pfns[i];
+if (pfn  ctx-save.p2m_size)
+{
+PERROR(Invalid pfn %#lx, pfn );
+rc = -1;
+goto err;
+}
+
+set_bit(pfn, dirty_bitmap);
+}
+
+rc = 0;
+
+ err:
+free(rec.data);
+return rc;
+}
+
 /*
  * Suspend the domain and send dirty memory.
  * This is the last iteration of the live migration and the
@@ -555,6 +607,16 @@ static int suspend_and_send_dirty(struct xc_sr_context 
*ctx)
 
 bitmap_or(dirty_bitmap, ctx-save.deferred_pages, ctx-save.p2m_size);
 
+if ( !ctx-save.live  ctx-save.checkpointed == MIG_STREAM_COLO )
+{
+rc = merge_secondary_dirty_bitmap(ctx);
+if ( rc )
+{
+PERROR(Failed to get secondary vm's dirty pages);
+goto out;
+}
+}
+
 rc = send_dirty_pages(ctx, stats.dirty_count + 
ctx-save.nr_deferred_pages);
 if ( rc )
 goto out;
@@ -784,11 +846,42 @@ static int save(struct xc_sr_context *ctx, uint16_t 
guest_type)
 if ( rc )
 goto err;
 
-ctx-save.callbacks-postcopy(ctx-save.callbacks-data);
+if ( ctx-save.checkpointed == MIG_STREAM_COLO )
+{
+rc = 
ctx-save.callbacks-checkpoint(ctx-save.callbacks-data);
+if ( !rc )
+{
+rc = -1;
+goto err;
+}
+}
 
-rc = ctx-save.callbacks-checkpoint(ctx-save.callbacks-data);
-if ( rc = 0 )
-ctx-save.checkpointed = false;
+rc = ctx-save.callbacks-postcopy(ctx-save.callbacks-data);
+if ( !rc )
+{
+rc = -1;
+goto err;
+}
+
+if ( ctx-save.checkpointed == MIG_STREAM_COLO )
+{
+rc = ctx-save.callbacks-should_checkpoint(
+ctx-save.callbacks-data);
+if ( rc = 0 )
+ctx-save.checkpointed = false;
+}
+else if ( ctx-save.checkpointed == MIG_STREAM_REMUS )
+{
+rc = 
ctx-save.callbacks-checkpoint(ctx-save.callbacks-data);
+if ( rc = 0 )
+ctx-save.checkpointed = false;
+}
+else
+{
+ERROR(Unknown checkpointed stream);
+rc = -1;
+goto err;
+}
 }
 } while ( ctx-save.checkpointed );
 
@@ -835,6 +928,7 @@ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t 
dom,
 ctx.save.live  = !!(flags  XCFLAGS_LIVE);
 ctx.save.debug = !!(flags  XCFLAGS_DEBUG);
 ctx.save.checkpointed = checkpointed_stream;
+ctx.save.recv_fd = back_fd;
 
 /*
  * TODO: Find some time to better tweak the live migration algorithm.
@@ -850,6 +944,8 @@ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t 
dom,
 assert(callbacks-switch_qemu_logdirty);
 if (

[Xen-devel] [PATCH v8 --for 4.6 COLO 01/25] docs: add colo readme

2015-07-15 Thread Yang Hongyang

add colo readme, refer to
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
Acked-by: Ian Campbell ian.campb...@citrix.com
---
 docs/README.colo | 9 +
 1 file changed, 9 insertions(+)
 create mode 100644 docs/README.colo

diff --git a/docs/README.colo b/docs/README.colo
new file mode 100644
index 000..466eb72
--- /dev/null
+++ b/docs/README.colo
@@ -0,0 +1,9 @@
+COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
+project is a high availability solution. Both primary VM (PVM) and secondary VM
+(SVM) run in parallel. They receive the same request from client, and generate
+response in parallel too. If the response packets from PVM and SVM are
+identical, they are released immediately. Otherwise, a VM checkpoint (on 
demand)
+is conducted.
+
+See the website at http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
+for details.
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v8 --for 4.6 COLO 08/25] tools/libxl: handle colo_context records in a libxl migration v2 read stream

2015-07-15 Thread Yang Hongyang

Read a colo_context and call stream-checkpoint_callback to handle it.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 tools/libxl/libxl_internal.h|  3 +++
 tools/libxl/libxl_stream_read.c | 51 +
 2 files changed, 54 insertions(+)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 05cee04..1be2a4a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3369,6 +3369,7 @@ struct libxl__stream_read_state {
 int rc;
 bool running;
 bool in_checkpoint;
+bool in_colo_context;
 libxl__save_helper_state shs;
 libxl__conversion_helper_state chs;
 
@@ -3396,6 +3397,8 @@ _hidden void libxl__stream_read_start(libxl__egc *egc,
   libxl__stream_read_state *stream);
 _hidden void libxl__stream_read_start_checkpoint(libxl__egc *egc,
  libxl__stream_read_state 
*stream);
+_hidden void libxl__stream_read_colo_context(libxl__egc *egc,
+ libxl__stream_read_state *stream);
 _hidden void libxl__stream_read_abort(libxl__egc *egc,
   libxl__stream_read_state *stream, int 
rc);
 static inline bool
diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index b924f05..ab47251 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -152,6 +152,13 @@ static void write_emulator_done(libxl__egc *egc,
 libxl__datacopier_state *dc,
 int rc, int onwrite, int errnoval);
 
+/* Handlers for colo context mini-loop */
+static void handle_colo_context(libxl__egc *egc,
+libxl__stream_read_state *stream,
+libxl__sr_record_buf *rec);
+static void colo_context_done(libxl__egc *egc,
+  libxl__stream_read_state *stream, int rc);
+
 /*- Helpers -*/
 
 /* Helper to set up reading some data from the stream. */
@@ -569,6 +576,15 @@ static bool process_record(libxl__egc *egc,
 checkpoint_done(egc, stream, 0);
 break;
 
+case REC_TYPE_COLO_CONTEXT:
+if (!stream-in_colo_context) {
+LOG(ERROR, Unexpected COLO_CONTEXT record in stream);
+rc = ERROR_FAIL;
+goto err;
+}
+handle_colo_context(egc, stream, rec);
+break;
+
 default:
 LOG(ERROR, Unrecognised record 0x%08x, rec-hdr.type);
 rc = ERROR_FAIL;
@@ -678,6 +694,11 @@ static void stream_complete(libxl__egc *egc,
 return;
 }
 
+if (stream-in_colo_context) {
+colo_context_done(egc, stream, rc);
+return;
+}
+
 if (!stream-rc)
 stream-rc = rc;
 stream_done(egc, stream);
@@ -794,6 +815,36 @@ static void check_all_finished(libxl__egc *egc,
 stream-completion_callback(egc, stream, stream-rc);
 }
 
+/*- COLO context handlers -*/
+
+void libxl__stream_read_colo_context(libxl__egc *egc,
+ libxl__stream_read_state *stream)
+{
+assert(stream-running);
+assert(!stream-in_checkpoint);
+assert(!stream-in_colo_context);
+stream-in_colo_context = true;
+
+setup_read_record(egc, stream);
+}
+
+static void handle_colo_context(libxl__egc *egc,
+libxl__stream_read_state *stream,
+libxl__sr_record_buf *rec)
+{
+libxl_sr_colo_context *colo_context = rec-body;
+
+colo_context_done(egc, stream, colo_context-id);
+}
+
+static void colo_context_done(libxl__egc *egc,
+  libxl__stream_read_state *stream, int rc)
+{
+assert(stream-in_colo_context);
+stream-in_colo_context = false;
+stream-checkpoint_callback(egc, stream, rc);
+}
+
 /*
  * Local variables:
  * mode: C
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v8 --for 4.6 COLO 05/25] tools/libxl: add back channel support to write stream

2015-07-15 Thread Yang Hongyang

Add back channel support to write stream. If the write stream is
a back channel stream, this means the write stream is used by
Secondary to send some records back.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 tools/libxl/libxl_dom_save.c |  1 +
 tools/libxl/libxl_internal.h |  1 +
 tools/libxl/libxl_stream_write.c | 16 
 3 files changed, 18 insertions(+)

diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index 9b7159f..25813ce 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -445,6 +445,7 @@ void libxl__domain_save(libxl__egc *egc, 
libxl__domain_save_state *dss)
 dss-sws.ao  = dss-ao;
 dss-sws.dss = dss;
 dss-sws.fd  = dss-fd;
+dss-sws.back_channel = false;
 dss-sws.completion_callback = stream_done;
 
 libxl__stream_write_start(egc, dss-sws);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 9c81d8d..a83d6a5 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2989,6 +2989,7 @@ struct libxl__stream_write_state {
 libxl__ao *ao;
 libxl__domain_save_state *dss;
 int fd;
+bool back_channel;
 void (*completion_callback)(libxl__egc *egc,
 libxl__stream_write_state *sws,
 int rc);
diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
index 16f667a..df55277 100644
--- a/tools/libxl/libxl_stream_write.c
+++ b/tools/libxl/libxl_stream_write.c
@@ -47,6 +47,13 @@
  *  - Toolstack record
  *  - if (hvm), Qemu record
  *  - Checkpoint end record
+ *
+ * For back channel stream:
+ * - libxl__stream_write_start()
+ *- Set up the stream to running state
+ *
+ * - Add a new API to write the record. When the record is written
+ *   out, call stream-checkpoint_callback() to return.
  */
 
 /* Success/error/cleanup handling. */
@@ -178,6 +185,9 @@ void libxl__stream_write_start(libxl__egc *egc,
 
 stream-running = true;
 
+if (stream-back_channel)
+return;
+
 dc-ao= ao;
 dc-readfd= -1;
 dc-writewhat = save/migration stream;
@@ -207,6 +217,7 @@ void libxl__stream_write_start_checkpoint(libxl__egc *egc,
 {
 assert(stream-running);
 assert(!stream-in_checkpoint);
+assert(!stream-back_channel);
 stream-in_checkpoint = true;
 
 write_toolstack_record(egc, stream);
@@ -500,6 +511,11 @@ static void stream_done(libxl__egc *egc,
 assert(stream-running);
 stream-running = false;
 
+if (stream-back_channel) {
+stream-completion_callback(egc, stream, stream-rc);
+return;
+}
+
 if (stream-emu_carefd)
 libxl__carefd_close(stream-emu_carefd);
 free(stream-emu_body);
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v8 --for 4.6 COLO 20/25] COLO proxy: implement setup/teardown of COLO proxy module

2015-07-15 Thread Yang Hongyang

setup/teardown of COLO proxy module.
we use netlink to communicate with proxy module.
About colo-proxy module:
https://lkml.org/lkml/2015/6/18/32
How to use:
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 tools/libxl/Makefile   |   1 +
 tools/libxl/libxl_colo.h   |   2 +
 tools/libxl/libxl_colo_proxy.c | 210 +
 tools/libxl/libxl_internal.h   |  12 +++
 4 files changed, 225 insertions(+)
 create mode 100644 tools/libxl/libxl_colo_proxy.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index e91ae79..d7a3540 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -65,6 +65,7 @@ endif
 LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
 LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o
 LIBXL_OBJS-y += libxl_colo_qdisk.o
+LIBXL_OBJS-y += libxl_colo_proxy.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
index 49a430b..46ca4cf 100644
--- a/tools/libxl/libxl_colo.h
+++ b/tools/libxl/libxl_colo.h
@@ -34,4 +34,6 @@ extern void libxl__colo_save_teardown(libxl__egc *egc,
   libxl__colo_save_state *css,
   int rc);
 
+extern int colo_proxy_setup(libxl__colo_proxy_state *cps);
+extern void colo_proxy_teardown(libxl__colo_proxy_state *cps);
 #endif
diff --git a/tools/libxl/libxl_colo_proxy.c b/tools/libxl/libxl_colo_proxy.c
new file mode 100644
index 000..9f1243e
--- /dev/null
+++ b/tools/libxl/libxl_colo_proxy.c
@@ -0,0 +1,210 @@
+/*
+ * Copyright (C) 2015 FUJITSU LIMITED
+ * Author: Yang Hongyang yan...@cn.fujitsu.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include libxl_osdeps.h /* must come before any other headers */
+
+#include libxl_internal.h
+#include libxl_colo.h
+#include linux/netlink.h
+
+#define NETLINK_COLO 28
+
+enum colo_netlink_op {
+COLO_QUERY_CHECKPOINT = (NLMSG_MIN_TYPE + 1),
+COLO_CHECKPOINT,
+COLO_FAILOVER,
+COLO_PROXY_INIT,
+COLO_PROXY_RESET, /* UNUSED, will be used for continuous FT */
+};
+
+/* = colo-proxy: helper functions == */
+
+static int colo_proxy_send(libxl__colo_proxy_state *cps, uint8_t *buff, 
uint64_t size, int type)
+{
+struct sockaddr_nl sa;
+struct nlmsghdr msg;
+struct iovec iov;
+struct msghdr mh;
+int ret;
+
+STATE_AO_GC(cps-ao);
+
+memset(sa, 0, sizeof(sa));
+sa.nl_family = AF_NETLINK;
+sa.nl_pid = 0;
+sa.nl_groups = 0;
+
+msg.nlmsg_len = NLMSG_SPACE(0);
+msg.nlmsg_flags = NLM_F_REQUEST;
+if (type == COLO_PROXY_INIT) {
+msg.nlmsg_flags |= NLM_F_ACK;
+}
+msg.nlmsg_seq = 0;
+/* This is untrusty */
+msg.nlmsg_pid = cps-index;
+msg.nlmsg_type = type;
+
+iov.iov_base = msg;
+iov.iov_len = msg.nlmsg_len;
+
+mh.msg_name = sa;
+mh.msg_namelen = sizeof(sa);
+mh.msg_iov = iov;
+mh.msg_iovlen = 1;
+mh.msg_control = NULL;
+mh.msg_controllen = 0;
+mh.msg_flags = 0;
+
+ret = sendmsg(cps-sock_fd, mh, 0);
+if (ret = 0) {
+LOG(ERROR, can't send msg to kernel by netlink: %s,
+strerror(errno));
+}
+
+return ret;
+}
+
+/* error: return -1, otherwise return 0 */
+static int64_t colo_proxy_recv(libxl__colo_proxy_state *cps, uint8_t **buff, 
int flags)
+{
+struct sockaddr_nl sa;
+struct iovec iov;
+struct msghdr mh = {
+.msg_name = sa,
+.msg_namelen = sizeof(sa),
+.msg_iov = iov,
+.msg_iovlen = 1,
+};
+uint32_t size = 16384;
+int64_t len = 0;
+int ret;
+
+STATE_AO_GC(cps-ao);
+uint8_t *tmp = libxl__malloc(NOGC, size);
+
+iov.iov_base = tmp;
+iov.iov_len = size;
+next:
+   ret = recvmsg(cps-sock_fd, mh, flags);
+if (ret = 0) {
+goto out;
+}
+
+len += ret;
+if (mh.msg_flags  MSG_TRUNC) {
+size += 16384;
+tmp = libxl__realloc(NOGC, tmp, size);
+iov.iov_base = tmp + len;
+iov.iov_len = size - len;
+goto next;
+}
+
+*buff = tmp;
+return len;
+
+out:
+free(tmp);
+*buff = NULL;
+return ret;
+}
+
+/* = colo-proxy: setup and teardown == */
+
+int colo_proxy_setup(libxl__colo_proxy_state *cps)
+{
+int skfd = 0;
+struct sockaddr_nl

[Xen-devel] [PATCH v8 --for 4.6 COLO 24/25] setup and control colo proxy on secondary side

2015-07-15 Thread Yang Hongyang

setup and control colo proxy on secondary side

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 tools/libxl/libxl_colo_restore.c | 28 +---
 tools/libxl/libxl_internal.h |  3 +++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
index 96ea0b9..da546f9 100644
--- a/tools/libxl/libxl_colo_restore.c
+++ b/tools/libxl/libxl_colo_restore.c
@@ -49,9 +49,11 @@ static void 
libxl__colo_restore_domain_checkpoint_callback(void *data);
 static void libxl__colo_restore_domain_should_checkpoint_callback(void *data);
 static void libxl__colo_restore_domain_suspend_callback(void *data);
 
+extern const libxl__checkpoint_device_instance_ops colo_restore_device_nic;
 extern const libxl__checkpoint_device_instance_ops colo_restore_device_qdisk;
 
 static const libxl__checkpoint_device_instance_ops *colo_restore_ops[] = {
+colo_restore_device_nic,
 colo_restore_device_qdisk,
 NULL,
 };
@@ -151,8 +153,14 @@ static int 
init_device_subkind(libxl__checkpoint_devices_state *cds)
 int rc;
 STATE_AO_GC(cds-ao);
 
+rc = init_subkind_colo_nic(cds);
+if (rc) goto out;
+
 rc = init_subkind_qdisk(cds);
-if (rc)  goto out;
+if (rc) {
+cleanup_subkind_colo_nic(cds);
+goto out;
+}
 
 rc = 0;
 out:
@@ -164,6 +172,7 @@ static void 
cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
 /* cleanup device subkind-specific state in the libxl ctx */
 STATE_AO_GC(cds-ao);
 
+cleanup_subkind_colo_nic(cds);
 cleanup_subkind_qdisk(cds);
 }
 
@@ -351,6 +360,8 @@ static void colo_restore_teardown_done(libxl__egc *egc,
 if (crcs-teardown_devices)
 cleanup_device_subkind(cds);
 
+colo_proxy_teardown(crs-cps);
+
 rc = crcs-saved_rc;
 if (!rc) {
 crcs-callback = do_failover_done;
@@ -535,6 +546,8 @@ static void colo_restore_preresume_cb(libxl__egc *egc,
 goto out;
 }
 
+colo_proxy_preresume(crs-cps);
+
 colo_restore_resume_vm(egc, crcs);
 
 return;
@@ -571,6 +584,8 @@ static void colo_resume_vm_done(libxl__egc *egc,
 
 crcs-status = LIBXL_COLO_RESUMED;
 
+colo_proxy_postresume(crs-cps);
+
 /* avoid calling libxl__xc_domain_restore_done() more than once */
 if (crs-saved_cb) {
 dcs-callback = crs-saved_cb;
@@ -690,13 +705,20 @@ static void colo_setup_checkpoint_devices(libxl__egc *egc,
 
 STATE_AO_GC(crs-ao);
 
-/* TODO: nic support */
-cds-device_kind_flags = (1  LIBXL__DEVICE_KIND_VBD);
+cds-device_kind_flags = (1  LIBXL__DEVICE_KIND_VIF) |
+ (1  LIBXL__DEVICE_KIND_VBD);
 cds-callback = colo_restore_setup_cds_done;
 cds-ao = ao;
 cds-domid = crs-domid;
 cds-ops = colo_restore_ops;
 
+crs-cps.ao = ao;
+if (colo_proxy_setup(crs-cps)) {
+LOG(ERROR, COLO: failed to setup colo proxy for guest with domid %u,
+cds-domid);
+goto out;
+}
+
 if (init_device_subkind(cds))
 goto out;
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index d12297d..33a93a1 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3476,6 +3476,9 @@ struct libxl__colo_restore_state {
 
 /* private, used by qdisk block replication */
 bool qdisk_setuped;
+
+/* private, used by colo proxy */
+libxl__colo_proxy_state cps;
 };
 
 struct libxl__domain_create_state {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v8 --for 4.6 COLO 15/25] send store mfn and console mfn to xl before resuming secondary vm

2015-07-15 Thread Yang Hongyang

From: Wen Congyang we...@cn.fujitsu.com

We will call libxl__xc_domain_restore_done() to rebuild secondary vm. But
we need store mfn and console mfn when rebuilding secondary vm. So make
restore_results a function pointer in callback struct and struct
{save,restore}_callbacks, and use this callback to send store mfn and
console mfn to xl.

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
CC: Andrew Cooper andrew.coop...@citrix.com
---
 tools/libxc/include/xenguest.h | 8 
 tools/libxc/xc_sr_restore.c| 7 +--
 tools/libxl/libxl_colo_restore.c   | 5 -
 tools/libxl/libxl_create.c | 2 ++
 tools/libxl/libxl_save_msgs_gen.pl | 2 +-
 5 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 1e7e1bb..d7bdfb5 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -140,6 +140,14 @@ struct restore_callbacks {
  */
 int (*should_checkpoint)(void* data);
 
+/*
+ * callback to send store mfn and console mfn to xl
+ * if we want to resume vm before xc_domain_save()
+ * exits.
+ */
+void (*restore_results)(unsigned long store_mfn, unsigned long console_mfn,
+void *data);
+
 /* to be provided as the last argument to each callback function */
 void* data;
 };
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 8b13d8d..fe81acb 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -563,7 +563,9 @@ static int handle_checkpoint(struct xc_sr_context *ctx)
 if ( rc )
 goto err;
 
-/* TODO: call restore_results */
+ctx-restore.callbacks-restore_results(ctx-restore.xenstore_gfn,
+ctx-restore.console_gfn,
+ctx-restore.callbacks-data);
 
 /* Resume secondary vm */
 ret = ctx-restore.callbacks-postcopy(ctx-restore.callbacks-data);
@@ -846,7 +848,8 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, 
uint32_t dom,
 /* this is COLO restore */
 assert(callbacks-suspend 
callbacks-postcopy 
-   callbacks-should_checkpoint);
+   callbacks-should_checkpoint 
+   callbacks-restore_results);
 }
 
 IPRINTF(In experimental %s, __func__);
diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c
index 5cda0b2..99f06ab 100644
--- a/tools/libxl/libxl_colo_restore.c
+++ b/tools/libxl/libxl_colo_restore.c
@@ -137,11 +137,6 @@ static void colo_resume_vm(libxl__egc *egc,
 return;
 }
 
-/*
- * TODO: get store mfn and console mfn
- *  We should call the callback restore_results in
- *  xc_domain_restore() before resuming the guest.
- */
 libxl__xc_domain_restore_done(egc, dcs, 0, 0, 0);
 
 return;
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index bf4b55d..34e9362 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1080,6 +1080,8 @@ static void domcreate_bootloader_done(libxl__egc *egc,
 dcs-srs.completion_callback = domcreate_stream_done;
 
 /* colo restore setup */
+callbacks-restore_results = libxl__srm_callout_callback_restore_results;
+
 if (checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) {
 crs-ao = ao;
 crs-domid = domid;
diff --git a/tools/libxl/libxl_save_msgs_gen.pl 
b/tools/libxl/libxl_save_msgs_gen.pl
index 7c9859b..e8943b9 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -29,7 +29,7 @@ our @msgs = (
 [  6, 'srcxA',  should_checkpoint, [] ],
 [  7, 'scxA',   switch_qemu_logdirty,  [qw(int domid
   unsigned enable)] ],
-[  8, 'r',  restore_results,   ['unsigned long', 'store_mfn',
+[  8, 'rcx',restore_results,   ['unsigned long', 'store_mfn',
   'unsigned long', 'console_mfn'] 
],
 [  9, 'srW',complete,  [qw(int retval
  int errnoval)] ],
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v8 --for 4.6 COLO 21/25] COLO proxy: preresume, postresume and checkpoint

2015-07-15 Thread Yang Hongyang

preresume, postresume and checkpoint

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 tools/libxl/libxl_colo.h   |  3 +++
 tools/libxl/libxl_colo_proxy.c | 57 ++
 2 files changed, 60 insertions(+)

diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
index 46ca4cf..4e5f02a 100644
--- a/tools/libxl/libxl_colo.h
+++ b/tools/libxl/libxl_colo.h
@@ -36,4 +36,7 @@ extern void libxl__colo_save_teardown(libxl__egc *egc,
 
 extern int colo_proxy_setup(libxl__colo_proxy_state *cps);
 extern void colo_proxy_teardown(libxl__colo_proxy_state *cps);
+extern void colo_proxy_preresume(libxl__colo_proxy_state *cps);
+extern void colo_proxy_postresume(libxl__colo_proxy_state *cps);
+extern int colo_proxy_checkpoint(libxl__colo_proxy_state *cps);
 #endif
diff --git a/tools/libxl/libxl_colo_proxy.c b/tools/libxl/libxl_colo_proxy.c
index 9f1243e..c8ff722 100644
--- a/tools/libxl/libxl_colo_proxy.c
+++ b/tools/libxl/libxl_colo_proxy.c
@@ -208,3 +208,60 @@ void colo_proxy_teardown(libxl__colo_proxy_state *cps)
 cps-sock_fd = -1;
 }
 }
+
+/* = colo-proxy: preresume, postresume and checkpoint == */
+
+void colo_proxy_preresume(libxl__colo_proxy_state *cps)
+{
+colo_proxy_send(cps, NULL, 0, COLO_CHECKPOINT);
+/* TODO: need to handle if the call fails... */
+}
+
+void colo_proxy_postresume(libxl__colo_proxy_state *cps)
+{
+/* nothing to do... */
+}
+
+
+typedef struct colo_msg {
+bool is_checkpoint;
+} colo_msg;
+
+/*
+do checkpoint: return 1
+error: return -1
+do not checkpoint: return 0
+*/
+int colo_proxy_checkpoint(libxl__colo_proxy_state *cps)
+{
+uint8_t *buff;
+int64_t size;
+struct nlmsghdr *h;
+struct colo_msg *m;
+int ret = -1;
+
+size = colo_proxy_recv(cps, buff, MSG_DONTWAIT);
+
+/* timeout, return no checkpoint message. */
+if (size = 0) {
+return 0;
+}
+
+h = (struct nlmsghdr *) buff;
+
+if (h-nlmsg_type == NLMSG_ERROR) {
+goto out;
+}
+
+if (h-nlmsg_len  NLMSG_LENGTH(sizeof(*m))) {
+goto out;
+}
+
+m = NLMSG_DATA(h);
+
+ret = m-is_checkpoint ? 1 : 0;
+
+out:
+free(buff);
+return ret;
+}
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v8 --for 4.6 COLO 12/25] primary vm suspend/resume/checkpoint code

2015-07-15 Thread Yang Hongyang

From: Wen Congyang we...@cn.fujitsu.com

We will do the following things again and again:
1. Suspend primary vm
   a. Suspend primary vm
   b. do postsuspend
   c. Read LIBXL_COLO_SVM_SUSPENDED sent by secondary
2. Resume primary vm
   a. Read LIBXL_COLO_SVM_READY from slave
   b. Do presume
   c. Resume primary vm
   d. Read LIBXL_COLO_SVM_RESUMED from slave
3. Wait a new checkpoint
   a. Wait a new checkpoint(not implemented)
   b. Send LIBXL_COLO_NEW_CHECKPOINT to slave

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 tools/libxl/Makefile  |   2 +-
 tools/libxl/libxl.c   |   6 +-
 tools/libxl/libxl_colo.h  |  10 +
 tools/libxl/libxl_colo_save.c | 569 ++
 tools/libxl/libxl_dom_save.c  |  13 +-
 tools/libxl/libxl_internal.h  | 167 +++--
 tools/libxl/libxl_types.idl   |   1 +
 7 files changed, 689 insertions(+), 79 deletions(-)
 create mode 100644 tools/libxl/libxl_colo_save.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 97b3753..71bf7a2 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -63,7 +63,7 @@ LIBXL_OBJS-y += libxl_no_convert_callout.o
 endif
 
 LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o
-LIBXL_OBJS-y += libxl_colo_restore.o
+LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 5502709..c040909 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -17,6 +17,7 @@
 #include libxl_osdeps.h
 
 #include libxl_internal.h
+#include libxl_colo.h
 
 #define PAGE_TO_MEMKB(pages) ((pages) * 4)
 #define BACKEND_STRING_SIZE 5
@@ -845,7 +846,10 @@ int libxl_domain_remus_start(libxl_ctx *ctx, 
libxl_domain_remus_info *info,
 assert(info);
 
 /* Point of no return */
-libxl__remus_setup(egc, dss-rs);
+if (libxl_defbool_val(info-colo))
+libxl__colo_save_setup(egc, dss-css);
+else
+libxl__remus_setup(egc, dss-rs);
 return AO_INPROGRESS;
 
  out:
diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h
index 54dc835..49a430b 100644
--- a/tools/libxl/libxl_colo.h
+++ b/tools/libxl/libxl_colo.h
@@ -24,4 +24,14 @@ extern void libxl__colo_restore_teardown(libxl__egc *egc,
  libxl__colo_restore_state *crs,
  int rc);
 
+extern void libxl__colo_save_domain_suspend_callback(void *data);
+extern void libxl__colo_save_domain_checkpoint_callback(void *data);
+extern void libxl__colo_save_domain_resume_callback(void *data);
+extern void libxl__colo_save_domain_should_checkpoint_callback(void *data);
+extern void libxl__colo_save_setup(libxl__egc *egc,
+   libxl__colo_save_state *css);
+extern void libxl__colo_save_teardown(libxl__egc *egc,
+  libxl__colo_save_state *css,
+  int rc);
+
 #endif
diff --git a/tools/libxl/libxl_colo_save.c b/tools/libxl/libxl_colo_save.c
new file mode 100644
index 000..f0ab565
--- /dev/null
+++ b/tools/libxl/libxl_colo_save.c
@@ -0,0 +1,569 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Wen Congyang we...@cn.fujitsu.com
+ * Yang Hongyang yan...@cn.fujitsu.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include libxl_osdeps.h /* must come before any other headers */
+
+#include libxl_internal.h
+#include libxl_colo.h
+
+static const libxl__checkpoint_device_instance_ops *colo_ops[] = {
+NULL,
+};
+
+/* = helper functions = */
+static int init_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+/* init device subkind-specific state in the libxl ctx */
+int rc;
+STATE_AO_GC(cds-ao);
+
+rc = 0;
+return rc;
+}
+
+static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
+{
+/* cleanup device subkind-specific state in the libxl ctx */
+STATE_AO_GC(cds-ao);
+}
+
+/* = colo: setup save environment = */
+static void colo_save_setup_done(libxl__egc *egc,
+ libxl__checkpoint_devices_state *cds,
+ int rc);
+static void colo_save_setup_failed(libxl__egc *egc,
+

[Xen-devel] [PATCH v8 --for 4.6 COLO 23/25] setup and control colo proxy on primary side

2015-07-15 Thread Yang Hongyang

setup and control colo proxy on primary side

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 tools/libxl/libxl_colo_save.c | 124 +++---
 tools/libxl/libxl_internal.h  |   1 +
 2 files changed, 117 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl_colo_save.c b/tools/libxl/libxl_colo_save.c
index 1245da7..50a880b 100644
--- a/tools/libxl/libxl_colo_save.c
+++ b/tools/libxl/libxl_colo_save.c
@@ -19,9 +19,11 @@
 #include libxl_internal.h
 #include libxl_colo.h
 
+extern const libxl__checkpoint_device_instance_ops colo_save_device_nic;
 extern const libxl__checkpoint_device_instance_ops colo_save_device_qdisk;
 
 static const libxl__checkpoint_device_instance_ops *colo_ops[] = {
+colo_save_device_nic,
 colo_save_device_qdisk,
 NULL,
 };
@@ -33,9 +35,15 @@ static int 
init_device_subkind(libxl__checkpoint_devices_state *cds)
 int rc;
 STATE_AO_GC(cds-ao);
 
-rc = init_subkind_qdisk(cds);
+rc = init_subkind_colo_nic(cds);
 if (rc) goto out;
 
+rc = init_subkind_qdisk(cds);
+if (rc) {
+cleanup_subkind_colo_nic(cds);
+goto out;
+}
+
 rc = 0;
 out:
 return rc;
@@ -46,6 +54,7 @@ static void 
cleanup_device_subkind(libxl__checkpoint_devices_state *cds)
 /* cleanup device subkind-specific state in the libxl ctx */
 STATE_AO_GC(cds-ao);
 
+cleanup_subkind_colo_nic(cds);
 cleanup_subkind_qdisk(cds);
 }
 
@@ -76,9 +85,16 @@ void libxl__colo_save_setup(libxl__egc *egc, 
libxl__colo_save_state *css)
 css-svm_running = false;
 css-paused = true;
 css-qdisk_setuped = false;
+libxl__ev_child_init(css-child);
 
-/* TODO: nic support */
-cds-device_kind_flags = (1  LIBXL__DEVICE_KIND_VBD);
+if (dss-remus-netbufscript)
+css-colo_proxy_script = libxl__strdup(gc, dss-remus-netbufscript);
+else
+css-colo_proxy_script = GCSPRINTF(%s/colo-proxy-setup,
+   libxl__xen_script_dir_path());
+
+cds-device_kind_flags = (1  LIBXL__DEVICE_KIND_VIF) |
+ (1  LIBXL__DEVICE_KIND_VBD);
 cds-ops = colo_ops;
 cds-callback = colo_save_setup_done;
 cds-ao = ao;
@@ -88,6 +104,12 @@ void libxl__colo_save_setup(libxl__egc *egc, 
libxl__colo_save_state *css)
 css-srs.fd = css-recv_fd;
 css-srs.back_channel = true;
 libxl__stream_read_start(egc, css-srs);
+css-cps.ao = ao;
+if (colo_proxy_setup(css-cps)) {
+LOG(ERROR, COLO: failed to setup colo proxy for guest with domid %u,
+cds-domid);
+goto out;
+}
 
 if (init_device_subkind(cds))
 goto out;
@@ -162,6 +184,7 @@ static void colo_teardown_done(libxl__egc *egc,
 libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css);
 
 cleanup_device_subkind(cds);
+colo_proxy_teardown(css-cps);
 dss-callback(egc, dss, rc);
 }
 
@@ -378,6 +401,8 @@ static void colo_read_svm_ready_done(libxl__egc *egc,
 goto out;
 }
 
+colo_proxy_preresume(css-cps);
+
 css-svm_running = true;
 css-cds.callback = colo_preresume_cb;
 libxl__checkpoint_devices_preresume(egc, css-cds);
@@ -454,6 +479,8 @@ static void colo_read_svm_resumed_done(libxl__egc *egc,
 goto out;
 }
 
+colo_proxy_postresume(css-cps);
+
 ok = 1;
 
 out:
@@ -462,6 +489,91 @@ out:
 
 
 /* = colo: wait new checkpoint = */
+
+static void colo_start_new_checkpoint(libxl__egc *egc,
+  libxl__checkpoint_devices_state *cds,
+  int rc);
+static void colo_proxy_async_wait_for_checkpoint(libxl__colo_save_state *css);
+static void colo_proxy_async_call_done(libxl__egc *egc,
+   libxl__ev_child *child,
+   int pid,
+   int status);
+
+static void colo_proxy_async_call(libxl__egc *egc,
+  libxl__colo_save_state *css,
+  void func(libxl__colo_save_state *),
+  libxl__ev_child_callback callback)
+{
+int pid = -1, rc;
+
+STATE_AO_GC(css-cds.ao);
+
+/* Fork and call */
+pid = libxl__ev_child_fork(gc, css-child, callback);
+if (pid == -1) {
+LOG(ERROR, unable to fork);
+rc = ERROR_FAIL;
+goto out;
+}
+
+if (!pid) {
+/* child */
+func(css);
+/* notreached */
+abort();
+}
+
+return;
+
+out:
+callback(egc, css-child, -1, 1);
+}
+
+static void colo_proxy_wait_for_checkpoint(libxl__egc *egc,
+   libxl__colo_save_state *css)
+{
+colo_proxy_async_call(egc, css,
+  colo_proxy_async_wait_for_checkpoint,
+  colo_proxy_async_call_done);
+}
+
+static void

[Xen-devel] [PATCH v8 --for 4.6 COLO 22/25] COLO nic: implement COLO nic subkind

2015-07-15 Thread Yang Hongyang

implement COLO nic subkind.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 tools/hotplug/Linux/Makefile |   1 +
 tools/hotplug/Linux/colo-proxy-setup | 131 ++
 tools/libxl/Makefile |   1 +
 tools/libxl/libxl_colo_nic.c | 320 +++
 tools/libxl/libxl_internal.h |   5 +
 tools/libxl/libxl_types.idl  |   1 +
 6 files changed, 459 insertions(+)
 create mode 100755 tools/hotplug/Linux/colo-proxy-setup
 create mode 100644 tools/libxl/libxl_colo_nic.c

diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index bc8ee5e..71b6475 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -26,6 +26,7 @@ XEN_SCRIPTS += block-iscsi
 XEN_SCRIPTS += block-tap
 XEN_SCRIPTS += block-drbd-probe
 XEN_SCRIPTS += $(XEN_SCRIPTS-y)
+XEN_SCRIPTS += colo-proxy-setup
 
 SUBDIRS-$(CONFIG_SYSTEMD) += systemd
 
diff --git a/tools/hotplug/Linux/colo-proxy-setup 
b/tools/hotplug/Linux/colo-proxy-setup
new file mode 100755
index 000..3096a9c
--- /dev/null
+++ b/tools/hotplug/Linux/colo-proxy-setup
@@ -0,0 +1,131 @@
+#! /bin/bash
+
+dir=$(dirname $0)
+. $dir/xen-hotplug-common.sh
+. $dir/hotplugpath.sh
+. $dir/xen-network-ft.sh
+
+findCommand $@
+
+if [ $command != setup -a  $command != teardown ]
+then
+echo Invalid command: $command
+log err Invalid command: $command
+exit 1
+fi
+
+evalVariables $@
+
+: ${vifname:?}
+: ${forwarddev:?}
+: ${mode:?}
+: ${index:?}
+: ${bridge:?}
+
+forwardbr=colobr0
+
+if [ $mode != primary -a $mode != secondary ]
+then
+echo Invalid mode: $mode
+log err Invalid mode: $mode
+exit 1
+fi
+
+if [ $index -lt 0 ] || [ $index -gt 100 ]; then
+echo index overflow
+exit 1
+fi
+
+function setup_primary()
+{
+do_without_error tc qdisc add dev $vifname root handle 1: prio
+do_without_error tc filter add dev $vifname parent 1: protocol ip prio 10 \
+u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev 
$forwarddev
+do_without_error tc filter add dev $vifname parent 1: protocol arp prio 11 
\
+u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev 
$forwarddev
+do_without_error tc filter add dev $vifname parent 1: protocol ipv6 prio \
+12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror \
+dev $forwarddev
+
+do_without_error modprobe nf_conntrack_ipv4
+do_without_error modprobe xt_PMYCOLO sec_dev=$forwarddev
+
+do_without_error iptables -t mangle -I PREROUTING -m physdev --physdev-in \
+$vifname -j PMYCOLO --index $index
+do_without_error ip6tables -t mangle -I PREROUTING -m physdev --physdev-in 
\
+$vifname -j PMYCOLO --index $index
+do_without_error arptables -I INPUT -i $forwarddev -j MARK --set-mark 
$index
+}
+
+function teardown_primary()
+{
+do_without_error tc filter del dev $vifname parent 1: protocol ip prio 10 
u32 match u32 \
+0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+do_without_error tc filter del dev $vifname parent 1: protocol arp prio 11 
u32 match u32 \
+0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+do_without_error tc filter del dev $vifname parent 1: protocol ipv6 prio 
12 u32 match u32 \
+0 0 flowid 1:2 action mirred egress mirror dev $forwarddev
+do_without_error tc qdisc del dev $vifname root handle 1: prio
+
+do_without_error iptables -t mangle -F
+do_without_error ip6tables -t mangle -F
+do_without_error arptables -F
+do_without_error rmmod xt_PMYCOLO
+}
+
+function setup_secondary()
+{
+do_without_error brctl delif $bridge $vifname
+do_without_error brctl addbr $forwardbr
+do_without_error brctl addif $forwardbr $vifname
+do_without_error brctl addif $forwardbr $forwarddev
+do_without_error modprobe xt_SECCOLO
+
+do_without_error iptables -t mangle -I PREROUTING -m physdev --physdev-in \
+$vifname -j SECCOLO --index $index
+do_without_error ip6tables -t mangle -I PREROUTING -m physdev --physdev-in 
\
+$vifname -j SECCOLO --index $index
+}
+
+function teardown_secondary()
+{
+do_without_error brctl delif $forwardbr $forwarddev
+do_without_error brctl delif $forwardbr $vifname
+do_without_error brctl delbr $forwardbr
+do_without_error brctl addif $bridge $vifname
+
+do_without_error iptables -t mangle -F
+do_without_error ip6tables -t mangle -F
+do_without_error rmmod xt_SECCOLO
+}
+
+case $command in
+setup)
+if [ $mode = primary ]
+then
+setup_primary
+else
+setup_secondary
+fi
+
+success
+;;
+teardown)
+if [ $mode = primary ]
+then
+teardown_primary
+else
+teardown_secondary
+fi
+;;
+esac
+
+if [ $mode = primary ]
+then
+log debug Successful colo-proxy-setup

[Xen-devel] [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service

2015-07-15 Thread Yang Hongyang

This patchset implemented the COLO feature for Xen.
For detail/install/use of COLO feature, refer to:
  http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping

In this series, we've rebased to the latest libxl migration v2.

This patchset is based on:
  [PATCH v4 --for 4.6 COLOPre 00/25] Prerequisite patches for COLO

Only support hvm guest for now. The code is also hosted on github:
  https://github.com/macrosheep/xen/tree/colo-v8

Changelog from v7 to v8:
1. Rebased to the latest libxl migration v2.

Changelog from v6 to v7:
1. Ported to Libxl migration v2
2. Send dirty bitmap from secondary to primary on libxc side
3. Address review comments

Changelog from v5 to v6:
1. based on migration v2(libxc)
2. split the patchset into prerequisite patchset and this main patchset.

Changelog from v4 to v5:
1. rebase to the latest xen upstream
2. disk replication: blktap2-qdisk
3. nic replication: colo-agent-colo-proxy

Changelog from v3 to v4:
1. rebase to newest xen
2. bug fix

Changlog from v2 to v3:
1. rebase to newest remus
2. add nic replication support

Changlog from v1 to v2:
1. rebase to newest remus
2. add disk replication support


Wen Congyang (7):
  docs/libxl: Introduce COLO_CONTEXT to support migration v2 colo
streams
  secondary vm suspend/resume/checkpoint code
  primary vm suspend/resume/checkpoint code
  send store mfn and console mfn to xl before resuming secondary vm
  implement the cmdline for COLO
  Support colo mode for qemu disk
  COLO: use qemu block replication

Yang Hongyang (18):
A  docs: add colo readme
  libxc/migration: Specification update for DIRTY_BITMAP records
  libxc/migration: export read_record for common use
  tools/libxl: add back channel support to write stream
  tools/libxl: write colo_context records into the stream
  tools/libxl: add back channel support to read stream
  tools/libxl: handle colo_context records in a libxl migration v2 read
stream
  tools/libx{l,c}: introduce should_checkpoint callback
  tools/libx{l,c}: add postcopy/suspend callback to restore side
  libxc/restore: support COLO restore
  libxc/restore: send dirty bitmap to primary when checkpoint under colo
  libxc/save: support COLO save
  COLO proxy: implement setup/teardown of COLO proxy module
  COLO proxy: preresume, postresume and checkpoint
  COLO nic: implement COLO nic subkind
  setup and control colo proxy on primary side
  setup and control colo proxy on secondary side
  cmdline switches and config vars to control colo-proxy

 docs/README.colo |9 +
 docs/man/xl.conf.pod.5   |6 +
 docs/man/xl.pod.1|   11 +-
 docs/misc/xl-disk-configuration.txt  |   38 ++
 docs/specs/libxc-migration-stream.pandoc |   24 +-
 docs/specs/libxl-migration-stream.pandoc |   22 +-
 tools/hotplug/Linux/Makefile |1 +
 tools/hotplug/Linux/colo-proxy-setup |  131 
 tools/libxc/include/xenguest.h   |   36 ++
 tools/libxc/xc_sr_common.c   |   50 ++
 tools/libxc/xc_sr_common.h   |   36 +-
 tools/libxc/xc_sr_restore.c  |  244 +--
 tools/libxc/xc_sr_save.c |  104 ++-
 tools/libxc/xc_sr_stream_format.h|1 +
 tools/libxl/Makefile |4 +
 tools/libxl/libxl.c  |   77 ++-
 tools/libxl/libxl_colo.h |   42 ++
 tools/libxl/libxl_colo_nic.c |  320 ++
 tools/libxl/libxl_colo_proxy.c   |  267 
 tools/libxl/libxl_colo_qdisk.c   |  209 ++
 tools/libxl/libxl_colo_restore.c | 1024 ++
 tools/libxl/libxl_colo_save.c|  709 +
 tools/libxl/libxl_create.c   |  153 -
 tools/libxl/libxl_device.c   |   38 ++
 tools/libxl/libxl_dm.c   |  257 +++-
 tools/libxl/libxl_dom_save.c |   14 +-
 tools/libxl/libxl_internal.h |  217 +--
 tools/libxl/libxl_qmp.c  |   31 +
 tools/libxl/libxl_save_callout.c |7 +-
 tools/libxl/libxl_save_msgs_gen.pl   |   11 +-
 tools/libxl/libxl_sr_stream_format.h |   11 +
 tools/libxl/libxl_stream_read.c  |   68 ++
 tools/libxl/libxl_stream_write.c |  103 +++
 tools/libxl/libxl_types.idl  |8 +
 tools/libxl/libxlu_disk_l.l  |5 +
 tools/libxl/xl.c |3 +
 tools/libxl/xl.h |1 +
 tools/libxl/xl_cmdimpl.c |  101 ++-
 tools/libxl/xl_cmdtable.c|4 +-
 tools/python/xen/migration/libxl.py  |9 +
 40 files changed, 4224 insertions(+), 182 deletions(-)
 create mode 100644 docs/README.colo
 create mode 100755 tools/hotplug/Linux/colo-proxy-setup
 create mode 100644 tools/libxl/libxl_colo.h
 create mode 100644 tools/libxl/libxl_colo_nic.c
 create mode 100644 tools/libxl/libxl_colo_proxy.c
 create mode 100644

[Xen-devel] [PATCH v8 --for 4.6 COLO 02/25] docs/libxl: Introduce COLO_CONTEXT to support migration v2 colo streams

2015-07-15 Thread Yang Hongyang

From: Wen Congyang we...@cn.fujitsu.com

It is the negotiation record for COLO.
Primary-Secondary:
control_id  0x: Secondary VM is out of sync, start a new checkpoint
Secondary-Primary:
0x0001: Secondary VM is suspended
0x0002: Secondary VM is ready
0x0003: Secondary VM is resumed

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 docs/specs/libxl-migration-stream.pandoc | 22 +-
 tools/libxl/libxl_sr_stream_format.h | 11 +++
 tools/python/xen/migration/libxl.py  |  9 +
 3 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/docs/specs/libxl-migration-stream.pandoc 
b/docs/specs/libxl-migration-stream.pandoc
index c24a434..5986273 100644
--- a/docs/specs/libxl-migration-stream.pandoc
+++ b/docs/specs/libxl-migration-stream.pandoc
@@ -121,7 +121,9 @@ type 0x: END
 
  0x0004: CHECKPOINT_END
 
- 0x0005 - 0x7FFF: Reserved for future _mandatory_
+ 0x0005: COLO_CONTEXT
+
+ 0x0006 - 0x7FFF: Reserved for future _mandatory_
  records.
 
  0x8000 - 0x: Reserved for future _optional_
@@ -215,3 +217,21 @@ A checkpoint end record marks the end of a checkpoint in 
the image.
 +-+
 
 The end record contains no fields; its body_length is 0.
+
+COLO\_CONTEXT
+--
+
+A COLO context record contains the control information for COLO.
+
+ 0 1 2 3 4 5 6 7 octet
++++
+| control_id | padding|
++++
+
+
+FieldDescription
+ ---
+control_id   0x: Secondary VM is out of sync, start a new 
checkpoint
+ 0x0001: Secondary VM is suspended
+ 0x0002: Secondary VM is ready
+ 0x0003: Secondary VM is resumed
diff --git a/tools/libxl/libxl_sr_stream_format.h 
b/tools/libxl/libxl_sr_stream_format.h
index 3f3c497..1dd2ac4 100644
--- a/tools/libxl/libxl_sr_stream_format.h
+++ b/tools/libxl/libxl_sr_stream_format.h
@@ -36,6 +36,7 @@ typedef struct libxl__sr_rec_hdr
 #define REC_TYPE_XENSTORE_DATA   0x0002U
 #define REC_TYPE_EMULATOR_CONTEXT0x0003U
 #define REC_TYPE_CHECKPOINT_END  0x0004U
+#define REC_TYPE_COLO_CONTEXT0x0005U
 
 typedef struct libxl__sr_emulator_hdr
 {
@@ -47,6 +48,16 @@ typedef struct libxl__sr_emulator_hdr
 #define EMULATOR_QEMU_TRADITIONAL0x0001U
 #define EMULATOR_QEMU_UPSTREAM   0x0002U
 
+typedef struct libxl_sr_colo_context
+{
+uint32_t id;
+} libxl_sr_colo_context;
+
+#define COLO_NEW_CHECKPOINT  0xU
+#define COLO_SVM_SUSPENDED   0x0001U
+#define COLO_SVM_READY   0x0002U
+#define COLO_SVM_RESUMED 0x0003U
+
 #endif /* LIBXL__SR_STREAM_FORMAT_H */
 
 /*
diff --git a/tools/python/xen/migration/libxl.py 
b/tools/python/xen/migration/libxl.py
index 415502e..57031c6 100644
--- a/tools/python/xen/migration/libxl.py
+++ b/tools/python/xen/migration/libxl.py
@@ -37,6 +37,7 @@ REC_TYPE_libxc_context= 0x0001
 REC_TYPE_xenstore_data= 0x0002
 REC_TYPE_emulator_context = 0x0003
 REC_TYPE_checkpoint_end   = 0x0004
+REC_TYPE_colo_context = 0x0005
 
 rec_type_to_str = {
 REC_TYPE_end  : End,
@@ -44,6 +45,7 @@ rec_type_to_str = {
 REC_TYPE_xenstore_data: Xenstore data,
 REC_TYPE_emulator_context : Emulator context,
 REC_TYPE_checkpoint_end   : Checkpoint end,
+REC_TYPE_colo_context : COLO context
 }
 
 # emulator_context
@@ -184,6 +186,11 @@ class VerifyLibxl(VerifyBase):
 if len(content) != 0:
 raise RecordError(Checkpoint end record with non-zero length)
 
+def verify_record_colo_context(self, content):
+ COLO context 
+if len(content) == 0:
+raise RecordError(COLO context record with zero length)
+
 
 record_verifiers = {
 REC_TYPE_end:
@@ -196,4 +203,6 @@ record_verifiers = {
 VerifyLibxl.verify_record_emulator_context,
 REC_TYPE_checkpoint_end:
 VerifyLibxl.verify_record_checkpoint_end,
+REC_TYPE_colo_context:
+VerifyLibxl.verify_record_colo_context,
 }
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v8 --for 4.6 COLO 09/25] tools/libx{l, c}: introduce should_checkpoint callback

2015-07-15 Thread Yang Hongyang

Under COLO, we are doing checkpoint on demand, if this
callback returns 1, we will take another checkpoint.
0 indicates unexpected error.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 tools/libxc/include/xenguest.h | 18 ++
 tools/libxl/libxl_save_msgs_gen.pl |  7 ---
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 4056955..fa06d9b 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -63,6 +63,15 @@ struct save_callbacks {
  * 1: take another checkpoint */
 int (*checkpoint)(void* data);
 
+/*
+ * Called after the checkpoint callback.
+ *
+ * returns:
+ * 0: terminate checkpointing gracefully
+ * 1: take another checkpoint
+ */
+int (*should_checkpoint)(void* data);
+
 /* Enable qemu-dm logging dirty pages to xen */
 int (*switch_qemu_logdirty)(int domid, unsigned enable, void *data); /* 
HVM only */
 
@@ -112,6 +121,15 @@ struct restore_callbacks {
 #define XGR_CHECKPOINT_FAILOVER 2 /* Failover and resume VM */
 int (*checkpoint)(void* data);
 
+/*
+ * Called after the checkpoint callback.
+ *
+ * returns:
+ * 0: terminate checkpointing gracefully
+ * 1: take another checkpoint
+ */
+int (*should_checkpoint)(void* data);
+
 /* to be provided as the last argument to each callback function */
 void* data;
 };
diff --git a/tools/libxl/libxl_save_msgs_gen.pl 
b/tools/libxl/libxl_save_msgs_gen.pl
index d6d2967..9107a86 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -26,11 +26,12 @@ our @msgs = (
 [  3, 'scxA',   suspend, [] ],
 [  4, 'scxA',   postcopy, [] ],
 [  5, 'srcxA',  checkpoint, [] ],
-[  6, 'scxA',   switch_qemu_logdirty,  [qw(int domid
+[  6, 'srcxA',  should_checkpoint, [] ],
+[  7, 'scxA',   switch_qemu_logdirty,  [qw(int domid
   unsigned enable)] ],
-[  7, 'r',  restore_results,   ['unsigned long', 'store_mfn',
+[  8, 'r',  restore_results,   ['unsigned long', 'store_mfn',
   'unsigned long', 'console_mfn'] 
],
-[  8, 'srW',complete,  [qw(int retval
+[  9, 'srW',complete,  [qw(int retval
  int errnoval)] ],
 );
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v8 --for 4.6 COLO 13/25] libxc/restore: support COLO restore

2015-07-15 Thread Yang Hongyang

call the callbacks resume/checkpoint/suspend while secondary vm
status is consistent with primary.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
CC: Andrew Cooper andrew.coop...@citrix.com
---
 tools/libxc/xc_sr_common.h  | 16 ++--
 tools/libxc/xc_sr_restore.c | 60 +
 2 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 632160e..c5603ff 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -167,6 +167,18 @@ struct xc_sr_context
 
 xc_dominfo_t dominfo;
 
+/*
+ * migration stream
+ * 0: Plain VM
+ * 1: Remus
+ * 2: COLO
+ */
+enum {
+MIG_STREAM_PLAIN,
+MIG_STREAM_REMUS,
+MIG_STREAM_COLO,
+} migration_stream;
+
 union /* Common save or restore data. */
 {
 struct /* Save data. */
@@ -209,13 +221,13 @@ struct xc_sr_context
 uint32_t guest_page_size;
 
 /* Plain VM, or checkpoints over time. */
-bool checkpointed;
+int checkpointed;
 
 /* Currently buffering records between a checkpoint */
 bool buffer_all_records;
 
 /*
- * With Remus, we buffer the records sent by the primary at checkpoint,
+ * With Remus/COLO, we buffer the records sent by the primary at checkpoint,
  * in case the primary will fail, we can recover from the last
  * checkpoint state.
  * This should be enough for most of the cases because primary only send
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index d53694b..696bf30 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -454,6 +454,49 @@ static int handle_checkpoint(struct xc_sr_context *ctx)
 else
 ctx-restore.buffer_all_records = true;
 
+if ( ctx-restore.checkpointed == MIG_STREAM_COLO )
+{
+#define HANDLE_CALLBACK_RETURN_VALUE(ret)   \
+do {\
+if ( ret == 1 ) \
+rc = 0; /* Success */   \
+else\
+{   \
+if ( ret == 2 ) \
+rc = BROKEN_CHANNEL;\
+else\
+rc = -1; /* Some unspecified error */   \
+goto err;   \
+}   \
+} while (0)
+
+/* COLO */
+
+/* We need to resume guest */
+rc = ctx-restore.ops.stream_complete(ctx);
+if ( rc )
+goto err;
+
+/* TODO: call restore_results */
+
+/* Resume secondary vm */
+ret = ctx-restore.callbacks-postcopy(ctx-restore.callbacks-data);
+HANDLE_CALLBACK_RETURN_VALUE(ret);
+
+/* Wait for a new checkpoint */
+ret = ctx-restore.callbacks-should_checkpoint(
+ctx-restore.callbacks-data);
+HANDLE_CALLBACK_RETURN_VALUE(ret);
+
+/* suspend secondary vm */
+ret = ctx-restore.callbacks-suspend(ctx-restore.callbacks-data);
+HANDLE_CALLBACK_RETURN_VALUE(ret);
+
+#undef HANDLE_CALLBACK_RETURN_VALUE
+
+/* TODO: send dirty bitmap to primary */
+}
+
  err:
 return rc;
 }
@@ -625,6 +668,15 @@ static int restore(struct xc_sr_context *ctx)
 } while ( rec.type != REC_TYPE_END );
 
  remus_failover:
+
+if ( ctx-restore.checkpointed == MIG_STREAM_COLO )
+{
+/* With COLO, we have already called stream_complete */
+rc = 0;
+IPRINTF(COLO Failover);
+goto done;
+}
+
 /*
  * With Remus, if we reach here, there must be some error on primary,
  * failover from the last checkpoint state.
@@ -679,6 +731,14 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, 
uint32_t dom,
 if ( checkpointed_stream )
 assert(callbacks-checkpoint);
 
+if ( ctx.restore.checkpointed == MIG_STREAM_COLO )
+{
+/* this is COLO restore */
+assert(callbacks-suspend 
+   callbacks-postcopy 
+   callbacks-should_checkpoint);
+}
+
 IPRINTF(In experimental %s, __func__);
 DPRINTF(fd %d, dom %u, hvm %u, pae %u, superpages %d
 , checkpointed_stream %d, io_fd, dom, hvm, pae,
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v8 --for 4.6 COLO 03/25] libxc/migration: Specification update for DIRTY_BITMAP records

2015-07-15 Thread Yang Hongyang

Used by secondary to send it's dirty bitmap to primary under COLO.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 docs/specs/libxc-migration-stream.pandoc | 24 +++-
 tools/libxc/xc_sr_common.c   |  1 +
 tools/libxc/xc_sr_stream_format.h|  1 +
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/docs/specs/libxc-migration-stream.pandoc 
b/docs/specs/libxc-migration-stream.pandoc
index 68fa513..480d357 100644
--- a/docs/specs/libxc-migration-stream.pandoc
+++ b/docs/specs/libxc-migration-stream.pandoc
@@ -227,7 +227,9 @@ type 0x: END
 
  0x000E: CHECKPOINT
 
- 0x000F - 0x7FFF: Reserved for future _mandatory_
+ 0x000F: DIRTY_BITMAP
+
+ 0x0010 - 0x7FFF: Reserved for future _mandatory_
  records.
 
  0x8000 - 0x: Reserved for future _optional_
@@ -601,6 +603,26 @@ CHECKPOINT record or an END record.
 
 \clearpage
 
+DIRTY_BITMAP
+
+
+A dirty_bitmap record is used for secondary to send it's dirty bitmap
+to primary while doing a checkpoint under COLO. This record only exists
+in back channel.
+
+ 0 1 2 3 4 5 6 7 octet
++-+
+| pfn[0]  |
++-+
+...
++-+
+| pfn[C-1]|
++-+
+
+The count of the pfn is: record-length/sizeof(uint64_t).
+
+\clearpage
+
 Layout
 ==
 
diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c
index 945cfa6..becc0f4 100644
--- a/tools/libxc/xc_sr_common.c
+++ b/tools/libxc/xc_sr_common.c
@@ -35,6 +35,7 @@ static const char *mandatory_rec_types[] =
 [REC_TYPE_X86_PV_VCPU_MSRS] = x86 PV vcpu msrs,
 [REC_TYPE_VERIFY]   = Verify,
 [REC_TYPE_CHECKPOINT]   = Checkpoint,
+[REC_TYPE_DIRTY_BITMAP] = Dirty bitmap,
 };
 
 const char *rec_type_to_str(uint32_t type)
diff --git a/tools/libxc/xc_sr_stream_format.h 
b/tools/libxc/xc_sr_stream_format.h
index 6d0f8fd..43a0209 100644
--- a/tools/libxc/xc_sr_stream_format.h
+++ b/tools/libxc/xc_sr_stream_format.h
@@ -75,6 +75,7 @@ struct xc_sr_rhdr
 #define REC_TYPE_X86_PV_VCPU_MSRS 0x000cU
 #define REC_TYPE_VERIFY   0x000dU
 #define REC_TYPE_CHECKPOINT   0x000eU
+#define REC_TYPE_DIRTY_BITMAP 0x000fU
 
 #define REC_TYPE_OPTIONAL 0x8000U
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/traps: Dump instruction stream in show_execution_state()

2015-07-15 Thread Jan Beulich

 On 15.07.15 at 11:26, andrew.coop...@citrix.com wrote:
 On 15/07/15 09:53, Jan Beulich wrote:
 Also I think you should avoid the subtraction from regs-rip to wrap
 through zero, or even bail when RIP doesn't point into Xen space.
 
 If the instruction stream under eip is accessible, it should be printed,
 even if it doesn't point into Xen space.  Bear in mind that anything
 could have gone wrong by the point we get here; we may have accidentally
 jumped into userspace or jumped into some data.

In which case that fact (seen by RIP itself being off) is enough to
know what happened. What exact instruction caused the fault is
then of no interest anymore.

 The wrapping through zero will be caught by the error handling in
 __copy_from_user(), but I admit that it is not very obvious.  The
 information will be available based on the numeric value of eip.

No, by passing the wrapped pointer to __coppy_from_user() you
will get the non-interesting bytes (if any) printed, but not the one
RIP actually points to.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V6 1/3] xen/mem_access: Support for memory-content hiding

2015-07-15 Thread George Dunlap

On 07/15/2015 09:45 AM, Razvan Cojocaru wrote:
 This patch adds support for memory-content hiding, by modifying the
 value returned by emulated instructions that read certain memory
 addresses that contain sensitive data. The patch only applies to
 cases where VM_FLAG_ACCESS_EMULATE has been set to a vm_event
 response.
 
 Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com
 Acked-by: Tamas K Lengyel tleng...@novetta.com

BTW I've looked at an earlier version of this and acked it, and I
haven't seen any changes I want to review; so when the rest of it is
acked/reviewed I'll take another look through and send my ack.

 -George

 
 ---
 Changes since V5:
  - Renamed set_context_data()'s bytes parameter to size.
  - Inverted if() condition in set_context_data().
  - Removed memcpy() conditional from set_context_data().
  - Removed label from hvmemul_rep_outs_set_context().
  - Now bypassing hvm_copy_from_guest_phys() in hvmemul_rep_movs() if
hvmemul_ctxt-set_context is true.
  - Fixed for_each_vcpu() coding style (blank before the opening
parenthesis).
  - Added comments about the serialization status of
vm_event_init_domain() and vm_event_cleanup_domain().
  - Setting v-arch.vm_event.emul_read_data to NULL after xfree() in
vcpu_destroy() for safety.
 ---
  tools/tests/xen-access/xen-access.c |2 +-
  xen/arch/x86/domain.c   |3 +
  xen/arch/x86/hvm/emulate.c  |  117 
 ---
  xen/arch/x86/hvm/event.c|   50 +++
  xen/arch/x86/mm/p2m.c   |   92 +++
  xen/arch/x86/vm_event.c |   35 +++
  xen/common/vm_event.c   |8 +++
  xen/include/asm-arm/vm_event.h  |   13 
  xen/include/asm-x86/domain.h|1 +
  xen/include/asm-x86/hvm/emulate.h   |   10 ++-
  xen/include/asm-x86/vm_event.h  |4 ++
  xen/include/public/vm_event.h   |   35 ---
  12 files changed, 287 insertions(+), 83 deletions(-)
 
 diff --git a/tools/tests/xen-access/xen-access.c 
 b/tools/tests/xen-access/xen-access.c
 index 12ab921..e6ca9ba 100644
 --- a/tools/tests/xen-access/xen-access.c
 +++ b/tools/tests/xen-access/xen-access.c
 @@ -530,7 +530,7 @@ int main(int argc, char *argv[])
  break;
  case VM_EVENT_REASON_SOFTWARE_BREAKPOINT:
  printf(Breakpoint: rip=%016PRIx64, gfn=%PRIx64 (vcpu 
 %d)\n,
 -   req.regs.x86.rip,
 +   req.data.regs.x86.rip,
 req.u.software_breakpoint.gfn,
 req.vcpu_id);
  
 diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
 index 34ecd7c..1ef9fad 100644
 --- a/xen/arch/x86/domain.c
 +++ b/xen/arch/x86/domain.c
 @@ -511,6 +511,9 @@ int vcpu_initialise(struct vcpu *v)
  
  void vcpu_destroy(struct vcpu *v)
  {
 +xfree(v-arch.vm_event.emul_read_data);
 +v-arch.vm_event.emul_read_data = NULL;
 +
  if ( is_pv_32bit_vcpu(v) )
  {
  free_compat_arg_xlat(v);
 diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
 index 795321c..2766919 100644
 --- a/xen/arch/x86/hvm/emulate.c
 +++ b/xen/arch/x86/hvm/emulate.c
 @@ -67,6 +67,24 @@ static int null_write(const struct hvm_io_handler *handler,
  return X86EMUL_OKAY;
  }
  
 +static int set_context_data(void *buffer, unsigned int size)
 +{
 +struct vcpu *curr = current;
 +
 +if ( curr-arch.vm_event.emul_read_data )
 +{
 +unsigned int safe_size =
 +min(size, curr-arch.vm_event.emul_read_data-size);
 +
 +memcpy(buffer, curr-arch.vm_event.emul_read_data-data, safe_size);
 +memset(buffer + safe_size, 0, size - safe_size);
 +}
 +else
 +return X86EMUL_UNHANDLEABLE;
 +
 +return X86EMUL_OKAY;
 +}
 +
  static const struct hvm_io_ops null_ops = {
  .read = null_read,
  .write = null_write
 @@ -771,6 +789,12 @@ static int hvmemul_read(
  unsigned int bytes,
  struct x86_emulate_ctxt *ctxt)
  {
 +struct hvm_emulate_ctxt *hvmemul_ctxt =
 +container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
 +
 +if ( unlikely(hvmemul_ctxt-set_context) )
 +return set_context_data(p_data, bytes);
 +
  return __hvmemul_read(
  seg, offset, p_data, bytes, hvm_access_read,
  container_of(ctxt, struct hvm_emulate_ctxt, ctxt));
 @@ -963,6 +987,17 @@ static int hvmemul_cmpxchg(
  unsigned int bytes,
  struct x86_emulate_ctxt *ctxt)
  {
 +struct hvm_emulate_ctxt *hvmemul_ctxt =
 +container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
 +
 +if ( unlikely(hvmemul_ctxt-set_context) )
 +{
 +int rc = set_context_data(p_new, bytes);
 +
 +if ( rc != X86EMUL_OKAY )
 +return rc;
 +}
 +
  /* Fix this in case the guest is really relying on r-m-w atomicity. */
  return hvmemul_write(seg, offset, p_new, bytes, ctxt);
  }
 @@ -1005,6 +1040,38 @@ static int hvmemul_rep_ins(

Re: [Xen-devel] [PATCH v4 05/17] xen/arm: ITS: implement hw_irq_controller for LPIs

2015-07-15 Thread Julien Grall


Hi Ian,

On 15/07/2015 11:32, Ian Campbell wrote:

Why can't we store the event ID in the irq_guest? As said on v3, this is not


Are you referring to irq_desc in above statement?


Yes sorry.


I'm afraid I don't follow your suggestion here, are you suggesting that
the vid field added above should be moved to irq_desc?


Yes,


But the vid _is_ domain specific, it is the virtual event ID which is
per-domain (it's the thing looked up in the ITT to get a vLPI to be
injected). I think it is a pretty direct analogue of the virq field used
for non-LPI irq_guest structs.


No, vid is not specific to a domain but a device. The virtual event ID 
is always the same as the physical event ID (See your design document 
[1]). Furthermore, all the usage of the irq_to_vid in this series are 
for physical command (see lpi_set_config within this patch).




Your proposal on v3 looks to be around moving the its_device pointer to
the irq_desc, which appears to have been done here, along with turning
the virq+vid into a union as requested there too.


On v3 I said: The event ID and
the its_device assigned are known when the device is added to Xen and
hence can be set in irq_desc (with a small memory impact, but we have
plenty of memory on ARM64).

Sorry if it was confusing.




It has been suggested by Ian to move col_id in the its_device in the
previous version [4]. Any reason to not doing it?


In round robin fashion each plpi is attached to col_id. So storing
in its_device is not possible. In linux latest col_id is stored in its_device
structure for which set_affinity is called.


Are you saying that in Linux all Events/LPIs associated with a given ITS
device are routed to the same collection?


You could do round robin on its_device... It would be exactly the same


Routing all LPIs associated with a given its_device to the same
collection is not exactly the same as round robin-ing all LPIs from the
device over the collections.


Yes, sorry I was a bit lax on the writing. I wanted to meant that there 
is not much difference to do it.



and save 2 byte if not more with the alignment per irq_desc.


If this is a concern then I would say we would either want a separate
array of per-pLPI information which we do not want in irq_desc because
it is irq specific, or do add a pointer to its_desc which points to an
array of per-event information.


That would be a good solution. Although, as I said, I don't really care 
for Xen 4.6. It's more an optimization for 4.7.


Regards,

[1] http://xenbits.xen.org/people/ianc/vits/draftG.html#event-id-event

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH RFC] A script to use with OpenStack instead of vif-bridge

2015-07-15 Thread Anthony PERARD

Hi,

I have submitted a script to be used by OpenStack instead of our vif-bridge
script: https://review.openstack.org/201257/
This is because vif-bridge is calling iptables and OpenStack (nova-network)
is also updating the iptables (via iptables-{save,restore}).

Could you review this patch that I have append bellow?

Also, would it be better to have a similair script in Xen repo instead of
Nova?

The script is based on another already present in nova:
http://git.openstack.org/cgit/openstack/nova/tree/contrib/xen/vif-openstack

Thanks.

The patch:


From cb7daaab757f5f744dc9c3698e67b451db3392fe Mon Sep 17 00:00:00 2001
From: Anthony PERARD anthony.per...@citrix.com
Date: Mon, 13 Jul 2015 16:39:25 +0100
Subject: [PATCH] contrib: Add vif-bridge-nova-network script for Xen.

This script adds a vif created for a Xen guest to the bridge. This script
is to be called by the Xen toolstack instead of the default one as the
default will make call to iptables in a way that is not compatible with
nova uses of iptables.

To make use of the script, it is to be placed in XEN_SCRIPT_DIR (likely to
be /etc/xen/scripts) and adds the following in nova.conf:
[libvirt]
xen_vif_bridge_script_path = vif-bridge-nova-network

Change-Id: Ief24f0eff85f9b5a5f8cf26c3e08c4d8aeabc789
Partial-Bug: #1461642
Co-Authored-By: Christian Berendt bere...@b1-systems.de
Signed-off-by: Anthony PERARD anthony.per...@citrix.com
---
 contrib/xen/vif-bridge-nova-network | 47 +
 1 file changed, 47 insertions(+)
 create mode 100755 contrib/xen/vif-bridge-nova-network

diff --git a/contrib/xen/vif-bridge-nova-network 
b/contrib/xen/vif-bridge-nova-network
new file mode 100755
index 000..c6a3a6b
--- /dev/null
+++ b/contrib/xen/vif-bridge-nova-network
@@ -0,0 +1,47 @@
+#!/bin/bash
+# copyright: B1 Systems GmbH i...@b1-systems.de, 2012.
+# author: Christian Berendt bere...@b1-systems.de, 2012.
+# Copyright (C) 2015, Citrix Ltd.
+#
+#Licensed under the Apache License, Version 2.0 (the License); you may
+#not use this file except in compliance with the License. You may obtain
+#a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an AS IS BASIS, WITHOUT
+#WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+#License for the specific language governing permissions and limitations
+#under the License.
+#
+# Use this script instead of the default one to avoid iptables call from
+# the script which may conflict with Nova use of iptables.
+#
+# usage:
+#   place the script in $XEN_SCRIPT_DIR (likely to be /etc/xen/scripts)
+#   and set the following in /etc/nova/nova.conf:
+# [libvirt]
+# xen_vif_bridge_script_path = vif-bridge-nova-network
+
+dir=$(dirname $0)
+. $dir/vif-common.sh
+
+bridge=$(xenstore_read_default $XENBUS_PATH/bridge $bridge)
+
+case $command in
+add|online)
+setup_virtual_bridge_port $dev
+add_to_bridge $bridge $dev
+;;
+
+remove|offline)
+  do_without_error brctl delif $bridge $dev
+  do_without_error ip link set $dev down
+  ;;
+esac
+
+if [ $type_if = vif -a $command = online ]
+then
+  success
+fi

-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 3/3] sched/preempt: fix cond_resched_lock() and cond_resched_softirq()

2015-07-15 Thread Konstantin Khlebnikov

These functions check should_resched() before unlocking spinlock/bh-enable:
preempt_count always non-zero = should_resched() always returns false.
cond_resched_lock() worked iff spin_needbreak is set.

This patch adds argument preempt_offset to should_resched().

preempt_count offset constants for that:

PREEMPT_DISABLE_OFFSET  - offset after preempt_disable()
PREEMPT_LOCK_OFFSET - offset after spin_lock()
SOFTIRQ_DISABLE_OFFSET  - offset after local_bh_distable()
SOFTIRQ_LOCK_OFFSET - offset after spin_lock_bh()

Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru
---
 arch/x86/include/asm/preempt.h |4 ++--
 include/asm-generic/preempt.h  |5 +++--
 include/linux/preempt.h|   19 ++-
 include/linux/sched.h  |6 --
 kernel/sched/core.c|6 +++---
 5 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index dca71714f860..b12f81022a6b 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -90,9 +90,9 @@ static __always_inline bool __preempt_count_dec_and_test(void)
 /*
  * Returns true when we need to resched and can (barring IRQ state).
  */
-static __always_inline bool should_resched(void)
+static __always_inline bool should_resched(int preempt_offset)
 {
-   return unlikely(!raw_cpu_read_4(__preempt_count));
+   return unlikely(raw_cpu_read_4(__preempt_count) == preempt_offset);
 }
 
 #ifdef CONFIG_PREEMPT
diff --git a/include/asm-generic/preempt.h b/include/asm-generic/preempt.h
index d0a7a4753db2..0bec580a4885 100644
--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -71,9 +71,10 @@ static __always_inline bool 
__preempt_count_dec_and_test(void)
 /*
  * Returns true when we need to resched and can (barring IRQ state).
  */
-static __always_inline bool should_resched(void)
+static __always_inline bool should_resched(int preempt_offset)
 {
-   return unlikely(!preempt_count()  tif_need_resched());
+   return unlikely(preempt_count() == preempt_offset 
+   tif_need_resched());
 }
 
 #ifdef CONFIG_PREEMPT
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 84991f185173..bea8dd8ff5e0 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -84,13 +84,21 @@
  */
 #define in_nmi()   (preempt_count()  NMI_MASK)
 
+/*
+ * The preempt_count offset after preempt_disable();
+ */
 #if defined(CONFIG_PREEMPT_COUNT)
-# define PREEMPT_DISABLE_OFFSET 1
+# define PREEMPT_DISABLE_OFFSETPREEMPT_OFFSET
 #else
-# define PREEMPT_DISABLE_OFFSET 0
+# define PREEMPT_DISABLE_OFFSET0
 #endif
 
 /*
+ * The preempt_count offset after spin_lock()
+ */
+#define PREEMPT_LOCK_OFFSETPREEMPT_DISABLE_OFFSET
+
+/*
  * The preempt_count offset needed for things like:
  *
  *  spin_lock_bh()
@@ -103,7 +111,7 @@
  *
  * Work as expected.
  */
-#define SOFTIRQ_LOCK_OFFSET (SOFTIRQ_DISABLE_OFFSET + PREEMPT_DISABLE_OFFSET)
+#define SOFTIRQ_LOCK_OFFSET (SOFTIRQ_DISABLE_OFFSET + PREEMPT_LOCK_OFFSET)
 
 /*
  * Are we running in atomic context?  WARNING: this macro cannot
@@ -124,7 +132,8 @@
 #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
 extern void preempt_count_add(int val);
 extern void preempt_count_sub(int val);
-#define preempt_count_dec_and_test() ({ preempt_count_sub(1); 
should_resched(); })
+#define preempt_count_dec_and_test() \
+   ({ preempt_count_sub(1); should_resched(0); })
 #else
 #define preempt_count_add(val) __preempt_count_add(val)
 #define preempt_count_sub(val) __preempt_count_sub(val)
@@ -184,7 +193,7 @@ do { \
 
 #define preempt_check_resched() \
 do { \
-   if (should_resched()) \
+   if (should_resched(0)) \
__preempt_schedule(); \
 } while (0)
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ae21f1591615..a8e9b17acdee 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2885,12 +2885,6 @@ extern int _cond_resched(void);
 
 extern int __cond_resched_lock(spinlock_t *lock);
 
-#ifdef CONFIG_PREEMPT_COUNT
-#define PREEMPT_LOCK_OFFSETPREEMPT_OFFSET
-#else
-#define PREEMPT_LOCK_OFFSET0
-#endif
-
 #define cond_resched_lock(lock) ({ \
___might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);\
__cond_resched_lock(lock);  \
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 78b4bad10081..d9a4d93dc879 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4492,7 +4492,7 @@ SYSCALL_DEFINE0(sched_yield)
 
 int __sched _cond_resched(void)
 {
-   if (should_resched()) {
+   if (should_resched(0)) {
preempt_schedule_common();
return 1;
}
@@ -4510,7 +4510,7 @@ EXPORT_SYMBOL(_cond_resched);
  */
 int __cond_resched_lock(spinlock_t *lock)
 {
-   int resched = should_resched();
+   int resched =

[Xen-devel] [PATCH v2 1/3] drivers/xen/preempt: use need_resched() instead of should_resched()

2015-07-15 Thread Konstantin Khlebnikov

This code is used only when CONFIG_PREEMPT=n and only in non-atomic context:
xen_in_preemptible_hcall is set only in privcmd_ioctl_hypercall().
Thus preempt_count is zero and should_resched() is equal to need_resched().

Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru
---
 drivers/xen/preempt.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/preempt.c b/drivers/xen/preempt.c
index a1800c150839..08cb419eb4e6 100644
--- a/drivers/xen/preempt.c
+++ b/drivers/xen/preempt.c
@@ -31,7 +31,7 @@ EXPORT_SYMBOL_GPL(xen_in_preemptible_hcall);
 asmlinkage __visible void xen_maybe_preempt_hcall(void)
 {
if (unlikely(__this_cpu_read(xen_in_preemptible_hcall)
- should_resched())) {
+ need_resched())) {
/*
 * Clear flag as we may be rescheduled on a different
 * cpu.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 2/3] KVM: PPC: Book3S HV: Use need_resched() instead of should_resched()

2015-07-15 Thread Konstantin Khlebnikov

Function should_resched() is equal to (!preempt_count()  need_resched()).
In preemptive kernel preempt_count here is non-zero because of vc-lock.

Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru
---
 arch/powerpc/kvm/book3s_hv.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 68d067ad4222..a9f753fb73a8 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2178,7 +2178,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
vc-runner = vcpu;
if (n_ceded == vc-n_runnable) {
kvmppc_vcore_blocked(vc);
-   } else if (should_resched()) {
+   } else if (need_resched()) {
vc-vcore_state = VCORE_PREEMPT;
/* Let something else run */
cond_resched_lock(vc-lock);


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 05/17] xen/arm: ITS: implement hw_irq_controller for LPIs

2015-07-15 Thread Ian Campbell

On Wed, 2015-07-15 at 11:49 +0200, Julien Grall wrote:
 Hi Ian,
 
 On 15/07/2015 11:32, Ian Campbell wrote:
  Why can't we store the event ID in the irq_guest? As said on v3, this is 
  not
 
  Are you referring to irq_desc in above statement?
 
  Yes sorry.
 
  I'm afraid I don't follow your suggestion here, are you suggesting that
  the vid field added above should be moved to irq_desc?
 
 Yes,
 
  But the vid _is_ domain specific, it is the virtual event ID which is
  per-domain (it's the thing looked up in the ITT to get a vLPI to be
  injected). I think it is a pretty direct analogue of the virq field used
  for non-LPI irq_guest structs.
 
 No, vid is not specific to a domain but a device. The virtual event ID 
 is always the same as the physical event ID (See your design document 
 [1]). Furthermore, all the usage of the irq_to_vid in this series are 
 for physical command (see lpi_set_config within this patch).
 
 
  Your proposal on v3 looks to be around moving the its_device pointer to
  the irq_desc, which appears to have been done here, along with turning
  the virq+vid into a union as requested there too.
 
 On v3 I said: The event ID and
 the its_device assigned are known when the device is added to Xen and
 hence can be set in irq_desc (with a small memory impact, but we have
 plenty of memory on ARM64).
 
 Sorry if it was confusing.

It was me who was confusing the properties of vid with those of vlpi,
sorry.

Not helped by
http://xenbits.xen.org/people/ianc/vits/draftG.html#virtual-lpi-injection 
confusingly using virq instead of vid.

Ian.

  It has been suggested by Ian to move col_id in the its_device in the
  previous version [4]. Any reason to not doing it?
 
  In round robin fashion each plpi is attached to col_id. So storing
  in its_device is not possible. In linux latest col_id is stored in 
  its_device
  structure for which set_affinity is called.
 
  Are you saying that in Linux all Events/LPIs associated with a given ITS
  device are routed to the same collection?
 
  You could do round robin on its_device... It would be exactly the same
 
  Routing all LPIs associated with a given its_device to the same
  collection is not exactly the same as round robin-ing all LPIs from the
  device over the collections.
 
 Yes, sorry I was a bit lax on the writing. I wanted to meant that there 
 is not much difference to do it.
 
  and save 2 byte if not more with the alignment per irq_desc.
 
  If this is a concern then I would say we would either want a separate
  array of per-pLPI information which we do not want in irq_desc because
  it is irq specific, or do add a pointer to its_desc which points to an
  array of per-event information.
 
 That would be a good solution. Although, as I said, I don't really care 
 for Xen 4.6. It's more an optimization for 4.7.
 
 Regards,
 
 [1] http://xenbits.xen.org/people/ianc/vits/draftG.html#event-id-event
 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/traps: Misc tweaks to several printk()s

2015-07-15 Thread Jan Beulich

 On 15.07.15 at 11:48, andrew.coop...@citrix.com wrote:
 On 15/07/15 10:03, Jan Beulich wrote:
 On 14.07.15 at 19:54, andrew.coop...@citrix.com wrote:
 @@ -626,8 +626,9 @@ static void do_trap(struct cpu_user_regs *regs, int 
 use_error_code)
  
  if ( likely((fixup = search_exception_table(regs-eip)) != 0) )
  {
 -dprintk(XENLOG_ERR, Trap %d: %p - %p\n,
 -trapnr, _p(regs-eip), _p(fixup));
 +printk(XENLOG_INFO Exception [#%d, ec=%04x] (%s): %ps %p - %p\n,
 +   trapnr, use_error_code ? regs-error_code : 0, 
 trapstr(trapnr),
 +   _p(regs-eip), _p(regs-eip), _p(fixup));
 But why the transition dprintk() - printk()?
 
 The file/line reference here is not useful, but now that you point it
 out I had forgotten to consider that dprintk() now only exists in debug
 builds.
 
 It would be nice to have a variant on printk() which is restricted to
 debug builds, but doesn't have a file/line reference.

But otoh the file/line pair shouldn't cause a lot of confusion - debug
build users can certainly be expected to cope with that. Which isn't
to say that I'd even consider making dprintk() by default not print
file/line, and instead have a dprintk_at() or DPRINTK() doing so for
those who really can't write distinguishable messages.

 @@ -2813,10 +2814,11 @@ static int emulate_privileged_op(struct 
 cpu_user_regs *regs)
  case MSR_EFER:
   rdmsr_normal:
  /* Everyone can read the MSR space. */
 -/* gdprintk(XENLOG_WARNING,Domain attempted RDMSR %p.\n,
 -_p(regs-ecx));*/
  if ( rdmsr_safe(regs-ecx, val) )
 +{
 +gprintk(XENLOG_WARNING, attempted RDMSR 0x%08x\n, 
 regs-_ecx);
  goto fail;
 +}
 Do you really see this to be useful in production builds?
 
 There is currently an asymmetry between the WRMSR and RDMSR paths, which
 shouldn't exist IMO.

I'm of the opposite opinion: Knowing that (just like we do) guest
kernels may access MSRs being prepared to get a #GP, and this
(naturally) being more likely on RDMSR (why would one try to write
an MSR one can't read?), the asymmetry has a reason.

 Guest warning is rate limited by default, and anecdotally, this path
 doesn't trigger by default on any of my test boxes with a 3.10 pvops kernel.

Which is nice to know, but not nearly enough to assume we won't get
flooded (ignoring the rate limiting) by these for other guest kernels.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used

2015-07-15 Thread Wu, Feng

 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, July 15, 2015 4:25 PM
 To: Wu, Feng
 Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
 Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
 Subject: RE: [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used

  On 15.07.15 at 08:04, feng...@intel.com wrote:
  From: Jan Beulich [mailto:jbeul...@suse.com]
  Sent: Friday, July 10, 2015 10:02 PM
  I'm particularly worried by the call to acpi_find_matched_drhd_unit()
  - is it maybe worth storing the iommu pointer in struct msi_desc?

  I think it worth, Like Andrew also mentioned this point before. I tend
  to make this a independent work and do it later, since the 4.6 release
  is coming, I am still try my best to target it. Could you please share
  your concern here, performance? Or other things? Thanks!

 Interrupt latency in particular.

This update IRTE operation is not so frequently. It only happens in few times,
especially in the initialization phase of the guest. And even the guest set
the affinity, in the MSI/MSIx configuration doesn't change, QEMU will not
ask Xen to update it.

   +GET_IREMAP_ENTRY(ir_ctrl-iremap_maddr, remap_index,
  iremap_entries, p);
   +new_ire = *p;
   +
   +/* Setup/Update interrupt remapping table entry. */
   +setup_posted_irte(new_ire, pi_desc, gvec);
   +
   +do {
   +old_ire = *(uint128_t *)p;

  This cast suggests that you might want to go beyond what Andrew
  said on cmpxchg16b()'s parameters: Perhaps they'd better be
  void * instead of uint128_t *.

  In that case, I need to do the cast in __cmpxchg16b(), right?

 Where needed, yes. But that would limit casting to just a single place.

   +ret = cmpxchg16b(p, old_ire, new_ire);
   +} while ( memcmp(ret, old_ire, sizeof(old_ire)) );

  Doesn't setup_posted_irte() need to move inside this loop, as it
  tries to preserve certain fields? Or else, what is the cmpxchg16b
  loop guarding against (i.e. why isn't this just a single one)?

  Why need we move setup_posted_irte() inside the loop? new_ire
  will not be changed after setup, right? Here we need to make sure
  the 128b IRTE is updated atomically, especially for the high part
  of posted-interrupt descriptor address and the low part of it.

 There are two possible scenarios:

 1) There are bits that can be updated behind the back of the code
 here. In that case you need to loop, and each iteration of the loop
 needs to re-fetch the current value (not doing so would make the
 loop infinite).

Oh, yes, I think I made a mistake here, it is too hastily these days,
Sorry for that! I think I need do it like this:

do {
new_ire = *p;

/* Setup/Update interrupt remapping table entry. */
setup_posted_irte(new_ire, pi_desc, gvec);

old_ire = *(uint128_t *)p;
ret = cmpxchg16b(p, old_ire, new_ire);
} while ( memcmp(ret, old_ire, sizeof(old_ire)) );

Thanks,
Feng

 2) No racing updates are possible; all you care about is atomicity
 of the update. In that case you don't need a loop around the
 cmpxchg16b().

 Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [xen-unstable test] 59544: regressions - FAIL

2015-07-15 Thread Dario Faggioli

On Wed, 2015-07-15 at 08:48 +, osstest service owner wrote:
 flight 59544 xen-unstable real [real]
 http://logs.test-lab.xenproject.org/osstest/logs/59544/
 
 Regressions :-(
 
 Tests which did not succeed and are blocking,
 including tests which could not be run:

  test-armhf-armhf-xl   6 xen-boot  fail REGR. vs. 
 58965

To me, it looks like the box did actually reboot in Xen. However:

Jul 14 21:04:52.969102 Starting NTP server: ntpd[   76.734015] asix 2-3.2.4:1.0 
eth0: link down
Jul 14 21:05:46.565053 [   85.437886] asix 2-3.2.4:1.0 eth0: link down
Jul 14 21:05:55.269159 .

Which is something I certainly I've seen already (I'm not sure it was on
arndale, but I think yes), and AFAICR, we can't do much about.

Regards,
Dario
-- 
This happens because I choose it to happen! (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used

2015-07-15 Thread Wu, Feng

 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, July 15, 2015 4:46 PM
 To: Wu, Feng
 Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
 Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
 Subject: RE: [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used

  On 15.07.15 at 10:38, feng...@intel.com wrote:

  -Original Message-
  From: Jan Beulich [mailto:jbeul...@suse.com]
  Sent: Wednesday, July 15, 2015 4:25 PM
  To: Wu, Feng
  Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
  Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
  Subject: RE: [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used

   On 15.07.15 at 08:04, feng...@intel.com wrote:
   From: Jan Beulich [mailto:jbeul...@suse.com]
   Sent: Friday, July 10, 2015 10:02 PM
   I'm particularly worried by the call to acpi_find_matched_drhd_unit()
   - is it maybe worth storing the iommu pointer in struct msi_desc?

   I think it worth, Like Andrew also mentioned this point before. I tend
   to make this a independent work and do it later, since the 4.6 release
   is coming, I am still try my best to target it. Could you please share
   your concern here, performance? Or other things? Thanks!

  Interrupt latency in particular.

  This update IRTE operation is not so frequently. It only happens in few
  times,
  especially in the initialization phase of the guest. And even the guest set
  the affinity, in the MSI/MSIx configuration doesn't change, QEMU will not
  ask Xen to update it.

 When the guest sets the affinity, the MSI{,-X} configuration is
 rather likely to change (at least for Linux guests).

Yes, it is. But I'd say, it is not a frequent operation. In my test, it only 
happens
in the initialization phase and some updates doesn't go the Xen since the
configuration is the same (QEMU filters it). And I agree I will change this,
my question is that can we put this a little late, and I can focus on some
other critical issue before 4.6 is release, which may make more chance for
this patch to catch up with 4.6. Is this okay for you?

Thanks,
Feng

  There are two possible scenarios:

  1) There are bits that can be updated behind the back of the code
  here. In that case you need to loop, and each iteration of the loop
  needs to re-fetch the current value (not doing so would make the
  loop infinite).

  Oh, yes, I think I made a mistake here, it is too hastily these days,
  Sorry for that! I think I need do it like this:

  do {
  new_ire = *p;

  /* Setup/Update interrupt remapping table entry. */
  setup_posted_irte(new_ire, pi_desc, gvec);

  old_ire = *(uint128_t *)p;
  ret = cmpxchg16b(p, old_ire, new_ire);
  } while ( memcmp(ret, old_ire, sizeof(old_ire)) );

 So since you put this in a loop again, would you mind pointing out
 which bits can get modified behind our back?

 Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

1 2 3 4 >

1 - 100 of 304 matches

Mail list logo