[Xen-devel] [ovmf test] 59592: all pass - PUSHED
flight 59592 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/59592/ Perfect :-) All tests in this flight passed version targeted for testing: ovmf 680742607132a7733880407453b5f792699d7143 baseline version: ovmf 2ad9cf37a492e69a4e1b7624d92d9a35fce083fc Last test of basis59511 2015-07-13 13:47:15 Z2 days Testing same since59592 2015-07-15 22:42:18 Z0 days1 attempts People who touched revisions under test: Ard Biesheuvel ard.biesheu...@linaro.org Brendan Jackman brendan.jack...@arm.com Bruce Cran br...@cran.org.uk Chao Zhang chao.b.zh...@intel.com fanwang2 fan.w...@intel.com Gabriel Somlo so...@cmu.edu Hao Wu hao.a...@intel.com Jeff Fan jeff@intel.com Jiaxin Wu jiaxin...@intel.com Laszlo Ersek ler...@redhat.com Olivier Martin olivier.mar...@arm.com Qiu Shumin shumin@intel.com Ronald Cron ronald.c...@arm.com Tapan Shah tapands...@hp.com Zhang Lubo lubo.zh...@intel.com jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : + branch=ovmf + revision=680742607132a7733880407453b5f792699d7143 + . cri-lock-repos ++ . cri-common +++ . cri-getconfig +++ umask 002 +++ getrepos getconfig Repos perl -e ' use Osstest; readglobalconfig(); print $c{Repos} or die $!; ' +++ local repos=/home/osstest/repos +++ '[' -z /home/osstest/repos ']' +++ '[' '!' -d /home/osstest/repos ']' +++ echo /home/osstest/repos ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x '!=' x/home/osstest/repos/lock ']' ++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock ++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push ovmf 680742607132a7733880407453b5f792699d7143 + branch=ovmf + revision=680742607132a7733880407453b5f792699d7143 + . cri-lock-repos ++ . cri-common +++ . cri-getconfig +++ umask 002 +++ getrepos getconfig Repos perl -e ' use Osstest; readglobalconfig(); print $c{Repos} or die $!; ' +++ local repos=/home/osstest/repos +++ '[' -z /home/osstest/repos ']' +++ '[' '!' -d /home/osstest/repos ']' +++ echo /home/osstest/repos ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']' + . cri-common ++ . cri-getconfig ++ umask 002 + select_xenbranch + case $branch in + tree=ovmf + xenbranch=xen-unstable + '[' xovmf = xlinux ']' + linuxbranch= + '[' x = x ']' + qemuubranch=qemu-upstream-unstable + : tested/2.6.39.x + . ap-common ++ : osst...@xenbits.xen.org +++ getconfig OsstestUpstream +++ perl -e ' use Osstest; readglobalconfig(); print $c{OsstestUpstream} or die $!; ' ++ : ++ : git://xenbits.xen.org/xen.git ++ : osst...@xenbits.xen.org:/home/xen/git/xen.git ++ : git://xenbits.xen.org/staging/qemu-xen-unstable.git ++ : git://git.kernel.org ++ : git://git.kernel.org/pub/scm/linux/kernel/git ++ : git ++ : git://xenbits.xen.org/libvirt.git ++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git ++ : git://xenbits.xen.org/libvirt.git ++ : git://xenbits.xen.org/rumpuser-xen.git ++ : git ++ : git://xenbits.xen.org/rumpuser-xen.git ++ : osst...@xenbits.xen.org:/home/xen/git/rumpuser-xen.git +++ besteffort_repo https://github.com/rumpkernel/rumpkernel-netbsd-src +++ local repo=https://github.com/rumpkernel/rumpkernel-netbsd-src +++ cached_repo https://github.com/rumpkernel/rumpkernel-netbsd-src '[fetch=try]' +++ local repo=https://github.com/rumpkernel/rumpkernel-netbsd-src +++ local 'options=[fetch=try]' getconfig GitCacheProxy perl -e '
Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 05/25] libxl/remus: introduce libxl__remus_setup
On 07/15/2015 07:26 PM, Ian Campbell wrote: On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote: Refactoring Remus setup by introducing libxl__remus_setup API. All Remus setup work are done in this function. Also remove the libxl__ prefix for static functions. There is a subtle behavioural change here, which is that if anything which is now done in _setup fails then the result is a call to dss-callback( ..,..,ERROR_FAIL) rather than _start returning AO_CREATE_FAIL(ERROR_FAIL). I think this is probably a reasonable and correct change, but I think it is worth mentioning in the commit log. Yes, will update the commit log. That said, I also wonder if the actual check for netbuffer_enabled (the only such failure in practice) ought to be moved up such that it stays in _start along with the other similar checks, i.e. _start would do: if (libxl_defbool_val(info-netbuf) !libxl__netbuffer_enabled(gc)) { LOG(ERROR, Remus: No support for network buffering); rc = ERROR_FAIL; goto out; } This check is for Remus only, we want to reuse _start for COLO, so anything related to Remus only should sit in libxl_remus.c. while _setup would do: if (libxl_defbool_val(info-netbuf)) { // MAYBE : assert(libxl__netbuffer_enabled(gc)) rds-device_kind_flags |= (1 LIBXL__DEVICE_KIND_VIF); } Ian. . -- Thanks, Yang. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 11/25] tools/libxc: support to resume uncooperative HVM guests
On 07/15/2015 08:26 PM, Ian Campbell wrote: On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote: From: Wen Congyang we...@cn.fujitsu.com 1. suspend a. PVHVM and PV: we use the same way to suspend the guest (send the suspend request to the guest). If the guest doesn't support evtchn, the xenstore variant will be used, suspending the guest via XenBus control node. b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend the guest 2. Resume: a. fast path In this case, we don't change the guest's state. PV: modify the return code to 1, and than call the domctl: XEN_DOMCTL_resumedomain PVHVM: same with PV HVM: do nothing in modify_returncode, and than call the domctl: XEN_DOMCTL_resumedomain b. slow Used when the guest's state have been changed. PV: update start info, and reset all secondary CPU states. Than call the domctl: XEN_DOMCTL_resumedomain PVHVM and HVM can not be resumed. For PVHVM, in my test, only call the domctl: XEN_DOMCTL_resumedomain can work. I am not sure if we should update start info and reset all secondary CPU states. For pure HVM guest, in my test, only call the domctl: XEN_DOMCTL_resumedomain can work. So we can call libxl__domain_resume(..., 1) if we don't change the guest state, otherwise call libxl__domain_resume(..., 0). Under COLO, we will update the guest's state(modify memory, cpu's registers, device status...). In this case, we cannot use the fast path to resume it. Keep the return code 0, and use a slow path to resume the guest. While resuming HVM using slow path is not supported currently, this patch is to make the resume call do not fail. I'm afraid that the addition of this paragraph has not really addressed my comment on v3: I'm afraid I think the commit message for this patch (and the associated doc comments) need revisiting almost from scratch, to clearly explain what this patch is doing and why and what the constraints on the new functionality will be. At the moment it mostly talks in a confusing way about the old behaviour and adds very specific assumptions to the new function which are not made clear. It also appears that this has not been addressed: Hrm, so it sounds here like the correctness of this new functionality requires the caller to have not messed with the domain's state? What sort of changes are to the guest state are we talking about here? This is used for secondary, at a checkpoint, we do: 1. suspend the guest 2. sync the guest state with primary == here the guest state has been changed 3. resume the guest The guest state is changed by step 2, then we will resume the guest, since the guest state has been changed, we cannot use the fast path to resume it. For slow path, resume HVM is not supported currently, this patch is to add the support. While the XEN_DOMCTL_resumedomain hyper call for HVM is an NOP, it happens to me that we could do this in a different way. We can modify libxl__domain_resume, if the domain is HVM, we skip the xc_domain_resume call, what do you think? Isn't that a new requirement for this call? If so then it should be documented somewhere, specifically what sorts of changes are and are not allowed and the types of guests which are affected. The two usages of in my test in the commit message also do not inspire confidence that this change is understood to be correct, vs. happening to be something which works for you. Ian. Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxc/xc_resume.c | 22 ++ 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c index e67bebd..bd82334 100644 --- a/tools/libxc/xc_resume.c +++ b/tools/libxc/xc_resume.c @@ -109,6 +109,23 @@ static int xc_domain_resume_cooperative(xc_interface *xch, uint32_t domid) return do_domctl(xch, domctl); } +static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid) +{ +DECLARE_DOMCTL; + +/* + * If it is PVHVM, the hypercall return code is 0, because this + * is not a fast path resume, we do not modify_returncode as in + * xc_domain_resume_cooperative. + * (resuming it in a new domain context) + * + * If it is a HVM, the hypercall is a NOP. + */ +domctl.cmd = XEN_DOMCTL_resumedomain; +domctl.domain = domid; +return do_domctl(xch, domctl); +} + static int xc_domain_resume_any(xc_interface *xch, uint32_t domid) { DECLARE_DOMCTL; @@ -138,10 +155,7 @@ static int xc_domain_resume_any(xc_interface *xch, uint32_t domid) */ #if defined(__i386__) || defined(__x86_64__) if ( info.hvm ) -{ -ERROR(Cannot resume uncooperative HVM guests); -return rc; -} +return xc_domain_resume_hvm(xch, domid); if (
Re: [Xen-devel] [PATCH] libxl: events: Do not abort remus with ERROR_TIMEOUT
On 07/15/2015 09:35 PM, Ian Jackson wrote: When the timeout set for prompting the next remus iteration fires, we should not treat the ERROR_TIMEDOUT as an error. Bug in 31c836f4 libxl: events: Permit timeouts to signal ao abort. Reported-by: Yang Hongyang yan...@cn.fujitsu.com Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com CC: Yang Hongyang yan...@cn.fujitsu.com CC: Wei Liu wei.l...@citrix.com CC: Ian Campbell ian.campb...@citrix.com Acked-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxl/libxl_dom.c |3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 81adb3d..4cb247a 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -2024,6 +2024,9 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev, STATE_AO_GC(dss-ao); +if (rc == ERROR_TIMEDOUT) /* As intended */ +rc = 0; + /* * Time to checkpoint the guest again. We return 1 to libxc * (xc_domain_save.c). in order to continue executing the infinite loop -- Thanks, Yang. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] libxl/remus: fix the return value of the checkpoint callback
On 07/15/2015 09:37 PM, Ian Jackson wrote: Ian Campbell writes (Re: [PATCH] libxl/remus: fix the return value of the checkpoint callback): Does that mean it won't apply to current staging? Indeed it doesn't. I think we probably want this fix ASAP rather than waiting for that series? Yes. Patch just sent. Untested but fairly obvious. Yang, do you want to test this, or do you want us to apply it as-is ? I don't have a remus test setup. Please apply, thanks! This is the second rc-handling bug in 31c836f4 libxl: events: Permit timeouts to signal ao abort. I am going to re-read that patch to see if I can find any more. Ian. . -- Thanks, Yang. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 04/25] tools/libxl: rename remus checkpoint callbacks
On 07/15/2015 07:17 PM, Ian Campbell wrote: On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote: There are 2 remus checkpoint callbacks(save/restore), currently, they both called libxl__remus_domain_checkpoint_callback in diffrent file, so it is ok. But in the following patch, we will move all of the remus callback code into a seperate file, the name should be diffrent. separate and different (twice). OK, thanks! So rename them to: libxl__remus_domain_{save/restore}_checkpoint_callback Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com Acked-by: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxl/libxl_create.c | 4 ++-- tools/libxl/libxl_dom.c| 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 5b4d333..a32e3df 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -677,7 +677,7 @@ static int store_libxl_entry(libxl__gc *gc, uint32_t domid, static void remus_checkpoint_stream_done( libxl__egc *egc, libxl__stream_read_state *srs, int rc); -static void libxl__remus_domain_checkpoint_callback(void *data) +static void libxl__remus_domain_restore_checkpoint_callback(void *data) { libxl__save_helper_state *shs = data; libxl__domain_create_state *dcs = shs-caller_state; @@ -989,7 +989,7 @@ static void domcreate_bootloader_done(libxl__egc *egc, } /* Restore */ -callbacks-checkpoint = libxl__remus_domain_checkpoint_callback; +callbacks-checkpoint = libxl__remus_domain_restore_checkpoint_callback; rc = libxl__build_pre(gc, domid, d_config, state); if (rc) diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 0788309..9c61fa7 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -1586,7 +1586,7 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev, const struct timeval *requested_abs, int rc); -static void libxl__remus_domain_checkpoint_callback(void *data) +static void libxl__remus_domain_save_checkpoint_callback(void *data) { libxl__save_helper_state *shs = data; libxl__domain_suspend_state *dss = shs-caller_state; @@ -1749,7 +1749,7 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss) if (r_info != NULL) { callbacks-suspend = libxl__remus_domain_suspend_callback; callbacks-postcopy = libxl__remus_domain_resume_callback; -callbacks-checkpoint = libxl__remus_domain_checkpoint_callback; +callbacks-checkpoint = libxl__remus_domain_save_checkpoint_callback; dss-sws.checkpoint_callback = remus_checkpoint_stream_written; } else callbacks-suspend = libxl__domain_suspend_callback; . -- Thanks, Yang. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 06/25] libxl/remus: introduce libxl__remus_teardown
On 07/15/2015 07:59 PM, Ian Campbell wrote: On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote: introduce libxl__remus_teardown to teardown Remus devices. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com Acked-by: Ian Campbell ian.campb...@citrix.com If you need to respin then you might consider inverting the if remus check in domain_suspend_done and calling this new function if true, e.g. if (dss-remus) { libxl__remus_teardown(...) return; } dss-callback(egc, dss, rc); I think the control flow would feel more natural then. will do, thanks! . -- Thanks, Yang. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 24/25] tools/libxl: move remus state into a seperate structure
On 07/15/2015 09:28 PM, Ian Campbell wrote: On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote: @@ -2921,6 +2911,26 @@ _hidden void libxl__checkpoint_devices_preresume(libxl__egc *egc, libxl__checkpoint_devices_state *cds); _hidden void libxl__checkpoint_devices_commit(libxl__egc *egc, libxl__checkpoint_devices_state *cds); + +/*- Remus related state structure -*/ +typedef struct libxl__remus_state libxl__remus_state; +struct libxl__remus_state { +/* private */ +libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */ +int interval; /* checkpoint interval */ + +/* abstract layer */ +libxl__checkpoint_devices_state cds; This mostly makes sense, I think, but this one field feels like it will be wanted by colo too. Does that mean we will end up with dss-rs.cds and dss-colo.cds doing effectively the same thing? Yes, checkpoint device is an abstract layer, used by both Remus colo, in the abstract layer, we do not aware of remus or colo, in Remus or colo, we can use container of cds to retrive Remus/colo state. Ian. . -- Thanks, Yang. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 12/25] tools/libxl: introduce enum type libxl_checkpointed_stream
On 07/15/2015 08:34 PM, Ian Campbell wrote: On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote: introduce enum type libxl_checkpointed_stream in IDL. rename the last argument of migrate_receive from remus to checkpointed since the semantics of this parameter has changed. NOTE: libxl_domain_restore_params isn't changed here, checkpointed_stream is still an int. It has to change eventually and other callers will have to be updated to cope (and there should be LIBXL_HAVE_...). Will this be fixed up later in this series? If so please say so. It's not fixed in this series, I plan to fix this later, but seems there will be another round for this series, I can fix this in the next version. My main concern is that this change is an api change, it will affect the existing callers. @@ -4282,7 +4282,7 @@ static void migrate_domain(uint32_t domid, const char *rune, int debug, } static void migrate_receive(int debug, int daemonize, int monitor, -int send_fd, int recv_fd, int remus) +int send_fd, int recv_fd, int checkpointed) I think you can start using the new enum type in xl straight away even if dom_info.checkpointed_stream remains an int. So that means here. @@ -4489,7 +4489,8 @@ int main_restore(int argc, char **argv) int main_migrate_receive(int argc, char **argv) { -int debug = 0, daemonize = 1, monitor = 1, remus = 0; +int debug = 0, daemonize = 1, monitor = 1; +int checkpointed = LIBXL_CHECKPOINTED_STREAM_NONE; and here. @@ -4318,7 +4318,7 @@ static void migrate_receive(int debug, int daemonize, int monitor, domid = rc; -if (remus) { +if (checkpointed) { /* If we are here, it means that the sender (primary) has crashed. * TODO: Split-Brain Check. */ Is it the case that we expect all check pointing solutions will use the same failover code here? If yes then this should be if (checkpointed ! = ...NONE). If we think they might differ (even if remus and colo happen to be the same) then I think a switch where the NONE case does nothing would be more structurally appropriate. Ian. . -- Thanks, Yang. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 21/25] tools/libxl: rename remus device to checkpoint device
On 07/15/2015 09:15 PM, Ian Campbell wrote: On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote: This patch is auto generated by the following commands: 1. git mv tools/libxl/libxl_remus_device.c tools/libxl/libxl_checkpoint_device.c This patch does not appear to have been formatted with git format-patch -M as requested last time around. Sorry I missed this :( will do in the next version. btw, I have a dump question...how to specify -M for only this patch while it is in a series? . -- Thanks, Yang. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 07/25] libxl/remus: init checkpoint_callback in Remus checkpoint callback
On 07/15/2015 08:02 PM, Ian Campbell wrote: On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote: init stream {read/write} state checkpoint_callback in Remus checkpoint callback. Why? Is this earlier or later than previously? Seems later? There's no functional change, it's just refactoring so that we can move all remus code into one file. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com CC: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxl/libxl_create.c | 2 +- tools/libxl/libxl_dom.c| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index a32e3df..94fe98f 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -684,6 +684,7 @@ static void libxl__remus_domain_restore_checkpoint_callback(void *data) libxl__egc *egc = shs-egc; STATE_AO_GC(dcs-ao); +dcs-srs.checkpoint_callback = remus_checkpoint_stream_done; libxl__stream_read_start_checkpoint(egc, dcs-srs); } @@ -1000,7 +1001,6 @@ static void domcreate_bootloader_done(libxl__egc *egc, dcs-srs.fd = restore_fd; dcs-srs.legacy = (dcs-restore_params.stream_version == 1); dcs-srs.completion_callback = domcreate_stream_done; -dcs-srs.checkpoint_callback = remus_checkpoint_stream_done; libxl__stream_read_start(egc, dcs-srs); return; diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 77a917c..1740bed 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -1593,6 +1593,7 @@ static void libxl__remus_domain_save_checkpoint_callback(void *data) libxl__egc *egc = shs-egc; STATE_AO_GC(dss-ao); +dss-sws.checkpoint_callback = remus_checkpoint_stream_written; libxl__stream_write_start_checkpoint(egc, dss-sws); } @@ -1750,7 +1751,6 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss) callbacks-suspend = libxl__remus_domain_suspend_callback; callbacks-postcopy = libxl__remus_domain_resume_callback; callbacks-checkpoint = libxl__remus_domain_save_checkpoint_callback; -dss-sws.checkpoint_callback = remus_checkpoint_stream_written; } else callbacks-suspend = libxl__domain_suspend_callback; . -- Thanks, Yang. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 21/25] tools/libxl: rename remus device to checkpoint device
On 07/15/2015 09:32 PM, Ian Campbell wrote: On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote: tools/libxl/libxl_types.idl | 4 +- diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index e8d3647..1d676ef 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -61,8 +61,8 @@ libxl_error = Enumeration(error, [ (-15, LOCK_FAIL), (-16, JSON_CONFIG_EMPTY), (-17, DEVICE_EXISTS), -(-18, REMUS_DEVOPS_DOES_NOT_MATCH), -(-19, REMUS_DEVICE_NOT_SUPPORTED), +(-18, CHECKPOINT_DEVOPS_DOES_NOT_MATCH), +(-19, CHECKPOINT_DEVICE_NOT_SUPPORTED), (-20, VNUMA_CONFIG_INVALID), (-21, DOMAIN_NOTFOUND), (-22, ABORTED), This is an API change, which I think we discussed before. Also missed this one, sorry. In 558bc6ee.60...@cn.fujitsu.com you said you would add an extra patch to deal with that, and I think that needs to come before this automatic will add the patch before the automatic renaming. renaming so that there is no bisect hazard. I don't see any such patch even after this point though (from grepping your colo-v8 branch). Ian. . -- Thanks, Yang. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] libxl/remus: fix the return value of the checkpoint callback
On 07/15/2015 08:13 PM, Ian Campbell wrote: On Wed, 2015-07-15 at 18:32 +0800, Yang Hongyang wrote: In checkpoint callback, we wait for the interval and then start another checkpoint, so the ERROR_TIMEDOUT should be intended and should not treat as error. This patch is based on [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service Does that mean it won't apply to current staging? No. This can apply on top of colo pre series. I think we probably want this fix ASAP rather than waiting for that series? I can resubmit the patch apply to staging, but the colo pre series will need to be rebased... Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxl/libxl_remus.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c index 46dcc3c..ffc92a7 100644 --- a/tools/libxl/libxl_remus.c +++ b/tools/libxl/libxl_remus.c @@ -355,11 +355,14 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev, * (xc_domain_save.c). in order to continue executing the infinite loop * (suspend, checkpoint, resume) in xc_domain_save(). */ - -if (rc) +if (rc == ERROR_TIMEDOUT) { +/* This is intended, we set the timeout and start another checkpoint */ +libxl__xc_domain_saverestore_async_callback_done(egc, dss-sws.shs, 1); Please wrap this (slightly) overlong line (and probably the comment too which is borderline AFAICT). +} else { dss-rc = rc; - -libxl__xc_domain_saverestore_async_callback_done(egc, dss-sws.shs, !rc); +libxl__xc_domain_saverestore_async_callback_done(egc, + dss-sws.shs, !rc); +} } /*-- remus callbacks (restore) ---*/ . -- Thanks, Yang. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 24/25] tools/libxl: move remus state into a seperate structure
On 07/15/2015 11:08 PM, Ian Jackson wrote: Yang Hongyang writes ([Xen-devel] [PATCH v4 --for 4.6 COLOPre 24/25] tools/libxl: move remus state into a seperate structure): Add a new structure remus state, and move concrete layer's private member to remus state. it is pure refactoring and no functional changes. Thanks. I don't have much to add to what Ian Campbell has said, but if (dss-checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_REMUS) { -dss-interval = r_info-interval; if (libxl_defbool_val(r_info-compression)) dss-xcflags |= XCFLAGS_CHECKPOINT_COMPRESS; In your next version it would be worth mentioning the movement of this initialisation in the commit message. Ok, thanks! Ian. . -- Thanks, Yang. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On 2015/7/16 0:14, George Dunlap wrote: On Wed, Jul 15, 2015 at 2:56 PM, George Dunlap george.dun...@eu.citrix.com wrote: Would it be possible, on a collision, to have one last stab at allocating the BAR somewhere else, without relocating memory (or relocating any other BARs)? At very least then an administrator could work around this kind of thing by setting the mmio_hole larger in the domain config. If it's not possible to have this last-ditch relocation effort, then Could you take a look at the original patch #06 ? Although Jan thought that is complicated, that is really one version that I can refine in current time slot. yes, I'd be OK with just disabling the device for the time being. Just let me send out new patch series based this idea. We can continue discuss this over there but we also need to further review other remaining comments based on a new revision. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 16/25] tools/libxl: Update libxl_domain_unpause() to support qemu-xen
On 07/15/2015 08:50 PM, Ian Campbell wrote: On Wed, 2015-07-15 at 15:45 +0800, Yang Hongyang wrote: Currently, libxl__domain_unpause() only supports qemu-xen-traditional. Update it to support qemu-xen. We use libxl__domain_resume_device_model to unpause guest dm. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxl/libxl.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 5b2d045..799aead 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -941,8 +941,6 @@ out: int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid) { GC_INIT(ctx); -char *path; -char *state; int ret, rc = 0; libxl_domain_type type = libxl__domain_type(gc, domid); @@ -952,14 +950,11 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid) } if (type == LIBXL_DOMAIN_TYPE_HVM) { -uint32_t dm_domid = libxl_get_stubdom_id(ctx, domid); - -path = libxl__device_model_xs_path(gc, dm_domid, domid, /state); -state = libxl__xs_read(gc, XBT_NULL, path); -if (state != NULL !strcmp(state, paused)) { -libxl__qemu_traditional_cmd(gc, domid, continue); -libxl__wait_for_device_model_deprecated(gc, domid, running, - NULL, NULL, NULL); +rc = libxl__domain_resume_device_model(gc, domid); +if (rc 0) { +LIBXL__LOG(ctx, LIBXL__LOG_ERROR, failed to unpause device model + for domain %u:%d, domid, rc); Please use the preferred form of LOG(ERROR, failed to...), which should also hopefully allow you to avoid splitting the line in the middle of a string constant which is discouraged. If you can't use LOG() then please: LIBXL__LOG(ctx, LIBXL__LOG_ERROR, failed to unpause device model for domain %u:%d, domid, rc); Not splitting string constants means you can grep for an error message. Sorry, the commit message is wrong, it's libxl_domain_unpause, not libxl__domain_unpause, LOG() can't be used, so I will update commit message and use your later suggestion, thank you! Ian. . -- Thanks, Yang. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table
I think I would say: -- Now use the hypervisor-supplied memory map to build our final e820 table: * Add regions for BIOS ranges and other special mappings not in the hypervisor map * Add in the hypervisor regions * Adjust the lowmem and highmem regions if we've had to relocate memory (adding a highmem region if necessary) * Sort all the ranges so that they appear in memory order. -- I'll update this and thanks a lot. CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- [snip] +/* Low RAM goes here. Reserve space for special pages. */ +BUG_ON(low_mem_end (2u 20)); Won't this BUG if the guest was actually given less than 2GiB of RAM? 2u 20 = 0x20, so this is 2M, not 2G :) + +/* + * We may need to adjust real lowmem end since we may + * populate RAM to get enough MMIO previously. + */ [snip] + +/* + * And then we also need to adjust highmem. + */ +if ( add_high_mem ) +{ +for ( i = 0; i memory_map.nr_map; i++ ) +{ +if ( e820[i].type == E820_RAM + e820[i].addr (1ull 32)) +e820[i].size += add_high_mem; +} +} What if there was originally no high memory, but resizing the pci hole meant we had to create a high memory region? You're right. We need to consider this case. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [linux-3.18 test] 59587: regressions - FAIL
flight 59587 linux-3.18 real [real] http://logs.test-lab.xenproject.org/osstest/logs/59587/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-pvh-intel 11 guest-start fail REGR. vs. 58581 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-libvirt 6 xen-boot fail REGR. vs. 58581 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 12 guest-localmigrate fail baseline untested test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 14 guest-localmigrate.2 fail baseline untested test-armhf-armhf-xl-rtds 14 guest-start.2 fail baseline untested test-amd64-i386-libvirt-xsm 11 guest-start fail like 58558 test-amd64-amd64-libvirt 11 guest-start fail like 58558 test-amd64-i386-rumpuserxen-i386 15 rumpuserxen-demo-xenstorels/xenstorels.repeat fail like 58558 test-amd64-amd64-libvirt-xsm 11 guest-start fail like 58558 test-amd64-i386-libvirt 11 guest-start fail like 58581 test-armhf-armhf-xl 6 xen-boot fail like 58581 test-armhf-armhf-xl-credit2 6 xen-boot fail like 58581 test-armhf-armhf-xl-multivcpu 6 xen-boot fail like 58581 test-armhf-armhf-xl-xsm 6 xen-boot fail like 58581 test-armhf-armhf-libvirt-xsm 6 xen-boot fail like 58581 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 58581 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 58581 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 58581 Tests which did not succeed, but are not blocking: test-amd64-i386-freebsd10-i386 9 freebsd-install fail never pass test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-i386-freebsd10-amd64 9 freebsd-install fail never pass test-armhf-armhf-xl-cubietruck 6 xen-boot fail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail never pass test-armhf-armhf-xl-rtds 12 migrate-support-checkfail never pass version targeted for testing: linux866cebe251f4fb2b435f4ecfe6d3bb4025938533 baseline version: linuxd048c068d00da7d4cfa5ea7651933b99026958cf Last test of basis58581 2015-06-15 09:42:22 Z 30 days Failing since 58976 2015-06-29 19:43:23 Z 16 days 20 attempts Testing same since59412 2015-07-11 00:18:42 Z5 days8 attempts 308 people touched revisions under test, not listing them all jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass build-amd64-rumpuserxen pass build-i386-rumpuserxen pass test-amd64-amd64-xl pass test-armhf-armhf-xl fail test-amd64-i386-xl pass test-amd64-amd64-xl-qemut-debianhvm-amd64-xsmpass test-amd64-i386-xl-qemut-debianhvm-amd64-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsmfail test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm fail test-amd64-amd64-libvirt-xsm fail test-armhf-armhf-libvirt-xsm fail test-amd64-i386-libvirt-xsm fail test-amd64-amd64-xl-xsm pass test-armhf-armhf-xl-xsm fail test-amd64-i386-xl-xsm pass test-amd64-amd64-xl-pvh-amd
Re: [Xen-devel] [PATCH v6] dmar: device scope mem leak fix
elena.ufimts...@oracle.com wrote on 2015-07-07: From: Elena Ufimtseva elena.ufimts...@oracle.com Release memory allocated for scope.devices dmar units on various failure paths and when disabling dmar. Set device count after successful memory allocation, not before, in device scope parsing function. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com Acked-by: Yang Zhang yang.z.zh...@intel.com Best regards, Yang ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 --for 4.6 COLOPre 00/25] Prerequisite patches for COLO
Seems my reply emails last night are lost. they didn't appear on the list, I'm going to repost them. On 07/15/2015 03:45 PM, Yang Hongyang wrote: This patchset is Prerequisite for COLO feature. Refer to: http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping This patchse is based on Andrew Cooper's Libxl migration v4.1: http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/libxl-migv2-v4.1 In this version, I moved some of the COLO specific patches down to the COLO main series, so most patches of this series are refactoring and can be applied first. I've done some simple test. Both Remus and normal migration work after apply this patchset. The patch to fix Remus on migration v2 will be sent later as a seperate patch. You can also get the patchset from: https://github.com/macrosheep/xen/tree/colo-v8 v3-v4: - Rebased to the latest migration v2 branch - Addressed comments from last round v2-v3: - Merge '[PATCH v2 0/6] Misc cleanups for libxl' into this patchset for easy review - Addressed review comments - Add back channel to libxc - Introduce should_checkpoint callback - Introduce DIRTY_BITMAP record on libxc side - Introduce COLO_CONTEXT record on libxl side - Ported to Libxl migration v2 v1-v2: - Rebased to [PATCH v2 0/6] Misc cleanups for libxl - Add a bugfix for the error handling of process_record Wen Congyang (2): tools/libxc: support to resume uncooperative HVM guests tools/libxl: Add back channel to allow migration target send data back Yang Hongyang (23): tools/libxl: rename libxl__domain_suspend to libxl__domain_save A tools/libxl: move domain suspend code into libxl_dom_suspend.c A tools/libxl: move domain resume code into libxl_dom_suspend.c tools/libxl: rename remus checkpoint callbacks libxl/remus: introduce libxl__remus_setup libxl/remus: introduce libxl__remus_teardown libxl/remus: init checkpoint_callback in Remus checkpoint callback tools/libxl: move remus code into libxl_remus.c A tools/libxl: move save/restore code into libxl_dom_save.c libxl/save: Refactor libxl__domain_suspend_state tools/libxl: introduce enum type libxl_checkpointed_stream migration/save: pass checkpointed_stream from libxl to libxc tools/libxl: introduce libxl__domain_restore_device_model to load qemu state tools/libxl: check QEMU state before resume dm tools/libxl: Update libxl_domain_unpause() to support qemu-xen A tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty() A tools/libxl: export logdirty_init tools/libx{l,c}: add back channel to libxc tools/libxl: rename remus device to checkpoint device A tools/libxl: adjust the indentation tools/libxl: store remus_ops in checkpoint device state tools/libxl: move remus state into a seperate structure tools/libxl: seperate device init/cleanup from checkpoint device layer tools/libxc/include/xenguest.h| 13 +- tools/libxc/xc_domain_restore.c |4 +- tools/libxc/xc_domain_save.c |6 +- tools/libxc/xc_nomigrate.c|3 +- tools/libxc/xc_resume.c | 22 +- tools/libxc/xc_sr_common.h|2 +- tools/libxc/xc_sr_restore.c |2 +- tools/libxc/xc_sr_save.c |5 +- tools/libxl/Makefile |5 +- tools/libxl/libxl.c | 119 +--- tools/libxl/libxl.h | 30 +- tools/libxl/libxl_checkpoint_device.c | 282 tools/libxl/libxl_create.c| 33 +- tools/libxl/libxl_dom.c | 1243 - tools/libxl/libxl_dom_save.c | 721 +++ tools/libxl/libxl_dom_suspend.c | 503 + tools/libxl/libxl_internal.h | 246 --- tools/libxl/libxl_netbuffer.c | 117 ++-- tools/libxl/libxl_nonetbuffer.c | 10 +- tools/libxl/libxl_qmp.c | 10 + tools/libxl/libxl_remus.c | 395 +++ tools/libxl/libxl_remus_device.c | 327 - tools/libxl/libxl_remus_disk_drbd.c | 56 +- tools/libxl/libxl_save_callout.c | 43 +- tools/libxl/libxl_save_helper.c |9 +- tools/libxl/libxl_stream_write.c | 14 +- tools/libxl/libxl_types.idl | 10 +- tools/libxl/xl_cmdimpl.c | 21 +- tools/ocaml/libs/xl/xenlight_stubs.c |2 +- 29 files changed, 2321 insertions(+), 1932 deletions(-) create mode 100644 tools/libxl/libxl_checkpoint_device.c create mode 100644 tools/libxl/libxl_dom_save.c create mode 100644 tools/libxl/libxl_dom_suspend.c create mode 100644 tools/libxl/libxl_remus.c delete mode 100644 tools/libxl/libxl_remus_device.c -- Thanks, Yang. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 --for 4.6 COLOPre 22/25] tools/libxl: adjust the indentation
This is just tidying up after the previous automatic renaming. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com Acked-by: Ian Campbell ian.campb...@citrix.com --- tools/libxl/libxl_checkpoint_device.c | 21 +++-- tools/libxl/libxl_internal.h | 19 +++ 2 files changed, 22 insertions(+), 18 deletions(-) diff --git a/tools/libxl/libxl_checkpoint_device.c b/tools/libxl/libxl_checkpoint_device.c index 109cd23..226f159 100644 --- a/tools/libxl/libxl_checkpoint_device.c +++ b/tools/libxl/libxl_checkpoint_device.c @@ -73,9 +73,9 @@ static void devices_teardown_cb(libxl__egc *egc, /* checkpoint device setup and teardown */ static libxl__checkpoint_device* checkpoint_device_init(libxl__egc *egc, - libxl__checkpoint_devices_state *cds, - libxl__device_kind kind, - void *libxl_dev) +libxl__checkpoint_devices_state *cds, +libxl__device_kind kind, +void *libxl_dev) { libxl__checkpoint_device *dev = NULL; @@ -89,9 +89,10 @@ static libxl__checkpoint_device* checkpoint_device_init(libxl__egc *egc, } static void checkpoint_devices_setup(libxl__egc *egc, -libxl__checkpoint_devices_state *cds); + libxl__checkpoint_devices_state *cds); -void libxl__checkpoint_devices_setup(libxl__egc *egc, libxl__checkpoint_devices_state *cds) +void libxl__checkpoint_devices_setup(libxl__egc *egc, + libxl__checkpoint_devices_state *cds) { int i, rc; @@ -137,7 +138,7 @@ out: } static void checkpoint_devices_setup(libxl__egc *egc, -libxl__checkpoint_devices_state *cds) + libxl__checkpoint_devices_state *cds) { int i, rc; @@ -285,12 +286,12 @@ static void devices_checkpoint_cb(libxl__egc *egc, /* API implementations */ -#define define_checkpoint_api(api)\ -void libxl__checkpoint_devices_##api(libxl__egc *egc,\ -libxl__checkpoint_devices_state *cds)\ +#define define_checkpoint_api(api) \ +void libxl__checkpoint_devices_##api(libxl__egc *egc, \ +libxl__checkpoint_devices_state *cds) \ { \ int i; \ -libxl__checkpoint_device *dev; \ +libxl__checkpoint_device *dev; \ \ STATE_AO_GC(cds-ao); \ \ diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 901e216..af992fc 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2765,7 +2765,8 @@ typedef struct libxl__save_helper_state { * Each device type needs to implement the interfaces specified in * the libxl__checkpoint_device_instance_ops if it wishes to support Remus. * - * The high-level control flow through the checkpoint device layer is shown below: + * The high-level control flow through the checkpoint device layer is shown + * below: * * xl remus * |- libxl_domain_remus_start @@ -2826,7 +2827,8 @@ int init_subkind_drbd_disk(libxl__checkpoint_devices_state *cds); void cleanup_subkind_drbd_disk(libxl__checkpoint_devices_state *cds); typedef void libxl__checkpoint_callback(libxl__egc *, - libxl__checkpoint_devices_state *, int rc); +libxl__checkpoint_devices_state *, +int rc); /* * State associated with a checkpoint invocation, including parameters @@ -2834,7 +2836,7 @@ typedef void libxl__checkpoint_callback(libxl__egc *, * save/restore machinery. */ struct libxl__checkpoint_devices_state { -/* must be set by caller of libxl__checkpoint_device_(setup|teardown) */ +/*-- must be set by caller of libxl__checkpoint_device_(setup|teardown) --*/ libxl__ao *ao; uint32_t domid; @@ -2847,7 +2849,8 @@ struct libxl__checkpoint_devices_state { /* * this array is allocated before setup the checkpoint devices by the * checkpoint abstract layer. - * devs may be NULL, means there's no checkpoint devices that has been set up. + * devs may be NULL, means there's no checkpoint devices that has been + * set up. *
[Xen-devel] [PATCH v4 --for 4.6 COLOPre 21/25] tools/libxl: rename remus device to checkpoint device
This patch is auto generated by the following commands: 1. git mv tools/libxl/libxl_remus_device.c tools/libxl/libxl_checkpoint_device.c 2. perl -pi -e 's/libxl_remus_device/libxl_checkpoint_device/g' tools/libxl/Makefile 3. perl -pi -e 's/\blibxl__remus_devices/libxl__checkpoint_devices/g' tools/libxl/*.[ch] 4. perl -pi -e 's/\blibxl__remus_device\b/libxl__checkpoint_device/g' tools/libxl/*.[ch] 5. perl -pi -e 's/\blibxl__remus_device_instance_ops\b/libxl__checkpoint_device_instance_ops/g' tools/libxl/*.[ch] 6. perl -pi -e 's/\blibxl__remus_callback\b/libxl__checkpoint_callback/g' tools/libxl/*.[ch] 7. perl -pi -e 's/\bremus_device_init\b/checkpoint_device_init/g' tools/libxl/*.[ch] 8. perl -pi -e 's/\bremus_devices_setup\b/checkpoint_devices_setup/g' tools/libxl/*.[ch] 9. perl -pi -e 's/\bdefine_remus_checkpoint_api\b/define_checkpoint_api/g' tools/libxl/*.[ch] 10. perl -pi -e 's/\brds\b/cds/g' tools/libxl/*.[ch] 11. perl -pi -e 's/REMUS_DEVICE/CHECKPOINT_DEVICE/g' tools/libxl/*.[ch] tools/libxl/*.idl 12. perl -pi -e 's/REMUS_DEVOPS/CHECKPOINT_DEVOPS/g' tools/libxl/*.[ch] tools/libxl/*.idl 13. perl -pi -e 's/\bremus\b/checkpoint/g' tools/libxl/libxl_checkpoint_device.[ch] 14. perl -pi -e 's/\bremus device/checkpoint device/g' tools/libxl/libxl_internal.h 15. perl -pi -e 's/\bRemus device/checkpoint device/g' tools/libxl/libxl_internal.h 16. perl -pi -e 's/\bremus abstract/checkpoint abstract/g' tools/libxl/libxl_internal.h 17. perl -pi -e 's/\bremus invocation/checkpoint invocation/g' tools/libxl/libxl_internal.h 18. perl -pi -e 's/\blibxl__remus_device_\(/libxl__checkpoint_device_(/g' tools/libxl/libxl_internal.h Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxl/Makefile | 2 +- tools/libxl/libxl_checkpoint_device.c | 327 ++ tools/libxl/libxl_internal.h | 112 ++-- tools/libxl/libxl_netbuffer.c | 108 +-- tools/libxl/libxl_nonetbuffer.c | 10 +- tools/libxl/libxl_remus.c | 76 tools/libxl/libxl_remus_device.c | 327 -- tools/libxl/libxl_remus_disk_drbd.c | 52 +++--- tools/libxl/libxl_types.idl | 4 +- 9 files changed, 509 insertions(+), 509 deletions(-) create mode 100644 tools/libxl/libxl_checkpoint_device.c delete mode 100644 tools/libxl/libxl_remus_device.c diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index 2e4c944..3cb3ae9 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -62,7 +62,7 @@ else LIBXL_OBJS-y += libxl_no_convert_callout.o endif -LIBXL_OBJS-y += libxl_remus.o libxl_remus_device.o libxl_remus_disk_drbd.o +LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o diff --git a/tools/libxl/libxl_checkpoint_device.c b/tools/libxl/libxl_checkpoint_device.c new file mode 100644 index 000..109cd23 --- /dev/null +++ b/tools/libxl/libxl_checkpoint_device.c @@ -0,0 +1,327 @@ +/* + * Copyright (C) 2014 FUJITSU LIMITED + * Author: Yang Hongyang yan...@cn.fujitsu.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include libxl_osdeps.h /* must come before any other headers */ + +#include libxl_internal.h + +extern const libxl__checkpoint_device_instance_ops remus_device_nic; +extern const libxl__checkpoint_device_instance_ops remus_device_drbd_disk; +static const libxl__checkpoint_device_instance_ops *remus_ops[] = { +remus_device_nic, +remus_device_drbd_disk, +NULL, +}; + +/*- helper functions -*/ + +static int init_device_subkind(libxl__checkpoint_devices_state *cds) +{ +/* init device subkind-specific state in the libxl ctx */ +int rc; +STATE_AO_GC(cds-ao); + +if (libxl__netbuffer_enabled(gc)) { +rc = init_subkind_nic(cds); +if (rc) goto out; +} + +rc = init_subkind_drbd_disk(cds); +if (rc) goto out; + +rc = 0; +out: +return rc; +} + +static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds) +{ +/* cleanup device subkind-specific state in the libxl ctx */ +STATE_AO_GC(cds-ao); + +if (libxl__netbuffer_enabled(gc)) +cleanup_subkind_nic(cds); + +cleanup_subkind_drbd_disk(cds); +} + +/*- setup() and teardown() -*/ + +/* callbacks */
[Xen-devel] [PATCH v4 --for 4.6 COLOPre 25/25] tools/libxl: seperate device init/cleanup from checkpoint device layer
we call (init|cleanup)_subkind_nic and (init|cleanup)_subkind_drbd_disk directly in checkpoint device. Move them to libxl_remus.c, Call them before calling libxl__checkpoint_devices_setup() or after calling libxl__checkpoint_devices_teardown(). it is pure refactoring and no functional changes. Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxl/libxl_checkpoint_device.c | 42 ++- tools/libxl/libxl_remus.c | 42 +++ 2 files changed, 44 insertions(+), 40 deletions(-) diff --git a/tools/libxl/libxl_checkpoint_device.c b/tools/libxl/libxl_checkpoint_device.c index bbc6dc4..0a16dbb 100644 --- a/tools/libxl/libxl_checkpoint_device.c +++ b/tools/libxl/libxl_checkpoint_device.c @@ -17,38 +17,6 @@ #include libxl_internal.h -/*- helper functions -*/ - -static int init_device_subkind(libxl__checkpoint_devices_state *cds) -{ -/* init device subkind-specific state in the libxl ctx */ -int rc; -STATE_AO_GC(cds-ao); - -if (libxl__netbuffer_enabled(gc)) { -rc = init_subkind_nic(cds); -if (rc) goto out; -} - -rc = init_subkind_drbd_disk(cds); -if (rc) goto out; - -rc = 0; -out: -return rc; -} - -static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds) -{ -/* cleanup device subkind-specific state in the libxl ctx */ -STATE_AO_GC(cds-ao); - -if (libxl__netbuffer_enabled(gc)) -cleanup_subkind_nic(cds); - -cleanup_subkind_drbd_disk(cds); -} - /*- setup() and teardown() -*/ /* callbacks */ @@ -86,14 +54,10 @@ static void checkpoint_devices_setup(libxl__egc *egc, void libxl__checkpoint_devices_setup(libxl__egc *egc, libxl__checkpoint_devices_state *cds) { -int i, rc; +int i; STATE_AO_GC(cds-ao); -rc = init_device_subkind(cds); -if (rc) -goto out; - cds-num_devices = 0; cds-num_nics = 0; cds-num_disks = 0; @@ -126,7 +90,7 @@ void libxl__checkpoint_devices_setup(libxl__egc *egc, return; out: -cds-callback(egc, cds, rc); +cds-callback(egc, cds, 0); } static void checkpoint_devices_setup(libxl__egc *egc, @@ -263,8 +227,6 @@ static void devices_teardown_cb(libxl__egc *egc, cds-disks = NULL; cds-num_disks = 0; -cleanup_device_subkind(cds); - cds-callback(egc, cds, rc); } diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c index 91abf8e..46dcc3c 100644 --- a/tools/libxl/libxl_remus.c +++ b/tools/libxl/libxl_remus.c @@ -26,6 +26,38 @@ static const libxl__checkpoint_device_instance_ops *remus_ops[] = { NULL, }; +/*- helper functions -*/ + +static int init_device_subkind(libxl__checkpoint_devices_state *cds) +{ +/* init device subkind-specific state in the libxl ctx */ +int rc; +STATE_AO_GC(cds-ao); + +if (libxl__netbuffer_enabled(gc)) { +rc = init_subkind_nic(cds); +if (rc) goto out; +} + +rc = init_subkind_drbd_disk(cds); +if (rc) goto out; + +rc = 0; +out: +return rc; +} + +static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds) +{ +/* cleanup device subkind-specific state in the libxl ctx */ +STATE_AO_GC(cds-ao); + +if (libxl__netbuffer_enabled(gc)) +cleanup_subkind_nic(cds); + +cleanup_subkind_drbd_disk(cds); +} + /* Remus setup and teardown -*/ static void remus_setup_done(libxl__egc *egc, @@ -60,6 +92,12 @@ void libxl__remus_setup(libxl__egc *egc, libxl__remus_state *rs) cds-ops = remus_ops; rs-interval = info-interval; +if (init_device_subkind(cds)) { +LOG(ERROR, Remus: failed to init device subkind for guest %u, +dss-domid); +goto out; +} + libxl__checkpoint_devices_setup(egc, cds); return; @@ -94,6 +132,8 @@ static void remus_setup_failed(libxl__egc *egc, LOG(ERROR, Remus: failed to teardown device after setup failed for guest with domid %u, rc %d, dss-domid, rc); +cleanup_device_subkind(cds); + dss-callback(egc, dss, rc); } @@ -123,6 +163,8 @@ static void remus_teardown_done(libxl__egc *egc, LOG(ERROR, Remus: failed to teardown device for guest with domid %u, rc %d, dss-domid, rc); +cleanup_device_subkind(cds); + dss-callback(egc, dss, rc); } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 --for 4.6 COLOPre 24/25] tools/libxl: move remus state into a seperate structure
Add a new structure remus state, and move concrete layer's private member to remus state. it is pure refactoring and no functional changes. Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxl/libxl.c | 2 +- tools/libxl/libxl_dom_save.c| 3 +-- tools/libxl/libxl_internal.h| 38 --- tools/libxl/libxl_netbuffer.c | 51 + tools/libxl/libxl_remus.c | 38 ++- tools/libxl/libxl_remus_disk_drbd.c | 8 +++--- 6 files changed, 79 insertions(+), 61 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index fcf91f1..5502709 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -845,7 +845,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info, assert(info); /* Point of no return */ -libxl__remus_setup(egc, dss); +libxl__remus_setup(egc, dss-rs); return AO_INPROGRESS; out: diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c index 9364a1d..9b7159f 100644 --- a/tools/libxl/libxl_dom_save.c +++ b/tools/libxl/libxl_dom_save.c @@ -428,7 +428,6 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss) | (dss-hvm ? XCFLAGS_HVM : 0); if (dss-checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_REMUS) { -dss-interval = r_info-interval; if (libxl_defbool_val(r_info-compression)) dss-xcflags |= XCFLAGS_CHECKPOINT_COMPRESS; } @@ -578,7 +577,7 @@ static void domain_save_done(libxl__egc *egc, * from sending checkpoints. Teardown the network buffers and * release netlink resources. This is an async op. */ -libxl__remus_teardown(egc, dss, rc); +libxl__remus_teardown(egc, dss-rs, rc); } /*= Domain restore */ diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index d92eabc..9c81d8d 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2864,16 +2864,6 @@ struct libxl__checkpoint_devices_state { int num_disks; libxl__multidev multidev; - -/*- private for concrete (device-specific) layer only -*/ - -/* private for nic device subkind ops */ -char *netbufscript; -struct nl_sock *nlsock; -struct nl_cache *qdisc_cache; - -/* private for drbd disk subkind ops */ -char *drbd_probe_script; }; /* @@ -2921,6 +2911,26 @@ _hidden void libxl__checkpoint_devices_preresume(libxl__egc *egc, libxl__checkpoint_devices_state *cds); _hidden void libxl__checkpoint_devices_commit(libxl__egc *egc, libxl__checkpoint_devices_state *cds); + +/*- Remus related state structure -*/ +typedef struct libxl__remus_state libxl__remus_state; +struct libxl__remus_state { +/* private */ +libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */ +int interval; /* checkpoint interval */ + +/* abstract layer */ +libxl__checkpoint_devices_state cds; + +/*- private for concrete (device-specific) layer only -*/ +/* private for nic device subkind ops */ +char *netbufscript; +struct nl_sock *nlsock; +struct nl_cache *qdisc_cache; + +/* private for drbd disk subkind ops */ +char *drbd_probe_script; +}; _hidden int libxl__netbuffer_enabled(libxl__gc *gc); /*- Legacy conversion helper -*/ @@ -3073,9 +3083,7 @@ struct libxl__domain_save_state { int hvm; int xcflags; libxl__domain_suspend_state dsps; -libxl__checkpoint_devices_state cds; -libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */ -int interval; /* checkpoint interval (for Remus) */ +libxl__remus_state rs; libxl__stream_write_state sws; libxl__logdirty_switch logdirty; /* private for libxl__domain_save_device_model */ @@ -3490,9 +3498,9 @@ _hidden void libxl__remus_domain_save_checkpoint_callback(void *data); _hidden void libxl__remus_domain_restore_checkpoint_callback(void *data); /* Remus setup and teardown*/ _hidden void libxl__remus_setup(libxl__egc *egc, -libxl__domain_save_state *dss); +libxl__remus_state *rs); _hidden void libxl__remus_teardown(libxl__egc *egc, - libxl__domain_save_state *dss, + libxl__remus_state *rs, int rc); /* diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c index 33c2a42..f7a8448 100644 --- a/tools/libxl/libxl_netbuffer.c +++ b/tools/libxl/libxl_netbuffer.c @@ -41,18 +41,19 @@ int libxl__netbuffer_enabled(libxl__gc *gc) int init_subkind_nic(libxl__checkpoint_devices_state *cds) { int rc, ret; -libxl__domain_save_state *dss
[Xen-devel] [PATCH v4 --for 4.6 COLOPre 08/25] tools/libxl: move remus code into libxl_remus.c
After previous refactoring, we are now able to move all remus code into a separate file libxl_remus.c. Export following functions for internal use: - Remus callbacks * libxl__remus_domain_suspend_callback * libxl__remus_domain_resume_callback * libxl__remus_domain_save_checkpoint_callback * libxl__remus_domain_restore_checkpoint_callback - setup/teardown Remus: * libxl__remus_setup * libxl__remus_teardown Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxl/Makefile | 2 +- tools/libxl/libxl.c | 67 - tools/libxl/libxl_create.c | 22 --- tools/libxl/libxl_dom.c | 223 tools/libxl/libxl_internal.h | 12 ++ tools/libxl/libxl_remus.c| 339 +++ 6 files changed, 352 insertions(+), 313 deletions(-) create mode 100644 tools/libxl/libxl_remus.c diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index 4a5957e..b10f4e7 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -62,7 +62,7 @@ else LIBXL_OBJS-y += libxl_no_convert_callout.o endif -LIBXL_OBJS-y += libxl_remus_device.o libxl_remus_disk_drbd.o +LIBXL_OBJS-y += libxl_remus.o libxl_remus_device.o libxl_remus_disk_drbd.o LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index acb5639..f1237d8 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -795,12 +795,6 @@ out: return ptr; } -static void libxl__remus_setup(libxl__egc *egc, - libxl__domain_suspend_state *dss); -static void remus_setup_done(libxl__egc *egc, - libxl__remus_devices_state *rds, int rc); -static void remus_setup_failed(libxl__egc *egc, - libxl__remus_devices_state *rds, int rc); static void remus_failover_cb(libxl__egc *egc, libxl__domain_suspend_state *dss, int rc); @@ -857,67 +851,6 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info, return AO_CREATE_FAIL(rc); } -static void libxl__remus_setup(libxl__egc *egc, - libxl__domain_suspend_state *dss) -{ -/* Convenience aliases */ -libxl__remus_devices_state *const rds = dss-rds; -const libxl_domain_remus_info *const info = dss-remus; - -STATE_AO_GC(dss-ao); - -if (libxl_defbool_val(info-netbuf)) { -if (!libxl__netbuffer_enabled(gc)) { -LOG(ERROR, Remus: No support for network buffering); -goto out; -} -rds-device_kind_flags |= (1 LIBXL__DEVICE_KIND_VIF); -} - -if (libxl_defbool_val(info-diskbuf)) -rds-device_kind_flags |= (1 LIBXL__DEVICE_KIND_VBD); - -rds-ao = ao; -rds-domid = dss-domid; -rds-callback = remus_setup_done; - -libxl__remus_devices_setup(egc, rds); -return; - -out: -dss-callback(egc, dss, ERROR_FAIL); -} - -static void remus_setup_done(libxl__egc *egc, - libxl__remus_devices_state *rds, int rc) -{ -libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds); -STATE_AO_GC(dss-ao); - -if (!rc) { -libxl__domain_save(egc, dss); -return; -} - -LOG(ERROR, Remus: failed to setup device for guest with domid %u, rc %d, -dss-domid, rc); -rds-callback = remus_setup_failed; -libxl__remus_devices_teardown(egc, rds); -} - -static void remus_setup_failed(libxl__egc *egc, - libxl__remus_devices_state *rds, int rc) -{ -libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds); -STATE_AO_GC(dss-ao); - -if (rc) -LOG(ERROR, Remus: failed to teardown device after setup failed - for guest with domid %u, rc %d, dss-domid, rc); - -dss-callback(egc, dss, rc); -} - static void remus_failover_cb(libxl__egc *egc, libxl__domain_suspend_state *dss, int rc) { diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 94fe98f..cbd7693 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -672,28 +672,6 @@ static int store_libxl_entry(libxl__gc *gc, uint32_t domid, libxl_device_model_version_to_string(b_info-device_model_version)); } -/*- remus asynchronous checkpoint callback -*/ - -static void remus_checkpoint_stream_done( -libxl__egc *egc, libxl__stream_read_state *srs, int rc); - -static void libxl__remus_domain_restore_checkpoint_callback(void *data) -{ -libxl__save_helper_state *shs = data; -libxl__domain_create_state *dcs = shs-caller_state; -libxl__egc *egc = shs-egc; -STATE_AO_GC(dcs-ao); - -dcs-srs.checkpoint_callback = remus_checkpoint_stream_done; -
[Xen-devel] [PATCH v4 --for 4.6 COLOPre 20/25] tools/libx{l, c}: add back channel to libxc
In COLO mode, both VMs are running, and are considered in sync if the visible network traffic is identical. After some time, they fall out of sync. At this point, the two VMs have definitely diverged. Lets call the primary dirty bitmap set A, while the secondary dirty bitmap set B. Sets A and B are different. Under normal migration, the page data for set A will be sent form the primary to the secondary. However, the set difference B - A (lets call this C) is out-of-date on the secondary (with respect to the primary) and will not be sent by the primary, as it was not memory dirtied by the primary. The secondary needs the page data for C to reconstruct an exact copy of the primary at the checkpoint. The secondary cannot calculate C as it doesn't know A. Instead, the secondary must send B to the primary, at which point the primary calculates the union of A and B (lets call this D) which is all the pages dirtied by both the primary and the secondary, and sends all page data covered by D. In the general case, D is a superset of both A and B. Without the backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid copy of the primary. We transfer the dirty bitmap on libxc side, so we need to introduce back channel to libxc. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com commit message: Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxc/include/xenguest.h | 8 tools/libxc/xc_domain_restore.c | 4 ++-- tools/libxc/xc_domain_save.c | 4 ++-- tools/libxc/xc_sr_restore.c | 2 +- tools/libxc/xc_sr_save.c | 2 +- tools/libxl/libxl_save_callout.c | 39 ++- tools/libxl/libxl_save_helper.c | 8 ++-- 7 files changed, 42 insertions(+), 25 deletions(-) diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index 6e24b6c..4056955 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -91,13 +91,13 @@ struct save_callbacks { int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters, uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */, struct save_callbacks* callbacks, int hvm, - int checkpointed_stream); + int checkpointed_stream, int back_fd); /* Domain Save v2 */ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters, uint32_t max_factor, uint32_t flags, struct save_callbacks* callbacks, int hvm, -int checkpointed_stream); +int checkpointed_stream, int back_fd); /* callbacks provided by xc_domain_restore */ struct restore_callbacks { @@ -140,7 +140,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, unsigned long *console_mfn, domid_t console_domid, unsigned int hvm, unsigned int pae, int superpages, int checkpointed_stream, - struct restore_callbacks *callbacks); + struct restore_callbacks *callbacks, int back_fd); /* Domain Restore v2 */ int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom, @@ -149,7 +149,7 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom, unsigned long *console_mfn, domid_t console_domid, unsigned int hvm, unsigned int pae, int superpages, int checkpointed_stream, - struct restore_callbacks *callbacks); + struct restore_callbacks *callbacks, int back_fd); /** * xc_domain_restore writes a file to disk that contains the device * model saved state. diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c index 3cd3483..63d1e6b 100644 --- a/tools/libxc/xc_domain_restore.c +++ b/tools/libxc/xc_domain_restore.c @@ -1515,7 +1515,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, unsigned long *console_mfn, domid_t console_domid, unsigned int hvm, unsigned int pae, int superpages, int checkpointed_stream, - struct restore_callbacks *callbacks) + struct restore_callbacks *callbacks, int back_fd) { DECLARE_DOMCTL; xc_dominfo_t info; @@ -1578,7 +1578,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, return xc_domain_restore2( xch, io_fd, dom, store_evtchn, store_mfn, store_domid, console_evtchn, console_mfn, console_domid, -hvm, pae, superpages, checkpointed_stream, callbacks); +hvm, pae, superpages, checkpointed_stream, callbacks, back_fd); } DPRINTF(%s: starting
[Xen-devel] [PATCH v4 --for 4.6 COLOPre 23/25] tools/libxl: store remus_ops in checkpoint device state
Checkpoint device is an abstract layer to do checkpoint. COLO can also use it to do checkpoint. But there are still some codes in checkpoint device which touch remus. This patch and the following 2 will seperate remus from checkpoint device layer. We use remus ops directly in checkpoint device. Store it in checkpoint device state so that we do not aware of remus_ops in the checkpoint device layer. it is pure refactoring and no functional changes. Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxl/libxl_checkpoint_device.c | 10 +- tools/libxl/libxl_internal.h | 2 ++ tools/libxl/libxl_remus.c | 9 + 3 files changed, 12 insertions(+), 9 deletions(-) diff --git a/tools/libxl/libxl_checkpoint_device.c b/tools/libxl/libxl_checkpoint_device.c index 226f159..bbc6dc4 100644 --- a/tools/libxl/libxl_checkpoint_device.c +++ b/tools/libxl/libxl_checkpoint_device.c @@ -17,14 +17,6 @@ #include libxl_internal.h -extern const libxl__checkpoint_device_instance_ops remus_device_nic; -extern const libxl__checkpoint_device_instance_ops remus_device_drbd_disk; -static const libxl__checkpoint_device_instance_ops *remus_ops[] = { -remus_device_nic, -remus_device_drbd_disk, -NULL, -}; - /*- helper functions -*/ static int init_device_subkind(libxl__checkpoint_devices_state *cds) @@ -172,7 +164,7 @@ static void device_setup_iterate(libxl__egc *egc, libxl__ao_device *aodev) goto out; do { -dev-ops = remus_ops[++dev-ops_index]; +dev-ops = dev-cds-ops[++dev-ops_index]; if (!dev-ops) { libxl_device_nic * nic = NULL; libxl_device_disk * disk = NULL; diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index af992fc..d92eabc 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2842,6 +2842,8 @@ struct libxl__checkpoint_devices_state { uint32_t domid; libxl__checkpoint_callback *callback; int device_kind_flags; +/* The ops must be pointer array, and the last ops must be NULL */ +const libxl__checkpoint_device_instance_ops **ops; /*- private for abstract layer only -*/ diff --git a/tools/libxl/libxl_remus.c b/tools/libxl/libxl_remus.c index fb21b6d..d2e4d42 100644 --- a/tools/libxl/libxl_remus.c +++ b/tools/libxl/libxl_remus.c @@ -18,6 +18,14 @@ #include libxl_internal.h +extern const libxl__checkpoint_device_instance_ops remus_device_nic; +extern const libxl__checkpoint_device_instance_ops remus_device_drbd_disk; +static const libxl__checkpoint_device_instance_ops *remus_ops[] = { +remus_device_nic, +remus_device_drbd_disk, +NULL, +}; + /* Remus setup and teardown -*/ static void remus_setup_done(libxl__egc *egc, @@ -48,6 +56,7 @@ void libxl__remus_setup(libxl__egc *egc, cds-ao = ao; cds-domid = dss-domid; cds-callback = remus_setup_done; +cds-ops = remus_ops; libxl__checkpoint_devices_setup(egc, cds); return; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 --for 4.6 COLOPre 10/25] libxl/save: Refactor libxl__domain_suspend_state
Currently struct libxl__domain_suspend_state contains 2 type of states, one is save state, another is suspend state. This patch separates those two out. The motivation of this is that COLO will need to do suspend/resume continuously, we need a more common suspend state. After this change, dss stands for libxl__domain_save_state, dsps stands for libxl__domain_suspend_state. Also introduce libxl__domain_suspend_init to initialise the libxl__domain_suspend_state. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com CC: Andrew Cooper andrew.coop...@citrix.com --- tools/libxl/libxl.c | 10 +- tools/libxl/libxl_dom_save.c | 69 + tools/libxl/libxl_dom_suspend.c | 217 +-- tools/libxl/libxl_internal.h | 60 +++ tools/libxl/libxl_netbuffer.c| 2 +- tools/libxl/libxl_remus.c| 37 --- tools/libxl/libxl_save_callout.c | 2 +- tools/libxl/libxl_stream_write.c | 14 +-- 8 files changed, 234 insertions(+), 177 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index f1237d8..05688cd 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -796,7 +796,7 @@ out: } static void remus_failover_cb(libxl__egc *egc, - libxl__domain_suspend_state *dss, int rc); + libxl__domain_save_state *dss, int rc); /* TODO: Explicit Checkpoint acknowledgements via recv_fd. */ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info, @@ -804,7 +804,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info, const libxl_asyncop_how *ao_how) { AO_CREATE(ctx, domid, ao_how); -libxl__domain_suspend_state *dss; +libxl__domain_save_state *dss; int rc; libxl_domain_type type = libxl__domain_type(gc, domid); @@ -852,7 +852,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info, } static void remus_failover_cb(libxl__egc *egc, - libxl__domain_suspend_state *dss, int rc) + libxl__domain_save_state *dss, int rc) { STATE_AO_GC(dss-ao); /* @@ -864,7 +864,7 @@ static void remus_failover_cb(libxl__egc *egc, } static void domain_suspend_cb(libxl__egc *egc, - libxl__domain_suspend_state *dss, int rc) + libxl__domain_save_state *dss, int rc) { STATE_AO_GC(dss-ao); libxl__ao_complete(egc,ao,rc); @@ -883,7 +883,7 @@ int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd, int flags, goto out_err; } -libxl__domain_suspend_state *dss; +libxl__domain_save_state *dss; GCNEW(dss); dss-ao = ao; diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c index d8383b1..6348cae 100644 --- a/tools/libxl/libxl_dom_save.c +++ b/tools/libxl/libxl_dom_save.c @@ -41,7 +41,7 @@ struct libxl__physmap_info { static void stream_done(libxl__egc *egc, libxl__stream_write_state *sws, int rc); static void domain_save_done(libxl__egc *egc, - libxl__domain_suspend_state *dss, int rc); + libxl__domain_save_state *dss, int rc); /*- complicated callback, called by xc_domain_save -*/ @@ -59,7 +59,7 @@ static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev, static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*, const char *watch_path, const char *event_path); static void switch_logdirty_done(libxl__egc *egc, - libxl__domain_suspend_state *dss, int rc); + libxl__domain_save_state *dss, int rc); static void logdirty_init(libxl__logdirty_switch *lds) { @@ -73,7 +73,7 @@ static void domain_suspend_switch_qemu_xen_traditional_logdirty libxl__save_helper_state *shs) { libxl__egc *egc = shs-egc; -libxl__domain_suspend_state *dss = shs-caller_state; +libxl__domain_save_state *dss = shs-caller_state; libxl__logdirty_switch *lds = dss-logdirty; STATE_AO_GC(dss-ao); int rc; @@ -145,7 +145,7 @@ static void domain_suspend_switch_qemu_xen_logdirty libxl__save_helper_state *shs) { libxl__egc *egc = shs-egc; -libxl__domain_suspend_state *dss = shs-caller_state; +libxl__domain_save_state *dss = shs-caller_state; STATE_AO_GC(dss-ao); int rc; @@ -164,7 +164,7 @@ void libxl__domain_suspend_common_switch_qemu_logdirty { libxl__save_helper_state *shs = user; libxl__egc *egc = shs-egc; -libxl__domain_suspend_state *dss = shs-caller_state; +libxl__domain_save_state *dss = shs-caller_state; STATE_AO_GC(dss-ao);
[Xen-devel] [PATCH v4 --for 4.6 COLOPre 18/25] tools/libxl: export logdirty_init
We need to enable logdirty on secondary, so we export logdirty_init for internal use. Rename it to libxl__logdirty_init. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com Acked-by: Ian Campbell ian.campb...@citrix.com --- tools/libxl/libxl_dom_save.c | 4 ++-- tools/libxl/libxl_internal.h | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c index ba7fc42..9364a1d 100644 --- a/tools/libxl/libxl_dom_save.c +++ b/tools/libxl/libxl_dom_save.c @@ -61,7 +61,7 @@ static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*, static void switch_logdirty_done(libxl__egc *egc, libxl__logdirty_switch *lds, int rc); -static void logdirty_init(libxl__logdirty_switch *lds) +void libxl__logdirty_init(libxl__logdirty_switch *lds) { lds-cmd_path = 0; libxl__ev_xswatch_init(lds-watch); @@ -403,7 +403,7 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss) } dss-rc = 0; -logdirty_init(dss-logdirty); +libxl__logdirty_init(dss-logdirty); dss-logdirty.ao = ao; dsps-ao = ao; diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 0b792e3..219176e 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -3025,6 +3025,8 @@ typedef struct libxl__logdirty_switch { libxl__ev_time timeout; } libxl__logdirty_switch; +_hidden void libxl__logdirty_init(libxl__logdirty_switch *lds); + struct libxl__domain_suspend_state { /* set by caller of libxl__domain_suspend_init */ libxl__ao *ao; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 --for 4.6 COLOPre 19/25] tools/libxl: Add back channel to allow migration target send data back
From: Wen Congyang we...@cn.fujitsu.com In colo mode, slave needs to send data to master, but the io_fd only can be written in master, and only can be read in slave. Save recv_fd in domain_suspend_state, and send_fd in domain_create_state. Extend libxl_domain_create_restore API, add a send_fd param to it. Add LIBXL_HAVE_CREATE_RESTORE_SEND_FD to indicate the API change. Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxl/libxl.c | 2 +- tools/libxl/libxl.h | 30 -- tools/libxl/libxl_create.c | 9 + tools/libxl/libxl_internal.h | 2 ++ tools/libxl/libxl_types.idl | 1 + tools/libxl/xl_cmdimpl.c | 8 +++- tools/ocaml/libs/xl/xenlight_stubs.c | 2 +- 7 files changed, 45 insertions(+), 9 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 799aead..fcf91f1 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -835,7 +835,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info, dss-callback = remus_failover_cb; dss-domid = domid; dss-fd = send_fd; -/* TODO do something with recv_fd */ +dss-recv_fd = recv_fd; dss-type = type; dss-live = 1; dss-debug = 0; diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 5a7308d..c492d20 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -617,6 +617,15 @@ typedef struct libxl__ctx libxl_ctx; #define LIBXL_HAVE_DOMAIN_CREATE_RESTORE_PARAMS 1 /* + * LIBXL_HAVE_DOMAIN_CREATE_RESTORE_SEND_FD 1 + * + * If this is defined, libxl_domain_create_restore()'s API has changed to + * include a send_fd param which used for libxl migration back channel + * during COLO FT. + */ +#define LIBXL_HAVE_DOMAIN_CREATE_RESTORE_SEND_FD 1 + +/* * LIBXL_HAVE_CREATEINFO_PVH * If this is defined, then libxl supports creation of a PVH guest. */ @@ -1089,7 +1098,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config, const libxl_asyncprogress_how *aop_console_how) LIBXL_EXTERNAL_CALLERS_ONLY; int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config, -uint32_t *domid, int restore_fd, +uint32_t *domid, int restore_fd, int send_fd, const libxl_domain_restore_params *params, const libxl_asyncop_how *ao_how, const libxl_asyncprogress_how *aop_console_how) @@ -1110,7 +1119,7 @@ int static inline libxl_domain_create_restore_0x040200( libxl_domain_restore_params_init(params); ret = libxl_domain_create_restore( -ctx, d_config, domid, restore_fd, params, ao_how, aop_console_how); +ctx, d_config, domid, restore_fd, -1, params, ao_how, aop_console_how); libxl_domain_restore_params_dispose(params); return ret; @@ -1118,6 +1127,23 @@ int static inline libxl_domain_create_restore_0x040200( #define libxl_domain_create_restore libxl_domain_create_restore_0x040200 +#elif defined(LIBXL_API_VERSION) LIBXL_API_VERSION = 0x040400 \ + LIBXL_API_VERSION 0x040600 + +int static inline libxl_domain_create_restore_0x040400( +libxl_ctx *ctx, libxl_domain_config *d_config, +uint32_t *domid, int restore_fd, +const libxl_domain_restore_params *params, +const libxl_asyncop_how *ao_how, +const libxl_asyncprogress_how *aop_console_how) +LIBXL_EXTERNAL_CALLERS_ONLY +{ +return libxl_domain_create_restore(ctx, d_config, domid, restore_fd, + -1, params, ao_how, aop_console_how); +} + +#define libxl_domain_create_restore libxl_domain_create_restore_0x040400 + #endif /* A progress report will be made via ao_console_how, of type diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index cbd7693..1d4b13b 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -1498,7 +1498,7 @@ static void domain_create_cb(libxl__egc *egc, int rc, uint32_t domid); static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config, -uint32_t *domid, int restore_fd, +uint32_t *domid, int restore_fd, int send_fd, const libxl_domain_restore_params *params, const libxl_asyncop_how *ao_how, const libxl_asyncprogress_how *aop_console_how) @@ -1512,6 +1512,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config, libxl_domain_config_init(cdcs-dcs.guest_config_saved); libxl_domain_config_copy(ctx, cdcs-dcs.guest_config_saved, d_config); cdcs-dcs.restore_fd = cdcs-dcs.libxc_fd = restore_fd; +
[Xen-devel] [PATCH v4 --for 4.6 COLOPre 16/25] tools/libxl: Update libxl_domain_unpause() to support qemu-xen
Currently, libxl__domain_unpause() only supports qemu-xen-traditional. Update it to support qemu-xen. We use libxl__domain_resume_device_model to unpause guest dm. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxl/libxl.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 5b2d045..799aead 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -941,8 +941,6 @@ out: int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid) { GC_INIT(ctx); -char *path; -char *state; int ret, rc = 0; libxl_domain_type type = libxl__domain_type(gc, domid); @@ -952,14 +950,11 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid) } if (type == LIBXL_DOMAIN_TYPE_HVM) { -uint32_t dm_domid = libxl_get_stubdom_id(ctx, domid); - -path = libxl__device_model_xs_path(gc, dm_domid, domid, /state); -state = libxl__xs_read(gc, XBT_NULL, path); -if (state != NULL !strcmp(state, paused)) { -libxl__qemu_traditional_cmd(gc, domid, continue); -libxl__wait_for_device_model_deprecated(gc, domid, running, - NULL, NULL, NULL); +rc = libxl__domain_resume_device_model(gc, domid); +if (rc 0) { +LIBXL__LOG(ctx, LIBXL__LOG_ERROR, failed to unpause device model + for domain %u:%d, domid, rc); +goto out; } } ret = xc_domain_unpause(ctx-xch, domid); -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 --for 4.6 COLOPre 17/25] tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty()
Secondary vm is running in colo mode, we need to send secondary vm's dirty page information to master at checkpoint, so we have to enable qemu logdirty on secondary. libxl__domain_suspend_common_switch_qemu_logdirty() is to enable qemu logdirty. But it uses domain_save_state, and calls libxl__xc_domain_saverestore_async_callback_done() before exits. This can not be used for secondary vm. Update libxl__domain_suspend_common_switch_qemu_logdirty() to introduce a new API libxl__domain_common_switch_qemu_logdirty(). This API only uses libxl__logdirty_switch, and calls lds-callback before exits. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com CC: Andrew Cooper andrew.coop...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com --- tools/libxl/libxl_dom_save.c | 93 tools/libxl/libxl_internal.h | 8 2 files changed, 59 insertions(+), 42 deletions(-) diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c index 0926b71..ba7fc42 100644 --- a/tools/libxl/libxl_dom_save.c +++ b/tools/libxl/libxl_dom_save.c @@ -59,7 +59,7 @@ static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev, static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*, const char *watch_path, const char *event_path); static void switch_logdirty_done(libxl__egc *egc, - libxl__domain_save_state *dss, int rc); + libxl__logdirty_switch *lds, int rc); static void logdirty_init(libxl__logdirty_switch *lds) { @@ -69,13 +69,10 @@ static void logdirty_init(libxl__logdirty_switch *lds) } static void domain_suspend_switch_qemu_xen_traditional_logdirty - (int domid, unsigned enable, -libxl__save_helper_state *shs) + (libxl__egc *egc, int domid, unsigned enable, +libxl__logdirty_switch *lds) { -libxl__egc *egc = shs-egc; -libxl__domain_save_state *dss = shs-caller_state; -libxl__logdirty_switch *lds = dss-logdirty; -STATE_AO_GC(dss-ao); +STATE_AO_GC(lds-ao); int rc; xs_transaction_t t = 0; const char *got; @@ -137,26 +134,34 @@ static void domain_suspend_switch_qemu_xen_traditional_logdirty out: LOG(ERROR,logdirty switch failed (rc=%d), abandoning suspend,rc); libxl__xs_transaction_abort(gc, t); -switch_logdirty_done(egc,dss,rc); +switch_logdirty_done(egc,lds,rc); } static void domain_suspend_switch_qemu_xen_logdirty - (int domid, unsigned enable, -libxl__save_helper_state *shs) + (libxl__egc *egc, int domid, unsigned enable, +libxl__logdirty_switch *lds) { -libxl__egc *egc = shs-egc; -libxl__domain_save_state *dss = shs-caller_state; -STATE_AO_GC(dss-ao); +STATE_AO_GC(lds-ao); int rc; rc = libxl__qmp_set_global_dirty_log(gc, domid, enable); -if (!rc) { -libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0); -} else { +if (rc) LOG(ERROR,logdirty switch failed (rc=%d), abandoning suspend,rc); + +lds-callback(egc, lds, rc); +} + +static void domain_suspend_switch_qemu_logdirty_done +(libxl__egc *egc, libxl__logdirty_switch *lds, int rc) +{ +libxl__domain_save_state *dss = CONTAINER_OF(lds, *dss, logdirty); + +if (rc) { dss-rc = rc; -libxl__xc_domain_saverestore_async_callback_done(egc, shs, -1); -} +libxl__xc_domain_saverestore_async_callback_done(egc, + dss-sws.shs, -1); +} else +libxl__xc_domain_saverestore_async_callback_done(egc, dss-sws.shs, 0); } void libxl__domain_suspend_common_switch_qemu_logdirty @@ -165,39 +170,49 @@ void libxl__domain_suspend_common_switch_qemu_logdirty libxl__save_helper_state *shs = user; libxl__egc *egc = shs-egc; libxl__domain_save_state *dss = shs-caller_state; -STATE_AO_GC(dss-ao); + +/* convenience aliases */ +libxl__logdirty_switch *const lds = dss-logdirty; + +lds-callback = domain_suspend_switch_qemu_logdirty_done; +libxl__domain_common_switch_qemu_logdirty(egc, domid, enable, lds); +} + +void libxl__domain_common_switch_qemu_logdirty(libxl__egc *egc, + int domid, unsigned enable, + libxl__logdirty_switch *lds) +{ +STATE_AO_GC(lds-ao); switch (libxl__device_model_version_running(gc, domid)) { case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: -domain_suspend_switch_qemu_xen_traditional_logdirty(domid, enable, shs); +domain_suspend_switch_qemu_xen_traditional_logdirty(egc, domid, enable, +
Re: [Xen-devel] [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
On 15.07.15 at 04:40, feng...@intel.com wrote: -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Friday, July 10, 2015 9:08 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: Re: [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts On 24.06.15 at 07:18, feng...@intel.com wrote: @@ -81,8 +81,19 @@ struct vmx_domain { struct pi_desc { DECLARE_BITMAP(pir, NR_VECTORS); -u32 control; -u32 rsvd[7]; +union { +struct +{ +u16 on : 1, /* bit 256 - Outstanding Notification */ +sn : 1, /* bit 257 - Suppress Notification */ +rsvd_1 : 14; /* bit 271:258 - Reserved */ +u8 nv; /* bit 279:272 - Notification Vector */ +u8 rsvd_2; /* bit 287:280 - Reserved */ +u32 ndst;/* bit 319:288 - Notification Destination */ +}; +u64 control; +}; So current code, afaics, uses e.g. test_and_set_bit() to set ON. By also declaring this as a bitfield you're opening the structure for non-atomic accesses. If that's correct, why is other code not being changed to _only_ use the bitfield mechanism (likely also eliminating the need for it being a union with the now 64-bit control? If atomic accesses are required, then I'd strongly suggest against making this a bit field. And in no event can I see why ndst needs to be union-ized with control if it doesn't need to be updated atomically with e.g. nv. When the vCPU is to be blocked, we need to atomically update the nv and ndst, then the wakeup notification event can be delivered to the right destination. Okay. Your reply made me go through the patches again to check where updates to nv/ndst happen - what's the reason they aren't being updated as a pair in patch 14's RUNSTATE_running handling (or in the replacement draft's vmx_ctxt_switch_to() adjustment)? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On 2015/7/15 16:34, Jan Beulich wrote: On 15.07.15 at 06:27, tiejun.c...@intel.com wrote: Furthermore, could we have this solution as follows? Yet more special casing code you want to add. I said no to this model, and unless you can address the issue _without_ adding a lot of special casing code, the answer will remain no (subject What about this? @@ -301,6 +301,19 @@ void pci_setup(void) pci_mem_start = 1; } +for ( i = 0; i memory_map.nr_map ; i++ ) +{ +uint64_t reserved_start, reserved_size; +reserved_start = memory_map.map[i].addr; +reserved_size = memory_map.map[i].size; +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start, + reserved_start, reserved_size) ) +{ +printf(Reserved device memory conflicts current PCI memory.\n); +BUG(); +} +} + if ( mmio_total (pci_mem_end - pci_mem_start) ) { printf(Low MMIO hole not large enough for all devices, This is very similar to our current policy to [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6 since actually this is also another rare possibility in real world. Even I can do this as well when we handle that conflict with [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6. Note its not necessary to concern high memory since we already handle this case in the hv code previously, and its also not affected by those relocated memory later since our previous policy can make sure RAM isn't overlapping with RDM. Thanks Tiejun to co-maintainers overriding me). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 --for 4.6 COLO 25/25] cmdline switches and config vars to control colo-proxy
Add cmdline switches to 'xl migrate-receive' command to specify a domain-specific hotplug script to setup COLO proxy. Add a new config var 'colo.default.agentscript' to xl.conf, that allows the user to override the default global script used to setup COLO proxy. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- docs/man/xl.conf.pod.5 | 6 ++ docs/man/xl.pod.1 | 1 - tools/libxl/libxl.c | 6 ++ tools/libxl/libxl_create.c | 14 +++-- tools/libxl/libxl_types.idl | 1 + tools/libxl/xl.c| 3 +++ tools/libxl/xl.h| 1 + tools/libxl/xl_cmdimpl.c| 50 ++--- 8 files changed, 67 insertions(+), 15 deletions(-) diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5 index 8ae19bb..8f7fd28 100644 --- a/docs/man/xl.conf.pod.5 +++ b/docs/man/xl.conf.pod.5 @@ -111,6 +111,12 @@ Configures the default script used by Remus to setup network buffering. Default: C/etc/xen/scripts/remus-netbuf-setup +=item Bcolo.default.proxyscript=PATH + +Configures the default script used by COLO to setup colo-proxy. + +Default: C/etc/xen/scripts/colo-proxy-setup + =item Boutput_format=json|sxp Configures the default output format used by xl when printing machine diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1 index 1effce7..a7ac32f 100644 --- a/docs/man/xl.pod.1 +++ b/docs/man/xl.pod.1 @@ -454,7 +454,6 @@ N.B: Remus support in xl is still in experimental (proof-of-concept) phase. Disk replication support is limited to DRBD disks. COLO support in xl is still in experimental (proof-of-concept) phase. - There is no support for network at the moment. BOPTIONS diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index c6cc5aa..75372ea 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -3305,6 +3305,11 @@ void libxl__device_nic_add(libxl__egc *egc, uint32_t domid, flexarray_append(back, nic-ifname); } +if (nic-forwarddev) { +flexarray_append(back, forwarddev); +flexarray_append(back, nic-forwarddev); +} + flexarray_append(back, mac); flexarray_append(back,libxl__sprintf(gc, LIBXL_MAC_FMT, LIBXL_MAC_BYTES(nic-mac))); @@ -3428,6 +3433,7 @@ static int libxl__device_nic_from_xs_be(libxl__gc *gc, nic-ip = READ_BACKEND(NOGC, ip); nic-bridge = READ_BACKEND(NOGC, bridge); nic-script = READ_BACKEND(NOGC, script); +nic-forwarddev = READ_BACKEND(NOGC, forwarddev); /* vif_ioemu nics use the same xenstore entries as vif interfaces */ tmp = READ_BACKEND(gc, type); diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index d99d5ef..7de2e89 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -1089,6 +1089,11 @@ static void domcreate_bootloader_done(libxl__egc *egc, crs-recv_fd = restore_fd; crs-hvm = (info-type == LIBXL_DOMAIN_TYPE_HVM); crs-callback = libxl__colo_restore_setup_done; +if (dcs-colo_proxy_script) +crs-colo_proxy_script = libxl__strdup(gc, dcs-colo_proxy_script); +else +crs-colo_proxy_script = GCSPRINTF(%s/colo-proxy-setup, + libxl__xen_script_dir_path()); libxl__colo_restore_setup(egc, crs); } else libxl__stream_read_start(egc, dcs-srs); @@ -1612,6 +1617,7 @@ static void domain_create_cb(libxl__egc *egc, static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config, uint32_t *domid, int restore_fd, int send_fd, const libxl_domain_restore_params *params, +const char *colo_proxy_script, const libxl_asyncop_how *ao_how, const libxl_asyncprogress_how *aop_console_how) { @@ -1628,6 +1634,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config, if (restore_fd -1) cdcs-dcs.restore_params = *params; cdcs-dcs.callback = domain_create_cb; +cdcs-dcs.colo_proxy_script = colo_proxy_script; libxl__ao_progress_gethow(cdcs-dcs.aop_console_how, aop_console_how); cdcs-domid_out = domid; @@ -1670,7 +1677,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config, const libxl_asyncprogress_how *aop_console_how) { unset_disk_colo_restore(d_config); -return do_domain_create(ctx, d_config, domid, -1, -1, NULL, +return do_domain_create(ctx, d_config, domid, -1, -1, NULL, NULL, ao_how, aop_console_how); } @@ -1680,14 +1687,17 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config, const libxl_asyncop_how *ao_how, const libxl_asyncprogress_how *aop_console_how) {
[Xen-devel] [PATCH v4 --for 4.6 COLOPre 05/25] libxl/remus: introduce libxl__remus_setup
Refactoring Remus setup by introducing libxl__remus_setup API. All Remus setup work are done in this function. Also remove the libxl__ prefix for static functions. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxl/libxl.c | 46 ++ 1 file changed, 30 insertions(+), 16 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 69a6937..acb5639 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -795,10 +795,12 @@ out: return ptr; } -static void libxl__remus_setup_done(libxl__egc *egc, -libxl__remus_devices_state *rds, int rc); -static void libxl__remus_setup_failed(libxl__egc *egc, - libxl__remus_devices_state *rds, int rc); +static void libxl__remus_setup(libxl__egc *egc, + libxl__domain_suspend_state *dss); +static void remus_setup_done(libxl__egc *egc, + libxl__remus_devices_state *rds, int rc); +static void remus_setup_failed(libxl__egc *egc, + libxl__remus_devices_state *rds, int rc); static void remus_failover_cb(libxl__egc *egc, libxl__domain_suspend_state *dss, int rc); @@ -847,13 +849,26 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info, assert(info); +/* Point of no return */ +libxl__remus_setup(egc, dss); +return AO_INPROGRESS; + + out: +return AO_CREATE_FAIL(rc); +} + +static void libxl__remus_setup(libxl__egc *egc, + libxl__domain_suspend_state *dss) +{ /* Convenience aliases */ libxl__remus_devices_state *const rds = dss-rds; +const libxl_domain_remus_info *const info = dss-remus; + +STATE_AO_GC(dss-ao); if (libxl_defbool_val(info-netbuf)) { if (!libxl__netbuffer_enabled(gc)) { LOG(ERROR, Remus: No support for network buffering); -rc = ERROR_FAIL; goto out; } rds-device_kind_flags |= (1 LIBXL__DEVICE_KIND_VIF); @@ -863,19 +878,18 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info, rds-device_kind_flags |= (1 LIBXL__DEVICE_KIND_VBD); rds-ao = ao; -rds-domid = domid; -rds-callback = libxl__remus_setup_done; +rds-domid = dss-domid; +rds-callback = remus_setup_done; -/* Point of no return */ libxl__remus_devices_setup(egc, rds); -return AO_INPROGRESS; +return; - out: -return AO_CREATE_FAIL(rc); +out: +dss-callback(egc, dss, ERROR_FAIL); } -static void libxl__remus_setup_done(libxl__egc *egc, -libxl__remus_devices_state *rds, int rc) +static void remus_setup_done(libxl__egc *egc, + libxl__remus_devices_state *rds, int rc) { libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds); STATE_AO_GC(dss-ao); @@ -887,12 +901,12 @@ static void libxl__remus_setup_done(libxl__egc *egc, LOG(ERROR, Remus: failed to setup device for guest with domid %u, rc %d, dss-domid, rc); -rds-callback = libxl__remus_setup_failed; +rds-callback = remus_setup_failed; libxl__remus_devices_teardown(egc, rds); } -static void libxl__remus_setup_failed(libxl__egc *egc, - libxl__remus_devices_state *rds, int rc) +static void remus_setup_failed(libxl__egc *egc, + libxl__remus_devices_state *rds, int rc) { libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds); STATE_AO_GC(dss-ao); -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 --for 4.6 COLOPre 02/25] tools/libxl: move domain suspend code into libxl_dom_suspend.c
Move domain suspend code into a separate file libxl_dom_suspend.c. Add an API libxl__domain_suspend() which wraps the static function domain_suspend_callback_common() for internal use. Export the existing API libxl__domain_suspend_callback() used by libxc to suspend the guest during migration. Note that the newly added file libxl_dom_suspend.c is used for suspend/resume code. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com --- tools/libxl/Makefile| 3 +- tools/libxl/libxl_dom.c | 342 +--- tools/libxl/libxl_dom_suspend.c | 380 tools/libxl/libxl_internal.h| 6 + 4 files changed, 389 insertions(+), 342 deletions(-) create mode 100644 tools/libxl/libxl_dom_suspend.c diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index 0150ec7..4a5957e 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -102,7 +102,8 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \ libxl_json.o libxl_aoutils.o libxl_numa.o libxl_vnuma.o \ libxl_stream_read.o libxl_stream_write.o \ libxl_save_callout.o _libxl_save_msgs_callout.o \ - libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y) + libxl_qmp.o libxl_event.o libxl_fork.o \ + libxl_dom_suspend.o $(LIBXL_OBJS-y) LIBXL_OBJS += libxl_genid.o LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 3bbec99..e21e110 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -1157,8 +1157,6 @@ static void stream_done(libxl__egc *egc, libxl__stream_write_state *sws, int rc); static void domain_save_done(libxl__egc *egc, libxl__domain_suspend_state *dss, int rc); -static void domain_suspend_callback_common_done(libxl__egc *egc, -libxl__domain_suspend_state *dss, int rc); /*- complicated callback, called by xc_domain_save -*/ @@ -1386,35 +1384,6 @@ static void switch_logdirty_done(libxl__egc *egc, /*- callbacks, called by xc_domain_save -*/ -int libxl__domain_suspend_device_model(libxl__gc *gc, - libxl__domain_suspend_state *dss) -{ -int ret = 0; -uint32_t const domid = dss-domid; -const char *const filename = dss-dm_savefile; - -switch (libxl__device_model_version_running(gc, domid)) { -case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: { -LOG(DEBUG, Saving device model state to %s, filename); -libxl__qemu_traditional_cmd(gc, domid, save); -libxl__wait_for_device_model_deprecated(gc, domid, paused, NULL, NULL, NULL); -break; -} -case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN: -if (libxl__qmp_stop(gc, domid)) -return ERROR_FAIL; -/* Save DM state into filename */ -ret = libxl__qmp_save(gc, domid, filename); -if (ret) -unlink(filename); -break; -default: -return ERROR_INVAL; -} - -return ret; -} - int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid) { @@ -1435,298 +1404,6 @@ int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid) return 0; } -static void domain_suspend_common_wait_guest(libxl__egc *egc, - libxl__domain_suspend_state *dss); -static void domain_suspend_common_guest_suspended(libxl__egc *egc, - libxl__domain_suspend_state *dss); - -static void domain_suspend_common_pvcontrol_suspending(libxl__egc *egc, - libxl__xswait_state *xswa, int rc, const char *state); -static void domain_suspend_common_wait_guest_evtchn(libxl__egc *egc, -libxl__ev_evtchn *evev); -static void suspend_common_wait_guest_watch(libxl__egc *egc, - libxl__ev_xswatch *xsw, const char *watch_path, const char *event_path); -static void suspend_common_wait_guest_check(libxl__egc *egc, -libxl__domain_suspend_state *dss); -static void suspend_common_wait_guest_timeout(libxl__egc *egc, - libxl__ev_time *ev, const struct timeval *requested_abs, int rc); - -static void domain_suspend_common_done(libxl__egc *egc, - libxl__domain_suspend_state *dss, - int rc); - -static bool domain_suspend_pvcontrol_acked(const char *state) { -/* any value other than suspend, including ENOENT (i.e. !state), is OK */ -if (!state) return 1; -return strcmp(state,suspend); -} - -/* calls dss-callback_common_done when done */ -static void domain_suspend_callback_common(libxl__egc *egc, -
[Xen-devel] [PATCH v4 --for 4.6 COLOPre 03/25] tools/libxl: move domain resume code into libxl_dom_suspend.c
move domain resume code into libxl_dom_suspend.c. pure code move. libxl__domain_resume_device_model() will be used later by COLO, so we are not making this func static. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com --- tools/libxl/libxl.c | 33 - tools/libxl/libxl_dom.c | 20 --- tools/libxl/libxl_dom_suspend.c | 55 + 3 files changed, 55 insertions(+), 53 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index fa42c1c..69a6937 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -513,39 +513,6 @@ int libxl_domain_rename(libxl_ctx *ctx, uint32_t domid, return rc; } -int libxl__domain_resume(libxl__gc *gc, uint32_t domid, int suspend_cancel) -{ -int rc = 0; - -if (xc_domain_resume(CTX-xch, domid, suspend_cancel)) { -LOGE(ERROR, xc_domain_resume failed for domain %u, domid); -rc = ERROR_FAIL; -goto out; -} - -libxl_domain_type type = libxl__domain_type(gc, domid); -if (type == LIBXL_DOMAIN_TYPE_INVALID) { -rc = ERROR_FAIL; -goto out; -} - -if (type == LIBXL_DOMAIN_TYPE_HVM) { -rc = libxl__domain_resume_device_model(gc, domid); -if (rc) { -LOG(ERROR, failed to resume device model for domain %u:%d, -domid, rc); -goto out; -} -} - -if (!xs_resume_domain(CTX-xsh, domid)) { -LOGE(ERROR, xs_resume_domain failed for domain %u, domid); -rc = ERROR_FAIL; -} -out: -return rc; -} - int libxl_domain_resume(libxl_ctx *ctx, uint32_t domid, int suspend_cancel, const libxl_asyncop_how *ao_how) { diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index e21e110..0788309 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -1384,26 +1384,6 @@ static void switch_logdirty_done(libxl__egc *egc, /*- callbacks, called by xc_domain_save -*/ -int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid) -{ - -switch (libxl__device_model_version_running(gc, domid)) { -case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: { -libxl__qemu_traditional_cmd(gc, domid, continue); -libxl__wait_for_device_model_deprecated(gc, domid, running, NULL, NULL, NULL); -break; -} -case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN: -if (libxl__qmp_resume(gc, domid)) -return ERROR_FAIL; -break; -default: -return ERROR_INVAL; -} - -return 0; -} - static inline char *physmap_path(libxl__gc *gc, uint32_t dm_domid, uint32_t domid, char *phys_offset, char *node) diff --git a/tools/libxl/libxl_dom_suspend.c b/tools/libxl/libxl_dom_suspend.c index 5146402..a90800d 100644 --- a/tools/libxl/libxl_dom_suspend.c +++ b/tools/libxl/libxl_dom_suspend.c @@ -371,6 +371,61 @@ static void domain_suspend_callback_common_done(libxl__egc *egc, libxl__xc_domain_saverestore_async_callback_done(egc, dss-sws.shs, !rc); } +/*=== Domain resume */ + +int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid) +{ + +switch (libxl__device_model_version_running(gc, domid)) { +case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: { +libxl__qemu_traditional_cmd(gc, domid, continue); +libxl__wait_for_device_model_deprecated(gc, domid, running, NULL, NULL, NULL); +break; +} +case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN: +if (libxl__qmp_resume(gc, domid)) +return ERROR_FAIL; +break; +default: +return ERROR_INVAL; +} + +return 0; +} + +int libxl__domain_resume(libxl__gc *gc, uint32_t domid, int suspend_cancel) +{ +int rc = 0; + +if (xc_domain_resume(CTX-xch, domid, suspend_cancel)) { +LOGE(ERROR, xc_domain_resume failed for domain %u, domid); +rc = ERROR_FAIL; +goto out; +} + +libxl_domain_type type = libxl__domain_type(gc, domid); +if (type == LIBXL_DOMAIN_TYPE_INVALID) { +rc = ERROR_FAIL; +goto out; +} + +if (type == LIBXL_DOMAIN_TYPE_HVM) { +rc = libxl__domain_resume_device_model(gc, domid); +if (rc) { +LOG(ERROR, failed to resume device model for domain %u:%d, +domid, rc); +goto out; +} +} + +if (!xs_resume_domain(CTX-xsh, domid)) { +LOGE(ERROR, xs_resume_domain failed for domain %u, domid); +rc = ERROR_FAIL; +} +out: +return rc; +} + /* * Local variables: * mode: C -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 --for 4.6 COLOPre 00/25] Prerequisite patches for COLO
This patchset is Prerequisite for COLO feature. Refer to: http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping This patchse is based on Andrew Cooper's Libxl migration v4.1: http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/libxl-migv2-v4.1 In this version, I moved some of the COLO specific patches down to the COLO main series, so most patches of this series are refactoring and can be applied first. I've done some simple test. Both Remus and normal migration work after apply this patchset. The patch to fix Remus on migration v2 will be sent later as a seperate patch. You can also get the patchset from: https://github.com/macrosheep/xen/tree/colo-v8 v3-v4: - Rebased to the latest migration v2 branch - Addressed comments from last round v2-v3: - Merge '[PATCH v2 0/6] Misc cleanups for libxl' into this patchset for easy review - Addressed review comments - Add back channel to libxc - Introduce should_checkpoint callback - Introduce DIRTY_BITMAP record on libxc side - Introduce COLO_CONTEXT record on libxl side - Ported to Libxl migration v2 v1-v2: - Rebased to [PATCH v2 0/6] Misc cleanups for libxl - Add a bugfix for the error handling of process_record Wen Congyang (2): tools/libxc: support to resume uncooperative HVM guests tools/libxl: Add back channel to allow migration target send data back Yang Hongyang (23): tools/libxl: rename libxl__domain_suspend to libxl__domain_save A tools/libxl: move domain suspend code into libxl_dom_suspend.c A tools/libxl: move domain resume code into libxl_dom_suspend.c tools/libxl: rename remus checkpoint callbacks libxl/remus: introduce libxl__remus_setup libxl/remus: introduce libxl__remus_teardown libxl/remus: init checkpoint_callback in Remus checkpoint callback tools/libxl: move remus code into libxl_remus.c A tools/libxl: move save/restore code into libxl_dom_save.c libxl/save: Refactor libxl__domain_suspend_state tools/libxl: introduce enum type libxl_checkpointed_stream migration/save: pass checkpointed_stream from libxl to libxc tools/libxl: introduce libxl__domain_restore_device_model to load qemu state tools/libxl: check QEMU state before resume dm tools/libxl: Update libxl_domain_unpause() to support qemu-xen A tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty() A tools/libxl: export logdirty_init tools/libx{l,c}: add back channel to libxc tools/libxl: rename remus device to checkpoint device A tools/libxl: adjust the indentation tools/libxl: store remus_ops in checkpoint device state tools/libxl: move remus state into a seperate structure tools/libxl: seperate device init/cleanup from checkpoint device layer tools/libxc/include/xenguest.h| 13 +- tools/libxc/xc_domain_restore.c |4 +- tools/libxc/xc_domain_save.c |6 +- tools/libxc/xc_nomigrate.c|3 +- tools/libxc/xc_resume.c | 22 +- tools/libxc/xc_sr_common.h|2 +- tools/libxc/xc_sr_restore.c |2 +- tools/libxc/xc_sr_save.c |5 +- tools/libxl/Makefile |5 +- tools/libxl/libxl.c | 119 +--- tools/libxl/libxl.h | 30 +- tools/libxl/libxl_checkpoint_device.c | 282 tools/libxl/libxl_create.c| 33 +- tools/libxl/libxl_dom.c | 1243 - tools/libxl/libxl_dom_save.c | 721 +++ tools/libxl/libxl_dom_suspend.c | 503 + tools/libxl/libxl_internal.h | 246 --- tools/libxl/libxl_netbuffer.c | 117 ++-- tools/libxl/libxl_nonetbuffer.c | 10 +- tools/libxl/libxl_qmp.c | 10 + tools/libxl/libxl_remus.c | 395 +++ tools/libxl/libxl_remus_device.c | 327 - tools/libxl/libxl_remus_disk_drbd.c | 56 +- tools/libxl/libxl_save_callout.c | 43 +- tools/libxl/libxl_save_helper.c |9 +- tools/libxl/libxl_stream_write.c | 14 +- tools/libxl/libxl_types.idl | 10 +- tools/libxl/xl_cmdimpl.c | 21 +- tools/ocaml/libs/xl/xenlight_stubs.c |2 +- 29 files changed, 2321 insertions(+), 1932 deletions(-) create mode 100644 tools/libxl/libxl_checkpoint_device.c create mode 100644 tools/libxl/libxl_dom_save.c create mode 100644 tools/libxl/libxl_dom_suspend.c create mode 100644 tools/libxl/libxl_remus.c delete mode 100644 tools/libxl/libxl_remus_device.c -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 --for 4.6 COLOPre 06/25] libxl/remus: introduce libxl__remus_teardown
introduce libxl__remus_teardown to teardown Remus devices. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxl/libxl_dom.c | 12 1 file changed, 12 insertions(+) diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 9c61fa7..77a917c 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -1865,6 +1865,9 @@ static void save_device_model_datacopier_done(libxl__egc *egc, dss-save_dm_callback(egc, dss, our_rc); } +static void libxl__remus_teardown(libxl__egc *egc, + libxl__domain_suspend_state *dss, + int rc); static void remus_teardown_done(libxl__egc *egc, libxl__remus_devices_state *rds, int rc); @@ -1894,6 +1897,15 @@ static void domain_save_done(libxl__egc *egc, * from sending checkpoints. Teardown the network buffers and * release netlink resources. This is an async op. */ +libxl__remus_teardown(egc, dss, rc); +} + +static void libxl__remus_teardown(libxl__egc *egc, + libxl__domain_suspend_state *dss, + int rc) +{ +EGC_GC; + LOG(WARN, Remus: Domain suspend terminated with rc %d, teardown Remus devices..., rc); dss-rds.callback = remus_teardown_done; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 --for 4.6 COLOPre 04/25] tools/libxl: rename remus checkpoint callbacks
There are 2 remus checkpoint callbacks(save/restore), currently, they both called libxl__remus_domain_checkpoint_callback in diffrent file, so it is ok. But in the following patch, we will move all of the remus callback code into a seperate file, the name should be diffrent. So rename them to: libxl__remus_domain_{save/restore}_checkpoint_callback Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxl/libxl_create.c | 4 ++-- tools/libxl/libxl_dom.c| 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 5b4d333..a32e3df 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -677,7 +677,7 @@ static int store_libxl_entry(libxl__gc *gc, uint32_t domid, static void remus_checkpoint_stream_done( libxl__egc *egc, libxl__stream_read_state *srs, int rc); -static void libxl__remus_domain_checkpoint_callback(void *data) +static void libxl__remus_domain_restore_checkpoint_callback(void *data) { libxl__save_helper_state *shs = data; libxl__domain_create_state *dcs = shs-caller_state; @@ -989,7 +989,7 @@ static void domcreate_bootloader_done(libxl__egc *egc, } /* Restore */ -callbacks-checkpoint = libxl__remus_domain_checkpoint_callback; +callbacks-checkpoint = libxl__remus_domain_restore_checkpoint_callback; rc = libxl__build_pre(gc, domid, d_config, state); if (rc) diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 0788309..9c61fa7 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -1586,7 +1586,7 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev, const struct timeval *requested_abs, int rc); -static void libxl__remus_domain_checkpoint_callback(void *data) +static void libxl__remus_domain_save_checkpoint_callback(void *data) { libxl__save_helper_state *shs = data; libxl__domain_suspend_state *dss = shs-caller_state; @@ -1749,7 +1749,7 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss) if (r_info != NULL) { callbacks-suspend = libxl__remus_domain_suspend_callback; callbacks-postcopy = libxl__remus_domain_resume_callback; -callbacks-checkpoint = libxl__remus_domain_checkpoint_callback; +callbacks-checkpoint = libxl__remus_domain_save_checkpoint_callback; dss-sws.checkpoint_callback = remus_checkpoint_stream_written; } else callbacks-suspend = libxl__domain_suspend_callback; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 05/17] xen/arm: ITS: implement hw_irq_controller for LPIs
Hi Vijay, On 15/07/2015 09:16, Vijay Kilari wrote: On Tue, Jul 14, 2015 at 2:48 AM, Julien Grall julien.gr...@citrix.com wrote: Hi, On 10/07/2015 09:42, vijay.kil...@gmail.com wrote: diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index c41e82e..4f3801b 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c +static inline hw_irq_controller *get_host_hw_irq_controller(unsigned int irq) +{ +if ( is_lpi(irq) ) +return its_hw_ops-lpi_host_irq_type; +else +return gic_hw_ops-gic_host_irq_type; +} This is not what I asked on v3 [1]. The ITS hardware controller shouldn't be exposed to the common GIC. We have to keep a clean and comprehensible interface. What I asked is to replace the gic_host_irq_type variable by a new callback which will return the correct hw_irq_controller. For GICv2, it will return the same hw_irq_controller as today. For GICv3, it will check is the IRQ is an LPI and return the correct controller. FWIW, it was ack by Ian [2]. If we don't want to expose any ITS interfaces to common gic code, then we have to register callbacks to GICv3 driver. Why? In fine, the ITS is an integral part of the GICv3, so you could directly call the ITS code within the GICv3 without any callback. Actually, you already do that in some place. So I don't see why you can't do it there... @@ -149,7 +173,7 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int virq, test_bit(GIC_IRQ_GUEST_ENABLED, p-status) ) goto out; -desc-handler = gic_hw_ops-gic_guest_irq_type; +desc-handler = get_guest_hw_irq_controller(desc-irq); set_bit(_IRQ_GUEST, desc-status); gic_set_irq_properties(desc, cpumask_of(v_target-processor), priority); diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c index 2dd43ee..ba8528a 100644 --- a/xen/arch/arm/irq.c +++ b/xen/arch/arm/irq.c @@ -35,7 +35,13 @@ static DEFINE_SPINLOCK(local_irqs_type_lock); struct irq_guest { struct domain *d; -unsigned int virq; +union +{ +/* virq refer to virtual irq in case of spi */ +unsigned int virq; +/* virq refer to event ID in case of lpi */ +unsigned int vid; Why can't we store the event ID in the irq_guest? As said on v3, this is not Are you referring to irq_desc in above statement? Yes sorry. domain specific [3]. Furthermore, you add support to route LPI in Xen (see gic_route_irq_to_xen) where you will clearly need the event ID. void irq_set_affinity(struct irq_desc *desc, const cpumask_t *cpu_mask) { if ( desc != NULL ) diff --git a/xen/include/asm-arm/gic-its.h b/xen/include/asm-arm/gic-its.h index b5e09bd..e8d244f 100644 --- a/xen/include/asm-arm/gic-its.h +++ b/xen/include/asm-arm/gic-its.h @@ -161,6 +161,10 @@ typedef union { * The ITS view of a device. */ struct its_device { +/* Physical ITS */ +struct its_node *its; +/* Number of Physical LPIs assigned */ +int nr_lpis; Why didn't you add this field directly in the patch #4? It would be more logical. /* * ITS registers, offsets from ITS_base diff --git a/xen/include/asm-arm/irq.h b/xen/include/asm-arm/irq.h index 34b492b..55e219f 100644 --- a/xen/include/asm-arm/irq.h +++ b/xen/include/asm-arm/irq.h @@ -17,6 +17,8 @@ struct arch_pirq struct arch_irq_desc { int eoi_cpu; unsigned int type; +struct its_device *dev; +u16 col_id; It has been suggested by Ian to move col_id in the its_device in the previous version [4]. Any reason to not doing it? In round robin fashion each plpi is attached to col_id. So storing in its_device is not possible. In linux latest col_id is stored in its_device structure for which set_affinity is called. You could do round robin on its_device... It would be exactly the same and save 2 byte if not more with the alignment per irq_desc. Don't forget that 1 byte in the irq_desc means 1KB added in Xen binary. These bytes saved could be used to store the event ID. That remind me, these 2 new fields should only be defined when GICv3 is used (#ifdef HAS_GICV3). I'm would be fine if you skip the former for 4.6, but the latter is mandatory. ITS code shouldn't be compiled on arm32. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, July 15, 2015 4:20 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: RE: [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts On 15.07.15 at 04:40, feng...@intel.com wrote: -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Friday, July 10, 2015 9:08 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: Re: [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts On 24.06.15 at 07:18, feng...@intel.com wrote: @@ -81,8 +81,19 @@ struct vmx_domain { struct pi_desc { DECLARE_BITMAP(pir, NR_VECTORS); -u32 control; -u32 rsvd[7]; +union { +struct +{ +u16 on : 1, /* bit 256 - Outstanding Notification */ +sn : 1, /* bit 257 - Suppress Notification */ +rsvd_1 : 14; /* bit 271:258 - Reserved */ +u8 nv; /* bit 279:272 - Notification Vector */ +u8 rsvd_2; /* bit 287:280 - Reserved */ +u32 ndst;/* bit 319:288 - Notification Destination */ +}; +u64 control; +}; So current code, afaics, uses e.g. test_and_set_bit() to set ON. By also declaring this as a bitfield you're opening the structure for non-atomic accesses. If that's correct, why is other code not being changed to _only_ use the bitfield mechanism (likely also eliminating the need for it being a union with the now 64-bit control? If atomic accesses are required, then I'd strongly suggest against making this a bit field. And in no event can I see why ndst needs to be union-ized with control if it doesn't need to be updated atomically with e.g. nv. When the vCPU is to be blocked, we need to atomically update the nv and ndst, then the wakeup notification event can be delivered to the right destination. Okay. Your reply made me go through the patches again to check where updates to nv/ndst happen - what's the reason they aren't being updated as a pair in patch 14's RUNSTATE_running handling (or in the replacement draft's vmx_ctxt_switch_to() adjustment)? It is because, we can only enter running state from runnable, in which, the NV field has been already changed back to ' posted_intr_vector ', we don't need to do it here again. Thanks, Feng Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On 15.07.15 at 06:27, tiejun.c...@intel.com wrote: Furthermore, could we have this solution as follows? Yet more special casing code you want to add. I said no to this model, and unless you can address the issue _without_ adding a lot of special casing code, the answer will remain no (subject to co-maintainers overriding me). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v8 01/11] xen: introduce SHUTDOWN_soft_reset shutdown reason
Ian Jackson ian.jack...@eu.citrix.com writes: Konrad Rzeszutek Wilk writes (Re: [PATCH v8 01/11] xen: introduce SHUTDOWN_soft_reset shutdown reason): On Tue, Jun 23, 2015 at 06:11:43PM +0200, Vitaly Kuznetsov wrote: This special type of shutdown is supposed to be used by PVHVM guests when they want to perform some sort of kexec/kdump. ... +#define SHUTDOWN_soft_reset 5 /* Domain did soft reset. Clean up and resume.*/ I would like more documentation about the semantics of this new request. (The semantics of the existing shutdown requests are fairly well understood because they generally map to real hardware.) Sure, would you like me to expand the comment here or should I write something somewhere else? Thanks, -- Vitaly ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used
On 15.07.15 at 10:38, feng...@intel.com wrote: -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, July 15, 2015 4:25 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: RE: [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used On 15.07.15 at 08:04, feng...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Friday, July 10, 2015 10:02 PM I'm particularly worried by the call to acpi_find_matched_drhd_unit() - is it maybe worth storing the iommu pointer in struct msi_desc? I think it worth, Like Andrew also mentioned this point before. I tend to make this a independent work and do it later, since the 4.6 release is coming, I am still try my best to target it. Could you please share your concern here, performance? Or other things? Thanks! Interrupt latency in particular. This update IRTE operation is not so frequently. It only happens in few times, especially in the initialization phase of the guest. And even the guest set the affinity, in the MSI/MSIx configuration doesn't change, QEMU will not ask Xen to update it. When the guest sets the affinity, the MSI{,-X} configuration is rather likely to change (at least for Linux guests). There are two possible scenarios: 1) There are bits that can be updated behind the back of the code here. In that case you need to loop, and each iteration of the loop needs to re-fetch the current value (not doing so would make the loop infinite). Oh, yes, I think I made a mistake here, it is too hastily these days, Sorry for that! I think I need do it like this: do { new_ire = *p; /* Setup/Update interrupt remapping table entry. */ setup_posted_irte(new_ire, pi_desc, gvec); old_ire = *(uint128_t *)p; ret = cmpxchg16b(p, old_ire, new_ire); } while ( memcmp(ret, old_ire, sizeof(old_ire)) ); So since you put this in a loop again, would you mind pointing out which bits can get modified behind our back? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V6 1/3] xen/mem_access: Support for memory-content hiding
This patch adds support for memory-content hiding, by modifying the value returned by emulated instructions that read certain memory addresses that contain sensitive data. The patch only applies to cases where VM_FLAG_ACCESS_EMULATE has been set to a vm_event response. Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com Acked-by: Tamas K Lengyel tleng...@novetta.com --- Changes since V5: - Renamed set_context_data()'s bytes parameter to size. - Inverted if() condition in set_context_data(). - Removed memcpy() conditional from set_context_data(). - Removed label from hvmemul_rep_outs_set_context(). - Now bypassing hvm_copy_from_guest_phys() in hvmemul_rep_movs() if hvmemul_ctxt-set_context is true. - Fixed for_each_vcpu() coding style (blank before the opening parenthesis). - Added comments about the serialization status of vm_event_init_domain() and vm_event_cleanup_domain(). - Setting v-arch.vm_event.emul_read_data to NULL after xfree() in vcpu_destroy() for safety. --- tools/tests/xen-access/xen-access.c |2 +- xen/arch/x86/domain.c |3 + xen/arch/x86/hvm/emulate.c | 117 --- xen/arch/x86/hvm/event.c| 50 +++ xen/arch/x86/mm/p2m.c | 92 +++ xen/arch/x86/vm_event.c | 35 +++ xen/common/vm_event.c |8 +++ xen/include/asm-arm/vm_event.h | 13 xen/include/asm-x86/domain.h|1 + xen/include/asm-x86/hvm/emulate.h | 10 ++- xen/include/asm-x86/vm_event.h |4 ++ xen/include/public/vm_event.h | 35 --- 12 files changed, 287 insertions(+), 83 deletions(-) diff --git a/tools/tests/xen-access/xen-access.c b/tools/tests/xen-access/xen-access.c index 12ab921..e6ca9ba 100644 --- a/tools/tests/xen-access/xen-access.c +++ b/tools/tests/xen-access/xen-access.c @@ -530,7 +530,7 @@ int main(int argc, char *argv[]) break; case VM_EVENT_REASON_SOFTWARE_BREAKPOINT: printf(Breakpoint: rip=%016PRIx64, gfn=%PRIx64 (vcpu %d)\n, - req.regs.x86.rip, + req.data.regs.x86.rip, req.u.software_breakpoint.gfn, req.vcpu_id); diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 34ecd7c..1ef9fad 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -511,6 +511,9 @@ int vcpu_initialise(struct vcpu *v) void vcpu_destroy(struct vcpu *v) { +xfree(v-arch.vm_event.emul_read_data); +v-arch.vm_event.emul_read_data = NULL; + if ( is_pv_32bit_vcpu(v) ) { free_compat_arg_xlat(v); diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 795321c..2766919 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -67,6 +67,24 @@ static int null_write(const struct hvm_io_handler *handler, return X86EMUL_OKAY; } +static int set_context_data(void *buffer, unsigned int size) +{ +struct vcpu *curr = current; + +if ( curr-arch.vm_event.emul_read_data ) +{ +unsigned int safe_size = +min(size, curr-arch.vm_event.emul_read_data-size); + +memcpy(buffer, curr-arch.vm_event.emul_read_data-data, safe_size); +memset(buffer + safe_size, 0, size - safe_size); +} +else +return X86EMUL_UNHANDLEABLE; + +return X86EMUL_OKAY; +} + static const struct hvm_io_ops null_ops = { .read = null_read, .write = null_write @@ -771,6 +789,12 @@ static int hvmemul_read( unsigned int bytes, struct x86_emulate_ctxt *ctxt) { +struct hvm_emulate_ctxt *hvmemul_ctxt = +container_of(ctxt, struct hvm_emulate_ctxt, ctxt); + +if ( unlikely(hvmemul_ctxt-set_context) ) +return set_context_data(p_data, bytes); + return __hvmemul_read( seg, offset, p_data, bytes, hvm_access_read, container_of(ctxt, struct hvm_emulate_ctxt, ctxt)); @@ -963,6 +987,17 @@ static int hvmemul_cmpxchg( unsigned int bytes, struct x86_emulate_ctxt *ctxt) { +struct hvm_emulate_ctxt *hvmemul_ctxt = +container_of(ctxt, struct hvm_emulate_ctxt, ctxt); + +if ( unlikely(hvmemul_ctxt-set_context) ) +{ +int rc = set_context_data(p_new, bytes); + +if ( rc != X86EMUL_OKAY ) +return rc; +} + /* Fix this in case the guest is really relying on r-m-w atomicity. */ return hvmemul_write(seg, offset, p_new, bytes, ctxt); } @@ -1005,6 +1040,38 @@ static int hvmemul_rep_ins( !!(ctxt-regs-eflags X86_EFLAGS_DF), gpa); } +static int hvmemul_rep_outs_set_context( +enum x86_segment src_seg, +unsigned long src_offset, +uint16_t dst_port, +unsigned int bytes_per_rep, +unsigned long *reps, +struct x86_emulate_ctxt *ctxt) +{ +unsigned int bytes = *reps * bytes_per_rep; +char *buf; +int rc; + +
[Xen-devel] [PATCH V6 2/3] xen/vm_event: Support for guest-requested events
Added support for a new class of vm_events: VM_EVENT_REASON_REQUEST, sent via HVMOP_request_vm_event. The guest can request that a generic vm_event (containing only the vm_event-filled guest registers as information) be sent to userspace by setting up the correct registers and doing a VMCALL. For example, for a 32-bit guest, this means: EAX = 34 (hvmop), EBX = 24 (HVMOP_guest_request_vm_event), ECX = 0 (NULL required for the hypercall parameter, reserved). Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com Acked-by: Tamas K Lengyel tleng...@novetta.com Acked-by: Wei Liu wei.l...@citrix.com Acked-by: Jan Beulich jbeul...@suse.com Acked-by: George Dunlap george.dun...@eu.citrix.com --- Changes since V5: - None. --- tools/libxc/include/xenctrl.h |2 ++ tools/libxc/xc_monitor.c| 15 +++ xen/arch/x86/hvm/event.c| 16 xen/arch/x86/hvm/hvm.c |8 +++- xen/arch/x86/monitor.c | 19 ++- xen/include/asm-x86/domain.h| 16 +--- xen/include/asm-x86/hvm/event.h |1 + xen/include/public/domctl.h |6 ++ xen/include/public/hvm/hvm_op.h |2 ++ xen/include/public/vm_event.h |2 ++ 10 files changed, 78 insertions(+), 9 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 0bbae2a..ce9029c 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2378,6 +2378,8 @@ int xc_monitor_mov_to_msr(xc_interface *xch, domid_t domain_id, bool enable, int xc_monitor_singlestep(xc_interface *xch, domid_t domain_id, bool enable); int xc_monitor_software_breakpoint(xc_interface *xch, domid_t domain_id, bool enable); +int xc_monitor_guest_request(xc_interface *xch, domid_t domain_id, + bool enable, bool sync); /*** * Memory sharing operations. diff --git a/tools/libxc/xc_monitor.c b/tools/libxc/xc_monitor.c index b64bce3..d5f87da 100644 --- a/tools/libxc/xc_monitor.c +++ b/tools/libxc/xc_monitor.c @@ -129,3 +129,18 @@ int xc_monitor_singlestep(xc_interface *xch, domid_t domain_id, return do_domctl(xch, domctl); } + +int xc_monitor_guest_request(xc_interface *xch, domid_t domain_id, bool enable, + bool sync) +{ +DECLARE_DOMCTL; + +domctl.cmd = XEN_DOMCTL_monitor_op; +domctl.domain = domain_id; +domctl.u.monitor_op.op = enable ? XEN_DOMCTL_MONITOR_OP_ENABLE +: XEN_DOMCTL_MONITOR_OP_DISABLE; +domctl.u.monitor_op.event = XEN_DOMCTL_MONITOR_EVENT_GUEST_REQUEST; +domctl.u.monitor_op.u.guest_request.sync = sync; + +return do_domctl(xch, domctl); +} diff --git a/xen/arch/x86/hvm/event.c b/xen/arch/x86/hvm/event.c index 5341937..17638ea 100644 --- a/xen/arch/x86/hvm/event.c +++ b/xen/arch/x86/hvm/event.c @@ -126,6 +126,22 @@ void hvm_event_msr(unsigned int msr, uint64_t value) hvm_event_traps(1, req); } +void hvm_event_guest_request(void) +{ +struct vcpu *curr = current; +struct arch_domain *currad = curr-domain-arch; + +if ( currad-monitor.guest_request_enabled ) +{ +vm_event_request_t req = { +.reason = VM_EVENT_REASON_GUEST_REQUEST, +.vcpu_id = curr-vcpu_id, +}; + +hvm_event_traps(currad-monitor.guest_request_sync, req); +} +} + int hvm_event_int3(unsigned long gla) { int rc = 0; diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 8a10111..22dbab1 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -5999,7 +5999,6 @@ static int hvmop_get_param( #define HVMOP_op_mask 0xff long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) - { unsigned long start_iter, mask; long rc = 0; @@ -6413,6 +6412,13 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) break; } +case HVMOP_guest_request_vm_event: +if ( guest_handle_is_null(arg) ) +hvm_event_guest_request(); +else +rc = -EINVAL; +break; + default: { gdprintk(XENLOG_DEBUG, Bad HVM op %ld.\n, op); diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c index 0da855e..d35907b 100644 --- a/xen/arch/x86/monitor.c +++ b/xen/arch/x86/monitor.c @@ -55,7 +55,8 @@ static inline uint32_t get_capabilities(struct domain *d) capabilities = (1 XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG) | (1 XEN_DOMCTL_MONITOR_EVENT_MOV_TO_MSR) | - (1 XEN_DOMCTL_MONITOR_EVENT_SOFTWARE_BREAKPOINT); + (1 XEN_DOMCTL_MONITOR_EVENT_SOFTWARE_BREAKPOINT) | + (1 XEN_DOMCTL_MONITOR_EVENT_GUEST_REQUEST); /* Since we know this is on VMX, we can just call the hvm func */ if ( hvm_is_singlestep_supported() ) @@ -184,6 +185,22 @@ int monitor_domctl(struct domain *d, struct xen_domctl_monitor_op *mop) break; }
[Xen-devel] [PATCH V6 3/3] xen/vm_event: Deny register writes if refused by vm_event reply
Deny register writes if a vm_client subscribed to mov_to_msr or control register write events forbids them. Currently supported for MSR, CR0, CR3 and CR4 events. Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com Acked-by: George Dunlap george.dun...@eu.citrix.com Acked-by: Jan Beulich jbeul...@suse.com Acked-by: Tamas K Lengyel tleng...@novetta.com --- Changes since V5: - Now using vzalloc() / vfree() for d-arch.event_write_data, and setting it to NULL after releasing it in arch_domain_destroy() for safety. --- xen/arch/x86/domain.c |3 + xen/arch/x86/hvm/emulate.c|8 +-- xen/arch/x86/hvm/event.c |5 +- xen/arch/x86/hvm/hvm.c| 118 - xen/arch/x86/hvm/svm/nestedsvm.c | 14 ++--- xen/arch/x86/hvm/svm/svm.c|2 +- xen/arch/x86/hvm/vmx/vmx.c| 15 +++-- xen/arch/x86/hvm/vmx/vvmx.c | 18 +++--- xen/arch/x86/vm_event.c | 43 ++ xen/common/vm_event.c |4 ++ xen/include/asm-arm/vm_event.h|7 +++ xen/include/asm-x86/domain.h | 18 +- xen/include/asm-x86/hvm/event.h |9 ++- xen/include/asm-x86/hvm/support.h |9 +-- xen/include/asm-x86/vm_event.h|3 + xen/include/public/vm_event.h |5 ++ 16 files changed, 235 insertions(+), 46 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 1ef9fad..045f6ff 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -668,6 +668,9 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags, void arch_domain_destroy(struct domain *d) { +vfree(d-arch.event_write_data); +d-arch.event_write_data = NULL; + if ( has_hvm_container_domain(d) ) hvm_domain_destroy(d); diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 2766919..bc7514a 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -1428,14 +1428,14 @@ static int hvmemul_write_cr( switch ( reg ) { case 0: -return hvm_set_cr0(val); +return hvm_set_cr0(val, 1); case 2: current-arch.hvm_vcpu.guest_cr[2] = val; return X86EMUL_OKAY; case 3: -return hvm_set_cr3(val); +return hvm_set_cr3(val, 1); case 4: -return hvm_set_cr4(val); +return hvm_set_cr4(val, 1); default: break; } @@ -1456,7 +1456,7 @@ static int hvmemul_write_msr( uint64_t val, struct x86_emulate_ctxt *ctxt) { -return hvm_msr_write_intercept(reg, val); +return hvm_msr_write_intercept(reg, val, 1); } static int hvmemul_wbinvd( diff --git a/xen/arch/x86/hvm/event.c b/xen/arch/x86/hvm/event.c index 17638ea..042e583 100644 --- a/xen/arch/x86/hvm/event.c +++ b/xen/arch/x86/hvm/event.c @@ -90,7 +90,7 @@ static int hvm_event_traps(uint8_t sync, vm_event_request_t *req) return 1; } -void hvm_event_cr(unsigned int index, unsigned long value, unsigned long old) +bool_t hvm_event_cr(unsigned int index, unsigned long value, unsigned long old) { struct arch_domain *currad = current-domain-arch; unsigned int ctrlreg_bitmask = monitor_ctrlreg_bitmask(index); @@ -109,7 +109,10 @@ void hvm_event_cr(unsigned int index, unsigned long value, unsigned long old) hvm_event_traps(currad-monitor.write_ctrlreg_sync ctrlreg_bitmask, req); +return 1; } + +return 0; } void hvm_event_msr(unsigned int msr, uint64_t value) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 22dbab1..c07e3ef 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -52,6 +52,7 @@ #include asm/traps.h #include asm/mc146818rtc.h #include asm/mce.h +#include asm/monitor.h #include asm/hvm/hvm.h #include asm/hvm/vpt.h #include asm/hvm/support.h @@ -519,6 +520,35 @@ void hvm_do_resume(struct vcpu *v) break; } +if ( unlikely(d-arch.event_write_data) ) +{ +struct monitor_write_data *w = d-arch.event_write_data[v-vcpu_id]; + +if ( w-do_write.msr ) +{ +hvm_msr_write_intercept(w-msr, w-value, 0); +w-do_write.msr = 0; +} + +if ( w-do_write.cr0 ) +{ +hvm_set_cr0(w-cr0, 0); +w-do_write.cr0 = 0; +} + +if ( w-do_write.cr4 ) +{ +hvm_set_cr4(w-cr4, 0); +w-do_write.cr4 = 0; +} + +if ( w-do_write.cr3 ) +{ +hvm_set_cr3(w-cr3, 0); +w-do_write.cr3 = 0; +} +} + /* Inject pending hw/sw trap */ if ( v-arch.hvm_vcpu.inject_trap.vector != -1 ) { @@ -3123,13 +3153,13 @@ int hvm_mov_to_cr(unsigned int cr, unsigned int gpr) switch ( cr ) { case 0: -return hvm_set_cr0(val); +return hvm_set_cr0(val, 1); case 3: -return hvm_set_cr3(val); +return hvm_set_cr3(val, 1); case 4: -
Re: [Xen-devel] [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, July 15, 2015 4:36 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: RE: [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts On 15.07.15 at 10:26, feng...@intel.com wrote: -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, July 15, 2015 4:20 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: RE: [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts On 15.07.15 at 04:40, feng...@intel.com wrote: -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Friday, July 10, 2015 9:08 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: Re: [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts On 24.06.15 at 07:18, feng...@intel.com wrote: @@ -81,8 +81,19 @@ struct vmx_domain { struct pi_desc { DECLARE_BITMAP(pir, NR_VECTORS); -u32 control; -u32 rsvd[7]; +union { +struct +{ +u16 on : 1, /* bit 256 - Outstanding Notification */ +sn : 1, /* bit 257 - Suppress Notification */ +rsvd_1 : 14; /* bit 271:258 - Reserved */ +u8 nv; /* bit 279:272 - Notification Vector */ +u8 rsvd_2; /* bit 287:280 - Reserved */ +u32 ndst;/* bit 319:288 - Notification Destination */ +}; +u64 control; +}; So current code, afaics, uses e.g. test_and_set_bit() to set ON. By also declaring this as a bitfield you're opening the structure for non-atomic accesses. If that's correct, why is other code not being changed to _only_ use the bitfield mechanism (likely also eliminating the need for it being a union with the now 64-bit control? If atomic accesses are required, then I'd strongly suggest against making this a bit field. And in no event can I see why ndst needs to be union-ized with control if it doesn't need to be updated atomically with e.g. nv. When the vCPU is to be blocked, we need to atomically update the nv and ndst, then the wakeup notification event can be delivered to the right destination. Okay. Your reply made me go through the patches again to check where updates to nv/ndst happen - what's the reason they aren't being updated as a pair in patch 14's RUNSTATE_running handling (or in the replacement draft's vmx_ctxt_switch_to() adjustment)? It is because, we can only enter running state from runnable, in which, the NV field has been already changed back to ' posted_intr_vector ', we don't need to do it here again. Without sitting in the runstate update path anymore, I can't see how you would get to see all transitions to runnable. Sorry, I cannot understanding the above comments well. Do you mean after using the new method (arch hooks ) to update posted-interrupt descriptor, I cannot track all the state transitions to runnable? Thanks, Feng Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-unstable test] 59544: regressions - FAIL
flight 59544 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/59544/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 14 guest-localmigrate.2 fail REGR. vs. 58958 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail REGR. vs. 58965 test-armhf-armhf-xl 6 xen-boot fail REGR. vs. 58965 test-amd64-amd64-xl-qemuu-win7-amd64 9 windows-install fail REGR. vs. 58965 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-rumpuserxen-amd64 15 rumpuserxen-demo-xenstorels/xenstorels.repeat fail REGR. vs. 58965 test-amd64-i386-xl-qemuu-win7-amd64 9 windows-install fail like 58958 test-armhf-armhf-xl-rtds 11 guest-start fail like 58965 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 58965 Tests which did not succeed, but are not blocking: test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail never pass version targeted for testing: xen d924ddbf59f54f432f5fb6907d1262ddb9a9070a baseline version: xen c40317f11b3f05e7c06a2213560c8471081f2662 Last test of basis58965 2015-06-29 02:08:30 Z 16 days Failing since 58974 2015-06-29 15:11:59 Z 15 days 17 attempts Testing same since59544 2015-07-14 13:41:02 Z0 days1 attempts People who touched revisions under test: Andrew Cooper andrew.coop...@citrix.com Anthony PERARD anthony.per...@citrix.com Ard Biesheuvel a...@linaro.org Ben Catterall ben.catter...@citrix.com Boris Ostrovsky boris.ostrov...@oracle.com Chao Peng chao.p.p...@linux.intel.com Chen Baozi baoz...@gmail.com Daniel De Graaf dgde...@tycho.nsa.gov Dario Faggioli dario.faggi...@citrix.com David Scott dave.sc...@citrix.com David Vrabel david.vra...@citrix.com Dietmar Hahn dietmar.h...@ts.fujitsu.com Euan Harris euan.har...@citrix.com Fabio Fantoni fabio.fant...@m2r.biz Feng Wu feng...@intel.com George Dunlap george.dun...@eu.citrix.com Ian Campbell ian,campb...@citrix.com Ian Campbell ian.campb...@citrix.com Ian Jackson ian.jack...@eu.citrix.com Jan Beulich jbeul...@suse.com Jennifer Herbert jennifer.herb...@citrix.com Juergen Gross jgr...@suse.com Julien Grall julien.gr...@citrix.com Julien Grall julien.gr...@linaro.org Kevin Tian kevin.t...@intel.com Liang Li liang.z...@intel.com Paul Durrant paul.durr...@citrix.com Razvan Cojocaru rcojoc...@bitdefender.com Rob Hoes rob.h...@citrix.com Roger Pau Monné roger@citrix.com Samuel Thibault samuel.thiba...@ens-lyon.org Sander Eikelenboom li...@eikelenboom.it Tamas K Lengyel tleng...@novetta.com Thomas Leonard tal...@gmail.com Tiejun Chen tiejun.c...@intel.com Tim Deegan t...@xen.org Vitaly Kuznetsov vkuzn...@redhat.com Wei Liu wei.l...@citrix.com Wen Congyang we...@cn.fujitsu.com Yang Zhang yang.z.zh...@intel.com jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-oldkern pass build-i386-oldkern pass build-amd64-pvopspass build-armhf-pvops
Re: [Xen-devel] [PATCH v4 15/17] xen/arm: ITS: Map ITS translation space
Hi Vijay, On 10/07/2015 09:42, vijay.kil...@gmail.com wrote: From: Vijaya Kumar K vijaya.ku...@caviumnetworks.com ITS translation space contains GITS_TRANSLATOR register which is written by device to raise LPI. This space needs to mapped to every domain address space for all physical ITS available, so that device can access GITS_TRANSLATOR register using SMMU. Signed-off-by: Vijaya Kumar K vijaya.ku...@caviumnetworks.com --- xen/arch/arm/vgic-v3-its.c | 31 ++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/vgic-v3-its.c b/xen/arch/arm/vgic-v3-its.c index 74e6ee7..301f065 100644 --- a/xen/arch/arm/vgic-v3-its.c +++ b/xen/arch/arm/vgic-v3-its.c @@ -1082,6 +1082,35 @@ static const struct mmio_handler_ops vgic_gits_mmio_handler = { .write_handler = vgic_v3_gits_mmio_write, }; +/* + * Map the 64K ITS translation space in guest. + * This is required purely for device smmu writes. +*/ + +static int vits_map_translation_space(struct domain *d) +{ +uint64_t addr, size; +int ret; + +addr = d-arch.vits-gits_base + SZ_64K; +size = SZ_64K; + +ret = map_mmio_regions(d, + paddr_to_pfn(addr PAGE_MASK), + DIV_ROUND_UP(size, PAGE_SIZE), + paddr_to_pfn(addr PAGE_MASK)); You are assuming a direct mapping in the guest memory for the ITS translation space. While this may be true for dom0, it won't work for guests. I'm fine if you don't handle this case for 4.6. Although I'd like to at least see a comment stating that we are using 1:1 mapping and an assert to check if the domain is using direct mapping (i.e is_domain_direct_mapped(d)). Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
This is very similar to our current policy to [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6 since actually this is also another rare possibility in real world. Even I can do this as well when we handle that conflict with [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6. Sorry, here is one typo, s/#6/#5 Thanks Tiejun Note its not necessary to concern high memory since we already handle this case in the hv code previously, and its also not affected by those relocated memory later since our previous policy can make sure RAM isn't overlapping with RDM. Thanks Tiejun to co-maintainers overriding me). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 --for 4.6 COLO 07/25] tools/libxl: add back channel support to read stream
This is used by primay to read records sent by secondary. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxl/libxl_create.c | 1 + tools/libxl/libxl_internal.h| 1 + tools/libxl/libxl_stream_read.c | 17 + 3 files changed, 19 insertions(+) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 1d4b13b..1af7103 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -978,6 +978,7 @@ static void domcreate_bootloader_done(libxl__egc *egc, dcs-srs.dcs = dcs; dcs-srs.fd = restore_fd; dcs-srs.legacy = (dcs-restore_params.stream_version == 1); +dcs-srs.back_channel = false; dcs-srs.completion_callback = domcreate_stream_done; libxl__stream_read_start(egc, dcs-srs); diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 2634836..05cee04 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -3358,6 +3358,7 @@ struct libxl__stream_read_state { libxl__domain_create_state *dcs; int fd; bool legacy; +bool back_channel; void (*completion_callback)(libxl__egc *egc, libxl__stream_read_state *srs, int rc); diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c index 2d17403..b924f05 100644 --- a/tools/libxl/libxl_stream_read.c +++ b/tools/libxl/libxl_stream_read.c @@ -104,6 +104,15 @@ * Depending on the contents of the stream, there are likely to be several * parallel tasks being managed. check_all_finished() is used to join all * tasks in both success and error cases. + * + * For back channel stream: + * - libxl__stream_read_start() + *- Set up the stream to running state + * + * - libxl__stream_read_continue() + * - Set up reading the next record from a started stream. + * Add some codes to process_record() to handle the record. + * Then call stream-checkpoint_callback() to return. */ /* Success/error/cleanup handling. */ @@ -200,6 +209,9 @@ void libxl__stream_read_start(libxl__egc *egc, stream-running = true; stream-phase = SRS_PHASE_NORMAL; +if (stream-back_channel) +return; + if (stream-legacy) { /* Convert the legacy stream. */ libxl__conversion_helper_state *chs = stream-chs; @@ -700,6 +712,11 @@ static void stream_done(libxl__egc *egc, assert(!stream-in_checkpoint); stream-running = false; +if (stream-back_channel) { +stream-completion_callback(egc, stream, stream-rc); +return; +} + if (stream-incoming_record) free_record(stream-incoming_record); -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 --for 4.6 COLO 10/25] tools/libx{l, c}: add postcopy/suspend callback to restore side
Secondary(restore side) is running under COLO, we also need postcopy/suspend callbacks. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxc/include/xenguest.h | 10 ++ tools/libxl/libxl_save_msgs_gen.pl | 4 ++-- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index fa06d9b..1e7e1bb 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -114,6 +114,16 @@ struct restore_callbacks { int (*toolstack_restore)(uint32_t domid, const uint8_t *buf, uint32_t size, void* data); +/* Called after a new checkpoint to suspend the guest. + */ +int (*suspend)(void* data); + +/* Called after the secondary vm is ready to resume. + * Callback function resumes the guest the device model, + * returns to xc_domain_restore. + */ +int (*postcopy)(void* data); + /* A checkpoint record has been found in the stream. * returns: */ #define XGR_CHECKPOINT_ERROR0 /* Terminate processing */ diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl index 9107a86..7c9859b 100755 --- a/tools/libxl/libxl_save_msgs_gen.pl +++ b/tools/libxl/libxl_save_msgs_gen.pl @@ -23,8 +23,8 @@ our @msgs = ( STRING doing_what), 'unsigned long', 'done', 'unsigned long', 'total'] ], -[ 3, 'scxA', suspend, [] ], -[ 4, 'scxA', postcopy, [] ], +[ 3, 'srcxA', suspend, [] ], +[ 4, 'srcxA', postcopy, [] ], [ 5, 'srcxA', checkpoint, [] ], [ 6, 'srcxA', should_checkpoint, [] ], [ 7, 'scxA', switch_qemu_logdirty, [qw(int domid -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 --for 4.6 COLO 04/25] libxc/migration: export read_record for common use
read_record() could be used by primary to read dirty bitmap record sent by secondary under COLO. When used by save side, we need to pass the backchannel fd instead of ctx-fd to read_record(), so we added a fd param to it. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com CC: Andrew Cooper andrew.coop...@citrix.com --- tools/libxc/xc_sr_common.c | 49 +++ tools/libxc/xc_sr_common.h | 14 ++ tools/libxc/xc_sr_restore.c | 63 + 3 files changed, 64 insertions(+), 62 deletions(-) diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c index becc0f4..0ee607c 100644 --- a/tools/libxc/xc_sr_common.c +++ b/tools/libxc/xc_sr_common.c @@ -89,6 +89,55 @@ int write_split_record(struct xc_sr_context *ctx, struct xc_sr_record *rec, return -1; } +int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec) +{ +xc_interface *xch = ctx-xch; +struct xc_sr_rhdr rhdr; +size_t datasz; + +if ( read_exact(fd, rhdr, sizeof(rhdr)) ) +{ +PERROR(Failed to read Record Header from stream); +return -1; +} +else if ( rhdr.length REC_LENGTH_MAX ) +{ +ERROR(Record (0x%08x, %s) length %#x exceeds max (%#x), rhdr.type, + rec_type_to_str(rhdr.type), rhdr.length, REC_LENGTH_MAX); +return -1; +} + +datasz = ROUNDUP(rhdr.length, REC_ALIGN_ORDER); + +if ( datasz ) +{ +rec-data = malloc(datasz); + +if ( !rec-data ) +{ +ERROR(Unable to allocate %zu bytes for record data (0x%08x, %s), + datasz, rhdr.type, rec_type_to_str(rhdr.type)); +return -1; +} + +if ( read_exact(fd, rec-data, datasz) ) +{ +free(rec-data); +rec-data = NULL; +PERROR(Failed to read %zu bytes of data for record (0x%08x, %s), + datasz, rhdr.type, rec_type_to_str(rhdr.type)); +return -1; +} +} +else +rec-data = NULL; + +rec-type = rhdr.type; +rec-length = rhdr.length; + +return 0; +}; + static void __attribute__((unused)) build_assertions(void) { XC_BUILD_BUG_ON(sizeof(struct xc_sr_ihdr) != 24); diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index 28755ac..632160e 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -356,6 +356,20 @@ static inline int write_record(struct xc_sr_context *ctx, } /* + * Reads a record from the stream, and fills in the record structure. + * + * Returns 0 on success and non-0 on failure. + * + * On success, the records type and size shall be valid. + * - If size is 0, data shall be NULL. + * - If size is non-0, data shall be a buffer allocated by malloc() which must + * be passed to free() by the caller. + * + * On failure, the contents of the record structure are undefined. + */ +int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec); + +/* * This would ideally be private in restore.c, but is needed by * x86_pv_localise_page() if we receive pagetables frames ahead of the * contents of the frames they point at. diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index 504463e..d53694b 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -69,67 +69,6 @@ static int read_headers(struct xc_sr_context *ctx) } /* - * Reads a record from the stream, and fills in the record structure. - * - * Returns 0 on success and non-0 on failure. - * - * On success, the records type and size shall be valid. - * - If size is 0, data shall be NULL. - * - If size is non-0, data shall be a buffer allocated by malloc() which must - * be passed to free() by the caller. - * - * On failure, the contents of the record structure are undefined. - */ -static int read_record(struct xc_sr_context *ctx, struct xc_sr_record *rec) -{ -xc_interface *xch = ctx-xch; -struct xc_sr_rhdr rhdr; -size_t datasz; - -if ( read_exact(ctx-fd, rhdr, sizeof(rhdr)) ) -{ -PERROR(Failed to read Record Header from stream); -return -1; -} -else if ( rhdr.length REC_LENGTH_MAX ) -{ -ERROR(Record (0x%08x, %s) length %#x exceeds max (%#x), rhdr.type, - rec_type_to_str(rhdr.type), rhdr.length, REC_LENGTH_MAX); -return -1; -} - -datasz = ROUNDUP(rhdr.length, REC_ALIGN_ORDER); - -if ( datasz ) -{ -rec-data = malloc(datasz); - -if ( !rec-data ) -{ -ERROR(Unable to allocate %zu bytes for record data (0x%08x, %s), - datasz, rhdr.type, rec_type_to_str(rhdr.type)); -return -1; -} - -if ( read_exact(ctx-fd, rec-data, datasz) ) -{ -free(rec-data); -rec-data = NULL; -PERROR(Failed to read %zu bytes of data for record (0x%08x, %s), - datasz, rhdr.type,
Re: [Xen-devel] [PATCH v4 05/17] xen/arm: ITS: implement hw_irq_controller for LPIs
On Wed, 2015-07-15 at 10:26 +0200, Julien Grall wrote: @@ -149,7 +173,7 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int virq, test_bit(GIC_IRQ_GUEST_ENABLED, p-status) ) goto out; -desc-handler = gic_hw_ops-gic_guest_irq_type; +desc-handler = get_guest_hw_irq_controller(desc-irq); set_bit(_IRQ_GUEST, desc-status); gic_set_irq_properties(desc, cpumask_of(v_target-processor), priority); diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c index 2dd43ee..ba8528a 100644 --- a/xen/arch/arm/irq.c +++ b/xen/arch/arm/irq.c @@ -35,7 +35,13 @@ static DEFINE_SPINLOCK(local_irqs_type_lock); struct irq_guest { struct domain *d; -unsigned int virq; +union +{ +/* virq refer to virtual irq in case of spi */ +unsigned int virq; +/* virq refer to event ID in case of lpi */ +unsigned int vid; Why can't we store the event ID in the irq_guest? As said on v3, this is not Are you referring to irq_desc in above statement? Yes sorry. I'm afraid I don't follow your suggestion here, are you suggesting that the vid field added above should be moved to irq_desc? But the vid _is_ domain specific, it is the virtual event ID which is per-domain (it's the thing looked up in the ITT to get a vLPI to be injected). I think it is a pretty direct analogue of the virq field used for non-LPI irq_guest structs. If we had need for the physical event id then that would like belong in the irq_desc. Your proposal on v3 looks to be around moving the its_device pointer to the irq_desc, which appears to have been done here, along with turning the virq+vid into a union as requested there too. It has been suggested by Ian to move col_id in the its_device in the previous version [4]. Any reason to not doing it? In round robin fashion each plpi is attached to col_id. So storing in its_device is not possible. In linux latest col_id is stored in its_device structure for which set_affinity is called. Are you saying that in Linux all Events/LPIs associated with a given ITS device are routed to the same collection? You could do round robin on its_device... It would be exactly the same Routing all LPIs associated with a given its_device to the same collection is not exactly the same as round robin-ing all LPIs from the device over the collections. and save 2 byte if not more with the alignment per irq_desc. If this is a concern then I would say we would either want a separate array of per-pLPI information which we do not want in irq_desc because it is irq specific, or do add a pointer to its_desc which points to an array of per-event information. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used
On 15.07.15 at 10:55, feng...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, July 15, 2015 4:46 PM On 15.07.15 at 10:38, feng...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, July 15, 2015 4:25 PM On 15.07.15 at 08:04, feng...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Friday, July 10, 2015 10:02 PM I'm particularly worried by the call to acpi_find_matched_drhd_unit() - is it maybe worth storing the iommu pointer in struct msi_desc? I think it worth, Like Andrew also mentioned this point before. I tend to make this a independent work and do it later, since the 4.6 release is coming, I am still try my best to target it. Could you please share your concern here, performance? Or other things? Thanks! Interrupt latency in particular. This update IRTE operation is not so frequently. It only happens in few times, especially in the initialization phase of the guest. And even the guest set the affinity, in the MSI/MSIx configuration doesn't change, QEMU will not ask Xen to update it. When the guest sets the affinity, the MSI{,-X} configuration is rather likely to change (at least for Linux guests). Yes, it is. But I'd say, it is not a frequent operation. In my test, it only happens in the initialization phase and some updates doesn't go the Xen since the configuration is the same (QEMU filters it). Can I please ask you to move away from this way of thinking? What you see in experiments is useful from a functionality pov, but pretty meaningless from a security perspective. For that, you'd rather start thinking about what a _malicious_ guest might be doing. And I agree I will change this, my question is that can we put this a little late, and I can focus on some other critical issue before 4.6 is release, which may make more chance for this patch to catch up with 4.6. Is this okay for you? As long as the feature (due to the other issue) remains experimental, is off by default, and the code has a prominent comment outlining the intended improvement, I'd be fine, yes. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, July 15, 2015 5:28 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: RE: [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts On 15.07.15 at 10:43, feng...@intel.com wrote: -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, July 15, 2015 4:36 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: RE: [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts On 15.07.15 at 10:26, feng...@intel.com wrote: -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, July 15, 2015 4:20 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: RE: [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts On 15.07.15 at 04:40, feng...@intel.com wrote: -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Friday, July 10, 2015 9:08 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: Re: [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts On 24.06.15 at 07:18, feng...@intel.com wrote: @@ -81,8 +81,19 @@ struct vmx_domain { struct pi_desc { DECLARE_BITMAP(pir, NR_VECTORS); -u32 control; -u32 rsvd[7]; +union { +struct +{ +u16 on : 1, /* bit 256 - Outstanding Notification */ +sn : 1, /* bit 257 - Suppress Notification */ +rsvd_1 : 14; /* bit 271:258 - Reserved */ +u8 nv; /* bit 279:272 - Notification Vector */ +u8 rsvd_2; /* bit 287:280 - Reserved */ +u32 ndst;/* bit 319:288 - Notification Destination */ +}; +u64 control; +}; So current code, afaics, uses e.g. test_and_set_bit() to set ON. By also declaring this as a bitfield you're opening the structure for non-atomic accesses. If that's correct, why is other code not being changed to _only_ use the bitfield mechanism (likely also eliminating the need for it being a union with the now 64-bit control? If atomic accesses are required, then I'd strongly suggest against making this a bit field. And in no event can I see why ndst needs to be union-ized with control if it doesn't need to be updated atomically with e.g. nv. When the vCPU is to be blocked, we need to atomically update the nv and ndst, then the wakeup notification event can be delivered to the right destination. Okay. Your reply made me go through the patches again to check where updates to nv/ndst happen - what's the reason they aren't being updated as a pair in patch 14's RUNSTATE_running handling (or in the replacement draft's vmx_ctxt_switch_to() adjustment)? It is because, we can only enter running state from runnable, in which, the NV field has been already changed back to ' posted_intr_vector ', we don't need to do it here again. Without sitting in the runstate update path anymore, I can't see how you would get to see all transitions to runnable. Sorry, I cannot understanding the above comments well. Do you mean after using the new method (arch hooks ) to update posted-interrupt descriptor, I cannot track all the state transitions to runnable? Not sure if track is the right word here, but yes. The new method is still in development, let's see how it will be then. :) Thanks, Feng Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question about mapping between domains
Hi, Ian. Thank You for the response. Look at how the balloon driver does it, the hypercalls you want are XENMEM_(increase|decrease)_reservation. I'll try to use those hypercalls. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
On 15.07.15 at 10:26, feng...@intel.com wrote: -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, July 15, 2015 4:20 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: RE: [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts On 15.07.15 at 04:40, feng...@intel.com wrote: -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Friday, July 10, 2015 9:08 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: Re: [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts On 24.06.15 at 07:18, feng...@intel.com wrote: @@ -81,8 +81,19 @@ struct vmx_domain { struct pi_desc { DECLARE_BITMAP(pir, NR_VECTORS); -u32 control; -u32 rsvd[7]; +union { +struct +{ +u16 on : 1, /* bit 256 - Outstanding Notification */ +sn : 1, /* bit 257 - Suppress Notification */ +rsvd_1 : 14; /* bit 271:258 - Reserved */ +u8 nv; /* bit 279:272 - Notification Vector */ +u8 rsvd_2; /* bit 287:280 - Reserved */ +u32 ndst;/* bit 319:288 - Notification Destination */ +}; +u64 control; +}; So current code, afaics, uses e.g. test_and_set_bit() to set ON. By also declaring this as a bitfield you're opening the structure for non-atomic accesses. If that's correct, why is other code not being changed to _only_ use the bitfield mechanism (likely also eliminating the need for it being a union with the now 64-bit control? If atomic accesses are required, then I'd strongly suggest against making this a bit field. And in no event can I see why ndst needs to be union-ized with control if it doesn't need to be updated atomically with e.g. nv. When the vCPU is to be blocked, we need to atomically update the nv and ndst, then the wakeup notification event can be delivered to the right destination. Okay. Your reply made me go through the patches again to check where updates to nv/ndst happen - what's the reason they aren't being updated as a pair in patch 14's RUNSTATE_running handling (or in the replacement draft's vmx_ctxt_switch_to() adjustment)? It is because, we can only enter running state from runnable, in which, the NV field has been already changed back to ' posted_intr_vector ', we don't need to do it here again. Without sitting in the runstate update path anymore, I can't see how you would get to see all transitions to runnable. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V6 0/3] Vm_event memory introspection helpers
This series addresses reviews addressed to V5. All patches have at least one ack, and the modifications are minor. Patch 2/3 has not been modified at all, and the only modification in patch 3/3 is that it now uses vzalloc() / vfree() instead of xzalloc_array() / xfree(), and both patch 3/3 and 1/3 now set the allocated data to NULL after freeing it on domain destruction paths. Patch 1/3 does has slightly more modifications, however they are mostly cosmetic (the only non-cosmetic one is that the patch now bypasses a hvm_copy_from_guest_phys() call that did no harm but was unnecessary). As discussed, I've kept the better-safe-than-sorry approach of freeing allocated data on both domain destruction paths and vm_event_cleanup(), based on the comments in shadow_final_teardown(), which imply that it is theoretically possible to end up on a domain destruction path without domain_kill() being called (and domain_kill() does the vm_event_cleanup()). [PATCH V6 1/3] xen/mem_access: Support for memory-content hiding [PATCH V6 2/3] xen/vm_event: Support for guest-requested events [PATCH V6 3/3] xen/vm_event: Deny register writes if refused by vm_event reply Thanks in advance for your reviews, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v8 11/11] (lib)xl: soft reset support
Ian Jackson ian.jack...@eu.citrix.com writes: Vitaly Kuznetsov writes ([PATCH v8 11/11] (lib)xl: soft reset support): Use existing create/restore path to perform 'soft reset' for HVM domains. Tear everything down, e.g. destroy domain's device model, remove the domain from xenstore, save toolstack record and start over. This patch has a number of long lines (eg in the documentation and comments) which make it hard to review. Can you please keep it to 70 columns, or 75 if you absolutely must ? No problem, will do in v9. BTW, libxl/CODING_STYLE states that 'Lines are limited to 75-80 characters'. I'd suggest we update that in case 70-75 is preferred. I'm not sure that this descriptiion: +=item Bsoft-reset + +cleanup the domain without destroying it, restart the device +model. This action is supported for HVM guests only. is really accurate from a user point of view. Yea, I'm trying hard to avoid mentioning Linux and kexec while describing soft reset. Will try to come up with something.. Ian. -- Vitaly ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/traps: Dump instruction stream in show_execution_state()
On 14.07.15 at 18:15, andrew.coop...@citrix.com wrote: Currently limited to just hypervisor context, but it could be extended to vcpus as well. Considering this ... --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -115,6 +115,31 @@ #define stack_words_per_line 4 #define ESP_BEFORE_EXCEPTION(regs) ((unsigned long *)regs-rsp) +static void show_code(const struct cpu_user_regs *regs) +{ +char insns[24]; +unsigned int i, not_copied; +void *__user start_ip = (void *)regs-rip - 8; + +if ( guest_mode(regs) ) +return; + +not_copied = __copy_from_user(insns, start_ip, ARRAY_SIZE(insns)); + +printk(Xen code around %04x:%p (%ps)%s:\n, ... I'd prefer the Xen here to be dropped. + regs-cs, _p(regs-rip), _p(regs-rip), + !!not_copied ? [fault on access] : ); Pointless !!. +for ( i = 0; i ARRAY_SIZE(insns) - not_copied; ++i ) +{ +if ( (unsigned long)(start_ip + i) == regs-rip ) +printk( %02x, (unsigned char)insns[i]); +else +printk( %02x, (unsigned char)insns[i]); Why not have insns[] be unsigned char right away? Also I think you should avoid the subtraction from regs-rip to wrap through zero, or even bail when RIP doesn't point into Xen space. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
Certainly appreciate your time. I didn't mean its wasting time at this point. I just want to express that its hard to implement that solution in one or two weeks to walking into 4.6 as an exception. Note I know this feature is still not accepted as an exception to 4.6 right now so I'm making an assumption. After all this is a bug fix (and would have been allowed into 4.5 had it been ready in time), so doesn't necessarily need a freeze exception (but of course the bar raises the later it gets). Rather Yes, this is not a bug fix again into 4.6. than rushing in something that's cumbersome to maintain, I'd much prefer this to be done properly. Indeed, we'd like to finalize this properly as you said. But apparently time is not sufficient to allow this happened. So I just suggest we can further seek the best solution in next phase. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/traps: Misc tweaks to several printk()s
On 14.07.15 at 19:54, andrew.coop...@citrix.com wrote: @@ -626,8 +626,9 @@ static void do_trap(struct cpu_user_regs *regs, int use_error_code) if ( likely((fixup = search_exception_table(regs-eip)) != 0) ) { -dprintk(XENLOG_ERR, Trap %d: %p - %p\n, -trapnr, _p(regs-eip), _p(fixup)); +printk(XENLOG_INFO Exception [#%d, ec=%04x] (%s): %ps %p - %p\n, + trapnr, use_error_code ? regs-error_code : 0, trapstr(trapnr), + _p(regs-eip), _p(regs-eip), _p(fixup)); But why the transition dprintk() - printk()? @@ -2677,9 +2678,9 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) if ( (rdmsr_safe(regs-ecx, val) != 0) || (msr_content != val) ) invalid: -gdprintk(XENLOG_WARNING, Domain attempted WRMSR %p from -0x%016PRIx64 to 0x%016PRIx64.\n, -_p(regs-ecx), val, msr_content); +gprintk(XENLOG_WARNING, +attempted WRMSR 0x%08x: 0x%016PRIx64 - 0x%016PRIx64\n, +regs-_ecx, val, msr_content); In cases where the values can't usefully be taken to be decimal I'd prefer the 0x prefixes to be omitted. @@ -2813,10 +2814,11 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) case MSR_EFER: rdmsr_normal: /* Everyone can read the MSR space. */ -/* gdprintk(XENLOG_WARNING,Domain attempted RDMSR %p.\n, -_p(regs-ecx));*/ if ( rdmsr_safe(regs-ecx, val) ) +{ +gprintk(XENLOG_WARNING, attempted RDMSR 0x%08x\n, regs-_ecx); goto fail; +} Do you really see this to be useful in production builds? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 --for 4.6 COLO 18/25] Support colo mode for qemu disk
From: Wen Congyang we...@cn.fujitsu.com Usage: disk = ['...,colo,colo-params=xxx,active-disk=xxx,hidden-disk=xxx...'] The format of colo-params: host:port:exportname=xx For QEMU block replication details: http://wiki.qemu.org/Features/BlockReplication Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- docs/man/xl.pod.1 | 2 +- docs/misc/xl-disk-configuration.txt | 38 ++ tools/libxl/libxl.c | 42 +- tools/libxl/libxl_create.c | 25 +++- tools/libxl/libxl_device.c | 38 ++ tools/libxl/libxl_dm.c | 257 +++- tools/libxl/libxl_types.idl | 5 + tools/libxl/libxlu_disk_l.l | 5 + 8 files changed, 403 insertions(+), 9 deletions(-) diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1 index 2cd34bb..1effce7 100644 --- a/docs/man/xl.pod.1 +++ b/docs/man/xl.pod.1 @@ -454,7 +454,7 @@ N.B: Remus support in xl is still in experimental (proof-of-concept) phase. Disk replication support is limited to DRBD disks. COLO support in xl is still in experimental (proof-of-concept) phase. - There is no support for network or disk at the moment. + There is no support for network at the moment. BOPTIONS diff --git a/docs/misc/xl-disk-configuration.txt b/docs/misc/xl-disk-configuration.txt index 6a2118d..e366e8d 100644 --- a/docs/misc/xl-disk-configuration.txt +++ b/docs/misc/xl-disk-configuration.txt @@ -234,6 +234,44 @@ were intentionally created non-sparse to avoid fragmentation of the file. +=== +COLO PARAMETERS +=== + + +colo + + +Enable COLO HA for disk. For better understanding block replication on +QEMU, please refer to: +http://wiki.qemu.org/Features/BlockReplication + + +colo-params=host:port:exportname=name +--- + +Description: Secondary host's address and port information, + We will run a nbd server on secondary host, + exportname is the nbd server's disk export name. +Mandatory: Yes when COLO enabled + + +active-disk +--- + +Description: This is used by secondary. Secondary guest's write + will be buffered in this disk. +Mandatory: Yes when COLO enabled + + +hidden-disk +--- + +Description: This is used by secondary. It buffers the original + content that is modified by the primary VM. +Mandatory: Yes when COLO enabled + + DEPRECATED PARAMETERS, PREFIXES AND SYNTAXES diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 791f364..c6cc5aa 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -2256,6 +2256,8 @@ int libxl__device_disk_setdefault(libxl__gc *gc, libxl_device_disk *disk) int rc; libxl_defbool_setdefault(disk-discard_enable, !!disk-readwrite); +libxl_defbool_setdefault(disk-colo_enable, false); +libxl_defbool_setdefault(disk-colo_restore_enable, false); rc = libxl__resolve_domid(gc, disk-backend_domname, disk-backend_domid); if (rc 0) return rc; @@ -2456,6 +2458,14 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid, flexarray_append(back, params); flexarray_append(back, libxl__sprintf(gc, %s:%s, libxl__device_disk_string_of_format(disk-format), disk-pdev_path)); +if (libxl_defbool_val(disk-colo_enable)) { +flexarray_append(back, colo-params); +flexarray_append(back, libxl__sprintf(gc, %s, disk-colo_params)); +flexarray_append(back, active-disk); +flexarray_append(back, libxl__sprintf(gc, %s, disk-active_disk)); +flexarray_append(back, hidden-disk); +flexarray_append(back, libxl__sprintf(gc, %s, disk-hidden_disk)); +} assert(device-backend_kind == LIBXL__DEVICE_KIND_QDISK); break; default: @@ -2570,7 +2580,10 @@ static int libxl__device_disk_from_xs_be(libxl__gc *gc, goto cleanup; } -/* params may not be present; but everything else must be. */ +/* + * params and colo-params may not be present; but everything + * else must be. + */ tmp = xs_read(ctx-xsh, XBT_NULL, libxl__sprintf(gc, %s/params, be_path), len); if (tmp strchr(tmp, ':')) { @@ -2580,6 +2593,33 @@ static int libxl__device_disk_from_xs_be(libxl__gc *gc, disk-pdev_path = tmp; } +tmp = xs_read(ctx-xsh, XBT_NULL, + libxl__sprintf(gc, %s/colo-params, be_path), len); +if (tmp) { +libxl_defbool_set(disk-colo_enable, true); +disk-colo_params = tmp; +} else { +
[Xen-devel] [PATCH v8 --for 4.6 COLO 06/25] tools/libxl: write colo_context records into the stream
write colo_context records into the stream, used by both primary and secondary to send colo context. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- tools/libxl/libxl_internal.h | 5 +++ tools/libxl/libxl_stream_write.c | 87 2 files changed, 92 insertions(+) diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index a83d6a5..2634836 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -3000,6 +3000,7 @@ struct libxl__stream_write_state { int rc; bool running; bool in_checkpoint; +bool in_colo_context; libxl__save_helper_state shs; /* Main stream-writing data. */ @@ -3019,6 +3020,10 @@ _hidden void libxl__stream_write_start(libxl__egc *egc, _hidden void libxl__stream_write_start_checkpoint(libxl__egc *egc, libxl__stream_write_state *stream); +_hidden void +libxl__stream_write_colo_context(libxl__egc *egc, + libxl__stream_write_state *stream, + libxl_sr_colo_context *colo_context); _hidden void libxl__stream_write_abort(libxl__egc *egc, libxl__stream_write_state *stream, int rc); diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c index df55277..e7a32c4 100644 --- a/tools/libxl/libxl_stream_write.c +++ b/tools/libxl/libxl_stream_write.c @@ -96,6 +96,16 @@ static void write_checkpoint_end_record(libxl__egc *egc, static void checkpoint_end_record_done(libxl__egc *egc, libxl__stream_write_state *stream); +/* COLO context */ +static void write_colo_context(libxl__egc *egc, + libxl__stream_write_state *stream, + libxl_sr_colo_context *colo_context); +static void write_colo_context_done(libxl__egc *egc, +libxl__datacopier_state *dc, +int rc, int onwrite, int errnoval); +static void colo_context_done(libxl__egc *egc, + libxl__stream_write_state *stream, int rc); + /*- Helpers -*/ static void write_done(libxl__egc *egc, @@ -500,6 +510,11 @@ static void stream_complete(libxl__egc *egc, return; } +if (stream-in_colo_context) { +colo_context_done(egc, stream, rc); +return; +} + if (!stream-rc) stream-rc = rc; stream_done(egc, stream); @@ -555,6 +570,78 @@ static void check_all_finished(libxl__egc *egc, stream-completion_callback(egc, stream, stream-rc); } +/*- COLO context -*/ +void libxl__stream_write_colo_context(libxl__egc *egc, + libxl__stream_write_state *stream, + libxl_sr_colo_context *colo_context) +{ +assert(stream-running); +assert(!stream-in_checkpoint); +assert(!stream-in_colo_context); +stream-in_colo_context = true; + +write_colo_context(egc, stream, colo_context); +} + +static void write_colo_context(libxl__egc *egc, + libxl__stream_write_state *stream, + libxl_sr_colo_context *colo_context) +{ +static const uint8_t zero_padding[1U REC_ALIGN_ORDER] = { 0 }; +libxl__datacopier_state *dc = stream-dc; +STATE_AO_GC(stream-ao); +struct libxl__sr_rec_hdr rec = { REC_TYPE_COLO_CONTEXT, 0 }; +int rc = 0; +uint32_t padding_len; + +dc-copywhat = colo context record; +dc-writewhat = save/migration stream; +dc-callback = write_colo_context_done; + +rc = libxl__datacopier_start(dc); +if (rc) +goto err; + +rec.length = sizeof(*colo_context); + +libxl__datacopier_prefixdata(egc, dc, rec, sizeof(rec)); +libxl__datacopier_prefixdata(egc, dc, colo_context, rec.length); + +padding_len = ROUNDUP(rec.length, REC_ALIGN_ORDER) - rec.length; +if (padding_len) +libxl__datacopier_prefixdata(egc, dc, zero_padding, padding_len); + +return; + + err: +assert(rc); +stream_complete(egc, stream, rc); +} + +static void write_colo_context_done(libxl__egc *egc, +libxl__datacopier_state *dc, +int rc, int onwrite, int errnoval) +{ +libxl__stream_write_state *stream = CONTAINER_OF(dc, *stream, dc); +STATE_AO_GC(stream-ao); + +if (rc || onwrite || errnoval) { +stream_complete(egc, stream, rc ?: ERROR_FAIL); +return; +} + +colo_context_done(egc, stream, rc); +return; +} + +static void colo_context_done(libxl__egc *egc, + libxl__stream_write_state *stream, int rc) +{ +assert(stream-in_colo_context); +stream-in_colo_context = false; +stream-checkpoint_callback(egc, stream, rc);
[Xen-devel] [PATCH v8 --for 4.6 COLO 17/25] implement the cmdline for COLO
From: Wen Congyang we...@cn.fujitsu.com Add a new option -c to the command 'xl remus'. If you want to use COLO HA instead of Remus HA, please use -c option. Update man pages to reflect the addition of a new option to 'xl remus' command. Also add a new option -c to the internal command 'xl migrate-receive'. Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- docs/man/xl.pod.1 | 12 -- tools/libxl/libxl.c | 23 -- tools/libxl/xl_cmdimpl.c | 61 --- tools/libxl/xl_cmdtable.c | 4 +++- 4 files changed, 81 insertions(+), 19 deletions(-) diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1 index f22c3f3..2cd34bb 100644 --- a/docs/man/xl.pod.1 +++ b/docs/man/xl.pod.1 @@ -447,12 +447,15 @@ Print huge (!) amount of debug during the migration process. =item Bremus [IOPTIONS] Idomain-id Ihost -Enable Remus HA for domain. By default Bxl relies on ssh as a transport -mechanism between the two hosts. +Enable Remus HA or COLO HA for domain. By default Bxl relies on ssh as a +transport mechanism between the two hosts. N.B: Remus support in xl is still in experimental (proof-of-concept) phase. Disk replication support is limited to DRBD disks. + COLO support in xl is still in experimental (proof-of-concept) phase. + There is no support for network or disk at the moment. + BOPTIONS =over 4 @@ -498,6 +501,11 @@ Disable network output buffering. Requires enabling unsafe mode. Disable disk replication. Requires enabling unsafe mode. +=item B-c + +Enable COLO HA. This conflicts with B-i and B-b, and memory +checkpoint compression must be disabled. + =back =item Bpause Idomain-id diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index c040909..791f364 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -814,12 +814,28 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info, goto out; } +/* The caller must set this defbool */ +if (libxl_defbool_is_default(info-colo)) { +LOG(ERROR, colo mode must be enabled/disabled); +rc = ERROR_FAIL; +goto out; +} + libxl_defbool_setdefault(info-allow_unsafe, false); libxl_defbool_setdefault(info-blackhole, false); -libxl_defbool_setdefault(info-compression, true); +libxl_defbool_setdefault(info-compression, + !libxl_defbool_val(info-colo)); libxl_defbool_setdefault(info-netbuf, true); libxl_defbool_setdefault(info-diskbuf, true); +if (libxl_defbool_val(info-colo)) { +if (libxl_defbool_val(info-compression)) { +LOG(ERROR, cannot use memory checkpoint compression in COLO mode); +rc = ERROR_FAIL; +goto out; +} +} + if (!libxl_defbool_val(info-allow_unsafe) (libxl_defbool_val(info-blackhole) || !libxl_defbool_val(info-netbuf) || @@ -841,7 +857,10 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info, dss-live = 1; dss-debug = 0; dss-remus = info; -dss-checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_REMUS; +if (libxl_defbool_val(info-colo)) +dss-checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_COLO; +else +dss-checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_REMUS; assert(info); diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index ace4a65..45ec435 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -4292,6 +4292,8 @@ static void migrate_receive(int debug, int daemonize, int monitor, char rc_buf; char *migration_domname; struct domain_create dom_info; +const char *ha = checkpointed == LIBXL_CHECKPOINTED_STREAM_COLO ? + COLO : Remus; signal(SIGPIPE, SIG_IGN); /* if we get SIGPIPE we'd rather just have it as an error */ @@ -4312,6 +4314,9 @@ static void migrate_receive(int debug, int daemonize, int monitor, dom_info.send_fd = send_fd; dom_info.migration_domname_r = migration_domname; dom_info.checkpointed_stream = checkpointed; +if (checkpointed == LIBXL_CHECKPOINTED_STREAM_COLO) +/* COLO uses stdout to send control message to master */ +dom_info.quiet = 1; rc = create_domain(dom_info); if (rc 0) { @@ -4326,8 +4331,8 @@ static void migrate_receive(int debug, int daemonize, int monitor, /* If we are here, it means that the sender (primary) has crashed. * TODO: Split-Brain Check. */ -fprintf(stderr, migration target: Remus Failover for domain %u\n, -domid); +fprintf(stderr, migration target: %s Failover for domain %u\n, +ha, domid); /* * If domain renaming fails, lets just continue (as we need the domain @@ -4343,16 +4348,20 @@ static void migrate_receive(int debug, int daemonize, int
[Xen-devel] [PATCH v8 --for 4.6 COLO 19/25] COLO: use qemu block replication
From: Wen Congyang we...@cn.fujitsu.com Use qemu block replication as our block replication solution. Note that guest must be paused before starting COLO, otherwise, the disk won't be consistent between primary and secondary. Signed-off-by: Wen Congyang we...@cn.fujitsu.com for commit message, Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxl/Makefile | 1 + tools/libxl/libxl_colo_qdisk.c | 209 +++ tools/libxl/libxl_colo_restore.c | 20 +++- tools/libxl/libxl_colo_save.c| 36 ++- tools/libxl/libxl_internal.h | 18 tools/libxl/libxl_qmp.c | 31 ++ 6 files changed, 311 insertions(+), 4 deletions(-) create mode 100644 tools/libxl/libxl_colo_qdisk.c diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index 71bf7a2..e91ae79 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -64,6 +64,7 @@ endif LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o +LIBXL_OBJS-y += libxl_colo_qdisk.o LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o diff --git a/tools/libxl/libxl_colo_qdisk.c b/tools/libxl/libxl_colo_qdisk.c new file mode 100644 index 000..d73572e --- /dev/null +++ b/tools/libxl/libxl_colo_qdisk.c @@ -0,0 +1,209 @@ +/* + * Copyright (C) 2015 FUJITSU LIMITED + * Author: Wen Congyang we...@cn.fujitsu.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include libxl_osdeps.h /* must come before any other headers */ + +#include libxl_internal.h + +typedef struct libxl__colo_qdisk { +libxl__checkpoint_device *dev; +} libxl__colo_qdisk; + +/* == init() and cleanup() == */ +int init_subkind_qdisk(libxl__checkpoint_devices_state *cds) +{ +/* + * We don't know if we use qemu block replication, so + * we cannot start block replication here. + */ +return 0; +} + +void cleanup_subkind_qdisk(libxl__checkpoint_devices_state *cds) +{ +} + +/* == setup() and teardown() == */ +static void colo_qdisk_setup(libxl__egc *egc, libxl__checkpoint_device *dev, + bool primary) +{ +const libxl_device_disk *disk = dev-backend_dev; +const char *addr = NULL; +const char *export_name; +int ret, rc = 0; + +/* Convenience aliases */ +libxl__checkpoint_devices_state *const cds = dev-cds; +const char *colo_params = disk-colo_params; +const int domid = cds-domid; + +EGC_GC; + +if (disk-backend != LIBXL_DISK_BACKEND_QDISK || +!libxl_defbool_val(disk-colo_enable)) { +rc = ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH; +goto out; +} + +export_name = strstr(colo_params, :exportname=); +if (!export_name) { +rc = ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH; +goto out; +} +export_name += strlen(:exportname=); +if (export_name[0] == 0) { +rc = ERROR_CHECKPOINT_DEVOPS_DOES_NOT_MATCH; +goto out; +} + +dev-matched = 1; + +if (primary) { +/* NBD server is not ready, so we cannot start block replication now */ +goto out; +} else { +libxl__colo_restore_state *crs = CONTAINER_OF(cds, *crs, cds); +int len; + +if (crs-qdisk_setuped) +goto out; + +crs-qdisk_setuped = true; + +len = export_name - strlen(:exportname=) - colo_params; +addr = libxl__strndup(gc, colo_params, len); +} + +ret = libxl__qmp_block_start_replication(gc, domid, primary, addr); +if (ret) +rc = ERROR_FAIL; + +out: +dev-aodev.rc = rc; +dev-aodev.callback(egc, dev-aodev); +} + +static void colo_qdisk_teardown(libxl__egc *egc, libxl__checkpoint_device *dev, +bool primary) +{ +int ret, rc = 0; + +/* Convenience aliases */ +libxl__checkpoint_devices_state *const cds = dev-cds; +const int domid = cds-domid; + +EGC_GC; + +if (primary) { +libxl__colo_save_state *css = CONTAINER_OF(cds, *css, cds); + +if (!css-qdisk_setuped) +goto out; + +css-qdisk_setuped = false; +} else { +libxl__colo_restore_state *crs = CONTAINER_OF(cds, *crs, cds); + +if (!crs-qdisk_setuped) +goto out; + +crs-qdisk_setuped = false; +} + +ret = libxl__qmp_block_stop_replication(gc, domid,
[Xen-devel] [PATCH v8 --for 4.6 COLO 11/25] secondary vm suspend/resume/checkpoint code
From: Wen Congyang we...@cn.fujitsu.com Secondary vm is running in colo mode. So we will do the following things again and again: 1. Resume secondary vm a. Send LIBXL_COLO_SVM_READY to master. b. If it is not the first resume, call libxl__checkpoint_devices_preresume(). c. If it is the first resume(resume right after live migration), - call libxl__xc_domain_restore_done() to build the secondary vm. - enable secondary vm's logdirty. - call libxl__domain_resume() to resume secondary vm. - call libxl__checkpoint_devices_setup() to setup checkpoint devices. d. Send LIBXL_COLO_SVM_RESUMED to master. 2. Wait a new checkpoint a. Call libxl__checkpoint_devices_commit(). b. Read LIBXL_COLO_NEW_CHECKPOINT from master. 3. Suspend secondary vm a. Suspend secondary vm. b. Call libxl__checkpoint_devices_postsuspend(). c. Send LIBXL_COLO_SVM_SUSPENDED to master. Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxl/Makefile | 1 + tools/libxl/libxl_colo.h | 27 ++ tools/libxl/libxl_colo_restore.c | 991 +++ tools/libxl/libxl_create.c | 111 - tools/libxl/libxl_internal.h | 19 + tools/libxl/libxl_save_callout.c | 7 +- 6 files changed, 1154 insertions(+), 2 deletions(-) create mode 100644 tools/libxl/libxl_colo.h create mode 100644 tools/libxl/libxl_colo_restore.c diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index 3cb3ae9..97b3753 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -63,6 +63,7 @@ LIBXL_OBJS-y += libxl_no_convert_callout.o endif LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o +LIBXL_OBJS-y += libxl_colo_restore.o LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h new file mode 100644 index 000..54dc835 --- /dev/null +++ b/tools/libxl/libxl_colo.h @@ -0,0 +1,27 @@ +/* + * Copyright (C) 2014 FUJITSU LIMITED + * Author: Wen Congyang we...@cn.fujitsu.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#ifndef LIBXL_COLO_H +#define LIBXL_COLO_H + +extern void libxl__colo_restore_done(libxl__egc *egc, void *dcs_void, + int ret, int retval, int errnoval); +extern void libxl__colo_restore_setup(libxl__egc *egc, + libxl__colo_restore_state *crs); +extern void libxl__colo_restore_teardown(libxl__egc *egc, + libxl__colo_restore_state *crs, + int rc); + +#endif diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c new file mode 100644 index 000..5cda0b2 --- /dev/null +++ b/tools/libxl/libxl_colo_restore.c @@ -0,0 +1,991 @@ +/* + * Copyright (C) 2014 FUJITSU LIMITED + * Author: Wen Congyang we...@cn.fujitsu.com + * Yang Hongyang yan...@cn.fujitsu.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include libxl_osdeps.h /* must come before any other headers */ + +#include libxl_internal.h +#include libxl_colo.h +#include libxl_sr_stream_format.h + +enum { +LIBXL_COLO_SETUPED, +LIBXL_COLO_SUSPENDED, +LIBXL_COLO_RESUMED, +}; + +typedef struct libxl__colo_restore_checkpoint_state libxl__colo_restore_checkpoint_state; +struct libxl__colo_restore_checkpoint_state { +libxl__domain_suspend_state dsps; +libxl__logdirty_switch lds; +libxl__colo_restore_state *crs; +libxl__stream_write_state sws; +int status; +bool preresume; +/* used for teardown */ +int teardown_devices; +int saved_rc; + +void (*callback)(libxl__egc *, + libxl__colo_restore_checkpoint_state *, + int); +}; + + +static void libxl__colo_restore_domain_resume_callback(void *data); +static
[Xen-devel] [PATCH v8 --for 4.6 COLO 14/25] libxc/restore: send dirty bitmap to primary when checkpoint under colo
Send dirty bitmap to primary when checkpoint under colo. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxc/xc_sr_common.h | 4 ++ tools/libxc/xc_sr_restore.c | 120 +++- 2 files changed, 123 insertions(+), 1 deletion(-) diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index c5603ff..7fc2021 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -213,6 +213,10 @@ struct xc_sr_context struct xc_sr_restore_ops ops; struct restore_callbacks *callbacks; +int send_fd; +unsigned long p2m_size; +xc_hypercall_buffer_t dirty_bitmap_hbuf; + /* From Image Header. */ uint32_t format_version; diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index 696bf30..8b13d8d 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -409,6 +409,92 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec) return rc; } +/* + * Send dirty_bitmap to primary. + */ +static int send_dirty_bitmap(struct xc_sr_context *ctx) +{ +xc_interface *xch = ctx-xch; +int rc = -1; +unsigned count, written; +uint64_t i, *pfns = NULL; +struct iovec *iov = NULL; +xc_shadow_op_stats_t stats = { 0, ctx-save.p2m_size }; +struct xc_sr_record rec = +{ +.type = REC_TYPE_DIRTY_BITMAP, +}; +DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap, +ctx-save.dirty_bitmap_hbuf); + +if ( xc_shadow_control( + xch, ctx-domid, XEN_DOMCTL_SHADOW_OP_CLEAN, + HYPERCALL_BUFFER(dirty_bitmap), ctx-restore.p2m_size, + NULL, 0, stats) != ctx-restore.p2m_size ) +{ +PERROR(Failed to retrieve logdirty bitmap); +goto err; +} + +for ( i = 0, count = 0; i ctx-restore.p2m_size; i++ ) +{ +if ( test_bit(i, dirty_bitmap) ) +count++; +} + + +pfns = malloc(count * sizeof(*pfns)); +if ( !pfns ) +{ +ERROR(Unable to allocate %zu bytes of memory for dirty pfn list, + count * sizeof(*pfns)); +goto err; +} + +for ( i = 0, written = 0; i ctx-restore.p2m_size; ++i ) +{ +if ( !test_bit(i, dirty_bitmap) ) +continue; + +if ( written count ) +{ +ERROR(Dirty pfn list exceed); +goto err; +} + +pfns[written++] = i; +} + +/* iovec[] for writev(). */ +iov = malloc(3 * sizeof(*iov)); +if ( !iov ) +{ +ERROR(Unable to allocate memory for sending dirty bitmap); +goto err; +} + +rec.length = count * sizeof(*pfns); + +iov[0].iov_base = rec.type; +iov[0].iov_len = sizeof(rec.type); + +iov[1].iov_base = rec.length; +iov[1].iov_len = sizeof(rec.length); + +iov[2].iov_base = pfns; +iov[2].iov_len = count * sizeof(*pfns); + +if ( writev_exact(ctx-restore.send_fd, iov, 3) ) +{ +PERROR(Failed to write dirty bitmap to stream); +goto err; +} + +rc = 0; + err: +return rc; +} + static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec); static int handle_checkpoint(struct xc_sr_context *ctx) { @@ -494,7 +580,9 @@ static int handle_checkpoint(struct xc_sr_context *ctx) #undef HANDLE_CALLBACK_RETURN_VALUE -/* TODO: send dirty bitmap to primary */ +rc = send_dirty_bitmap(ctx); +if ( rc ) +goto err; } err: @@ -566,6 +654,21 @@ static int setup(struct xc_sr_context *ctx) { xc_interface *xch = ctx-xch; int rc; +DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap, +ctx-restore.dirty_bitmap_hbuf); + +if ( ctx-restore.checkpointed == MIG_STREAM_COLO ) +{ +dirty_bitmap = xc_hypercall_buffer_alloc_pages(xch, dirty_bitmap, +NRPAGES(bitmap_size(ctx-restore.p2m_size))); + +if ( !dirty_bitmap ) +{ +ERROR(Unable to allocate memory for dirty bitmap); +rc = -1; +goto err; +} +} rc = ctx-restore.ops.setup(ctx); if ( rc ) @@ -599,10 +702,15 @@ static void cleanup(struct xc_sr_context *ctx) { xc_interface *xch = ctx-xch; unsigned i; +DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap, +ctx-save.dirty_bitmap_hbuf); for ( i = 0; i ctx-restore.buffered_rec_num; i++ ) free(ctx-restore.buffered_records[i].data); +if ( ctx-restore.checkpointed == MIG_STREAM_COLO ) +xc_hypercall_buffer_free_pages(xch, dirty_bitmap, + NRPAGES(bitmap_size(ctx-save.p2m_size))); free(ctx-restore.buffered_records); free(ctx-restore.populated_pfns); if ( ctx-restore.ops.cleanup(ctx) ) @@ -713,6 +821,7 @@
[Xen-devel] [PATCH v8 --for 4.6 COLO 16/25] libxc/save: support COLO save
After suspend primary vm, get dirty bitmap on secondary vm, and send pages both dirty on primary/secondary to secondary. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com CC: Andrew Cooper andrew.coop...@citrix.com --- tools/libxc/xc_sr_common.h | 2 + tools/libxc/xc_sr_save.c | 104 +++-- 2 files changed, 102 insertions(+), 4 deletions(-) diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index 7fc2021..5f2d99b 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -183,6 +183,8 @@ struct xc_sr_context { struct /* Save data. */ { +int recv_fd; + struct xc_sr_save_ops ops; struct save_callbacks *callbacks; diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c index d12e5b1..6f13706 100644 --- a/tools/libxc/xc_sr_save.c +++ b/tools/libxc/xc_sr_save.c @@ -515,6 +515,58 @@ static int send_memory_live(struct xc_sr_context *ctx) return rc; } +static int merge_secondary_dirty_bitmap(struct xc_sr_context *ctx) +{ +xc_interface *xch = ctx-xch; +struct xc_sr_record rec; +uint64_t *pfns = NULL; +uint64_t pfn; +unsigned count, i; +int rc; +DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap, +ctx-save.dirty_bitmap_hbuf); + +rc = read_record(ctx, ctx-save.recv_fd, rec); +if ( rc ) +goto err; + +if ( rec.type != REC_TYPE_DIRTY_BITMAP ) +{ +PERROR(Expect dirty bitmap record, but received %u, rec.type ); +rc = -1; +goto err; +} + +if ( rec.length % sizeof(*pfns) ) +{ +PERROR(Invalid dirty bitmap record length %u, rec.length ); +rc = -1; +goto err; +} + +count = rec.length / sizeof(*pfns); +pfns = rec.data; + +for ( i = 0; i count; i++ ) +{ +pfn = pfns[i]; +if (pfn ctx-save.p2m_size) +{ +PERROR(Invalid pfn %#lx, pfn ); +rc = -1; +goto err; +} + +set_bit(pfn, dirty_bitmap); +} + +rc = 0; + + err: +free(rec.data); +return rc; +} + /* * Suspend the domain and send dirty memory. * This is the last iteration of the live migration and the @@ -555,6 +607,16 @@ static int suspend_and_send_dirty(struct xc_sr_context *ctx) bitmap_or(dirty_bitmap, ctx-save.deferred_pages, ctx-save.p2m_size); +if ( !ctx-save.live ctx-save.checkpointed == MIG_STREAM_COLO ) +{ +rc = merge_secondary_dirty_bitmap(ctx); +if ( rc ) +{ +PERROR(Failed to get secondary vm's dirty pages); +goto out; +} +} + rc = send_dirty_pages(ctx, stats.dirty_count + ctx-save.nr_deferred_pages); if ( rc ) goto out; @@ -784,11 +846,42 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type) if ( rc ) goto err; -ctx-save.callbacks-postcopy(ctx-save.callbacks-data); +if ( ctx-save.checkpointed == MIG_STREAM_COLO ) +{ +rc = ctx-save.callbacks-checkpoint(ctx-save.callbacks-data); +if ( !rc ) +{ +rc = -1; +goto err; +} +} -rc = ctx-save.callbacks-checkpoint(ctx-save.callbacks-data); -if ( rc = 0 ) -ctx-save.checkpointed = false; +rc = ctx-save.callbacks-postcopy(ctx-save.callbacks-data); +if ( !rc ) +{ +rc = -1; +goto err; +} + +if ( ctx-save.checkpointed == MIG_STREAM_COLO ) +{ +rc = ctx-save.callbacks-should_checkpoint( +ctx-save.callbacks-data); +if ( rc = 0 ) +ctx-save.checkpointed = false; +} +else if ( ctx-save.checkpointed == MIG_STREAM_REMUS ) +{ +rc = ctx-save.callbacks-checkpoint(ctx-save.callbacks-data); +if ( rc = 0 ) +ctx-save.checkpointed = false; +} +else +{ +ERROR(Unknown checkpointed stream); +rc = -1; +goto err; +} } } while ( ctx-save.checkpointed ); @@ -835,6 +928,7 @@ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, ctx.save.live = !!(flags XCFLAGS_LIVE); ctx.save.debug = !!(flags XCFLAGS_DEBUG); ctx.save.checkpointed = checkpointed_stream; +ctx.save.recv_fd = back_fd; /* * TODO: Find some time to better tweak the live migration algorithm. @@ -850,6 +944,8 @@ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, assert(callbacks-switch_qemu_logdirty); if (
[Xen-devel] [PATCH v8 --for 4.6 COLO 01/25] docs: add colo readme
add colo readme, refer to http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com Acked-by: Ian Campbell ian.campb...@citrix.com --- docs/README.colo | 9 + 1 file changed, 9 insertions(+) create mode 100644 docs/README.colo diff --git a/docs/README.colo b/docs/README.colo new file mode 100644 index 000..466eb72 --- /dev/null +++ b/docs/README.colo @@ -0,0 +1,9 @@ +COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service) +project is a high availability solution. Both primary VM (PVM) and secondary VM +(SVM) run in parallel. They receive the same request from client, and generate +response in parallel too. If the response packets from PVM and SVM are +identical, they are released immediately. Otherwise, a VM checkpoint (on demand) +is conducted. + +See the website at http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping +for details. -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 --for 4.6 COLO 08/25] tools/libxl: handle colo_context records in a libxl migration v2 read stream
Read a colo_context and call stream-checkpoint_callback to handle it. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- tools/libxl/libxl_internal.h| 3 +++ tools/libxl/libxl_stream_read.c | 51 + 2 files changed, 54 insertions(+) diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 05cee04..1be2a4a 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -3369,6 +3369,7 @@ struct libxl__stream_read_state { int rc; bool running; bool in_checkpoint; +bool in_colo_context; libxl__save_helper_state shs; libxl__conversion_helper_state chs; @@ -3396,6 +3397,8 @@ _hidden void libxl__stream_read_start(libxl__egc *egc, libxl__stream_read_state *stream); _hidden void libxl__stream_read_start_checkpoint(libxl__egc *egc, libxl__stream_read_state *stream); +_hidden void libxl__stream_read_colo_context(libxl__egc *egc, + libxl__stream_read_state *stream); _hidden void libxl__stream_read_abort(libxl__egc *egc, libxl__stream_read_state *stream, int rc); static inline bool diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c index b924f05..ab47251 100644 --- a/tools/libxl/libxl_stream_read.c +++ b/tools/libxl/libxl_stream_read.c @@ -152,6 +152,13 @@ static void write_emulator_done(libxl__egc *egc, libxl__datacopier_state *dc, int rc, int onwrite, int errnoval); +/* Handlers for colo context mini-loop */ +static void handle_colo_context(libxl__egc *egc, +libxl__stream_read_state *stream, +libxl__sr_record_buf *rec); +static void colo_context_done(libxl__egc *egc, + libxl__stream_read_state *stream, int rc); + /*- Helpers -*/ /* Helper to set up reading some data from the stream. */ @@ -569,6 +576,15 @@ static bool process_record(libxl__egc *egc, checkpoint_done(egc, stream, 0); break; +case REC_TYPE_COLO_CONTEXT: +if (!stream-in_colo_context) { +LOG(ERROR, Unexpected COLO_CONTEXT record in stream); +rc = ERROR_FAIL; +goto err; +} +handle_colo_context(egc, stream, rec); +break; + default: LOG(ERROR, Unrecognised record 0x%08x, rec-hdr.type); rc = ERROR_FAIL; @@ -678,6 +694,11 @@ static void stream_complete(libxl__egc *egc, return; } +if (stream-in_colo_context) { +colo_context_done(egc, stream, rc); +return; +} + if (!stream-rc) stream-rc = rc; stream_done(egc, stream); @@ -794,6 +815,36 @@ static void check_all_finished(libxl__egc *egc, stream-completion_callback(egc, stream, stream-rc); } +/*- COLO context handlers -*/ + +void libxl__stream_read_colo_context(libxl__egc *egc, + libxl__stream_read_state *stream) +{ +assert(stream-running); +assert(!stream-in_checkpoint); +assert(!stream-in_colo_context); +stream-in_colo_context = true; + +setup_read_record(egc, stream); +} + +static void handle_colo_context(libxl__egc *egc, +libxl__stream_read_state *stream, +libxl__sr_record_buf *rec) +{ +libxl_sr_colo_context *colo_context = rec-body; + +colo_context_done(egc, stream, colo_context-id); +} + +static void colo_context_done(libxl__egc *egc, + libxl__stream_read_state *stream, int rc) +{ +assert(stream-in_colo_context); +stream-in_colo_context = false; +stream-checkpoint_callback(egc, stream, rc); +} + /* * Local variables: * mode: C -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 --for 4.6 COLO 05/25] tools/libxl: add back channel support to write stream
Add back channel support to write stream. If the write stream is a back channel stream, this means the write stream is used by Secondary to send some records back. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxl/libxl_dom_save.c | 1 + tools/libxl/libxl_internal.h | 1 + tools/libxl/libxl_stream_write.c | 16 3 files changed, 18 insertions(+) diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c index 9b7159f..25813ce 100644 --- a/tools/libxl/libxl_dom_save.c +++ b/tools/libxl/libxl_dom_save.c @@ -445,6 +445,7 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss) dss-sws.ao = dss-ao; dss-sws.dss = dss; dss-sws.fd = dss-fd; +dss-sws.back_channel = false; dss-sws.completion_callback = stream_done; libxl__stream_write_start(egc, dss-sws); diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 9c81d8d..a83d6a5 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2989,6 +2989,7 @@ struct libxl__stream_write_state { libxl__ao *ao; libxl__domain_save_state *dss; int fd; +bool back_channel; void (*completion_callback)(libxl__egc *egc, libxl__stream_write_state *sws, int rc); diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c index 16f667a..df55277 100644 --- a/tools/libxl/libxl_stream_write.c +++ b/tools/libxl/libxl_stream_write.c @@ -47,6 +47,13 @@ * - Toolstack record * - if (hvm), Qemu record * - Checkpoint end record + * + * For back channel stream: + * - libxl__stream_write_start() + *- Set up the stream to running state + * + * - Add a new API to write the record. When the record is written + * out, call stream-checkpoint_callback() to return. */ /* Success/error/cleanup handling. */ @@ -178,6 +185,9 @@ void libxl__stream_write_start(libxl__egc *egc, stream-running = true; +if (stream-back_channel) +return; + dc-ao= ao; dc-readfd= -1; dc-writewhat = save/migration stream; @@ -207,6 +217,7 @@ void libxl__stream_write_start_checkpoint(libxl__egc *egc, { assert(stream-running); assert(!stream-in_checkpoint); +assert(!stream-back_channel); stream-in_checkpoint = true; write_toolstack_record(egc, stream); @@ -500,6 +511,11 @@ static void stream_done(libxl__egc *egc, assert(stream-running); stream-running = false; +if (stream-back_channel) { +stream-completion_callback(egc, stream, stream-rc); +return; +} + if (stream-emu_carefd) libxl__carefd_close(stream-emu_carefd); free(stream-emu_body); -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 --for 4.6 COLO 20/25] COLO proxy: implement setup/teardown of COLO proxy module
setup/teardown of COLO proxy module. we use netlink to communicate with proxy module. About colo-proxy module: https://lkml.org/lkml/2015/6/18/32 How to use: http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxl/Makefile | 1 + tools/libxl/libxl_colo.h | 2 + tools/libxl/libxl_colo_proxy.c | 210 + tools/libxl/libxl_internal.h | 12 +++ 4 files changed, 225 insertions(+) create mode 100644 tools/libxl/libxl_colo_proxy.c diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index e91ae79..d7a3540 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -65,6 +65,7 @@ endif LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o LIBXL_OBJS-y += libxl_colo_qdisk.o +LIBXL_OBJS-y += libxl_colo_proxy.o LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h index 49a430b..46ca4cf 100644 --- a/tools/libxl/libxl_colo.h +++ b/tools/libxl/libxl_colo.h @@ -34,4 +34,6 @@ extern void libxl__colo_save_teardown(libxl__egc *egc, libxl__colo_save_state *css, int rc); +extern int colo_proxy_setup(libxl__colo_proxy_state *cps); +extern void colo_proxy_teardown(libxl__colo_proxy_state *cps); #endif diff --git a/tools/libxl/libxl_colo_proxy.c b/tools/libxl/libxl_colo_proxy.c new file mode 100644 index 000..9f1243e --- /dev/null +++ b/tools/libxl/libxl_colo_proxy.c @@ -0,0 +1,210 @@ +/* + * Copyright (C) 2015 FUJITSU LIMITED + * Author: Yang Hongyang yan...@cn.fujitsu.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include libxl_osdeps.h /* must come before any other headers */ + +#include libxl_internal.h +#include libxl_colo.h +#include linux/netlink.h + +#define NETLINK_COLO 28 + +enum colo_netlink_op { +COLO_QUERY_CHECKPOINT = (NLMSG_MIN_TYPE + 1), +COLO_CHECKPOINT, +COLO_FAILOVER, +COLO_PROXY_INIT, +COLO_PROXY_RESET, /* UNUSED, will be used for continuous FT */ +}; + +/* = colo-proxy: helper functions == */ + +static int colo_proxy_send(libxl__colo_proxy_state *cps, uint8_t *buff, uint64_t size, int type) +{ +struct sockaddr_nl sa; +struct nlmsghdr msg; +struct iovec iov; +struct msghdr mh; +int ret; + +STATE_AO_GC(cps-ao); + +memset(sa, 0, sizeof(sa)); +sa.nl_family = AF_NETLINK; +sa.nl_pid = 0; +sa.nl_groups = 0; + +msg.nlmsg_len = NLMSG_SPACE(0); +msg.nlmsg_flags = NLM_F_REQUEST; +if (type == COLO_PROXY_INIT) { +msg.nlmsg_flags |= NLM_F_ACK; +} +msg.nlmsg_seq = 0; +/* This is untrusty */ +msg.nlmsg_pid = cps-index; +msg.nlmsg_type = type; + +iov.iov_base = msg; +iov.iov_len = msg.nlmsg_len; + +mh.msg_name = sa; +mh.msg_namelen = sizeof(sa); +mh.msg_iov = iov; +mh.msg_iovlen = 1; +mh.msg_control = NULL; +mh.msg_controllen = 0; +mh.msg_flags = 0; + +ret = sendmsg(cps-sock_fd, mh, 0); +if (ret = 0) { +LOG(ERROR, can't send msg to kernel by netlink: %s, +strerror(errno)); +} + +return ret; +} + +/* error: return -1, otherwise return 0 */ +static int64_t colo_proxy_recv(libxl__colo_proxy_state *cps, uint8_t **buff, int flags) +{ +struct sockaddr_nl sa; +struct iovec iov; +struct msghdr mh = { +.msg_name = sa, +.msg_namelen = sizeof(sa), +.msg_iov = iov, +.msg_iovlen = 1, +}; +uint32_t size = 16384; +int64_t len = 0; +int ret; + +STATE_AO_GC(cps-ao); +uint8_t *tmp = libxl__malloc(NOGC, size); + +iov.iov_base = tmp; +iov.iov_len = size; +next: + ret = recvmsg(cps-sock_fd, mh, flags); +if (ret = 0) { +goto out; +} + +len += ret; +if (mh.msg_flags MSG_TRUNC) { +size += 16384; +tmp = libxl__realloc(NOGC, tmp, size); +iov.iov_base = tmp + len; +iov.iov_len = size - len; +goto next; +} + +*buff = tmp; +return len; + +out: +free(tmp); +*buff = NULL; +return ret; +} + +/* = colo-proxy: setup and teardown == */ + +int colo_proxy_setup(libxl__colo_proxy_state *cps) +{ +int skfd = 0; +struct sockaddr_nl
[Xen-devel] [PATCH v8 --for 4.6 COLO 24/25] setup and control colo proxy on secondary side
setup and control colo proxy on secondary side Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxl/libxl_colo_restore.c | 28 +--- tools/libxl/libxl_internal.h | 3 +++ 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c index 96ea0b9..da546f9 100644 --- a/tools/libxl/libxl_colo_restore.c +++ b/tools/libxl/libxl_colo_restore.c @@ -49,9 +49,11 @@ static void libxl__colo_restore_domain_checkpoint_callback(void *data); static void libxl__colo_restore_domain_should_checkpoint_callback(void *data); static void libxl__colo_restore_domain_suspend_callback(void *data); +extern const libxl__checkpoint_device_instance_ops colo_restore_device_nic; extern const libxl__checkpoint_device_instance_ops colo_restore_device_qdisk; static const libxl__checkpoint_device_instance_ops *colo_restore_ops[] = { +colo_restore_device_nic, colo_restore_device_qdisk, NULL, }; @@ -151,8 +153,14 @@ static int init_device_subkind(libxl__checkpoint_devices_state *cds) int rc; STATE_AO_GC(cds-ao); +rc = init_subkind_colo_nic(cds); +if (rc) goto out; + rc = init_subkind_qdisk(cds); -if (rc) goto out; +if (rc) { +cleanup_subkind_colo_nic(cds); +goto out; +} rc = 0; out: @@ -164,6 +172,7 @@ static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds) /* cleanup device subkind-specific state in the libxl ctx */ STATE_AO_GC(cds-ao); +cleanup_subkind_colo_nic(cds); cleanup_subkind_qdisk(cds); } @@ -351,6 +360,8 @@ static void colo_restore_teardown_done(libxl__egc *egc, if (crcs-teardown_devices) cleanup_device_subkind(cds); +colo_proxy_teardown(crs-cps); + rc = crcs-saved_rc; if (!rc) { crcs-callback = do_failover_done; @@ -535,6 +546,8 @@ static void colo_restore_preresume_cb(libxl__egc *egc, goto out; } +colo_proxy_preresume(crs-cps); + colo_restore_resume_vm(egc, crcs); return; @@ -571,6 +584,8 @@ static void colo_resume_vm_done(libxl__egc *egc, crcs-status = LIBXL_COLO_RESUMED; +colo_proxy_postresume(crs-cps); + /* avoid calling libxl__xc_domain_restore_done() more than once */ if (crs-saved_cb) { dcs-callback = crs-saved_cb; @@ -690,13 +705,20 @@ static void colo_setup_checkpoint_devices(libxl__egc *egc, STATE_AO_GC(crs-ao); -/* TODO: nic support */ -cds-device_kind_flags = (1 LIBXL__DEVICE_KIND_VBD); +cds-device_kind_flags = (1 LIBXL__DEVICE_KIND_VIF) | + (1 LIBXL__DEVICE_KIND_VBD); cds-callback = colo_restore_setup_cds_done; cds-ao = ao; cds-domid = crs-domid; cds-ops = colo_restore_ops; +crs-cps.ao = ao; +if (colo_proxy_setup(crs-cps)) { +LOG(ERROR, COLO: failed to setup colo proxy for guest with domid %u, +cds-domid); +goto out; +} + if (init_device_subkind(cds)) goto out; diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index d12297d..33a93a1 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -3476,6 +3476,9 @@ struct libxl__colo_restore_state { /* private, used by qdisk block replication */ bool qdisk_setuped; + +/* private, used by colo proxy */ +libxl__colo_proxy_state cps; }; struct libxl__domain_create_state { -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 --for 4.6 COLO 15/25] send store mfn and console mfn to xl before resuming secondary vm
From: Wen Congyang we...@cn.fujitsu.com We will call libxl__xc_domain_restore_done() to rebuild secondary vm. But we need store mfn and console mfn when rebuilding secondary vm. So make restore_results a function pointer in callback struct and struct {save,restore}_callbacks, and use this callback to send store mfn and console mfn to xl. Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com CC: Andrew Cooper andrew.coop...@citrix.com --- tools/libxc/include/xenguest.h | 8 tools/libxc/xc_sr_restore.c| 7 +-- tools/libxl/libxl_colo_restore.c | 5 - tools/libxl/libxl_create.c | 2 ++ tools/libxl/libxl_save_msgs_gen.pl | 2 +- 5 files changed, 16 insertions(+), 8 deletions(-) diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index 1e7e1bb..d7bdfb5 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -140,6 +140,14 @@ struct restore_callbacks { */ int (*should_checkpoint)(void* data); +/* + * callback to send store mfn and console mfn to xl + * if we want to resume vm before xc_domain_save() + * exits. + */ +void (*restore_results)(unsigned long store_mfn, unsigned long console_mfn, +void *data); + /* to be provided as the last argument to each callback function */ void* data; }; diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index 8b13d8d..fe81acb 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -563,7 +563,9 @@ static int handle_checkpoint(struct xc_sr_context *ctx) if ( rc ) goto err; -/* TODO: call restore_results */ +ctx-restore.callbacks-restore_results(ctx-restore.xenstore_gfn, +ctx-restore.console_gfn, +ctx-restore.callbacks-data); /* Resume secondary vm */ ret = ctx-restore.callbacks-postcopy(ctx-restore.callbacks-data); @@ -846,7 +848,8 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom, /* this is COLO restore */ assert(callbacks-suspend callbacks-postcopy - callbacks-should_checkpoint); + callbacks-should_checkpoint + callbacks-restore_results); } IPRINTF(In experimental %s, __func__); diff --git a/tools/libxl/libxl_colo_restore.c b/tools/libxl/libxl_colo_restore.c index 5cda0b2..99f06ab 100644 --- a/tools/libxl/libxl_colo_restore.c +++ b/tools/libxl/libxl_colo_restore.c @@ -137,11 +137,6 @@ static void colo_resume_vm(libxl__egc *egc, return; } -/* - * TODO: get store mfn and console mfn - * We should call the callback restore_results in - * xc_domain_restore() before resuming the guest. - */ libxl__xc_domain_restore_done(egc, dcs, 0, 0, 0); return; diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index bf4b55d..34e9362 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -1080,6 +1080,8 @@ static void domcreate_bootloader_done(libxl__egc *egc, dcs-srs.completion_callback = domcreate_stream_done; /* colo restore setup */ +callbacks-restore_results = libxl__srm_callout_callback_restore_results; + if (checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) { crs-ao = ao; crs-domid = domid; diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl index 7c9859b..e8943b9 100755 --- a/tools/libxl/libxl_save_msgs_gen.pl +++ b/tools/libxl/libxl_save_msgs_gen.pl @@ -29,7 +29,7 @@ our @msgs = ( [ 6, 'srcxA', should_checkpoint, [] ], [ 7, 'scxA', switch_qemu_logdirty, [qw(int domid unsigned enable)] ], -[ 8, 'r', restore_results, ['unsigned long', 'store_mfn', +[ 8, 'rcx',restore_results, ['unsigned long', 'store_mfn', 'unsigned long', 'console_mfn'] ], [ 9, 'srW',complete, [qw(int retval int errnoval)] ], -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 --for 4.6 COLO 21/25] COLO proxy: preresume, postresume and checkpoint
preresume, postresume and checkpoint Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxl/libxl_colo.h | 3 +++ tools/libxl/libxl_colo_proxy.c | 57 ++ 2 files changed, 60 insertions(+) diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h index 46ca4cf..4e5f02a 100644 --- a/tools/libxl/libxl_colo.h +++ b/tools/libxl/libxl_colo.h @@ -36,4 +36,7 @@ extern void libxl__colo_save_teardown(libxl__egc *egc, extern int colo_proxy_setup(libxl__colo_proxy_state *cps); extern void colo_proxy_teardown(libxl__colo_proxy_state *cps); +extern void colo_proxy_preresume(libxl__colo_proxy_state *cps); +extern void colo_proxy_postresume(libxl__colo_proxy_state *cps); +extern int colo_proxy_checkpoint(libxl__colo_proxy_state *cps); #endif diff --git a/tools/libxl/libxl_colo_proxy.c b/tools/libxl/libxl_colo_proxy.c index 9f1243e..c8ff722 100644 --- a/tools/libxl/libxl_colo_proxy.c +++ b/tools/libxl/libxl_colo_proxy.c @@ -208,3 +208,60 @@ void colo_proxy_teardown(libxl__colo_proxy_state *cps) cps-sock_fd = -1; } } + +/* = colo-proxy: preresume, postresume and checkpoint == */ + +void colo_proxy_preresume(libxl__colo_proxy_state *cps) +{ +colo_proxy_send(cps, NULL, 0, COLO_CHECKPOINT); +/* TODO: need to handle if the call fails... */ +} + +void colo_proxy_postresume(libxl__colo_proxy_state *cps) +{ +/* nothing to do... */ +} + + +typedef struct colo_msg { +bool is_checkpoint; +} colo_msg; + +/* +do checkpoint: return 1 +error: return -1 +do not checkpoint: return 0 +*/ +int colo_proxy_checkpoint(libxl__colo_proxy_state *cps) +{ +uint8_t *buff; +int64_t size; +struct nlmsghdr *h; +struct colo_msg *m; +int ret = -1; + +size = colo_proxy_recv(cps, buff, MSG_DONTWAIT); + +/* timeout, return no checkpoint message. */ +if (size = 0) { +return 0; +} + +h = (struct nlmsghdr *) buff; + +if (h-nlmsg_type == NLMSG_ERROR) { +goto out; +} + +if (h-nlmsg_len NLMSG_LENGTH(sizeof(*m))) { +goto out; +} + +m = NLMSG_DATA(h); + +ret = m-is_checkpoint ? 1 : 0; + +out: +free(buff); +return ret; +} -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 --for 4.6 COLO 12/25] primary vm suspend/resume/checkpoint code
From: Wen Congyang we...@cn.fujitsu.com We will do the following things again and again: 1. Suspend primary vm a. Suspend primary vm b. do postsuspend c. Read LIBXL_COLO_SVM_SUSPENDED sent by secondary 2. Resume primary vm a. Read LIBXL_COLO_SVM_READY from slave b. Do presume c. Resume primary vm d. Read LIBXL_COLO_SVM_RESUMED from slave 3. Wait a new checkpoint a. Wait a new checkpoint(not implemented) b. Send LIBXL_COLO_NEW_CHECKPOINT to slave Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxl/Makefile | 2 +- tools/libxl/libxl.c | 6 +- tools/libxl/libxl_colo.h | 10 + tools/libxl/libxl_colo_save.c | 569 ++ tools/libxl/libxl_dom_save.c | 13 +- tools/libxl/libxl_internal.h | 167 +++-- tools/libxl/libxl_types.idl | 1 + 7 files changed, 689 insertions(+), 79 deletions(-) create mode 100644 tools/libxl/libxl_colo_save.c diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index 97b3753..71bf7a2 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -63,7 +63,7 @@ LIBXL_OBJS-y += libxl_no_convert_callout.o endif LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o -LIBXL_OBJS-y += libxl_colo_restore.o +LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 5502709..c040909 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -17,6 +17,7 @@ #include libxl_osdeps.h #include libxl_internal.h +#include libxl_colo.h #define PAGE_TO_MEMKB(pages) ((pages) * 4) #define BACKEND_STRING_SIZE 5 @@ -845,7 +846,10 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info, assert(info); /* Point of no return */ -libxl__remus_setup(egc, dss-rs); +if (libxl_defbool_val(info-colo)) +libxl__colo_save_setup(egc, dss-css); +else +libxl__remus_setup(egc, dss-rs); return AO_INPROGRESS; out: diff --git a/tools/libxl/libxl_colo.h b/tools/libxl/libxl_colo.h index 54dc835..49a430b 100644 --- a/tools/libxl/libxl_colo.h +++ b/tools/libxl/libxl_colo.h @@ -24,4 +24,14 @@ extern void libxl__colo_restore_teardown(libxl__egc *egc, libxl__colo_restore_state *crs, int rc); +extern void libxl__colo_save_domain_suspend_callback(void *data); +extern void libxl__colo_save_domain_checkpoint_callback(void *data); +extern void libxl__colo_save_domain_resume_callback(void *data); +extern void libxl__colo_save_domain_should_checkpoint_callback(void *data); +extern void libxl__colo_save_setup(libxl__egc *egc, + libxl__colo_save_state *css); +extern void libxl__colo_save_teardown(libxl__egc *egc, + libxl__colo_save_state *css, + int rc); + #endif diff --git a/tools/libxl/libxl_colo_save.c b/tools/libxl/libxl_colo_save.c new file mode 100644 index 000..f0ab565 --- /dev/null +++ b/tools/libxl/libxl_colo_save.c @@ -0,0 +1,569 @@ +/* + * Copyright (C) 2014 FUJITSU LIMITED + * Author: Wen Congyang we...@cn.fujitsu.com + * Yang Hongyang yan...@cn.fujitsu.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include libxl_osdeps.h /* must come before any other headers */ + +#include libxl_internal.h +#include libxl_colo.h + +static const libxl__checkpoint_device_instance_ops *colo_ops[] = { +NULL, +}; + +/* = helper functions = */ +static int init_device_subkind(libxl__checkpoint_devices_state *cds) +{ +/* init device subkind-specific state in the libxl ctx */ +int rc; +STATE_AO_GC(cds-ao); + +rc = 0; +return rc; +} + +static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds) +{ +/* cleanup device subkind-specific state in the libxl ctx */ +STATE_AO_GC(cds-ao); +} + +/* = colo: setup save environment = */ +static void colo_save_setup_done(libxl__egc *egc, + libxl__checkpoint_devices_state *cds, + int rc); +static void colo_save_setup_failed(libxl__egc *egc, +
[Xen-devel] [PATCH v8 --for 4.6 COLO 23/25] setup and control colo proxy on primary side
setup and control colo proxy on primary side Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxl/libxl_colo_save.c | 124 +++--- tools/libxl/libxl_internal.h | 1 + 2 files changed, 117 insertions(+), 8 deletions(-) diff --git a/tools/libxl/libxl_colo_save.c b/tools/libxl/libxl_colo_save.c index 1245da7..50a880b 100644 --- a/tools/libxl/libxl_colo_save.c +++ b/tools/libxl/libxl_colo_save.c @@ -19,9 +19,11 @@ #include libxl_internal.h #include libxl_colo.h +extern const libxl__checkpoint_device_instance_ops colo_save_device_nic; extern const libxl__checkpoint_device_instance_ops colo_save_device_qdisk; static const libxl__checkpoint_device_instance_ops *colo_ops[] = { +colo_save_device_nic, colo_save_device_qdisk, NULL, }; @@ -33,9 +35,15 @@ static int init_device_subkind(libxl__checkpoint_devices_state *cds) int rc; STATE_AO_GC(cds-ao); -rc = init_subkind_qdisk(cds); +rc = init_subkind_colo_nic(cds); if (rc) goto out; +rc = init_subkind_qdisk(cds); +if (rc) { +cleanup_subkind_colo_nic(cds); +goto out; +} + rc = 0; out: return rc; @@ -46,6 +54,7 @@ static void cleanup_device_subkind(libxl__checkpoint_devices_state *cds) /* cleanup device subkind-specific state in the libxl ctx */ STATE_AO_GC(cds-ao); +cleanup_subkind_colo_nic(cds); cleanup_subkind_qdisk(cds); } @@ -76,9 +85,16 @@ void libxl__colo_save_setup(libxl__egc *egc, libxl__colo_save_state *css) css-svm_running = false; css-paused = true; css-qdisk_setuped = false; +libxl__ev_child_init(css-child); -/* TODO: nic support */ -cds-device_kind_flags = (1 LIBXL__DEVICE_KIND_VBD); +if (dss-remus-netbufscript) +css-colo_proxy_script = libxl__strdup(gc, dss-remus-netbufscript); +else +css-colo_proxy_script = GCSPRINTF(%s/colo-proxy-setup, + libxl__xen_script_dir_path()); + +cds-device_kind_flags = (1 LIBXL__DEVICE_KIND_VIF) | + (1 LIBXL__DEVICE_KIND_VBD); cds-ops = colo_ops; cds-callback = colo_save_setup_done; cds-ao = ao; @@ -88,6 +104,12 @@ void libxl__colo_save_setup(libxl__egc *egc, libxl__colo_save_state *css) css-srs.fd = css-recv_fd; css-srs.back_channel = true; libxl__stream_read_start(egc, css-srs); +css-cps.ao = ao; +if (colo_proxy_setup(css-cps)) { +LOG(ERROR, COLO: failed to setup colo proxy for guest with domid %u, +cds-domid); +goto out; +} if (init_device_subkind(cds)) goto out; @@ -162,6 +184,7 @@ static void colo_teardown_done(libxl__egc *egc, libxl__domain_save_state *dss = CONTAINER_OF(css, *dss, css); cleanup_device_subkind(cds); +colo_proxy_teardown(css-cps); dss-callback(egc, dss, rc); } @@ -378,6 +401,8 @@ static void colo_read_svm_ready_done(libxl__egc *egc, goto out; } +colo_proxy_preresume(css-cps); + css-svm_running = true; css-cds.callback = colo_preresume_cb; libxl__checkpoint_devices_preresume(egc, css-cds); @@ -454,6 +479,8 @@ static void colo_read_svm_resumed_done(libxl__egc *egc, goto out; } +colo_proxy_postresume(css-cps); + ok = 1; out: @@ -462,6 +489,91 @@ out: /* = colo: wait new checkpoint = */ + +static void colo_start_new_checkpoint(libxl__egc *egc, + libxl__checkpoint_devices_state *cds, + int rc); +static void colo_proxy_async_wait_for_checkpoint(libxl__colo_save_state *css); +static void colo_proxy_async_call_done(libxl__egc *egc, + libxl__ev_child *child, + int pid, + int status); + +static void colo_proxy_async_call(libxl__egc *egc, + libxl__colo_save_state *css, + void func(libxl__colo_save_state *), + libxl__ev_child_callback callback) +{ +int pid = -1, rc; + +STATE_AO_GC(css-cds.ao); + +/* Fork and call */ +pid = libxl__ev_child_fork(gc, css-child, callback); +if (pid == -1) { +LOG(ERROR, unable to fork); +rc = ERROR_FAIL; +goto out; +} + +if (!pid) { +/* child */ +func(css); +/* notreached */ +abort(); +} + +return; + +out: +callback(egc, css-child, -1, 1); +} + +static void colo_proxy_wait_for_checkpoint(libxl__egc *egc, + libxl__colo_save_state *css) +{ +colo_proxy_async_call(egc, css, + colo_proxy_async_wait_for_checkpoint, + colo_proxy_async_call_done); +} + +static void
[Xen-devel] [PATCH v8 --for 4.6 COLO 22/25] COLO nic: implement COLO nic subkind
implement COLO nic subkind. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- tools/hotplug/Linux/Makefile | 1 + tools/hotplug/Linux/colo-proxy-setup | 131 ++ tools/libxl/Makefile | 1 + tools/libxl/libxl_colo_nic.c | 320 +++ tools/libxl/libxl_internal.h | 5 + tools/libxl/libxl_types.idl | 1 + 6 files changed, 459 insertions(+) create mode 100755 tools/hotplug/Linux/colo-proxy-setup create mode 100644 tools/libxl/libxl_colo_nic.c diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile index bc8ee5e..71b6475 100644 --- a/tools/hotplug/Linux/Makefile +++ b/tools/hotplug/Linux/Makefile @@ -26,6 +26,7 @@ XEN_SCRIPTS += block-iscsi XEN_SCRIPTS += block-tap XEN_SCRIPTS += block-drbd-probe XEN_SCRIPTS += $(XEN_SCRIPTS-y) +XEN_SCRIPTS += colo-proxy-setup SUBDIRS-$(CONFIG_SYSTEMD) += systemd diff --git a/tools/hotplug/Linux/colo-proxy-setup b/tools/hotplug/Linux/colo-proxy-setup new file mode 100755 index 000..3096a9c --- /dev/null +++ b/tools/hotplug/Linux/colo-proxy-setup @@ -0,0 +1,131 @@ +#! /bin/bash + +dir=$(dirname $0) +. $dir/xen-hotplug-common.sh +. $dir/hotplugpath.sh +. $dir/xen-network-ft.sh + +findCommand $@ + +if [ $command != setup -a $command != teardown ] +then +echo Invalid command: $command +log err Invalid command: $command +exit 1 +fi + +evalVariables $@ + +: ${vifname:?} +: ${forwarddev:?} +: ${mode:?} +: ${index:?} +: ${bridge:?} + +forwardbr=colobr0 + +if [ $mode != primary -a $mode != secondary ] +then +echo Invalid mode: $mode +log err Invalid mode: $mode +exit 1 +fi + +if [ $index -lt 0 ] || [ $index -gt 100 ]; then +echo index overflow +exit 1 +fi + +function setup_primary() +{ +do_without_error tc qdisc add dev $vifname root handle 1: prio +do_without_error tc filter add dev $vifname parent 1: protocol ip prio 10 \ +u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $forwarddev +do_without_error tc filter add dev $vifname parent 1: protocol arp prio 11 \ +u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $forwarddev +do_without_error tc filter add dev $vifname parent 1: protocol ipv6 prio \ +12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror \ +dev $forwarddev + +do_without_error modprobe nf_conntrack_ipv4 +do_without_error modprobe xt_PMYCOLO sec_dev=$forwarddev + +do_without_error iptables -t mangle -I PREROUTING -m physdev --physdev-in \ +$vifname -j PMYCOLO --index $index +do_without_error ip6tables -t mangle -I PREROUTING -m physdev --physdev-in \ +$vifname -j PMYCOLO --index $index +do_without_error arptables -I INPUT -i $forwarddev -j MARK --set-mark $index +} + +function teardown_primary() +{ +do_without_error tc filter del dev $vifname parent 1: protocol ip prio 10 u32 match u32 \ +0 0 flowid 1:2 action mirred egress mirror dev $forwarddev +do_without_error tc filter del dev $vifname parent 1: protocol arp prio 11 u32 match u32 \ +0 0 flowid 1:2 action mirred egress mirror dev $forwarddev +do_without_error tc filter del dev $vifname parent 1: protocol ipv6 prio 12 u32 match u32 \ +0 0 flowid 1:2 action mirred egress mirror dev $forwarddev +do_without_error tc qdisc del dev $vifname root handle 1: prio + +do_without_error iptables -t mangle -F +do_without_error ip6tables -t mangle -F +do_without_error arptables -F +do_without_error rmmod xt_PMYCOLO +} + +function setup_secondary() +{ +do_without_error brctl delif $bridge $vifname +do_without_error brctl addbr $forwardbr +do_without_error brctl addif $forwardbr $vifname +do_without_error brctl addif $forwardbr $forwarddev +do_without_error modprobe xt_SECCOLO + +do_without_error iptables -t mangle -I PREROUTING -m physdev --physdev-in \ +$vifname -j SECCOLO --index $index +do_without_error ip6tables -t mangle -I PREROUTING -m physdev --physdev-in \ +$vifname -j SECCOLO --index $index +} + +function teardown_secondary() +{ +do_without_error brctl delif $forwardbr $forwarddev +do_without_error brctl delif $forwardbr $vifname +do_without_error brctl delbr $forwardbr +do_without_error brctl addif $bridge $vifname + +do_without_error iptables -t mangle -F +do_without_error ip6tables -t mangle -F +do_without_error rmmod xt_SECCOLO +} + +case $command in +setup) +if [ $mode = primary ] +then +setup_primary +else +setup_secondary +fi + +success +;; +teardown) +if [ $mode = primary ] +then +teardown_primary +else +teardown_secondary +fi +;; +esac + +if [ $mode = primary ] +then +log debug Successful colo-proxy-setup
[Xen-devel] [PATCH v8 --for 4.6 COLO 00/25] COarse-grain LOck-stepping Virtual Machines for Non-stop Service
This patchset implemented the COLO feature for Xen. For detail/install/use of COLO feature, refer to: http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping In this series, we've rebased to the latest libxl migration v2. This patchset is based on: [PATCH v4 --for 4.6 COLOPre 00/25] Prerequisite patches for COLO Only support hvm guest for now. The code is also hosted on github: https://github.com/macrosheep/xen/tree/colo-v8 Changelog from v7 to v8: 1. Rebased to the latest libxl migration v2. Changelog from v6 to v7: 1. Ported to Libxl migration v2 2. Send dirty bitmap from secondary to primary on libxc side 3. Address review comments Changelog from v5 to v6: 1. based on migration v2(libxc) 2. split the patchset into prerequisite patchset and this main patchset. Changelog from v4 to v5: 1. rebase to the latest xen upstream 2. disk replication: blktap2-qdisk 3. nic replication: colo-agent-colo-proxy Changelog from v3 to v4: 1. rebase to newest xen 2. bug fix Changlog from v2 to v3: 1. rebase to newest remus 2. add nic replication support Changlog from v1 to v2: 1. rebase to newest remus 2. add disk replication support Wen Congyang (7): docs/libxl: Introduce COLO_CONTEXT to support migration v2 colo streams secondary vm suspend/resume/checkpoint code primary vm suspend/resume/checkpoint code send store mfn and console mfn to xl before resuming secondary vm implement the cmdline for COLO Support colo mode for qemu disk COLO: use qemu block replication Yang Hongyang (18): A docs: add colo readme libxc/migration: Specification update for DIRTY_BITMAP records libxc/migration: export read_record for common use tools/libxl: add back channel support to write stream tools/libxl: write colo_context records into the stream tools/libxl: add back channel support to read stream tools/libxl: handle colo_context records in a libxl migration v2 read stream tools/libx{l,c}: introduce should_checkpoint callback tools/libx{l,c}: add postcopy/suspend callback to restore side libxc/restore: support COLO restore libxc/restore: send dirty bitmap to primary when checkpoint under colo libxc/save: support COLO save COLO proxy: implement setup/teardown of COLO proxy module COLO proxy: preresume, postresume and checkpoint COLO nic: implement COLO nic subkind setup and control colo proxy on primary side setup and control colo proxy on secondary side cmdline switches and config vars to control colo-proxy docs/README.colo |9 + docs/man/xl.conf.pod.5 |6 + docs/man/xl.pod.1| 11 +- docs/misc/xl-disk-configuration.txt | 38 ++ docs/specs/libxc-migration-stream.pandoc | 24 +- docs/specs/libxl-migration-stream.pandoc | 22 +- tools/hotplug/Linux/Makefile |1 + tools/hotplug/Linux/colo-proxy-setup | 131 tools/libxc/include/xenguest.h | 36 ++ tools/libxc/xc_sr_common.c | 50 ++ tools/libxc/xc_sr_common.h | 36 +- tools/libxc/xc_sr_restore.c | 244 +-- tools/libxc/xc_sr_save.c | 104 ++- tools/libxc/xc_sr_stream_format.h|1 + tools/libxl/Makefile |4 + tools/libxl/libxl.c | 77 ++- tools/libxl/libxl_colo.h | 42 ++ tools/libxl/libxl_colo_nic.c | 320 ++ tools/libxl/libxl_colo_proxy.c | 267 tools/libxl/libxl_colo_qdisk.c | 209 ++ tools/libxl/libxl_colo_restore.c | 1024 ++ tools/libxl/libxl_colo_save.c| 709 + tools/libxl/libxl_create.c | 153 - tools/libxl/libxl_device.c | 38 ++ tools/libxl/libxl_dm.c | 257 +++- tools/libxl/libxl_dom_save.c | 14 +- tools/libxl/libxl_internal.h | 217 +-- tools/libxl/libxl_qmp.c | 31 + tools/libxl/libxl_save_callout.c |7 +- tools/libxl/libxl_save_msgs_gen.pl | 11 +- tools/libxl/libxl_sr_stream_format.h | 11 + tools/libxl/libxl_stream_read.c | 68 ++ tools/libxl/libxl_stream_write.c | 103 +++ tools/libxl/libxl_types.idl |8 + tools/libxl/libxlu_disk_l.l |5 + tools/libxl/xl.c |3 + tools/libxl/xl.h |1 + tools/libxl/xl_cmdimpl.c | 101 ++- tools/libxl/xl_cmdtable.c|4 +- tools/python/xen/migration/libxl.py |9 + 40 files changed, 4224 insertions(+), 182 deletions(-) create mode 100644 docs/README.colo create mode 100755 tools/hotplug/Linux/colo-proxy-setup create mode 100644 tools/libxl/libxl_colo.h create mode 100644 tools/libxl/libxl_colo_nic.c create mode 100644 tools/libxl/libxl_colo_proxy.c create mode 100644
[Xen-devel] [PATCH v8 --for 4.6 COLO 02/25] docs/libxl: Introduce COLO_CONTEXT to support migration v2 colo streams
From: Wen Congyang we...@cn.fujitsu.com It is the negotiation record for COLO. Primary-Secondary: control_id 0x: Secondary VM is out of sync, start a new checkpoint Secondary-Primary: 0x0001: Secondary VM is suspended 0x0002: Secondary VM is ready 0x0003: Secondary VM is resumed Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- docs/specs/libxl-migration-stream.pandoc | 22 +- tools/libxl/libxl_sr_stream_format.h | 11 +++ tools/python/xen/migration/libxl.py | 9 + 3 files changed, 41 insertions(+), 1 deletion(-) diff --git a/docs/specs/libxl-migration-stream.pandoc b/docs/specs/libxl-migration-stream.pandoc index c24a434..5986273 100644 --- a/docs/specs/libxl-migration-stream.pandoc +++ b/docs/specs/libxl-migration-stream.pandoc @@ -121,7 +121,9 @@ type 0x: END 0x0004: CHECKPOINT_END - 0x0005 - 0x7FFF: Reserved for future _mandatory_ + 0x0005: COLO_CONTEXT + + 0x0006 - 0x7FFF: Reserved for future _mandatory_ records. 0x8000 - 0x: Reserved for future _optional_ @@ -215,3 +217,21 @@ A checkpoint end record marks the end of a checkpoint in the image. +-+ The end record contains no fields; its body_length is 0. + +COLO\_CONTEXT +-- + +A COLO context record contains the control information for COLO. + + 0 1 2 3 4 5 6 7 octet ++++ +| control_id | padding| ++++ + + +FieldDescription + --- +control_id 0x: Secondary VM is out of sync, start a new checkpoint + 0x0001: Secondary VM is suspended + 0x0002: Secondary VM is ready + 0x0003: Secondary VM is resumed diff --git a/tools/libxl/libxl_sr_stream_format.h b/tools/libxl/libxl_sr_stream_format.h index 3f3c497..1dd2ac4 100644 --- a/tools/libxl/libxl_sr_stream_format.h +++ b/tools/libxl/libxl_sr_stream_format.h @@ -36,6 +36,7 @@ typedef struct libxl__sr_rec_hdr #define REC_TYPE_XENSTORE_DATA 0x0002U #define REC_TYPE_EMULATOR_CONTEXT0x0003U #define REC_TYPE_CHECKPOINT_END 0x0004U +#define REC_TYPE_COLO_CONTEXT0x0005U typedef struct libxl__sr_emulator_hdr { @@ -47,6 +48,16 @@ typedef struct libxl__sr_emulator_hdr #define EMULATOR_QEMU_TRADITIONAL0x0001U #define EMULATOR_QEMU_UPSTREAM 0x0002U +typedef struct libxl_sr_colo_context +{ +uint32_t id; +} libxl_sr_colo_context; + +#define COLO_NEW_CHECKPOINT 0xU +#define COLO_SVM_SUSPENDED 0x0001U +#define COLO_SVM_READY 0x0002U +#define COLO_SVM_RESUMED 0x0003U + #endif /* LIBXL__SR_STREAM_FORMAT_H */ /* diff --git a/tools/python/xen/migration/libxl.py b/tools/python/xen/migration/libxl.py index 415502e..57031c6 100644 --- a/tools/python/xen/migration/libxl.py +++ b/tools/python/xen/migration/libxl.py @@ -37,6 +37,7 @@ REC_TYPE_libxc_context= 0x0001 REC_TYPE_xenstore_data= 0x0002 REC_TYPE_emulator_context = 0x0003 REC_TYPE_checkpoint_end = 0x0004 +REC_TYPE_colo_context = 0x0005 rec_type_to_str = { REC_TYPE_end : End, @@ -44,6 +45,7 @@ rec_type_to_str = { REC_TYPE_xenstore_data: Xenstore data, REC_TYPE_emulator_context : Emulator context, REC_TYPE_checkpoint_end : Checkpoint end, +REC_TYPE_colo_context : COLO context } # emulator_context @@ -184,6 +186,11 @@ class VerifyLibxl(VerifyBase): if len(content) != 0: raise RecordError(Checkpoint end record with non-zero length) +def verify_record_colo_context(self, content): + COLO context +if len(content) == 0: +raise RecordError(COLO context record with zero length) + record_verifiers = { REC_TYPE_end: @@ -196,4 +203,6 @@ record_verifiers = { VerifyLibxl.verify_record_emulator_context, REC_TYPE_checkpoint_end: VerifyLibxl.verify_record_checkpoint_end, +REC_TYPE_colo_context: +VerifyLibxl.verify_record_colo_context, } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 --for 4.6 COLO 09/25] tools/libx{l, c}: introduce should_checkpoint callback
Under COLO, we are doing checkpoint on demand, if this callback returns 1, we will take another checkpoint. 0 indicates unexpected error. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxc/include/xenguest.h | 18 ++ tools/libxl/libxl_save_msgs_gen.pl | 7 --- 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index 4056955..fa06d9b 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -63,6 +63,15 @@ struct save_callbacks { * 1: take another checkpoint */ int (*checkpoint)(void* data); +/* + * Called after the checkpoint callback. + * + * returns: + * 0: terminate checkpointing gracefully + * 1: take another checkpoint + */ +int (*should_checkpoint)(void* data); + /* Enable qemu-dm logging dirty pages to xen */ int (*switch_qemu_logdirty)(int domid, unsigned enable, void *data); /* HVM only */ @@ -112,6 +121,15 @@ struct restore_callbacks { #define XGR_CHECKPOINT_FAILOVER 2 /* Failover and resume VM */ int (*checkpoint)(void* data); +/* + * Called after the checkpoint callback. + * + * returns: + * 0: terminate checkpointing gracefully + * 1: take another checkpoint + */ +int (*should_checkpoint)(void* data); + /* to be provided as the last argument to each callback function */ void* data; }; diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl index d6d2967..9107a86 100755 --- a/tools/libxl/libxl_save_msgs_gen.pl +++ b/tools/libxl/libxl_save_msgs_gen.pl @@ -26,11 +26,12 @@ our @msgs = ( [ 3, 'scxA', suspend, [] ], [ 4, 'scxA', postcopy, [] ], [ 5, 'srcxA', checkpoint, [] ], -[ 6, 'scxA', switch_qemu_logdirty, [qw(int domid +[ 6, 'srcxA', should_checkpoint, [] ], +[ 7, 'scxA', switch_qemu_logdirty, [qw(int domid unsigned enable)] ], -[ 7, 'r', restore_results, ['unsigned long', 'store_mfn', +[ 8, 'r', restore_results, ['unsigned long', 'store_mfn', 'unsigned long', 'console_mfn'] ], -[ 8, 'srW',complete, [qw(int retval +[ 9, 'srW',complete, [qw(int retval int errnoval)] ], ); -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 --for 4.6 COLO 13/25] libxc/restore: support COLO restore
call the callbacks resume/checkpoint/suspend while secondary vm status is consistent with primary. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com CC: Andrew Cooper andrew.coop...@citrix.com --- tools/libxc/xc_sr_common.h | 16 ++-- tools/libxc/xc_sr_restore.c | 60 + 2 files changed, 74 insertions(+), 2 deletions(-) diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index 632160e..c5603ff 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -167,6 +167,18 @@ struct xc_sr_context xc_dominfo_t dominfo; +/* + * migration stream + * 0: Plain VM + * 1: Remus + * 2: COLO + */ +enum { +MIG_STREAM_PLAIN, +MIG_STREAM_REMUS, +MIG_STREAM_COLO, +} migration_stream; + union /* Common save or restore data. */ { struct /* Save data. */ @@ -209,13 +221,13 @@ struct xc_sr_context uint32_t guest_page_size; /* Plain VM, or checkpoints over time. */ -bool checkpointed; +int checkpointed; /* Currently buffering records between a checkpoint */ bool buffer_all_records; /* - * With Remus, we buffer the records sent by the primary at checkpoint, + * With Remus/COLO, we buffer the records sent by the primary at checkpoint, * in case the primary will fail, we can recover from the last * checkpoint state. * This should be enough for most of the cases because primary only send diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index d53694b..696bf30 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -454,6 +454,49 @@ static int handle_checkpoint(struct xc_sr_context *ctx) else ctx-restore.buffer_all_records = true; +if ( ctx-restore.checkpointed == MIG_STREAM_COLO ) +{ +#define HANDLE_CALLBACK_RETURN_VALUE(ret) \ +do {\ +if ( ret == 1 ) \ +rc = 0; /* Success */ \ +else\ +{ \ +if ( ret == 2 ) \ +rc = BROKEN_CHANNEL;\ +else\ +rc = -1; /* Some unspecified error */ \ +goto err; \ +} \ +} while (0) + +/* COLO */ + +/* We need to resume guest */ +rc = ctx-restore.ops.stream_complete(ctx); +if ( rc ) +goto err; + +/* TODO: call restore_results */ + +/* Resume secondary vm */ +ret = ctx-restore.callbacks-postcopy(ctx-restore.callbacks-data); +HANDLE_CALLBACK_RETURN_VALUE(ret); + +/* Wait for a new checkpoint */ +ret = ctx-restore.callbacks-should_checkpoint( +ctx-restore.callbacks-data); +HANDLE_CALLBACK_RETURN_VALUE(ret); + +/* suspend secondary vm */ +ret = ctx-restore.callbacks-suspend(ctx-restore.callbacks-data); +HANDLE_CALLBACK_RETURN_VALUE(ret); + +#undef HANDLE_CALLBACK_RETURN_VALUE + +/* TODO: send dirty bitmap to primary */ +} + err: return rc; } @@ -625,6 +668,15 @@ static int restore(struct xc_sr_context *ctx) } while ( rec.type != REC_TYPE_END ); remus_failover: + +if ( ctx-restore.checkpointed == MIG_STREAM_COLO ) +{ +/* With COLO, we have already called stream_complete */ +rc = 0; +IPRINTF(COLO Failover); +goto done; +} + /* * With Remus, if we reach here, there must be some error on primary, * failover from the last checkpoint state. @@ -679,6 +731,14 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom, if ( checkpointed_stream ) assert(callbacks-checkpoint); +if ( ctx.restore.checkpointed == MIG_STREAM_COLO ) +{ +/* this is COLO restore */ +assert(callbacks-suspend + callbacks-postcopy + callbacks-should_checkpoint); +} + IPRINTF(In experimental %s, __func__); DPRINTF(fd %d, dom %u, hvm %u, pae %u, superpages %d , checkpointed_stream %d, io_fd, dom, hvm, pae, -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 --for 4.6 COLO 03/25] libxc/migration: Specification update for DIRTY_BITMAP records
Used by secondary to send it's dirty bitmap to primary under COLO. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- docs/specs/libxc-migration-stream.pandoc | 24 +++- tools/libxc/xc_sr_common.c | 1 + tools/libxc/xc_sr_stream_format.h| 1 + 3 files changed, 25 insertions(+), 1 deletion(-) diff --git a/docs/specs/libxc-migration-stream.pandoc b/docs/specs/libxc-migration-stream.pandoc index 68fa513..480d357 100644 --- a/docs/specs/libxc-migration-stream.pandoc +++ b/docs/specs/libxc-migration-stream.pandoc @@ -227,7 +227,9 @@ type 0x: END 0x000E: CHECKPOINT - 0x000F - 0x7FFF: Reserved for future _mandatory_ + 0x000F: DIRTY_BITMAP + + 0x0010 - 0x7FFF: Reserved for future _mandatory_ records. 0x8000 - 0x: Reserved for future _optional_ @@ -601,6 +603,26 @@ CHECKPOINT record or an END record. \clearpage +DIRTY_BITMAP + + +A dirty_bitmap record is used for secondary to send it's dirty bitmap +to primary while doing a checkpoint under COLO. This record only exists +in back channel. + + 0 1 2 3 4 5 6 7 octet ++-+ +| pfn[0] | ++-+ +... ++-+ +| pfn[C-1]| ++-+ + +The count of the pfn is: record-length/sizeof(uint64_t). + +\clearpage + Layout == diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c index 945cfa6..becc0f4 100644 --- a/tools/libxc/xc_sr_common.c +++ b/tools/libxc/xc_sr_common.c @@ -35,6 +35,7 @@ static const char *mandatory_rec_types[] = [REC_TYPE_X86_PV_VCPU_MSRS] = x86 PV vcpu msrs, [REC_TYPE_VERIFY] = Verify, [REC_TYPE_CHECKPOINT] = Checkpoint, +[REC_TYPE_DIRTY_BITMAP] = Dirty bitmap, }; const char *rec_type_to_str(uint32_t type) diff --git a/tools/libxc/xc_sr_stream_format.h b/tools/libxc/xc_sr_stream_format.h index 6d0f8fd..43a0209 100644 --- a/tools/libxc/xc_sr_stream_format.h +++ b/tools/libxc/xc_sr_stream_format.h @@ -75,6 +75,7 @@ struct xc_sr_rhdr #define REC_TYPE_X86_PV_VCPU_MSRS 0x000cU #define REC_TYPE_VERIFY 0x000dU #define REC_TYPE_CHECKPOINT 0x000eU +#define REC_TYPE_DIRTY_BITMAP 0x000fU #define REC_TYPE_OPTIONAL 0x8000U -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/traps: Dump instruction stream in show_execution_state()
On 15.07.15 at 11:26, andrew.coop...@citrix.com wrote: On 15/07/15 09:53, Jan Beulich wrote: Also I think you should avoid the subtraction from regs-rip to wrap through zero, or even bail when RIP doesn't point into Xen space. If the instruction stream under eip is accessible, it should be printed, even if it doesn't point into Xen space. Bear in mind that anything could have gone wrong by the point we get here; we may have accidentally jumped into userspace or jumped into some data. In which case that fact (seen by RIP itself being off) is enough to know what happened. What exact instruction caused the fault is then of no interest anymore. The wrapping through zero will be caught by the error handling in __copy_from_user(), but I admit that it is not very obvious. The information will be available based on the numeric value of eip. No, by passing the wrapped pointer to __coppy_from_user() you will get the non-interesting bytes (if any) printed, but not the one RIP actually points to. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V6 1/3] xen/mem_access: Support for memory-content hiding
On 07/15/2015 09:45 AM, Razvan Cojocaru wrote: This patch adds support for memory-content hiding, by modifying the value returned by emulated instructions that read certain memory addresses that contain sensitive data. The patch only applies to cases where VM_FLAG_ACCESS_EMULATE has been set to a vm_event response. Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com Acked-by: Tamas K Lengyel tleng...@novetta.com BTW I've looked at an earlier version of this and acked it, and I haven't seen any changes I want to review; so when the rest of it is acked/reviewed I'll take another look through and send my ack. -George --- Changes since V5: - Renamed set_context_data()'s bytes parameter to size. - Inverted if() condition in set_context_data(). - Removed memcpy() conditional from set_context_data(). - Removed label from hvmemul_rep_outs_set_context(). - Now bypassing hvm_copy_from_guest_phys() in hvmemul_rep_movs() if hvmemul_ctxt-set_context is true. - Fixed for_each_vcpu() coding style (blank before the opening parenthesis). - Added comments about the serialization status of vm_event_init_domain() and vm_event_cleanup_domain(). - Setting v-arch.vm_event.emul_read_data to NULL after xfree() in vcpu_destroy() for safety. --- tools/tests/xen-access/xen-access.c |2 +- xen/arch/x86/domain.c |3 + xen/arch/x86/hvm/emulate.c | 117 --- xen/arch/x86/hvm/event.c| 50 +++ xen/arch/x86/mm/p2m.c | 92 +++ xen/arch/x86/vm_event.c | 35 +++ xen/common/vm_event.c |8 +++ xen/include/asm-arm/vm_event.h | 13 xen/include/asm-x86/domain.h|1 + xen/include/asm-x86/hvm/emulate.h | 10 ++- xen/include/asm-x86/vm_event.h |4 ++ xen/include/public/vm_event.h | 35 --- 12 files changed, 287 insertions(+), 83 deletions(-) diff --git a/tools/tests/xen-access/xen-access.c b/tools/tests/xen-access/xen-access.c index 12ab921..e6ca9ba 100644 --- a/tools/tests/xen-access/xen-access.c +++ b/tools/tests/xen-access/xen-access.c @@ -530,7 +530,7 @@ int main(int argc, char *argv[]) break; case VM_EVENT_REASON_SOFTWARE_BREAKPOINT: printf(Breakpoint: rip=%016PRIx64, gfn=%PRIx64 (vcpu %d)\n, - req.regs.x86.rip, + req.data.regs.x86.rip, req.u.software_breakpoint.gfn, req.vcpu_id); diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 34ecd7c..1ef9fad 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -511,6 +511,9 @@ int vcpu_initialise(struct vcpu *v) void vcpu_destroy(struct vcpu *v) { +xfree(v-arch.vm_event.emul_read_data); +v-arch.vm_event.emul_read_data = NULL; + if ( is_pv_32bit_vcpu(v) ) { free_compat_arg_xlat(v); diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 795321c..2766919 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -67,6 +67,24 @@ static int null_write(const struct hvm_io_handler *handler, return X86EMUL_OKAY; } +static int set_context_data(void *buffer, unsigned int size) +{ +struct vcpu *curr = current; + +if ( curr-arch.vm_event.emul_read_data ) +{ +unsigned int safe_size = +min(size, curr-arch.vm_event.emul_read_data-size); + +memcpy(buffer, curr-arch.vm_event.emul_read_data-data, safe_size); +memset(buffer + safe_size, 0, size - safe_size); +} +else +return X86EMUL_UNHANDLEABLE; + +return X86EMUL_OKAY; +} + static const struct hvm_io_ops null_ops = { .read = null_read, .write = null_write @@ -771,6 +789,12 @@ static int hvmemul_read( unsigned int bytes, struct x86_emulate_ctxt *ctxt) { +struct hvm_emulate_ctxt *hvmemul_ctxt = +container_of(ctxt, struct hvm_emulate_ctxt, ctxt); + +if ( unlikely(hvmemul_ctxt-set_context) ) +return set_context_data(p_data, bytes); + return __hvmemul_read( seg, offset, p_data, bytes, hvm_access_read, container_of(ctxt, struct hvm_emulate_ctxt, ctxt)); @@ -963,6 +987,17 @@ static int hvmemul_cmpxchg( unsigned int bytes, struct x86_emulate_ctxt *ctxt) { +struct hvm_emulate_ctxt *hvmemul_ctxt = +container_of(ctxt, struct hvm_emulate_ctxt, ctxt); + +if ( unlikely(hvmemul_ctxt-set_context) ) +{ +int rc = set_context_data(p_new, bytes); + +if ( rc != X86EMUL_OKAY ) +return rc; +} + /* Fix this in case the guest is really relying on r-m-w atomicity. */ return hvmemul_write(seg, offset, p_new, bytes, ctxt); } @@ -1005,6 +1040,38 @@ static int hvmemul_rep_ins(
Re: [Xen-devel] [PATCH v4 05/17] xen/arm: ITS: implement hw_irq_controller for LPIs
Hi Ian, On 15/07/2015 11:32, Ian Campbell wrote: Why can't we store the event ID in the irq_guest? As said on v3, this is not Are you referring to irq_desc in above statement? Yes sorry. I'm afraid I don't follow your suggestion here, are you suggesting that the vid field added above should be moved to irq_desc? Yes, But the vid _is_ domain specific, it is the virtual event ID which is per-domain (it's the thing looked up in the ITT to get a vLPI to be injected). I think it is a pretty direct analogue of the virq field used for non-LPI irq_guest structs. No, vid is not specific to a domain but a device. The virtual event ID is always the same as the physical event ID (See your design document [1]). Furthermore, all the usage of the irq_to_vid in this series are for physical command (see lpi_set_config within this patch). Your proposal on v3 looks to be around moving the its_device pointer to the irq_desc, which appears to have been done here, along with turning the virq+vid into a union as requested there too. On v3 I said: The event ID and the its_device assigned are known when the device is added to Xen and hence can be set in irq_desc (with a small memory impact, but we have plenty of memory on ARM64). Sorry if it was confusing. It has been suggested by Ian to move col_id in the its_device in the previous version [4]. Any reason to not doing it? In round robin fashion each plpi is attached to col_id. So storing in its_device is not possible. In linux latest col_id is stored in its_device structure for which set_affinity is called. Are you saying that in Linux all Events/LPIs associated with a given ITS device are routed to the same collection? You could do round robin on its_device... It would be exactly the same Routing all LPIs associated with a given its_device to the same collection is not exactly the same as round robin-ing all LPIs from the device over the collections. Yes, sorry I was a bit lax on the writing. I wanted to meant that there is not much difference to do it. and save 2 byte if not more with the alignment per irq_desc. If this is a concern then I would say we would either want a separate array of per-pLPI information which we do not want in irq_desc because it is irq specific, or do add a pointer to its_desc which points to an array of per-event information. That would be a good solution. Although, as I said, I don't really care for Xen 4.6. It's more an optimization for 4.7. Regards, [1] http://xenbits.xen.org/people/ianc/vits/draftG.html#event-id-event -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH RFC] A script to use with OpenStack instead of vif-bridge
Hi, I have submitted a script to be used by OpenStack instead of our vif-bridge script: https://review.openstack.org/201257/ This is because vif-bridge is calling iptables and OpenStack (nova-network) is also updating the iptables (via iptables-{save,restore}). Could you review this patch that I have append bellow? Also, would it be better to have a similair script in Xen repo instead of Nova? The script is based on another already present in nova: http://git.openstack.org/cgit/openstack/nova/tree/contrib/xen/vif-openstack Thanks. The patch: From cb7daaab757f5f744dc9c3698e67b451db3392fe Mon Sep 17 00:00:00 2001 From: Anthony PERARD anthony.per...@citrix.com Date: Mon, 13 Jul 2015 16:39:25 +0100 Subject: [PATCH] contrib: Add vif-bridge-nova-network script for Xen. This script adds a vif created for a Xen guest to the bridge. This script is to be called by the Xen toolstack instead of the default one as the default will make call to iptables in a way that is not compatible with nova uses of iptables. To make use of the script, it is to be placed in XEN_SCRIPT_DIR (likely to be /etc/xen/scripts) and adds the following in nova.conf: [libvirt] xen_vif_bridge_script_path = vif-bridge-nova-network Change-Id: Ief24f0eff85f9b5a5f8cf26c3e08c4d8aeabc789 Partial-Bug: #1461642 Co-Authored-By: Christian Berendt bere...@b1-systems.de Signed-off-by: Anthony PERARD anthony.per...@citrix.com --- contrib/xen/vif-bridge-nova-network | 47 + 1 file changed, 47 insertions(+) create mode 100755 contrib/xen/vif-bridge-nova-network diff --git a/contrib/xen/vif-bridge-nova-network b/contrib/xen/vif-bridge-nova-network new file mode 100755 index 000..c6a3a6b --- /dev/null +++ b/contrib/xen/vif-bridge-nova-network @@ -0,0 +1,47 @@ +#!/bin/bash +# copyright: B1 Systems GmbH i...@b1-systems.de, 2012. +# author: Christian Berendt bere...@b1-systems.de, 2012. +# Copyright (C) 2015, Citrix Ltd. +# +#Licensed under the Apache License, Version 2.0 (the License); you may +#not use this file except in compliance with the License. You may obtain +#a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an AS IS BASIS, WITHOUT +#WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +#License for the specific language governing permissions and limitations +#under the License. +# +# Use this script instead of the default one to avoid iptables call from +# the script which may conflict with Nova use of iptables. +# +# usage: +# place the script in $XEN_SCRIPT_DIR (likely to be /etc/xen/scripts) +# and set the following in /etc/nova/nova.conf: +# [libvirt] +# xen_vif_bridge_script_path = vif-bridge-nova-network + +dir=$(dirname $0) +. $dir/vif-common.sh + +bridge=$(xenstore_read_default $XENBUS_PATH/bridge $bridge) + +case $command in +add|online) +setup_virtual_bridge_port $dev +add_to_bridge $bridge $dev +;; + +remove|offline) + do_without_error brctl delif $bridge $dev + do_without_error ip link set $dev down + ;; +esac + +if [ $type_if = vif -a $command = online ] +then + success +fi -- Anthony PERARD ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 3/3] sched/preempt: fix cond_resched_lock() and cond_resched_softirq()
These functions check should_resched() before unlocking spinlock/bh-enable: preempt_count always non-zero = should_resched() always returns false. cond_resched_lock() worked iff spin_needbreak is set. This patch adds argument preempt_offset to should_resched(). preempt_count offset constants for that: PREEMPT_DISABLE_OFFSET - offset after preempt_disable() PREEMPT_LOCK_OFFSET - offset after spin_lock() SOFTIRQ_DISABLE_OFFSET - offset after local_bh_distable() SOFTIRQ_LOCK_OFFSET - offset after spin_lock_bh() Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru --- arch/x86/include/asm/preempt.h |4 ++-- include/asm-generic/preempt.h |5 +++-- include/linux/preempt.h| 19 ++- include/linux/sched.h |6 -- kernel/sched/core.c|6 +++--- 5 files changed, 22 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h index dca71714f860..b12f81022a6b 100644 --- a/arch/x86/include/asm/preempt.h +++ b/arch/x86/include/asm/preempt.h @@ -90,9 +90,9 @@ static __always_inline bool __preempt_count_dec_and_test(void) /* * Returns true when we need to resched and can (barring IRQ state). */ -static __always_inline bool should_resched(void) +static __always_inline bool should_resched(int preempt_offset) { - return unlikely(!raw_cpu_read_4(__preempt_count)); + return unlikely(raw_cpu_read_4(__preempt_count) == preempt_offset); } #ifdef CONFIG_PREEMPT diff --git a/include/asm-generic/preempt.h b/include/asm-generic/preempt.h index d0a7a4753db2..0bec580a4885 100644 --- a/include/asm-generic/preempt.h +++ b/include/asm-generic/preempt.h @@ -71,9 +71,10 @@ static __always_inline bool __preempt_count_dec_and_test(void) /* * Returns true when we need to resched and can (barring IRQ state). */ -static __always_inline bool should_resched(void) +static __always_inline bool should_resched(int preempt_offset) { - return unlikely(!preempt_count() tif_need_resched()); + return unlikely(preempt_count() == preempt_offset + tif_need_resched()); } #ifdef CONFIG_PREEMPT diff --git a/include/linux/preempt.h b/include/linux/preempt.h index 84991f185173..bea8dd8ff5e0 100644 --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -84,13 +84,21 @@ */ #define in_nmi() (preempt_count() NMI_MASK) +/* + * The preempt_count offset after preempt_disable(); + */ #if defined(CONFIG_PREEMPT_COUNT) -# define PREEMPT_DISABLE_OFFSET 1 +# define PREEMPT_DISABLE_OFFSETPREEMPT_OFFSET #else -# define PREEMPT_DISABLE_OFFSET 0 +# define PREEMPT_DISABLE_OFFSET0 #endif /* + * The preempt_count offset after spin_lock() + */ +#define PREEMPT_LOCK_OFFSETPREEMPT_DISABLE_OFFSET + +/* * The preempt_count offset needed for things like: * * spin_lock_bh() @@ -103,7 +111,7 @@ * * Work as expected. */ -#define SOFTIRQ_LOCK_OFFSET (SOFTIRQ_DISABLE_OFFSET + PREEMPT_DISABLE_OFFSET) +#define SOFTIRQ_LOCK_OFFSET (SOFTIRQ_DISABLE_OFFSET + PREEMPT_LOCK_OFFSET) /* * Are we running in atomic context? WARNING: this macro cannot @@ -124,7 +132,8 @@ #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER) extern void preempt_count_add(int val); extern void preempt_count_sub(int val); -#define preempt_count_dec_and_test() ({ preempt_count_sub(1); should_resched(); }) +#define preempt_count_dec_and_test() \ + ({ preempt_count_sub(1); should_resched(0); }) #else #define preempt_count_add(val) __preempt_count_add(val) #define preempt_count_sub(val) __preempt_count_sub(val) @@ -184,7 +193,7 @@ do { \ #define preempt_check_resched() \ do { \ - if (should_resched()) \ + if (should_resched(0)) \ __preempt_schedule(); \ } while (0) diff --git a/include/linux/sched.h b/include/linux/sched.h index ae21f1591615..a8e9b17acdee 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2885,12 +2885,6 @@ extern int _cond_resched(void); extern int __cond_resched_lock(spinlock_t *lock); -#ifdef CONFIG_PREEMPT_COUNT -#define PREEMPT_LOCK_OFFSETPREEMPT_OFFSET -#else -#define PREEMPT_LOCK_OFFSET0 -#endif - #define cond_resched_lock(lock) ({ \ ___might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);\ __cond_resched_lock(lock); \ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 78b4bad10081..d9a4d93dc879 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4492,7 +4492,7 @@ SYSCALL_DEFINE0(sched_yield) int __sched _cond_resched(void) { - if (should_resched()) { + if (should_resched(0)) { preempt_schedule_common(); return 1; } @@ -4510,7 +4510,7 @@ EXPORT_SYMBOL(_cond_resched); */ int __cond_resched_lock(spinlock_t *lock) { - int resched = should_resched(); + int resched =
[Xen-devel] [PATCH v2 1/3] drivers/xen/preempt: use need_resched() instead of should_resched()
This code is used only when CONFIG_PREEMPT=n and only in non-atomic context: xen_in_preemptible_hcall is set only in privcmd_ioctl_hypercall(). Thus preempt_count is zero and should_resched() is equal to need_resched(). Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru --- drivers/xen/preempt.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/xen/preempt.c b/drivers/xen/preempt.c index a1800c150839..08cb419eb4e6 100644 --- a/drivers/xen/preempt.c +++ b/drivers/xen/preempt.c @@ -31,7 +31,7 @@ EXPORT_SYMBOL_GPL(xen_in_preemptible_hcall); asmlinkage __visible void xen_maybe_preempt_hcall(void) { if (unlikely(__this_cpu_read(xen_in_preemptible_hcall) - should_resched())) { + need_resched())) { /* * Clear flag as we may be rescheduled on a different * cpu. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 2/3] KVM: PPC: Book3S HV: Use need_resched() instead of should_resched()
Function should_resched() is equal to (!preempt_count() need_resched()). In preemptive kernel preempt_count here is non-zero because of vc-lock. Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru --- arch/powerpc/kvm/book3s_hv.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 68d067ad4222..a9f753fb73a8 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2178,7 +2178,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) vc-runner = vcpu; if (n_ceded == vc-n_runnable) { kvmppc_vcore_blocked(vc); - } else if (should_resched()) { + } else if (need_resched()) { vc-vcore_state = VCORE_PREEMPT; /* Let something else run */ cond_resched_lock(vc-lock); ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 05/17] xen/arm: ITS: implement hw_irq_controller for LPIs
On Wed, 2015-07-15 at 11:49 +0200, Julien Grall wrote: Hi Ian, On 15/07/2015 11:32, Ian Campbell wrote: Why can't we store the event ID in the irq_guest? As said on v3, this is not Are you referring to irq_desc in above statement? Yes sorry. I'm afraid I don't follow your suggestion here, are you suggesting that the vid field added above should be moved to irq_desc? Yes, But the vid _is_ domain specific, it is the virtual event ID which is per-domain (it's the thing looked up in the ITT to get a vLPI to be injected). I think it is a pretty direct analogue of the virq field used for non-LPI irq_guest structs. No, vid is not specific to a domain but a device. The virtual event ID is always the same as the physical event ID (See your design document [1]). Furthermore, all the usage of the irq_to_vid in this series are for physical command (see lpi_set_config within this patch). Your proposal on v3 looks to be around moving the its_device pointer to the irq_desc, which appears to have been done here, along with turning the virq+vid into a union as requested there too. On v3 I said: The event ID and the its_device assigned are known when the device is added to Xen and hence can be set in irq_desc (with a small memory impact, but we have plenty of memory on ARM64). Sorry if it was confusing. It was me who was confusing the properties of vid with those of vlpi, sorry. Not helped by http://xenbits.xen.org/people/ianc/vits/draftG.html#virtual-lpi-injection confusingly using virq instead of vid. Ian. It has been suggested by Ian to move col_id in the its_device in the previous version [4]. Any reason to not doing it? In round robin fashion each plpi is attached to col_id. So storing in its_device is not possible. In linux latest col_id is stored in its_device structure for which set_affinity is called. Are you saying that in Linux all Events/LPIs associated with a given ITS device are routed to the same collection? You could do round robin on its_device... It would be exactly the same Routing all LPIs associated with a given its_device to the same collection is not exactly the same as round robin-ing all LPIs from the device over the collections. Yes, sorry I was a bit lax on the writing. I wanted to meant that there is not much difference to do it. and save 2 byte if not more with the alignment per irq_desc. If this is a concern then I would say we would either want a separate array of per-pLPI information which we do not want in irq_desc because it is irq specific, or do add a pointer to its_desc which points to an array of per-event information. That would be a good solution. Although, as I said, I don't really care for Xen 4.6. It's more an optimization for 4.7. Regards, [1] http://xenbits.xen.org/people/ianc/vits/draftG.html#event-id-event ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/traps: Misc tweaks to several printk()s
On 15.07.15 at 11:48, andrew.coop...@citrix.com wrote: On 15/07/15 10:03, Jan Beulich wrote: On 14.07.15 at 19:54, andrew.coop...@citrix.com wrote: @@ -626,8 +626,9 @@ static void do_trap(struct cpu_user_regs *regs, int use_error_code) if ( likely((fixup = search_exception_table(regs-eip)) != 0) ) { -dprintk(XENLOG_ERR, Trap %d: %p - %p\n, -trapnr, _p(regs-eip), _p(fixup)); +printk(XENLOG_INFO Exception [#%d, ec=%04x] (%s): %ps %p - %p\n, + trapnr, use_error_code ? regs-error_code : 0, trapstr(trapnr), + _p(regs-eip), _p(regs-eip), _p(fixup)); But why the transition dprintk() - printk()? The file/line reference here is not useful, but now that you point it out I had forgotten to consider that dprintk() now only exists in debug builds. It would be nice to have a variant on printk() which is restricted to debug builds, but doesn't have a file/line reference. But otoh the file/line pair shouldn't cause a lot of confusion - debug build users can certainly be expected to cope with that. Which isn't to say that I'd even consider making dprintk() by default not print file/line, and instead have a dprintk_at() or DPRINTK() doing so for those who really can't write distinguishable messages. @@ -2813,10 +2814,11 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) case MSR_EFER: rdmsr_normal: /* Everyone can read the MSR space. */ -/* gdprintk(XENLOG_WARNING,Domain attempted RDMSR %p.\n, -_p(regs-ecx));*/ if ( rdmsr_safe(regs-ecx, val) ) +{ +gprintk(XENLOG_WARNING, attempted RDMSR 0x%08x\n, regs-_ecx); goto fail; +} Do you really see this to be useful in production builds? There is currently an asymmetry between the WRMSR and RDMSR paths, which shouldn't exist IMO. I'm of the opposite opinion: Knowing that (just like we do) guest kernels may access MSRs being prepared to get a #GP, and this (naturally) being more likely on RDMSR (why would one try to write an MSR one can't read?), the asymmetry has a reason. Guest warning is rate limited by default, and anecdotally, this path doesn't trigger by default on any of my test boxes with a 3.10 pvops kernel. Which is nice to know, but not nearly enough to assume we won't get flooded (ignoring the rate limiting) by these for other guest kernels. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, July 15, 2015 4:25 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: RE: [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used On 15.07.15 at 08:04, feng...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Friday, July 10, 2015 10:02 PM I'm particularly worried by the call to acpi_find_matched_drhd_unit() - is it maybe worth storing the iommu pointer in struct msi_desc? I think it worth, Like Andrew also mentioned this point before. I tend to make this a independent work and do it later, since the 4.6 release is coming, I am still try my best to target it. Could you please share your concern here, performance? Or other things? Thanks! Interrupt latency in particular. This update IRTE operation is not so frequently. It only happens in few times, especially in the initialization phase of the guest. And even the guest set the affinity, in the MSI/MSIx configuration doesn't change, QEMU will not ask Xen to update it. +GET_IREMAP_ENTRY(ir_ctrl-iremap_maddr, remap_index, iremap_entries, p); +new_ire = *p; + +/* Setup/Update interrupt remapping table entry. */ +setup_posted_irte(new_ire, pi_desc, gvec); + +do { +old_ire = *(uint128_t *)p; This cast suggests that you might want to go beyond what Andrew said on cmpxchg16b()'s parameters: Perhaps they'd better be void * instead of uint128_t *. In that case, I need to do the cast in __cmpxchg16b(), right? Where needed, yes. But that would limit casting to just a single place. +ret = cmpxchg16b(p, old_ire, new_ire); +} while ( memcmp(ret, old_ire, sizeof(old_ire)) ); Doesn't setup_posted_irte() need to move inside this loop, as it tries to preserve certain fields? Or else, what is the cmpxchg16b loop guarding against (i.e. why isn't this just a single one)? Why need we move setup_posted_irte() inside the loop? new_ire will not be changed after setup, right? Here we need to make sure the 128b IRTE is updated atomically, especially for the high part of posted-interrupt descriptor address and the low part of it. There are two possible scenarios: 1) There are bits that can be updated behind the back of the code here. In that case you need to loop, and each iteration of the loop needs to re-fetch the current value (not doing so would make the loop infinite). Oh, yes, I think I made a mistake here, it is too hastily these days, Sorry for that! I think I need do it like this: do { new_ire = *p; /* Setup/Update interrupt remapping table entry. */ setup_posted_irte(new_ire, pi_desc, gvec); old_ire = *(uint128_t *)p; ret = cmpxchg16b(p, old_ire, new_ire); } while ( memcmp(ret, old_ire, sizeof(old_ire)) ); Thanks, Feng 2) No racing updates are possible; all you care about is atomicity of the update. In that case you don't need a loop around the cmpxchg16b(). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 59544: regressions - FAIL
On Wed, 2015-07-15 at 08:48 +, osstest service owner wrote: flight 59544 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/59544/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-armhf-armhf-xl 6 xen-boot fail REGR. vs. 58965 To me, it looks like the box did actually reboot in Xen. However: Jul 14 21:04:52.969102 Starting NTP server: ntpd[ 76.734015] asix 2-3.2.4:1.0 eth0: link down Jul 14 21:05:46.565053 [ 85.437886] asix 2-3.2.4:1.0 eth0: link down Jul 14 21:05:55.269159 . Which is something I certainly I've seen already (I'm not sure it was on arndale, but I think yes), and AFAICR, we can't do much about. Regards, Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, July 15, 2015 4:46 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: RE: [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used On 15.07.15 at 10:38, feng...@intel.com wrote: -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, July 15, 2015 4:25 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: RE: [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used On 15.07.15 at 08:04, feng...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Friday, July 10, 2015 10:02 PM I'm particularly worried by the call to acpi_find_matched_drhd_unit() - is it maybe worth storing the iommu pointer in struct msi_desc? I think it worth, Like Andrew also mentioned this point before. I tend to make this a independent work and do it later, since the 4.6 release is coming, I am still try my best to target it. Could you please share your concern here, performance? Or other things? Thanks! Interrupt latency in particular. This update IRTE operation is not so frequently. It only happens in few times, especially in the initialization phase of the guest. And even the guest set the affinity, in the MSI/MSIx configuration doesn't change, QEMU will not ask Xen to update it. When the guest sets the affinity, the MSI{,-X} configuration is rather likely to change (at least for Linux guests). Yes, it is. But I'd say, it is not a frequent operation. In my test, it only happens in the initialization phase and some updates doesn't go the Xen since the configuration is the same (QEMU filters it). And I agree I will change this, my question is that can we put this a little late, and I can focus on some other critical issue before 4.6 is release, which may make more chance for this patch to catch up with 4.6. Is this okay for you? Thanks, Feng There are two possible scenarios: 1) There are bits that can be updated behind the back of the code here. In that case you need to loop, and each iteration of the loop needs to re-fetch the current value (not doing so would make the loop infinite). Oh, yes, I think I made a mistake here, it is too hastily these days, Sorry for that! I think I need do it like this: do { new_ire = *p; /* Setup/Update interrupt remapping table entry. */ setup_posted_irte(new_ire, pi_desc, gvec); old_ire = *(uint128_t *)p; ret = cmpxchg16b(p, old_ire, new_ire); } while ( memcmp(ret, old_ire, sizeof(old_ire)) ); So since you put this in a loop again, would you mind pointing out which bits can get modified behind our back? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel