[Xen-devel] [qemu-mainline test] 59832: regressions - trouble: broken/fail/pass
flight 59832 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/59832/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-freebsd10-i386 12 guest-saverestore fail REGR. vs. 59059 test-amd64-amd64-xl 20 guest-start/debian.repeat fail REGR. vs. 59059 test-amd64-i386-xl-qemuu-ovmf-amd64 11 guest-saverestore fail REGR. vs. 59059 test-amd64-i386-xl-qemuu-debianhvm-amd64 9 debian-hvm-install fail REGR. vs. 59059 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 11 guest-saverestore fail REGR. vs. 59059 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 11 guest-saverestore fail REGR. vs. 59059 test-amd64-amd64-xl-qemuu-debianhvm-amd64 11 guest-saverestore fail REGR. vs. 59059 test-amd64-amd64-xl-qemuu-ovmf-amd64 11 guest-saverestore fail REGR. vs. 59059 test-amd64-amd64-xl-qemuu-winxpsp3 11 guest-saverestore fail REGR. vs. 59059 test-amd64-i386-freebsd10-amd64 12 guest-saverestore fail REGR. vs. 59059 test-amd64-i386-xl-qemuu-winxpsp3 11 guest-saverestorefail REGR. vs. 59059 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 11 guest-saverestore fail REGR. vs. 59059 test-amd64-amd64-xl-qemuu-win7-amd64 11 guest-saverestore fail REGR. vs. 59059 test-amd64-i386-xl-qemuu-win7-amd64 11 guest-saverestore fail REGR. vs. 59059 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds 8 leak-check/basis(8) fail REGR. vs. 59059 Tests which did not succeed, but are not blocking: test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass version targeted for testing: qemuub69b30532e0a80e25449244c01b0cbed000c99a3 baseline version: qemuu35360642d043c2a5366e8a04a10e5545e7353bd5 Last test of basis59059 2015-07-05 10:39:20 Z 18 days Failing since 59109 2015-07-06 14:58:21 Z 17 days 24 attempts Testing same since59832 2015-07-23 10:53:08 Z0 days1 attempts People who touched revisions under test: Alberto Garcia be...@igalia.com Alex Williamson alex.william...@redhat.com Alexander Graf ag...@suse.de Alexey Kardashevskiy a...@ozlabs.ru Alvise Rigo a.r...@virtualopensystems.com Amit Shah amit.s...@redhat.com Andreas Färber afaer...@suse.de Andrew Bennett andrew.benn...@imgtec.com Andrew Jones drjo...@redhat.com Artyom Tarasenko atar4q...@gmail.com Aurelien Jarno aurel...@aurel32.net Benjamin Herrenschmidt b...@kernel.crashing.org Bharata B Rao bhar...@linux.vnet.com Bharata B Rao bhar...@linux.vnet.ibm.com Brian Kress kre...@moose.net Chen Hanxiao chenhanx...@cn.fujitsu.com Christian Borntraeger borntrae...@de.ibm.com Christoffer Dall christoffer.d...@linaro.org Christoph Hellwig h...@lst.de Christophe Fergeau cferg...@redhat.com Claudio Fontana claudio.font...@huawei.com Cormac O'Brien i.am.cormac.obr...@gmail.com Cornelia Huck cornelia.h...@de.ibm.com Dana Rubin dana.ru...@ravellosystems.com Daniel P. Berrange berra...@redhat.com David Gibson da...@gibson.dropbear.id.au Denis V. Lunev d...@openvz.org Dmitry Osipenko dig...@gmail.com Dr. David Alan Gilbert dgilb...@redhat.com Eduardo Habkost ehabk...@redhat.com Eric Auger eric.au...@linaro.org Fam Zheng f...@redhat.com Frediano Ziglio fzig...@redhat.com Gabriel Laupre glau...@chelsio.com Gavin Shan gws...@linux.vnet.ibm.com Gerd Hoffmann kra...@redhat.com Gonglei arei.gong...@huawei.com Greg Kurz gk...@linux.vnet.ibm.com Hannes Reinecke h...@suse.de Hervé Poussineau hpous...@reactos.org Igor Mammedov imamm...@redhat.com James Hogan james.ho...@imgtec.com Jan Kiszka jan.kis...@siemens.com Jason Wang jasow...@redhat.com Jeff Cody jc...@redhat.com Johannes Schlatow schla...@ida.ing.tu-bs.de John Snow js...@redhat.com Josh Durgin jdur...@redhat.com Juan
Re: [Xen-devel] [PATCH v3 2/3] x86/ldt: Make modify_ldt optional
On Thu, Jul 23, 2015 at 04:40:14PM -0700, Andy Lutomirski wrote: On Thu, Jul 23, 2015 at 4:36 PM, Kees Cook keesc...@chromium.org wrote: I've been pondering something like this that is even MORE generic, for any syscall. Something like a syscalls directory under /proc/sys/kernel, with 1 entry per syscall. 0 is available, 1 is disabled, and -1 disabled until next boot. It might want to be /proc/sys/kernel/syscalls/[abi]/[name], possibly with more than just those options. We might want disabled, returns ENOSYS, disabled, returns EPERM, and a lock bit. On x86 at least, the implementation's easy -- we can just poke the syscall table. I wouldn't do it these days. Around 2000-2001, with a friend we designed a module with its userland counterpart which was called overloader. The principle was to intercept syscalls in order to enforce some form of policies, log values, or remap paths, etc. The first use was to log all file creations during a make install to more easily build packages. It was at the era where it was easy to modify the syscall table from a module, in kernel 2.2. We quickly found that beyond logging/rewriting syscall arguments, it had limited use cases when used as a syscall firewall because many syscalls are still too coarse to decide whether you want to enable/disable them. I remember that socketcall() and ioctl() were among the annoying ones. Either you totally enable or totally disable. In the end, the only valid use cases we found for enabling/disabling a syscall were limited to a very small set for debugging purposes, in order to force some application code to detect a missing implementation and switch to an alternative (eg: these days if you suspect a bug in epoll you could disable it and force the app to use poll instead). It was still useful to disable module loading and FS mounting but that was about all by then. All this to say that probably only a handful of tricky syscalls would need an on/off switch but clearly not all of them at all, so I'd rather add a few entries just for the relevant ones, mainly to fix compatibility issues and nothing more. Eg: what's the point of disabling exit(), wait(), kill(), fork() or getpid()... It would only increase the difficulty to sort out bug reports. Just my opinion, Willy ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 2/3] x86/ldt: Make modify_ldt optional
On Thu, Jul 23, 2015 at 4:58 PM, Willy Tarreau w...@1wt.eu wrote: On Thu, Jul 23, 2015 at 04:40:14PM -0700, Andy Lutomirski wrote: On Thu, Jul 23, 2015 at 4:36 PM, Kees Cook keesc...@chromium.org wrote: I've been pondering something like this that is even MORE generic, for any syscall. Something like a syscalls directory under /proc/sys/kernel, with 1 entry per syscall. 0 is available, 1 is disabled, and -1 disabled until next boot. It might want to be /proc/sys/kernel/syscalls/[abi]/[name], possibly with more than just those options. We might want disabled, returns ENOSYS, disabled, returns EPERM, and a lock bit. On x86 at least, the implementation's easy -- we can just poke the syscall table. I wouldn't do it these days. Around 2000-2001, with a friend we designed a module with its userland counterpart which was called overloader. The principle was to intercept syscalls in order to enforce some form of policies, log values, or remap paths, etc. The first use was to log all file creations during a make install to more easily build packages. It was at the era where it was easy to modify the syscall table from a module, in kernel 2.2. We quickly found that beyond logging/rewriting syscall arguments, it had limited use cases when used as a syscall firewall because many syscalls are still too coarse to decide whether you want to enable/disable them. I remember that socketcall() and ioctl() were among the annoying ones. Either you totally enable or totally disable. In the end, the only valid use cases we found for enabling/disabling a syscall were limited to a very small set for debugging purposes, in order to force some application code to detect a missing implementation and switch to an alternative (eg: these days if you suspect a bug in epoll you could disable it and force the app to use poll instead). It was still useful to disable module loading and FS mounting but that was about all by then. All this to say that probably only a handful of tricky syscalls would need an on/off switch but clearly not all of them at all, so I'd rather add a few entries just for the relevant ones, mainly to fix compatibility issues and nothing more. Eg: what's the point of disabling exit(), wait(), kill(), fork() or getpid()... It would only increase the difficulty to sort out bug reports. Just my opinion, Well, I would really like to have something like this around so that I can trivially globally disable syscalls when they have security risks. My hack[1] to disable kexec_load, for example, was terrible while I waited for a kernel that supported the disable_kexec_load sysctl. -Kees [1] https://outflux.net/blog/archives/2013/12/10/live-patching-the-kernel/ -- Kees Cook Chrome OS Security ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 2/3] x86/ldt: Make modify_ldt optional
On Thu, Jul 23, 2015 at 4:36 PM, Kees Cook keesc...@chromium.org wrote: On Thu, Jul 23, 2015 at 3:24 AM, Willy Tarreau w...@1wt.eu wrote: #ifdef CONFIG_SMP static void flush_ldt(void *current_mm) { @@ -254,6 +260,9 @@ asmlinkage int sys_modify_ldt(int func, void __user *ptr, { int ret = -ENOSYS; + if (!sysctl_modify_ldt) + return ret; + switch (func) { case 0: ret = read_ldt(ptr, bytecount); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 2082b1a..60270c6 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -111,6 +111,9 @@ extern int sysctl_nr_open_min, sysctl_nr_open_max; #ifndef CONFIG_MMU extern int sysctl_nr_trim_pages; #endif +#ifdef CONFIG_X86 +extern int sysctl_modify_ldt; +#endif /* Constants used for minimum and maximum */ #ifdef CONFIG_LOCKUP_DETECTOR @@ -962,6 +965,13 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, + { + .procname = modify_ldt, + .data = sysctl_modify_ldt, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, #endif #if defined(CONFIG_MMU) { I've been pondering something like this that is even MORE generic, for any syscall. Something like a syscalls directory under /proc/sys/kernel, with 1 entry per syscall. 0 is available, 1 is disabled, and -1 disabled until next boot. It might want to be /proc/sys/kernel/syscalls/[abi]/[name], possibly with more than just those options. We might want disabled, returns ENOSYS, disabled, returns EPERM, and a lock bit. On x86 at least, the implementation's easy -- we can just poke the syscall table. --Andy ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [MINUTES] Monthly Xen.org Technical Call (2015-07-22)
On 23/07/15 12:17, Roger Pau Monné wrote: El 23/07/15 a les 12.59, Konrad Rzeszutek Wilk ha escrit: [...] We forgot to speak about dom0. This work outlined will lay out how to do it - but the pieces for dom0 are not implemented and would need work (which actually may be following most of the is_pvh in the hypervisor). Let's do that once we have DomU nailed down and the guest ABI marked as stable. Some work will indeed be needed for Dom0, for example I would like to avoid using the current Dom0 builder (construct_dom0, that is suited for PV, but all the PVH hacks just make it almost impossible to understand) and start from scratch with a more sane approach. +1 on both points. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
On 23.07.15 at 13:45, ian.jack...@eu.citrix.com wrote: Ian Jackson writes ([PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy): From: Tiejun Chen tiejun.c...@intel.com This patch extends the existing hypercall to support rdm reservation policy. We return error or just throw out a warning message depending on whether the policy is strict or relaxed when reserving RDM regions in pfn space. Note in some special cases, e.g. add a device to hwdomain, and remove a device from user domain, 'relaxed' is fine enough since this is always safe to hwdomain. This patch breaks the build on ARM: gcc -O1 -fno-omit-frame-pointer -marm -g -fno-strict-aliasing -std=gnu99 -Wall -Wstrict-prototypes -Wdeclaration-after-statement -Wno-unused-but-set-variable -Wno-unused-local-typedefs -I/local/scratch/ianj/xen.git/xen/include -fno-stack-protector -fno-exceptions -Wnested-externs -msoft-float -mcpu=cortex-a15 -DGCC_HAS_VISIBILITY_ATTRIBUTE -fno-builtin -fno-common -Werror -Wredundant-decls -Wno-pointer-arith -pipe -g -D__XEN__ -include /local/scratch/ianj/xen.git/xen/include/xen/config.h -nostdinc -fno-optimize-sibling-calls -DVERBOSE -DHAS_PASSTHROUGH -DHAS_DEVICE_TREE -DHAS_MEM_ACCESS -DHAS_PDX -fno-omit-frame-pointer -DCONFIG_FRAME_POINTER -MMD -MF .smmu.o.d -c smmu.c -o smmu.o smmu.c: In function 'arm_smmu_reassign_dev': smmu.c:2712:9: error: too few arguments to function 'arm_smmu_assign_dev' ret = arm_smmu_assign_dev(t, devfn, dev); ^ smmu.c:2607:12: note: declared here static int arm_smmu_assign_dev(struct domain *d, u8 devfn, ^ /local/scratch/ianj/xen.git/xen/Rules.mk:168: recipe for target 'smmu.o' failed make[6]: *** [smmu.o] Error 1 I had a quick look but it's not a simple matter of plumbing through an additional flags parameter becuase the reassign_device method apparently doesn't take flags. Considering that the parameter is ignored anyway, I'd suggest following what was done for various of the rmrr_identity_mapping() callers - just pass zero. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] tools/libxl: Fixes to stream v2 task joining logic
On Thu, Jul 23, 2015 at 12:42:25PM +0100, Andrew Cooper wrote: On 23/07/15 12:38, Wei Liu wrote: On Thu, Jul 23, 2015 at 12:09:38PM +0100, Andrew Cooper wrote: During review of the libxl migration v2 series, I changes the task joining logic, but clearly didn't think the result through properly. This patch fixes several errors. 1) Do not call check_all_finished() in the success cases of libxl__xc_domain_{save,restore}_done(). It serves no specific purpose as the save helper state will report itself as inactive by this point, and avoids triggering a second stream-completion_callback() in the case that write_toolstack_record()/stream_continue() record errors synchronously themselves. 2) Only ever set stream-rc in stream_complete(). The first version of the migration v2 series had separate rc and joined_rc parameters, where this logic worked. However when combining the two, the teardown path fails to trigger if stream_done() records stream-rc itself. A side effect of this is that stream_done() needs to take an rc parameter. 3) Avoid stacking of check_all_finished() via synchronous teardown of tasks. If the _abort() functions call back synchronously, stream-completion_callback() ends up getting called twice, as first and last check_all_finished() frames observe each task being finished. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- I found this while working to fix the toolstack record issue, but am posting this bugfix ahead of the other work as OSSTest has tripped over the issue. This change itself doesn't seem to have anything to do with libxc. In OSSTest the error that triggers this knock on effect is the failure of xc_map_foreign_bulk. Does that mean this patch only fix half of the problem seen in OSSTest? Saving to image new xl format (info 0x3/0x0/1797) xc: info: In experimental xc_domain_save2 xc: info: Saving domain 2, type x86 HVM xc: progress: Memory: 67584/13445135% xc: progress: Memory: 135168/1344513 10% xc: progress: Memory: 201728/1344513 15% xc: progress: Memory: 269312/1344513 20% xc: progress: Memory: 336896/1344513 25% xc: progress: Memory: 403456/1344513 30% xc: progress: Memory: 471040/1344513 35% xc: progress: Memory: 538624/1344513 40% xc: progress: Memory: 605184/1344513 45% xc: progress: Memory: 672768/1344513 50% xc: progress: Memory: 740352/1344513 55% xc: error: xc_map_foreign_bulk: mmap failed (12 = Cannot allocate memory): Internal error xc: error: Failed to map guest pages (12 = Cannot allocate memory): Internal error xc: error: Save failed (12 = Cannot allocate memory): Internal error libxl: error: libxl_stream_write.c:267:libxl__xc_domain_save_done: saving domain: domain responded to suspend request: Cannot allocate memory xl: libxl_event.c:1866: libxl__ao_inprogress_gc: Assertion `!ao-complete' failed. I don't know why xc_map_foreign_bulk() failed, but at a guess I would day dom0 is out of RAM. I don't think so. There is no OOM error in serial log and dmesg. A similar failure is discovered in QEMU mainline test. That one is using OVMF + QEMU upstream. So this failure is not related to firmware or device model. http://logs.test-lab.xenproject.org/osstest/logs/59819/test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm/info.html It should be a real bug manifests only on 32bit toolstack + guest with large amount of ram (5000 in this case). Looks like we have one more bug to hunt down. Wei. This patch fixes the libxl assertion. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4 16/17] vmx: Add some scheduler hooks for VT-d posted interrupts
On Thu, 2015-07-23 at 19:35 +0800, Feng Wu wrote: This patch adds the following arch hooks in scheduler: So, preliminary question: does this mean that you have identified (and fixed) the differences in behavior wrt the runstate based model, which was causing performance issue? I've been sidetracked a bit in looking at your previous patch, but did found a couple of differences, which I was about to report... But I guess that's no longer necessary, I guess, is it? Dario - vmx_pre_ctx_switch_pi(): It is called before context switch, we update the posted interrupt descriptor when the vCPU is preempted, go to sleep, or is blocked. - vmx_post_ctx_switch_pi() It is called after context switch, we update the posted interrupt descriptor when the vCPU is going to run. - arch_vcpu_wake() It will be called when waking up the vCPU, we update the posted interrupt descriptor when the vCPU is unblocked. -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/HVM: atomically access pointers in bufioreq handling
On 23.07.15 at 12:04, stefano.stabell...@eu.citrix.com wrote: On Thu, 23 Jul 2015, Jan Beulich wrote: On 22.07.15 at 19:24, stefano.stabell...@eu.citrix.com wrote: I'll queue this change up for the next QEMU release cycle. Thanks - v2 (with the adjusted description) just sent. It would however be nice for our variant in 4.6 to also gain this, perhaps independent of upstream's schedule. Is the hypervisor side going to go in 4.6? Do we need a freeze exception or do we consider this a bug fix? As said in the updated description in v2, the hypervisor side went in already some time ago. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [MINUTES] Monthly Xen.org Technical Call (2015-07-22)
El 23/07/15 a les 12.59, Konrad Rzeszutek Wilk ha escrit: [...] We forgot to speak about dom0. This work outlined will lay out how to do it - but the pieces for dom0 are not implemented and would need work (which actually may be following most of the is_pvh in the hypervisor). Let's do that once we have DomU nailed down and the guest ABI marked as stable. Some work will indeed be needed for Dom0, for example I would like to avoid using the current Dom0 builder (construct_dom0, that is suited for PV, but all the PVH hacks just make it almost impossible to understand) and start from scratch with a more sane approach. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] hvmloader: don't build with __XEN_TOOLS__ defined
On 22/07/15 15:38, Jan Beulich wrote: This being an in-guest component, it shouldn't get to see (and even less so use) tools-only public interfaces. Signed-off-by: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- a/tools/firmware/hvmloader/Makefile +++ b/tools/firmware/hvmloader/Makefile @@ -31,6 +31,9 @@ SMBIOS_REL_DATE ?= $(shell date +%m/%d/% CFLAGS += $(CFLAGS_xeninclude) +# We mustn't use tools-only public interfaces. +CFLAGS += -U__XEN_TOOLS__ -D__XEN_INTERFACE_VERSION__=__XEN_LATEST_INTERFACE_VERSION__ + OBJS = hvmloader.o mp_tables.o util.o smbios.o OBJS += smp.o cacheattr.o xenbus.o vnuma.o OBJS += e820.o pci.o pir.o ctype.o ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/6] libxl: Remove linux udev rules
On Thu, 2015-07-23 at 13:55 +0200, Roger Pau Monné wrote: El 14/07/15 a les 18.48, Ian Campbell ha escrit: On Tue, 2015-07-14 at 17:40 +0100, Wei Liu wrote: On Tue, Jul 14, 2015 at 05:35:23PM +0100, George Dunlap wrote: On 07/14/2015 05:21 PM, Ian Campbell wrote: On Tue, 2015-07-14 at 12:13 -0400, Konrad Rzeszutek Wilk wrote: On Tue, Jul 07, 2015 at 12:39:52PM +0100, Wei Liu wrote: On Mon, Jul 06, 2015 at 11:51:39AM +0100, George Dunlap wrote: They are no longer needed, having been replaced by a daemon for driverdomains which will run scripts as necessary. This introduces an regression. The 'daemon for driverdomains' is xenbackendd - which only gets built for NetBSD. It's not (any more), it's xl devd, which should be built everywhere. Aha! I knew I'd seen it somewhere. In that case, do we actually still need xenbackendd? Or does xenbackendd do something slightly different? That's NetBSD-only thing as far as I remember. Not sure if what it does is NetBSD specific. My understanding was it had been subsumed by xl devd, but I don't know why it hasn't been removed. Probably best to wait until Roger is back... Some time ago I proposed to the NetBSD folks to remove xenbackendd, because I think it's useless now, but the reply was to leave it as is: http://mail-index.netbsd.org/port-xen/2015/01/29/msg008483.html It's redundant with xl devd, I don't see any reason for use to continue carrying it in xen.git. If xl devd has some short coming relative to xenbackendd then we should address that. But otherwise if folks want to keep xenbackend despite it having been replaced then they can carry the patch IMHO. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
Ian Jackson writes ([PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy): From: Tiejun Chen tiejun.c...@intel.com This patch extends the existing hypercall to support rdm reservation policy. We return error or just throw out a warning message depending on whether the policy is strict or relaxed when reserving RDM regions in pfn space. Note in some special cases, e.g. add a device to hwdomain, and remove a device from user domain, 'relaxed' is fine enough since this is always safe to hwdomain. This patch breaks the build on ARM: gcc -O1 -fno-omit-frame-pointer -marm -g -fno-strict-aliasing -std=gnu99 -Wall -Wstrict-prototypes -Wdeclaration-after-statement -Wno-unused-but-set-variable -Wno-unused-local-typedefs -I/local/scratch/ianj/xen.git/xen/include -fno-stack-protector -fno-exceptions -Wnested-externs -msoft-float -mcpu=cortex-a15 -DGCC_HAS_VISIBILITY_ATTRIBUTE -fno-builtin -fno-common -Werror -Wredundant-decls -Wno-pointer-arith -pipe -g -D__XEN__ -include /local/scratch/ianj/xen.git/xen/include/xen/config.h -nostdinc -fno-optimize-sibling-calls -DVERBOSE -DHAS_PASSTHROUGH -DHAS_DEVICE_TREE -DHAS_MEM_ACCESS -DHAS_PDX -fno-omit-frame-pointer -DCONFIG_FRAME_POINTER -MMD -MF .smmu.o.d -c smmu.c -o smmu.o smmu.c: In function 'arm_smmu_reassign_dev': smmu.c:2712:9: error: too few arguments to function 'arm_smmu_assign_dev' ret = arm_smmu_assign_dev(t, devfn, dev); ^ smmu.c:2607:12: note: declared here static int arm_smmu_assign_dev(struct domain *d, u8 devfn, ^ /local/scratch/ianj/xen.git/xen/Rules.mk:168: recipe for target 'smmu.o' failed make[6]: *** [smmu.o] Error 1 I had a quick look but it's not a simple matter of plumbing through an additional flags parameter becuase the reassign_device method apparently doesn't take flags. Tiejun, please can you send a patch to fix this up. Please send just a revised version of this patch. I think the rest of the series will rebase just fine on top of it. (If I'm wrong then we will need to do something more complex.) Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] x86/MSI: drop bogus NULL check from pci_restore_msi_state()
Commit 372900faf8 (x86/MSI-X: reduce fiddling with control register during restore) introduced de-references of pdev before it gets checked against NULL. Instead of deferring the de-references, drop the pointless check - both call sites do that check already. Reported-by: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -1354,9 +1354,6 @@ int pci_restore_msi_state(struct pci_dev if ( !use_msi ) return -EOPNOTSUPP; -if ( !pdev ) -return -EINVAL; - ret = xsm_resource_setup_pci(XSM_PRIV, (pdev-seg 16) | (pdev-bus 8) | pdev-devfn); x86/MSI: drop bogus NULL check from pci_restore_msi_state() Commit 372900faf8 (x86/MSI-X: reduce fiddling with control register during restore) introduced de-references of pdev before it gets checked against NULL. Instead of deferring the de-references, drop the pointless check - both call sites do that check already. Reported-by: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -1354,9 +1354,6 @@ int pci_restore_msi_state(struct pci_dev if ( !use_msi ) return -EOPNOTSUPP; -if ( !pdev ) -return -EINVAL; - ret = xsm_resource_setup_pci(XSM_PRIV, (pdev-seg 16) | (pdev-bus 8) | pdev-devfn); ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4 15/17] arm: add a dummy arch hooks for scheduler
Hi, On 23/07/15 12:35, Feng Wu wrote: Add a dummy arch hooks for scheduler to make the build successful in ARM. I would make more sense if you introduce the dummy hooks in the patch where you effectively use it (i.e #16). Actually you also introduce arch_vcpu_wake for x86 there too. CC: Ian Campbell ian.campb...@citrix.com CC: Stefano Stabellini stefano.stabell...@citrix.com CC: Tim Deegan t...@xen.org Signed-off-by: Feng Wu feng...@intel.com --- v4: - Newly added xen/include/asm-arm/domain.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index f1a087e..7f25fd5 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -265,6 +265,8 @@ static inline unsigned int domain_max_vcpus(const struct domain *d) return MAX_VIRT_CPUS; } +static void arch_vcpu_wake(struct vcpu *v) {} + Please use static inline here. #endif /* __ASM_DOMAIN_H__ */ /* Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/6] libxl: Remove linux udev rules
El 14/07/15 a les 18.48, Ian Campbell ha escrit: On Tue, 2015-07-14 at 17:40 +0100, Wei Liu wrote: On Tue, Jul 14, 2015 at 05:35:23PM +0100, George Dunlap wrote: On 07/14/2015 05:21 PM, Ian Campbell wrote: On Tue, 2015-07-14 at 12:13 -0400, Konrad Rzeszutek Wilk wrote: On Tue, Jul 07, 2015 at 12:39:52PM +0100, Wei Liu wrote: On Mon, Jul 06, 2015 at 11:51:39AM +0100, George Dunlap wrote: They are no longer needed, having been replaced by a daemon for driverdomains which will run scripts as necessary. This introduces an regression. The 'daemon for driverdomains' is xenbackendd - which only gets built for NetBSD. It's not (any more), it's xl devd, which should be built everywhere. Aha! I knew I'd seen it somewhere. In that case, do we actually still need xenbackendd? Or does xenbackendd do something slightly different? That's NetBSD-only thing as far as I remember. Not sure if what it does is NetBSD specific. My understanding was it had been subsumed by xl devd, but I don't know why it hasn't been removed. Probably best to wait until Roger is back... Some time ago I proposed to the NetBSD folks to remove xenbackendd, because I think it's useless now, but the reply was to leave it as is: http://mail-index.netbsd.org/port-xen/2015/01/29/msg008483.html Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
On Thu, 2015-07-23 at 12:45 +0100, Ian Jackson wrote: I had a quick look but it's not a simple matter of plumbing through an additional flags parameter becuase the reassign_device method apparently doesn't take flags. The vtd version (reassign_device_ownership) does: /* * Any RMRR flag is always ignored when remove a device, * but its always safe and strict to set 0. */ ret = rmrr_identity_mapping(source, 0, rmrr, 0); Where that second 0 there was the flag when it was called from (intel_iommu_assign_device). Given that arm_smmu_assign_dev currently ignores the flag, and that ARM doesn't currently have RDM/RMRR at all I think just passing 0 in arm_smmu_reassign_dev would be tolerable. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] tools/libxl: Fixes to stream v2 task joining logic
On 23/07/15 12:38, Wei Liu wrote: On Thu, Jul 23, 2015 at 12:09:38PM +0100, Andrew Cooper wrote: During review of the libxl migration v2 series, I changes the task joining logic, but clearly didn't think the result through properly. This patch fixes several errors. 1) Do not call check_all_finished() in the success cases of libxl__xc_domain_{save,restore}_done(). It serves no specific purpose as the save helper state will report itself as inactive by this point, and avoids triggering a second stream-completion_callback() in the case that write_toolstack_record()/stream_continue() record errors synchronously themselves. 2) Only ever set stream-rc in stream_complete(). The first version of the migration v2 series had separate rc and joined_rc parameters, where this logic worked. However when combining the two, the teardown path fails to trigger if stream_done() records stream-rc itself. A side effect of this is that stream_done() needs to take an rc parameter. 3) Avoid stacking of check_all_finished() via synchronous teardown of tasks. If the _abort() functions call back synchronously, stream-completion_callback() ends up getting called twice, as first and last check_all_finished() frames observe each task being finished. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- I found this while working to fix the toolstack record issue, but am posting this bugfix ahead of the other work as OSSTest has tripped over the issue. This change itself doesn't seem to have anything to do with libxc. In OSSTest the error that triggers this knock on effect is the failure of xc_map_foreign_bulk. Does that mean this patch only fix half of the problem seen in OSSTest? Saving to image new xl format (info 0x3/0x0/1797) xc: info: In experimental xc_domain_save2 xc: info: Saving domain 2, type x86 HVM xc: progress: Memory: 67584/13445135% xc: progress: Memory: 135168/1344513 10% xc: progress: Memory: 201728/1344513 15% xc: progress: Memory: 269312/1344513 20% xc: progress: Memory: 336896/1344513 25% xc: progress: Memory: 403456/1344513 30% xc: progress: Memory: 471040/1344513 35% xc: progress: Memory: 538624/1344513 40% xc: progress: Memory: 605184/1344513 45% xc: progress: Memory: 672768/1344513 50% xc: progress: Memory: 740352/1344513 55% xc: error: xc_map_foreign_bulk: mmap failed (12 = Cannot allocate memory): Internal error xc: error: Failed to map guest pages (12 = Cannot allocate memory): Internal error xc: error: Save failed (12 = Cannot allocate memory): Internal error libxl: error: libxl_stream_write.c:267:libxl__xc_domain_save_done: saving domain: domain responded to suspend request: Cannot allocate memory xl: libxl_event.c:1866: libxl__ao_inprogress_gc: Assertion `!ao-complete' failed. I don't know why xc_map_foreign_bulk() failed, but at a guess I would day dom0 is out of RAM. This patch fixes the libxl assertion. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 07/32] xen/x86: fix arch_set_info_guest for HVM guests
On Thu, 2015-07-23 at 05:29 -0600, Jan Beulich wrote: On 23.07.15 at 12:25, roger@citrix.com wrote: El 13/07/15 a les 16.01, Jan Beulich ha escrit: On 03.07.15 at 13:34, roger@citrix.com wrote: --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -795,6 +795,15 @@ int arch_set_info_guest( c.nat-fs_base || c.nat-gs_base_user)) ) return -EINVAL; } +else if ( is_hvm_domain(d) ) +{ +if ( c(ctrlreg[0]) || c(ctrlreg[1]) || c(ctrlreg[2]) || + c(ctrlreg[3]) || c(ctrlreg[4]) || c(ctrlreg[5]) || + c(ctrlreg[6]) || c(ctrlreg[7]) || c(ldt_base) || + c(ldt_ents) || c(kernel_ss) || c(kernel_sp) || + c(gdt_ents) ) +return -EINVAL; +} In addition to what Andrew said - is the use of c() here really correct considering compat = is_pv_32bit_domain(d); #define c(fld) (compat ? (c.cmp-fld) : (c.nat-fld)) near the beginning of the function? I've been thinking about this. Since HVM vCPUs are always started in 32bit mode, I would ideally like to use the vcpu_guest_context_x86_32_t struct to set the vCPU context. This means that for HVM guests compat should always be true. The problem is that inside of the guest there's no notion of vcpu_guest_context_x86_32_t or vcpu_guest_context_x86_64_t, there's only vcpu_guest_context, which defaults to the native bitness of the guest. So if your guest is running in 64bit mode vcpu_guest_context is aliased to vcpu_guest_context_x86_64_t by default. TBH I'm not sure about the best way to solve this. Should vcpu_guest_context_x86_32_t and vcpu_guest_context_x86_64_t be exposed to the guest like it's done for the tools? Why? We're in arch_set_info_guest(), which is unreachable for a HVM guest on itself. Or is this in preparation of allowing HVM guests to use VCPUOP_initialise? In that case, exposing might be an option, but remember that the compat mode layout even today is used only for PV guests. So I'd rather avoid exposing both layouts to guests, and do translation instead inside the hypervisor. On ARM we deliberately arranged that the vcpu_guest_context was the same for both arm32 and arm64. Moving to PVH seems like an ideal opportunity to do something similar for x86, if not by standardising on the x86_64 layout then by declaring a new struct which can encompass both and declaring the old ones PV legacy. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
On Thu, 2015-07-23 at 13:19 +0100, Ian Jackson wrote: Chen, Tiejun writes (Re: [PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy): Tiejun, please can you send a patch to fix this up. Please send just a revised version of this patch. I think the rest of the series will rebase just fine on top of it. (If I'm wrong then we will need to do something more complex.) Sorry to this. On ARM side the flag field doesn't take any affect. Signed-off-by: Tiejun Chen tiejun.c...@intel.com OK, thanks. I wasn't sure that just passing 0 was right. I will take Jan's and Ian's suggestions as acks for your patch, which looks correct to me to implement them. Can you avoid the mention of DT in the comment please, since PCI will eventually go that path. Something like No flags are defined for ARM would suffice I think. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2] oxenstored: link in the systemd system library
If systemd is configured for use AND you are building oxenstored, the C systemd library must be linked in to the systemd.cxma library. Signed-off-by: Jonathan Creekmore jonathan.creekm...@gmail.com --- Changed since v1: * Link the systemd library in to the systemd.cxma instead of the final binary. --- tools/ocaml/xenstored/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/ocaml/xenstored/Makefile b/tools/ocaml/xenstored/Makefile index d861f11..59875f7 100644 --- a/tools/ocaml/xenstored/Makefile +++ b/tools/ocaml/xenstored/Makefile @@ -30,6 +30,8 @@ systemd_OBJS = systemd systemd_C_OBJS = systemd_stubs OCAML_LIBRARY += systemd +LIBS_systemd += $(LDFLAGS-y) + OBJS = define \ stdext \ trie \ -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [linux-3.18 test] 59825: regressions - FAIL
flight 59825 linux-3.18 real [real] http://logs.test-lab.xenproject.org/osstest/logs/59825/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-pvh-intel 11 guest-start fail REGR. vs. 58581 test-amd64-i386-xl-qemut-debianhvm-amd64 11 guest-saverestore fail REGR. vs. 58581 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-rumpuserxen-amd64 15 rumpuserxen-demo-xenstorels/xenstorels.repeat fail REGR. vs. 58558 test-armhf-armhf-libvirt 6 xen-boot fail REGR. vs. 58581 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 11 guest-saverestore fail baseline untested test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 14 guest-localmigrate.2 fail baseline untested test-armhf-armhf-xl-rtds 14 guest-start.2 fail baseline untested test-armhf-armhf-xl-credit2 6 xen-boot fail like 58581 test-armhf-armhf-xl 6 xen-boot fail like 58581 test-armhf-armhf-xl-multivcpu 6 xen-boot fail like 58581 test-armhf-armhf-xl-xsm 6 xen-boot fail like 58581 test-armhf-armhf-libvirt-xsm 6 xen-boot fail like 58581 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 58581 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 58581 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 58581 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail never pass test-armhf-armhf-xl-cubietruck 6 xen-boot fail never pass test-armhf-armhf-xl-rtds 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass version targeted for testing: linux22a6cbf9f36ee3ae2878efcbdde33e6ca00b9c4b baseline version: linuxd048c068d00da7d4cfa5ea7651933b99026958cf Last test of basis58581 2015-06-15 09:42:22 Z 38 days Failing since 58976 2015-06-29 19:43:23 Z 23 days 33 attempts Testing same since59825 2015-07-22 17:19:07 Z0 days1 attempts 346 people touched revisions under test, not listing them all jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass build-amd64-rumpuserxen pass build-i386-rumpuserxen pass test-amd64-amd64-xl pass test-armhf-armhf-xl fail test-amd64-i386-xl pass test-amd64-amd64-xl-qemut-debianhvm-amd64-xsmpass test-amd64-i386-xl-qemut-debianhvm-amd64-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsmfail test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm fail test-amd64-amd64-libvirt-xsm pass test-armhf-armhf-libvirt-xsm fail test-amd64-i386-libvirt-xsm pass test-amd64-amd64-xl-xsm pass test-armhf-armhf-xl-xsm fail test-amd64-i386-xl-xsm pass test-amd64-amd64-xl-pvh-amd fail test-amd64-i386-qemut-rhel6hvm-amd
Re: [Xen-devel] [v4 16/17] vmx: Add some scheduler hooks for VT-d posted interrupts
-Original Message- From: Dario Faggioli [mailto:dario.faggi...@citrix.com] Sent: Thursday, July 23, 2015 8:50 PM To: Wu, Feng Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap; Andrew Cooper; Jan Beulich Subject: Re: [Xen-devel] [v4 16/17] vmx: Add some scheduler hooks for VT-d posted interrupts On Thu, 2015-07-23 at 19:35 +0800, Feng Wu wrote: This patch adds the following arch hooks in scheduler: So, preliminary question: does this mean that you have identified (and fixed) the differences in behavior wrt the runstate based model, which was causing performance issue? Yes, the issue has been fixed in this version. I've been sidetracked a bit in looking at your previous patch, but did found a couple of differences, which I was about to report... But I guess that's no longer necessary, I guess, is it? Thanks for your effort, Dario! Maybe you can give a review about this new version, if you have time. Thanks, Feng Dario - vmx_pre_ctx_switch_pi(): It is called before context switch, we update the posted interrupt descriptor when the vCPU is preempted, go to sleep, or is blocked. - vmx_post_ctx_switch_pi() It is called after context switch, we update the posted interrupt descriptor when the vCPU is going to run. - arch_vcpu_wake() It will be called when waking up the vCPU, we update the posted interrupt descriptor when the vCPU is unblocked. -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Patch V4 1/3] usb: Add Xen pvUSB protocol description
On 07/23/2015 09:08 PM, Greg KH wrote: On Thu, Jul 23, 2015 at 08:46:17AM +0200, Juergen Gross wrote: On 07/23/2015 06:36 AM, Greg KH wrote: On Thu, Jul 23, 2015 at 06:04:39AM +0200, Juergen Gross wrote: On 07/23/2015 01:46 AM, Greg KH wrote: On Tue, Jun 23, 2015 at 08:53:23AM +0200, Juergen Gross wrote: Add the definition of pvUSB protocol used between the pvUSB frontend in a Xen domU and the pvUSB backend in a Xen driver domain (usually Dom0). This header was originally provided by Fujitsu for Xen based on Linux 2.6.18. Changes are: - adapt to Linux style guide Signed-off-by: Juergen Gross jgr...@suse.com --- include/xen/interface/io/usbif.h | 252 +++ Why is this a different interface than the existing ones we have today (i.e. usbip?) Where is it documented? Do the Xen developers / The interface definition is living in the Xen git repository for several years now: git://xenbits.xen.org/xen.git - xen/include/public/io/usbif.h That's header file, not a document describing the api here. I suppose you want to tell me I should add something like: Documentation/DocBook/usb/API-struct-urb.html Somewhere that people can refer to that describes this public-facing API that must not ever be broken or changed. If you want to put it in a documentation file, or a .h file, I don't care. It is used e.g. in SUSE's xen kernel since 2.6.18. I am very aware of the amount of Xen crap in SuSE's kernel, don't use that as an excuse for me to merge it to mainline :) :-) Wasn't meant as an excuse, just a hint why the interface can't be the same as for usbip. We have to ensure compatibility with those kernels This shouldn't be a kernel/kernel compability issue, as the api talks between Xen and the OS, not between different OSs, right? Depends on where the backend is living. It's the backend the frontend is talking to. There is a backend in SUSE's kernels up to SLE12. So compatibility is to be maintained to those kernels. Looks as if in future there will be one in qemu. and possibly other operating systems (BSD?, Windows?) which already might be using pvUSB with a Dom0 based on the SUSE xen kernel. Are there other operating system drivers today that use this API? Is this an API in the Xen core today that we have to support? Yes. Some more background / descriptions would be nice to have. I guess a documentation file giving a brief explanation about the interfaces of Xen wouldn't be a bad idea. This could avoid discussions like this. It shouldn't define each interface, but the classes of interfaces which are existing (between kernel and hypervisor, frontends and backends) and the stability requirements. Headers like the one we are discussing here could then refer to this document. Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4 04/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
-Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: Thursday, July 23, 2015 10:05 PM To: Wu, Feng; xen-devel@lists.xen.org Cc: Tian, Kevin; Jan Beulich Subject: Re: [Xen-devel] [v4 04/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature On 23/07/15 15:01, Andrew Cooper wrote: On 23/07/15 12:35, Feng Wu wrote: VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt. With VT-d Posted-Interrupts enabled, external interrupts from direct-assigned devices can be delivered to guests without VMM intervention when guest is running in non-root mode. This patch adds variable 'iommu_intpost' to control whether enable VT-d posted-interrupt or not in the generic IOMMU code. CC: Jan Beulich jbeul...@suse.com CC: Kevin Tian kevin.t...@intel.com Signed-off-by: Feng Wu feng...@intel.com Reviewed-by: Kevin Tian kevin.t...@intel.com Please patch docs/misc/xen-command-line.markdown as you are --- v4: - No changes v3: - Remove pointless initializer for 'iommu_intpost'. - Some adjustment for if no intremap then no intpost logic. * For parse_iommu_param(), move it to the end of the function, so we don't need to add the some logic when introduing the new kernel parameter 'intpost' in later patch. * Add this logic in iommu_setup() after iommu_hardware_setup() is called. xen/drivers/passthrough/iommu.c | 17 - xen/include/xen/iommu.h | 2 +- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c index 06cb38f..536b69f 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -39,6 +39,7 @@ static void iommu_dump_p2m_table(unsigned char key); * no-snoop Disable VT-d Snoop Control * no-qinval Disable VT-d Queued Invalidation * no-intremapDisable VT-d Interrupt Remapping + * no-intpost Disable VT-d Interrupt posting */ custom_param(iommu, parse_iommu_param); bool_t __initdata iommu_enable = 1; @@ -51,6 +52,14 @@ bool_t __read_mostly iommu_passthrough; bool_t __read_mostly iommu_snoop = 1; bool_t __read_mostly iommu_qinval = 1; bool_t __read_mostly iommu_intremap = 1; + +/* + * In the current implementation of VT-d posted interrupts, in some extreme + * cases, the per cpu list which saves the blocked vCPU will be very long, + * and this will affect the interrupt latency, so let this feature off by + * default until we find a good solution to resolve it. + */ +bool_t __read_mostly iommu_intpost; bool_t __read_mostly iommu_hap_pt_share = 1; bool_t __read_mostly iommu_debug; bool_t __read_mostly amd_iommu_perdev_intremap = 1; @@ -112,6 +121,9 @@ static void __init parse_iommu_param(char *s) s = ss + 1; } while ( ss ); + +if ( !iommu_intremap ) +iommu_intpost = 0; This hunk is still wrong, as I said in v3. Furthermore, you appear to have no way in v4 to enable intpost on the Xen command line. Oh, sorry, since this feature is required default off, I forget to send out the last patch, which add the Xen command line. I will re-add it in the next version. Thanks, Feng ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [linux-linus test] 59836: regressions - FAIL
flight 59836 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/59836/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-xsm 13 guest-saverestore fail REGR. vs. 59254 test-amd64-i386-xl 13 guest-saverestore fail REGR. vs. 59254 test-amd64-i386-pair 21 guest-migrate/src_host/dst_host fail REGR. vs. 59254 test-amd64-i386-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail REGR. vs. 59254 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 12 guest-localmigrate fail REGR. vs. 59254 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds 11 guest-start fail REGR. vs. 59254 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 59254 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 59254 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail like 59423-bisect Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-amd64-xl-pvh-intel 13 guest-saverestorefail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass version targeted for testing: linuxc5dfd654d0ec0a28fe81e7bd4d4fd984a9855e09 baseline version: linux45820c294fe1b1a9df495d57f40585ef2d069a39 Last test of basis59254 2015-07-09 04:20:48 Z 14 days Failing since 59348 2015-07-10 04:24:05 Z 13 days 14 attempts Testing same since59836 2015-07-23 10:55:54 Z0 days1 attempts 452 people touched revisions under test, not listing them all jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass build-amd64-rumpuserxen pass build-i386-rumpuserxen pass test-amd64-amd64-xl pass test-armhf-armhf-xl pass test-amd64-i386-xl fail test-amd64-amd64-xl-qemut-debianhvm-amd64-xsmpass test-amd64-i386-xl-qemut-debianhvm-amd64-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsmfail test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm fail test-amd64-amd64-libvirt-xsm pass test-armhf-armhf-libvirt-xsm pass test-amd64-i386-libvirt-xsm pass test-amd64-amd64-xl-xsm pass test-armhf-armhf-xl-xsm pass test-amd64-i386-xl-xsm fail test-amd64-amd64-xl-pvh-amd fail test-amd64-i386-qemut-rhel6hvm-amd pass
Re: [Xen-devel] [v4 15/17] arm: add a dummy arch hooks for scheduler
-Original Message- From: Julien Grall [mailto:julien.gr...@citrix.com] Sent: Thursday, July 23, 2015 7:54 PM To: Wu, Feng; xen-devel@lists.xen.org Cc: Stefano Stabellini; Ian Campbell; Tim Deegan Subject: Re: [Xen-devel] [v4 15/17] arm: add a dummy arch hooks for scheduler Hi, On 23/07/15 12:35, Feng Wu wrote: Add a dummy arch hooks for scheduler to make the build successful in ARM. I would make more sense if you introduce the dummy hooks in the patch where you effectively use it (i.e #16). Actually you also introduce arch_vcpu_wake for x86 there too. I split it just because this is the only changes to ARM. I will make them one patch any way. Thanks, Feng CC: Ian Campbell ian.campb...@citrix.com CC: Stefano Stabellini stefano.stabell...@citrix.com CC: Tim Deegan t...@xen.org Signed-off-by: Feng Wu feng...@intel.com --- v4: - Newly added xen/include/asm-arm/domain.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index f1a087e..7f25fd5 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -265,6 +265,8 @@ static inline unsigned int domain_max_vcpus(const struct domain *d) return MAX_VIRT_CPUS; } +static void arch_vcpu_wake(struct vcpu *v) {} + Please use static inline here. #endif /* __ASM_DOMAIN_H__ */ /* Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Patch V4 1/3] usb: Add Xen pvUSB protocol description
On Thu, Jul 23, 2015 at 12:08:01PM -0700, Greg KH wrote: Somewhere that people can refer to that describes this public-facing API that must not ever be broken or changed. If you want to put it in a documentation file, or a .h file, I don't care. It is used e.g. in SUSE's xen kernel since 2.6.18. I am very aware of the amount of Xen crap in SuSE's kernel, don't use that as an excuse for me to merge it to mainline :) :-) Wasn't meant as an excuse, just a hint why the interface can't be the same as for usbip. We have to ensure compatibility with those kernels This shouldn't be a kernel/kernel compability issue, as the api talks between Xen and the OS, not between different OSs, right? and possibly other operating systems (BSD?, Windows?) which already might be using pvUSB with a Dom0 based on the SUSE xen kernel. Are there other operating system drivers today that use this API? Is this an API in the Xen core today that we have to support? Some more background / descriptions would be nice to have. For example Xen GPLPV drivers for Windows do have PVUSB frontend driver.. -- Pasi thanks, greg k-h ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4 04/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
On 23/07/15 12:35, Feng Wu wrote: VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt. With VT-d Posted-Interrupts enabled, external interrupts from direct-assigned devices can be delivered to guests without VMM intervention when guest is running in non-root mode. This patch adds variable 'iommu_intpost' to control whether enable VT-d posted-interrupt or not in the generic IOMMU code. CC: Jan Beulich jbeul...@suse.com CC: Kevin Tian kevin.t...@intel.com Signed-off-by: Feng Wu feng...@intel.com Reviewed-by: Kevin Tian kevin.t...@intel.com Please patch docs/misc/xen-command-line.markdown as you are --- v4: - No changes v3: - Remove pointless initializer for 'iommu_intpost'. - Some adjustment for if no intremap then no intpost logic. * For parse_iommu_param(), move it to the end of the function, so we don't need to add the some logic when introduing the new kernel parameter 'intpost' in later patch. * Add this logic in iommu_setup() after iommu_hardware_setup() is called. xen/drivers/passthrough/iommu.c | 17 - xen/include/xen/iommu.h | 2 +- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c index 06cb38f..536b69f 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -39,6 +39,7 @@ static void iommu_dump_p2m_table(unsigned char key); * no-snoop Disable VT-d Snoop Control * no-qinval Disable VT-d Queued Invalidation * no-intremapDisable VT-d Interrupt Remapping + * no-intpost Disable VT-d Interrupt posting */ custom_param(iommu, parse_iommu_param); bool_t __initdata iommu_enable = 1; @@ -51,6 +52,14 @@ bool_t __read_mostly iommu_passthrough; bool_t __read_mostly iommu_snoop = 1; bool_t __read_mostly iommu_qinval = 1; bool_t __read_mostly iommu_intremap = 1; + +/* + * In the current implementation of VT-d posted interrupts, in some extreme + * cases, the per cpu list which saves the blocked vCPU will be very long, + * and this will affect the interrupt latency, so let this feature off by + * default until we find a good solution to resolve it. + */ +bool_t __read_mostly iommu_intpost; bool_t __read_mostly iommu_hap_pt_share = 1; bool_t __read_mostly iommu_debug; bool_t __read_mostly amd_iommu_perdev_intremap = 1; @@ -112,6 +121,9 @@ static void __init parse_iommu_param(char *s) s = ss + 1; } while ( ss ); + +if ( !iommu_intremap ) +iommu_intpost = 0; This hunk is still wrong, as I said in v3. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] oxenstored: link in the systemd system library
On Thu, 2015-07-23 at 09:03 -0500, Jonathan Creekmore wrote: On Jul 23, 2015, at 9:00 AM, Ian Campbell ian.campb...@citrix.com wrote: On Thu, 2015-07-23 at 08:40 -0500, Jonathan Creekmore wrote: If systemd is configured for use AND you are building oxenstored, the C systemd library must be linked in to the systemd.cxma library. Signed-off-by: Jonathan Creekmore jonathan.creekm...@gmail.com Acked-by: Ian Campbell ian.campb...@citrix.com OOI what was the failure mode for you without this? Runtime rather than build time I suspect? No, it was build-time. It failed to link the oxenstored binary due to missing symbols. Interesting, that (surprisingly) works for me. I suspect differing toolchain versions then! Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] xenbackendd: remove xenbackendd
The functionality provided by xenbackendd has been integrated into the xl toolstack under the devd command on all platforms, because of that it no longer makes sense to maintain xenbackendd. Init scripts have also been provided for all platforms in order to launch xl devd in the background, mimicking the functionality provided by xenbackendd. Furthermore, xenbackendd is not started by default on NetBSD since a couple of releases ago (4.2 was the first release to disable the automatic startup of xenbackendd in xencommons). Signed-off-by: Roger Pau Monné roger@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com Cc: George Dunlap george.dun...@eu.citrix.com Cc: port-...@netbsd.org --- tools/Makefile | 1 - tools/hotplug/Linux/init.d/sysconfig.xencommons.in | 6 - tools/hotplug/NetBSD/block | 2 +- tools/hotplug/NetBSD/rc.d/xencommons.in| 2 - tools/hotplug/NetBSD/vif-bridge| 2 +- tools/hotplug/NetBSD/vif-ip| 2 +- tools/xenbackendd/Makefile | 41 --- tools/xenbackendd/xenbackendd.c| 325 - 8 files changed, 3 insertions(+), 378 deletions(-) delete mode 100644 tools/xenbackendd/Makefile delete mode 100644 tools/xenbackendd/xenbackendd.c diff --git a/tools/Makefile b/tools/Makefile index 45cb4b2..633fb38 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -17,7 +17,6 @@ SUBDIRS-y += xenmon SUBDIRS-y += xenstat SUBDIRS-$(CONFIG_Linux) += memshr SUBDIRS-$(CONFIG_BLKTAP2) += blktap2 -SUBDIRS-$(CONFIG_NetBSD) += xenbackendd SUBDIRS-y += libfsimage SUBDIRS-$(CONFIG_Linux) += libvchan diff --git a/tools/hotplug/Linux/init.d/sysconfig.xencommons.in b/tools/hotplug/Linux/init.d/sysconfig.xencommons.in index f0fa98d..f6f35cd 100644 --- a/tools/hotplug/Linux/init.d/sysconfig.xencommons.in +++ b/tools/hotplug/Linux/init.d/sysconfig.xencommons.in @@ -40,11 +40,5 @@ XENSTORED_ARGS= # Running xenstored on XENSTORED_ROOTDIR #XENSTORED_ROOTDIR=@XEN_LIB_STORED@ -## Type: string -## Default: Not defined, xenbackendd debug mode off -# -# Running xenbackendd in debug mode -#XENBACKENDD_DEBUG=[yes|on|1] - # qemu path #QEMU_XEN=@LIBEXEC_BIN@/qemu-system-i386 diff --git a/tools/hotplug/NetBSD/block b/tools/hotplug/NetBSD/block index 32c20b6..fc8f668 100644 --- a/tools/hotplug/NetBSD/block +++ b/tools/hotplug/NetBSD/block @@ -1,7 +1,7 @@ #!/bin/sh -e # $NetBSD: block-nbsd,v 1.1.1.1 2008/08/07 20:26:57 cegger Exp $ -# Called by xenbackendd +# # Usage: block xsdir_backend_path state DIR=$(dirname $0) diff --git a/tools/hotplug/NetBSD/rc.d/xencommons.in b/tools/hotplug/NetBSD/rc.d/xencommons.in index d7552cd..9f45cfd 100644 --- a/tools/hotplug/NetBSD/rc.d/xencommons.in +++ b/tools/hotplug/NetBSD/rc.d/xencommons.in @@ -22,8 +22,6 @@ required_files=/kern/xen/privcmd XENSTORED_PIDFILE=/var/run/xenstored.pid XENCONSOLED_PIDFILE=/var/run/xenconsoled.pid -XENBACKENDD_PIDFILE=/var/run/xenbackendd.pid -#XENBACKENDD_DEBUG=1 #XENCONSOLED_TRACE=/var/log/xen/xenconsole-trace.log #XENSTORED_TRACE=/var/log/xen/xenstore-trace.log diff --git a/tools/hotplug/NetBSD/vif-bridge b/tools/hotplug/NetBSD/vif-bridge index b58e922..f12edf0 100644 --- a/tools/hotplug/NetBSD/vif-bridge +++ b/tools/hotplug/NetBSD/vif-bridge @@ -1,7 +1,7 @@ #!/bin/sh -e # $NetBSD: vif-bridge-nbsd,v 1.1.1.1 2008/08/07 20:26:57 cegger Exp $ -# Called by xenbackendd +# # Usage: vif-bridge xsdir_backend_path state DIR=$(dirname $0) diff --git a/tools/hotplug/NetBSD/vif-ip b/tools/hotplug/NetBSD/vif-ip index 83cbfe2..44200e3 100644 --- a/tools/hotplug/NetBSD/vif-ip +++ b/tools/hotplug/NetBSD/vif-ip @@ -1,7 +1,7 @@ #!/bin/sh -e # $NetBSD: vif-ip-nbsd,v 1.1.1.1 2008/08/07 20:26:57 cegger Exp $ -# Called by xenbackendd +# # Usage: vif-ip xsdir_backend_path state DIR=$(dirname $0) diff --git a/tools/xenbackendd/Makefile b/tools/xenbackendd/Makefile deleted file mode 100644 index f52be74..000 --- a/tools/xenbackendd/Makefile +++ /dev/null @@ -1,41 +0,0 @@ -# Copyright (c) 2009 Advanced Micro Devices, Inc. -# -# This program is free software; you can redistribute it and/or modify -# it under the terms of the GNU General Public License as published by -# the Free Software Foundation; under version 2 of the License. -# -# This program is distributed in the hope that it will be useful, -# but WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -# GNU General Public License for more details. - -XEN_ROOT=$(CURDIR)/../.. -include $(XEN_ROOT)/tools/Rules.mk - -CFLAGS += -Werror -CFLAGS += $(CFLAGS_libxenstore) -CPPFLAGS += -DXEN_SCRIPT_DIR=\$(XEN_SCRIPT_DIR)\ -LDLIBS += $(LDLIBS_libxenstore) - -.PHONY: all -all: build - -.PHONY: build -build:
Re: [Xen-devel] Stable backport request for new OVMF version
Forgot to CC the list, now done... On Thu, 2015-07-23 at 14:56 +0100, Ian Campbell wrote: The version of OVMF in 4.5 (and presumably earlier) doesn't build with the gcc in modern Linux distros (I noticed it with 4.9 from Debian/Jessie). I think we should update Config.mk:OVMF_UPSTREAM_REVISION to cb9a7ebabcd6b8a49dc0854b2f9592d732b5afbd in all stable branches. That version is the one currently in master. One aspect of the specific failure I've tripped over (Debian Jessie) is that it will be a blocker for updating osstest to using the new version of Debian on the test hosts, which in turn is a prerequisite for arm64 support. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 06/15] VMX/altp2m: add code to support EPTP switching and #VE.
On 23.07.15 at 16:40, ravi.sah...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, July 23, 2015 2:43 AM On 23.07.15 at 01:01, edmund.h.wh...@intel.com wrote: +static void vmx_vcpu_update_vmfunc_ve(struct vcpu *v) { +struct domain *d = v-domain; +u32 mask = SECONDARY_EXEC_ENABLE_VM_FUNCTIONS; + +if ( !cpu_has_vmx_vmfunc ) +return; + +if ( cpu_has_vmx_virt_exceptions ) +mask |= SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS; + +vmx_vmcs_enter(v); + +if ( !d-is_dying altp2m_active(d) ) +{ +v-arch.hvm_vmx.secondary_exec_control |= mask; +__vmwrite(VM_FUNCTION_CONTROL, VMX_VMFUNC_EPTP_SWITCHING); +__vmwrite(EPTP_LIST_ADDR, + virt_to_maddr(d-arch.altp2m_eptp)); + +if ( cpu_has_vmx_virt_exceptions ) +{ +p2m_type_t t; +mfn_t mfn; + +mfn = get_gfn_query_unlocked(d, + gfn_x(vcpu_altp2m(v).veinfo_gfn), t); + +if ( mfn_x(mfn) != INVALID_MFN ) +__vmwrite(VIRT_EXCEPTION_INFO, mfn_x(mfn) + PAGE_SHIFT); Considering that the VMCS field holds a byte-aligned address, why do you have the (later introduced) hvmop specify a GFN instead of a GPA? The SDM states: If the EPT-violation #VE VM-execution control is 1, the virtualization-exception information address must satisfy the following checks: - Bits 11:0 of the address must be 0. - The address must not set any bits beyond the processor's physical-address width. Ah, interesting. Knowing what to search for, I was able to find this. But to be honest, it should also be stated in the section talking about #VE, not just in the one talking about checks being done. With that, just like for the other patch my ack depends on the promise to deal with all the remaining issues. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 05/15] x86/altp2m: basic data structures and support routines.
From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, July 23, 2015 7:54 AM On 23.07.15 at 16:36, ravi.sah...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, July 23, 2015 2:22 AM On 23.07.15 at 01:01, edmund.h.wh...@intel.com wrote: +out: Missed sorry. Just to repeat - labels should be indented by at least one space. But you realize this isn't the only case, and it looked like you adjusted only exactly the one place where the comment was made? I agree these kinds of things are hard to catch - we did go in and scan our v7 for comment styles, indentation, case indentation, spaces in if conditional expressions etc, But of course we missed some and this could be addressed - Andrew was also suggesting some sort of tool to do this would be good to use. thanks Ravi Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Stable backport request for new OVMF version
Il 23/07/2015 16:51, Jan Beulich ha scritto: On 23.07.15 at 16:43, ian.campb...@citrix.com wrote: On Thu, 2015-07-23 at 08:26 -0600, Jan Beulich wrote: On 23.07.15 at 15:56, ian.campb...@citrix.com wrote: The version of OVMF in 4.5 (and presumably earlier) doesn't build with the gcc in modern Linux distros (I noticed it with 4.9 from Debian/Jessie). I think we should update Config.mk:OVMF_UPSTREAM_REVISION to cb9a7ebabcd6b8a49dc0854b2f9592d732b5afbd in all stable branches. That version is the one currently in master. To be honest I didn't even know we support use of OVMF in any released version (nor in unstable). When did this change? It was present in 4.3. I don't think we build it by default since its toolchain is a bit picky, but we turn it on and test it in osstest from 4.4 onwards. You can ask for it with bios=ovmf in your hvm guest's config. Well, that means it can be used by the adventurous, but it doesn't mean we consider it supported. Jan Even if used only by the adventurous left it unable to build with newer gcc don't seems very good, even if the adventurous should be able to change Config.mk to use update ovmf git that don't fail the build. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [MINUTES] Monthly Xen.org Technical Call (2015-07-22)
On 23/07/15 10:08, Ian Campbell wrote: * For Linux: * The Xen entrypoint shall point to a stub in the same vein of the UEFI stub. * The stub will set up a basic initial set of page tables, fills in bootinfo and then jump to the native 32- or 64-bit entry point as appropriate. * The stub can/should live in linux.git (presumably arch/x86/xen) but should be self -contained and isolated from the main body of Linux code. It does setup to impedance match the Xen entry point to the Linux native entry point. * Other things (e.g. lack of ACPI) should be addressed by fixing the native Linux entry path to be able to cope without them. Likewise installing PV hooks may need hypervisor detection to be moved earlier in the native boot path. Agreed. The goal here is to use as much of the native code paths as possible to reduce the chance of Xen support being broken by x86 changes. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 2/4] x86/compat: Test both PV and PVH guests for compat mode
On 23.07.15 at 16:13, ian.campb...@citrix.com wrote: On Thu, 2015-07-23 at 08:07 -0600, Jan Beulich wrote: what we called no-pm on yesterday's call? FYI it was no-dm (no device model). Of course it was - fingers typing disconnected from brain, I guess. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Stable backport request for new OVMF version
On 23.07.15 at 15:56, ian.campb...@citrix.com wrote: The version of OVMF in 4.5 (and presumably earlier) doesn't build with the gcc in modern Linux distros (I noticed it with 4.9 from Debian/Jessie). I think we should update Config.mk:OVMF_UPSTREAM_REVISION to cb9a7ebabcd6b8a49dc0854b2f9592d732b5afbd in all stable branches. That version is the one currently in master. To be honest I didn't even know we support use of OVMF in any released version (nor in unstable). When did this change? Jan One aspect of the specific failure I've tripped over (Debian Jessie) is that it will be a blocker for updating osstest to using the new version of Debian on the test hosts, which in turn is a prerequisite for arm64 support. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 11/15] x86/altp2m: define and implement alternate p2m HVMOP types.
From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, July 23, 2015 3:22 AM On 23.07.15 at 01:01, edmund.h.wh...@intel.com wrote: Signed-off-by: Ed White edmund.h.wh...@intel.com Acked-by: Jan Beulich jbeul...@suse.com And I have to withdraw this ack pending clarification of (and perhaps adjustment to) the #VE info address interface. Could we have the ack back :-) I clarified the #VE info address interface in the other email - repeating here: If the EPT-violation #VE VM-execution control is 1, the virtualization-exception information address must satisfy the following checks: - Bits 11:0 of the address must be 0. - The address must not set any bits beyond the processor's physical-address width. --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -6138,6 +6138,140 @@ static int hvmop_get_param( return rc; } +static int do_altp2m_op( +XEN_GUEST_HANDLE_PARAM(void) arg) { +struct xen_hvm_altp2m_op a; +struct domain *d = NULL; +int rc = 0; + +if ( !hvm_altp2m_supported() ) +return -EOPNOTSUPP; + +if ( copy_from_guest(a, arg, 1) ) +return -EFAULT; + +if ( a.pad1 || a.pad2 || + (a.version != HVMOP_ALTP2M_INTERFACE_VERSION) || + (a.cmd HVMOP_altp2m_get_domain_state) || + (a.cmd HVMOP_altp2m_change_gfn) ) I'm afraid such a change invalidates any earlier ack, even if ti is correct. Instead of this, why don't you start numbering of the sub-ops at zero? Or if you really have a reason to start at 1, why not simply check a.cmd against zero (without using any particular sub-op value)? And then it escapes me why this can't be handled in a default case in the switch statement below anyway. Hmm - is that a requirement per se? we are consistently checking per the sub-op definition we have. Would like this to be considered as is. As I said in the cover letter we have constraints on how much more we can do this week now - so requesting the maintainers to accept v7 with the review comments you have on those recorded as pending to be addressed by us. Thanks much! Ravi Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Help
Hello, Issue in tapdisk-vbd.c Why td_queue_write(parent,treq); is called in static void __tapdisk_vbd_reissue_td_request(td_vbd_t *vbd,td_image_t *image, td_request_t treq) function as we can't write in parent vhd because it is read only. I have attached the code of tapdisk-vbd.c and block-vhd.c. And please tell me about how to write into child vhd after reading from parent vhd in block-vhd.c in case of VHD_BM_BIT_CLEAR in function vhd_queue_read(). Thanks. /* * Copyright (C) Citrix Systems Inc. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; version 2.1 only * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ #ifdef HAVE_CONFIG_H #include config.h #endif #include stdio.h #include errno.h #include fcntl.h #include regex.h #include unistd.h #include stdlib.h #include libgen.h #include sys/mman.h #include sys/ioctl.h #include sys/stat.h #include sys/types.h #include debug.h #include libvhd.h #include tapdisk-blktap.h #include tapdisk-image.h #include tapdisk-driver.h #include tapdisk-server.h #include tapdisk-vbd.h #include tapdisk-disktype.h #include tapdisk-interface.h #include tapdisk-stats.h #include tapdisk-storage.h #include tapdisk-nbdserver.h #include td-stats.h #include tapdisk-utils.h #include md5.h #define DBG(_level, _f, _a...) tlog_write(_level, _f, ##_a) #define ERR(_err, _f, _a...) tlog_error(_err, _f, ##_a) #define INFO(_f, _a...)tlog_syslog(TLOG_INFO, vbd: _f, ##_a) #define ERROR(_f, _a...) tlog_syslog(TLOG_WARN, vbd: _f, ##_a) #define TD_VBD_EIO_RETRIES 10 #define TD_VBD_EIO_SLEEP1 #define TD_VBD_WATCHDOG_TIMEOUT 10 static void tapdisk_vbd_complete_vbd_request(td_vbd_t *, td_vbd_request_t *); static int tapdisk_vbd_queue_ready(td_vbd_t *); static void tapdisk_vbd_check_queue_state(td_vbd_t *); /* * initialization */ static void tapdisk_vbd_mark_progress(td_vbd_t *vbd) { gettimeofday(vbd-ts, NULL); } td_vbd_t* tapdisk_vbd_create(uint16_t uuid) { td_vbd_t *vbd; vbd = calloc(1, sizeof(td_vbd_t)); if (!vbd) { EPRINTF(failed to allocate tapdisk state\n); return NULL; } shm_init(vbd-rrd.shm); vbd-uuid= uuid; vbd-req_timeout = TD_VBD_REQUEST_TIMEOUT; INIT_LIST_HEAD(vbd-images); INIT_LIST_HEAD(vbd-new_requests); INIT_LIST_HEAD(vbd-pending_requests); INIT_LIST_HEAD(vbd-failed_requests); INIT_LIST_HEAD(vbd-completed_requests); INIT_LIST_HEAD(vbd-next); INIT_LIST_HEAD(vbd-rings); INIT_LIST_HEAD(vbd-dead_rings); tapdisk_vbd_mark_progress(vbd); return vbd; } int tapdisk_vbd_initialize(int rfd, int wfd, uint16_t uuid) { td_vbd_t *vbd; vbd = tapdisk_server_get_vbd(uuid); if (vbd) { EPRINTF(duplicate vbds! %u\n, uuid); return -EEXIST; } vbd = tapdisk_vbd_create(uuid); tapdisk_server_add_vbd(vbd); return 0; } static inline void tapdisk_vbd_add_image(td_vbd_t *vbd, td_image_t *image) { list_add_tail(image-next, vbd-images); } static inline int tapdisk_vbd_is_last_image(td_vbd_t *vbd, td_image_t *image) { return list_is_last(image-next, vbd-images); } static inline td_image_t * tapdisk_vbd_first_image(td_vbd_t *vbd) { td_image_t *image = NULL; if (!list_empty(vbd-images)) image = list_entry(vbd-images.next, td_image_t, next); return image; } static inline td_image_t * tapdisk_vbd_last_image(td_vbd_t *vbd) { td_image_t *image = NULL; if (!list_empty(vbd-images)) image = list_entry(vbd-images.prev, td_image_t, next); return image; } static inline td_image_t * tapdisk_vbd_next_image(td_image_t *image) { return list_entry(image-next.next, td_image_t, next); } static int tapdisk_vbd_validate_chain(td_vbd_t *vbd) { return tapdisk_image_validate_chain(vbd-images); } static int vbd_stats_destroy(td_vbd_t *vbd) { int err = 0; ASSERT(vbd); err = shm_destroy(vbd-rrd.shm); if (unlikely(err)) { EPRINTF(failed to destroy RRD file: %s\n, strerror(err)); goto out; } free(vbd-rrd.shm.path); vbd-rrd.shm.path = NULL; out: return -err; } static int vbd_stats_create(td_vbd_t *vbd) { int err; ASSERT(vbd); err = mkdir(/dev/shm/metrics, S_IRUSR | S_IWUSR); if (likely(err)) { err = errno; if (unlikely(err != EEXIST)) goto out; else err = 0; } /* * FIXME Rename this to something like vbd3-domid-devid. Consider * consolidating this with the io_ring shared memory file. Check if blkback * exports the same information
Re: [Xen-devel] [v4 11/17] vt-d: Add API to update IRTE when VT-d PI is used
On 23/07/15 12:35, Feng Wu wrote: +GET_IREMAP_ENTRY(ir_ctrl-iremap_maddr, remap_index, iremap_entries, p); + +old_ire = new_ire = *p; + +/* Setup/Update interrupt remapping table entry. */ +setup_posted_irte(new_ire, pi_desc, gvec); +ret = cmpxchg16b(p, old_ire, new_ire); + +ASSERT(ret == *(__uint128_t *)old_ire); This cannot be correct. Either the cmpxchg() is required and you must cope with it failing, or the cmpxchg() is not required and this should be a plain write. It looks as if the cmpxchg() is required. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] tools/libxl: Fixes to stream v2 task joining logic
Andrew Cooper writes (Re: [PATCH] tools/libxl: Fixes to stream v2 task joining logic): Migration v2 only ever has 1024 guest pages ever mapped at once (see xc_sr_save.c:write_batch() ), and doesn't leak the mapping, which means that the kernel decided that it was out of memory. Maybe it was being asked for too much. I don't think this can really be an OOM condition in the debianhvm test. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 2/4] x86/compat: Test both PV and PVH guests for compat mode
On Thu, 2015-07-23 at 08:07 -0600, Jan Beulich wrote: what we called no-pm on yesterday's call? FYI it was no-dm (no device model). Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.6 02/13] libxl: properly clean up array in libxl_list_cpupool failure path
On Thu, 2015-07-23 at 10:22 +0100, Ian Campbell wrote: On Thu, 2015-07-23 at 08:59 +0100, Wei Liu wrote: There is change of behaviour. Previously if memory allocation fails the said function returns NULL. Now memory allocation failure is fatal. This is in line with how we deal with memory allocation failure in other places in libxl though. I think this function would benefit from making the out: label be the error path and the success case just a return ptr (just before the label). Then your error handling for cpupool_info would become libxl_cpupoolinfo_dispose(info); if (errno != ENOENT) goto out; break; and the if (failed) block would be at the out label (without the iff). I was about to give my Reviewed-by, but yes, I like Ian's suggestion too. In any case, apart from how error handling is done, I think this patch is fine, as far as dealing with cpupools goes. Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 06/15] VMX/altp2m: add code to support EPTP switching and #VE.
From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, July 23, 2015 2:43 AM On 23.07.15 at 01:01, edmund.h.wh...@intel.com wrote: @@ -1770,6 +1771,105 @@ static bool_t vmx_is_singlestep_supported(void) return cpu_has_monitor_trap_flag; } +static void vmx_vcpu_update_eptp(struct vcpu *v) { +struct domain *d = v-domain; +struct p2m_domain *p2m = NULL; +struct ept_data *ept; + +if ( altp2m_active(d) ) +p2m = p2m_get_altp2m(v); +if ( !p2m ) +p2m = p2m_get_hostp2m(d); + +ept = p2m-ept; +ept-asr = pagetable_get_pfn(p2m_get_pagetable(p2m)); + +vmx_vmcs_enter(v); + +__vmwrite(EPT_POINTER, ept_get_eptp(ept)); + +if ( v-arch.hvm_vmx.secondary_exec_control +SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS ) Indentation. Ok. +static void vmx_vcpu_update_vmfunc_ve(struct vcpu *v) { +struct domain *d = v-domain; +u32 mask = SECONDARY_EXEC_ENABLE_VM_FUNCTIONS; + +if ( !cpu_has_vmx_vmfunc ) +return; + +if ( cpu_has_vmx_virt_exceptions ) +mask |= SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS; + +vmx_vmcs_enter(v); + +if ( !d-is_dying altp2m_active(d) ) +{ +v-arch.hvm_vmx.secondary_exec_control |= mask; +__vmwrite(VM_FUNCTION_CONTROL, VMX_VMFUNC_EPTP_SWITCHING); +__vmwrite(EPTP_LIST_ADDR, + virt_to_maddr(d-arch.altp2m_eptp)); + +if ( cpu_has_vmx_virt_exceptions ) +{ +p2m_type_t t; +mfn_t mfn; + +mfn = get_gfn_query_unlocked(d, + gfn_x(vcpu_altp2m(v).veinfo_gfn), t); + +if ( mfn_x(mfn) != INVALID_MFN ) +__vmwrite(VIRT_EXCEPTION_INFO, mfn_x(mfn) + PAGE_SHIFT); Considering that the VMCS field holds a byte-aligned address, why do you have the (later introduced) hvmop specify a GFN instead of a GPA? The SDM states: If the EPT-violation #VE VM-execution control is 1, the virtualization-exception information address must satisfy the following checks: - Bits 11:0 of the address must be 0. - The address must not set any bits beyond the processor's physical-address width. Also you shouldn't be open coding pfn_to_paddr(). ok thanks, Ravi +else +v-arch.hvm_vmx.secondary_exec_control = +~SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS; +} +} +else +v-arch.hvm_vmx.secondary_exec_control = ~mask; + +__vmwrite(SECONDARY_VM_EXEC_CONTROL, +v-arch.hvm_vmx.secondary_exec_control); Indentation again. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 1/2] xen: sched: reorganize cpu_disable_scheduler()
The function is called both when we want to remove a cpu from a cpupool, and during cpu teardown, for suspend or shutdown. If, however, the boot cpu (cpu 0, most of the times) is not present in the default cpupool, during suspend or shutdown, Xen crashes like this: root@Zhaman:~# xl cpupool-cpu-remove Pool-0 0 root@Zhaman:~# shutdown -h now (XEN) [ Xen-4.6-unstable x86_64 debug=y Tainted:C ] ... (XEN) Xen call trace: (XEN)[82d0801238de] _csched_cpu_pick+0x156/0x61f (XEN)[82d080123db5] csched_cpu_pick+0xe/0x10 (XEN)[82d08012de3c] vcpu_migrate+0x18e/0x321 (XEN)[82d08012e4f8] cpu_disable_scheduler+0x1cf/0x2ac (XEN)[82d08018bb8d] __cpu_disable+0x313/0x36e (XEN)[82d080101424] take_cpu_down+0x34/0x3b (XEN)[82d08013097a] stopmachine_action+0x70/0x99 (XEN)[82d0801325f0] do_tasklet_work+0x78/0xab (XEN)[82d080132926] do_tasklet+0x5e/0x8a (XEN)[82d08016478c] idle_loop+0x56/0x6b (XEN) (XEN) (XEN) (XEN) Panic on CPU 15: (XEN) Assertion 'cpu nr_cpu_ids' failed at ...URCES/xen/xen/xen.git/xen/include/xen/cpumask.h:97 (XEN) There also are problems when we try to suspend or shutdown with a cpupool configured with just one cpu (no matter, in this case, whether that is the boot cpu or not): root@Zhaman:~# xl create /etc/xen/test.cfg root@Zhaman:~# xl cpupool-migrate test Pool-1 root@Zhaman:~# xl cpupool-list -c Name CPU list Pool-0 0,1,2,3,4,5,6,7,8,9,10,11,13,14,15 Pool-1 12 root@Zhaman:~# shutdown -h now (XEN) [ Xen-4.6-unstable x86_64 debug=y Tainted:C ] (XEN) CPU:12 ... (XEN) Xen call trace: (XEN)[82d08018bb91] __cpu_disable+0x317/0x36e (XEN)[82d080101424] take_cpu_down+0x34/0x3b (XEN)[82d08013097a] stopmachine_action+0x70/0x99 (XEN)[82d0801325f0] do_tasklet_work+0x78/0xab (XEN)[82d080132926] do_tasklet+0x5e/0x8a (XEN)[82d08016478c] idle_loop+0x56/0x6b (XEN) (XEN) (XEN) (XEN) Panic on CPU 12: (XEN) Xen BUG at smpboot.c:895 (XEN) In both cases, the problem is the scheduler not being able to: - move all the vcpus to the boot cpu (as the boot cpu is not in the cpupool), in the former; - move the vcpus away from a cpu at all (as that is the only one cpu in the cpupool), in the latter. Solution is to distinguish, inside cpu_disable_scheduler(), the two cases of cpupool manipulation and teardown. For cpupool manipulation, it is correct to ask the scheduler to take an action, as pathological situation (like there not being any cpu in the pool where to send vcpus) are taken care of (i.e., forbidden!) already. For suspend and shutdown, we don't want the scheduler to be involved at all, as the final goal is pretty simple: send all the vcpus to the boot cpu ASAP, so we just go for it. Signed-off-by: Dario Faggioli dario.faggi...@citrix.com Acked-by: George Dunlap george.dun...@eu.citrix.com --- Changes from v3: * fix typos and language in comments, as requested during review Changes from v2: * add a missing spin_unlock, most likely eaten by a forgotten `stg refresh' (sorry!) * fix a typo Changes from v1: * BUG_ON() if, in the suspend/shutdown case, the mask of online pCPUs will ever get empty, as suggested during review; * reorganize and improve comments inside cpu_disable_scheduler() as suggested during review; * make it more clear that vcpu_move_nosched() (name changed, as suggested during review), should only be called from quite contextes, such us, during suspend or shutdown. Do that via both comments and asserts, as requested during review; * reorganize cpu_disable_scheduler() and vcpu_move_nosched() so that calling to sleep and wakeup functions are only called when necessary (i.e., *not* in case we are suspending/shutting down, as requested during review. --- Cc: Juergen Gross jgr...@suse.com --- xen/common/schedule.c | 104 + 1 file changed, 88 insertions(+), 16 deletions(-) diff --git a/xen/common/schedule.c b/xen/common/schedule.c index df8c1d0..df64268 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -454,9 +454,10 @@ void vcpu_unblock(struct vcpu *v) * Do the actual movement of a vcpu from old to new CPU. Locks for *both* * CPUs needs to have been taken already when calling this! */ -static void vcpu_move(struct vcpu *v, unsigned int old_cpu, - unsigned int new_cpu) +static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu) { +unsigned int old_cpu = v-processor; + /* * Transfer urgency status to new CPU before switching CPUs, as * once the switch occurs, v-is_urgent is no longer protected by
[Xen-devel] [PATCH v4 0/2] xen: sched/cpupool: more fixing of (corner?) cases
Take 4, with only typos and language fixes in comments, in patch 1. This now have all the Ack-s it requires, I think. Regards, Dario --- Dario Faggioli (2): xen: sched: reorganize cpu_disable_scheduler() xen: sched/cpupool: properly update affinity when removing a cpu from a cpupool xen/common/cpupool.c | 18 xen/common/schedule.c | 111 + 2 files changed, 112 insertions(+), 17 deletions(-) -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 2/2] xen: sched/cpupool: properly update affinity when removing a cpu from a cpupool
And this time, do it right. In fact, a similar change was attempted in 93be8285a79c6 (cpupools: update domU's node-affinity on the cpupool_unassign_cpu() path). But that was buggy, and got reverted with 8395b67ab0b8a86. However, even though reverting was the right thing to do, it remains true that: - calling the function is better done in the cpupool cpu removal code, even if just for simmetry with the cpupool cpu adding path; - it is not necessary to call it during cpu teardown (for suspend or shutdown) code as we either are going down and will never come up (shutdown) or, when coming up, we want everything to be as before the tearing down process started, and so we would just undo any update made during the process. - calling it from the teardown path is not only unnecessary, but it can trigger an ASSERT(), in case we get, during the process, to remove the last online pcpu of a domain's node affinity: (XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:466 (XEN) [ Xen-4.6-unstable x86_64 debug=y Tainted:C ] ... ... ... (XEN) Xen call trace: (XEN)[82d0801055b9] domain_update_node_affinity+0x113/0x240 (XEN)[82d08012e676] cpu_disable_scheduler+0x334/0x3f2 (XEN)[82d08018bb8d] __cpu_disable+0x313/0x36e (XEN)[82d080101424] take_cpu_down+0x34/0x3b (XEN)[82d080130ad9] stopmachine_action+0x70/0x99 (XEN)[82d08013274f] do_tasklet_work+0x78/0xab (XEN)[82d080132a85] do_tasklet+0x5e/0x8a (XEN)[82d08016478c] idle_loop+0x56/0x6b (XEN) (XEN) (XEN) (XEN) Panic on CPU 12: (XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:466 (XEN) Therefore, for all these reasons, move the call from cpu_disable_schedule() to cpupool_unassign_cpu_helper(). While there, add some sanity checking (in the latter function), and make sure that scanning the domain list is done with domlist_read_lock held, at least when the system is 'live'. I re-tested the scenario described in here: http://permalink.gmane.org/gmane.comp.emulators.xen.devel/235310 which is what led to the revert of 93be8285a79c6, and that is working ok after this commit. Signed-off-by: dario.faggi...@citrix.com Acked-by: George Dunlap george.dun...@eu.citrix.com Acked-by: Juergen Gross jgr...@suse.com --- xen/common/cpupool.c | 18 ++ xen/common/schedule.c |7 ++- 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index b48ae17..69b984c 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -297,12 +297,25 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu) static long cpupool_unassign_cpu_helper(void *info) { int cpu = cpupool_moving_cpu; +struct cpupool *c = info; +struct domain *d; long ret; cpupool_dprintk(cpupool_unassign_cpu(pool=%d,cpu=%d)\n, cpupool_cpu_moving-cpupool_id, cpu); spin_lock(cpupool_lock); +if ( c != cpupool_cpu_moving ) +{ +ret = -EBUSY; +goto out; +} + +/* + * We need this for scanning the domain list, both in + * cpu_disable_scheduler(), and at the bottom of this function. + */ +rcu_read_lock(domlist_read_lock); ret = cpu_disable_scheduler(cpu); cpumask_set_cpu(cpu, cpupool_free_cpus); if ( !ret ) @@ -319,6 +332,11 @@ static long cpupool_unassign_cpu_helper(void *info) cpupool_cpu_moving = NULL; } +for_each_domain_in_cpupool(d, c) +{ +domain_update_node_affinity(d); +} +rcu_read_unlock(domlist_read_lock); out: spin_unlock(cpupool_lock); cpupool_dprintk(cpupool_unassign_cpu ret=%ld\n, ret); diff --git a/xen/common/schedule.c b/xen/common/schedule.c index df64268..3eefed7 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -650,6 +650,12 @@ int cpu_disable_scheduler(unsigned int cpu) if ( c == NULL ) return ret; +/* + * We'd need the domain RCU lock, but: + * - when we are called from cpupool code, it's acquired there already; + * - when we are called for CPU teardown, we're in stop-machine context, + *so that's not be a problem. + */ for_each_domain_in_cpupool ( d, c ) { for_each_vcpu ( d, v ) @@ -735,7 +741,6 @@ int cpu_disable_scheduler(unsigned int cpu) ret = -EAGAIN; } } -domain_update_node_affinity(d); } return ret; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Stable backport request for new OVMF version
On Thu, 2015-07-23 at 08:51 -0600, Jan Beulich wrote: On 23.07.15 at 16:43, ian.campb...@citrix.com wrote: On Thu, 2015-07-23 at 08:26 -0600, Jan Beulich wrote: On 23.07.15 at 15:56, ian.campb...@citrix.com wrote: The version of OVMF in 4.5 (and presumably earlier) doesn't build with the gcc in modern Linux distros (I noticed it with 4.9 from Debian/Jessie). I think we should update Config.mk:OVMF_UPSTREAM_REVISION to cb9a7ebabcd6b8a49dc0854b2f9592d732b5afbd in all stable branches. That version is the one currently in master. To be honest I didn't even know we support use of OVMF in any released version (nor in unstable). When did this change? It was present in 4.3. I don't think we build it by default since its toolchain is a bit picky, but we turn it on and test it in osstest from 4.4 onwards. You can ask for it with bios=ovmf in your hvm guest's config. Well, that means it can be used by the adventurous, but it doesn't mean we consider it supported. TBH, I'm more concerned that it is a blocker for upgrade osstest to Debian Jessie, which in turn is a prerequisite for testing arm64. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 1/4] x86/pvh: Set 32b PVH guest mode in XEN_DOMCTL_set_address_size
On 11.07.15 at 00:20, boris.ostrov...@oracle.com wrote: Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com Reviewed-by: Jan Beulich jbeul...@suse.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] oxenstored: link in the systemd system library
On Thu, 2015-07-23 at 08:40 -0500, Jonathan Creekmore wrote: If systemd is configured for use AND you are building oxenstored, the C systemd library must be linked in to the systemd.cxma library. Signed-off-by: Jonathan Creekmore jonathan.creekm...@gmail.com Acked-by: Ian Campbell ian.campb...@citrix.com OOI what was the failure mode for you without this? Runtime rather than build time I suspect? --- Changed since v1: * Link the systemd library in to the systemd.cxma instead of the final binary. --- tools/ocaml/xenstored/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/ocaml/xenstored/Makefile b/tools/ocaml/xenstored/Makefile index d861f11..59875f7 100644 --- a/tools/ocaml/xenstored/Makefile +++ b/tools/ocaml/xenstored/Makefile @@ -30,6 +30,8 @@ systemd_OBJS = systemd systemd_C_OBJS = systemd_stubs OCAML_LIBRARY += systemd +LIBS_systemd += $(LDFLAGS-y) + OBJS = define \ stdext \ trie \ ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] oxenstored: link in the systemd system library
On Jul 23, 2015, at 9:00 AM, Ian Campbell ian.campb...@citrix.com wrote: On Thu, 2015-07-23 at 08:40 -0500, Jonathan Creekmore wrote: If systemd is configured for use AND you are building oxenstored, the C systemd library must be linked in to the systemd.cxma library. Signed-off-by: Jonathan Creekmore jonathan.creekm...@gmail.com Acked-by: Ian Campbell ian.campb...@citrix.com OOI what was the failure mode for you without this? Runtime rather than build time I suspect? No, it was build-time. It failed to link the oxenstored binary due to missing symbols. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
On Thu, 2015-07-23 at 06:43 +0200, Juergen Gross wrote: On 07/22/2015 04:44 PM, Boris Ostrovsky wrote: On 07/22/2015 10:09 AM, Juergen Gross wrote: I think we have 2 possible solutions: 1. Try to handle this all in the hypervisor via CPUID mangling. 2. Add PV-topology support to the guest and indicate this capability via elfnote; only enable PV-numa if this note is present. I'd prefer the second solution. If you are okay with this, I'd try to do some patches for the pvops kernel. Why do you think that kernel patches are preferable to CPUID management? This would be all in tools, I'd think. (Well, one problem that I can think of is that AMD sometimes pokes at MSRs and/or Northbridge's PCI registers to figure out nodeID --- that we may need to have to address in the hypervisor) Doing it via CPUID is more HW specific. Trying to fake a topology for the guest from outside might lead to weird decisions in the guest e.g. regarding licenses based on socket counts. I do see the value of this, I think... If you are doing it in the guest itself you are able to address the different problems (scheduling, licensing) in different ways. ... but, at least in the case of vNUMA for instance, there still need to be a correlation between the vNUMA topology, and the CPUID topology, and vNUMA topology is decided by toolstack. Then, if you mean, within all the possible solutions that matches (i.e., that does not cause problems to!) the vNUMA setup we've been given, let's pick up one that also is best for this xxx other purpose, then I agree. What I'm not sure I see, although, is how you would be specifying the other purpose, e.g., in this case, are you thinking to another parameter saying that we want to minimize the socket count? Depending on the licensing model playing with CPUID is either good or bad. I can even imagine the CPUID configuration capabilities in xl are in use today for exactly this purpose. Using them for pv-NUMA as well will make this feature unusable for those users. Yeah, well... So, you want a VM with only one socket, because of whatever reasons (say licensing), and you're using libxl's CPUID fiddling capability to do that. Now, if you specify, for such a VM, a vNUMA layout with 2 vnodes, well, I'd call this asking for troubles. I know, strictly speaking, socket != (v)NUMA node. Still, I think this will be a corner case, way less common than just a user specifying a vNUMA topology, and getting only a fraction of the vcpus being used/usable! :-/ In summary, I probably know too few of CPUID handling to have a clear view on whether something like 'making it match the topology' --which also means, if no vNUMA, CPUID should say flat, for some definition of flat-- should leave in tools or in kernel... I just know that we need to do something *consistent*. FWIW, I was thinking that the kernel were a better place, as Juergen is saying, while now I'm more convinced that tools would be more appropriate, as Boris is saying. Thanks and Regards, Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 2/4] x86/compat: Test both PV and PVH guests for compat mode
On 11.07.15 at 00:20, boris.ostrov...@oracle.com wrote: Add is_pvh_32bit_domain() macro and use it alongside is_pv_32bit_domain() where necessary. Since PVH guests cannot change execution mode, has_32bit_shinfo is a good indicator of whether the guest is PVH and 32-bit. Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com Relative to what is in the tree right now this is fine, but ... @@ -771,7 +771,7 @@ int arch_set_info_guest( /* The context is a compat-mode one if the target domain is compat-mode; * we expect the tools to DTRT even in compat-mode callers. */ -compat = is_pv_32bit_domain(d); +compat = is_pv_32bit_domain(d) || is_pvh_32bit_domain(d); ... won't this and ... @@ -1203,7 +1204,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c) { unsigned int i; const struct domain *d = v-domain; -bool_t compat = is_pv_32bit_domain(d); +bool_t compat = is_pv_32bit_domain(d) || is_pvh_32bit_domain(d); ... this get in the way of what we called no-pm on yesterday's call? I would assume that for the transitional period both ought to be able to co-exist. Plus - is this in line with what the tools are doing? Aren't they assuming !PV = native format context? I.e. don't you need to treat differently v-domain == current-domain and its opposite? Roger btw. raised a similar question on IRC earlier today... Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4 04/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
On 23/07/15 15:01, Andrew Cooper wrote: On 23/07/15 12:35, Feng Wu wrote: VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt. With VT-d Posted-Interrupts enabled, external interrupts from direct-assigned devices can be delivered to guests without VMM intervention when guest is running in non-root mode. This patch adds variable 'iommu_intpost' to control whether enable VT-d posted-interrupt or not in the generic IOMMU code. CC: Jan Beulich jbeul...@suse.com CC: Kevin Tian kevin.t...@intel.com Signed-off-by: Feng Wu feng...@intel.com Reviewed-by: Kevin Tian kevin.t...@intel.com Please patch docs/misc/xen-command-line.markdown as you are --- v4: - No changes v3: - Remove pointless initializer for 'iommu_intpost'. - Some adjustment for if no intremap then no intpost logic. * For parse_iommu_param(), move it to the end of the function, so we don't need to add the some logic when introduing the new kernel parameter 'intpost' in later patch. * Add this logic in iommu_setup() after iommu_hardware_setup() is called. xen/drivers/passthrough/iommu.c | 17 - xen/include/xen/iommu.h | 2 +- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c index 06cb38f..536b69f 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -39,6 +39,7 @@ static void iommu_dump_p2m_table(unsigned char key); * no-snoop Disable VT-d Snoop Control * no-qinval Disable VT-d Queued Invalidation * no-intremapDisable VT-d Interrupt Remapping + * no-intpost Disable VT-d Interrupt posting */ custom_param(iommu, parse_iommu_param); bool_t __initdata iommu_enable = 1; @@ -51,6 +52,14 @@ bool_t __read_mostly iommu_passthrough; bool_t __read_mostly iommu_snoop = 1; bool_t __read_mostly iommu_qinval = 1; bool_t __read_mostly iommu_intremap = 1; + +/* + * In the current implementation of VT-d posted interrupts, in some extreme + * cases, the per cpu list which saves the blocked vCPU will be very long, + * and this will affect the interrupt latency, so let this feature off by + * default until we find a good solution to resolve it. + */ +bool_t __read_mostly iommu_intpost; bool_t __read_mostly iommu_hap_pt_share = 1; bool_t __read_mostly iommu_debug; bool_t __read_mostly amd_iommu_perdev_intremap = 1; @@ -112,6 +121,9 @@ static void __init parse_iommu_param(char *s) s = ss + 1; } while ( ss ); + +if ( !iommu_intremap ) +iommu_intpost = 0; This hunk is still wrong, as I said in v3. Furthermore, you appear to have no way in v4 to enable intpost on the Xen command line. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 3/4] x86/pvh: Handle hypercalls for 32b PVH guests
On 11.07.15 at 00:20, boris.ostrov...@oracle.com wrote: Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com --- Changes in v3: * Defined compat_mmuext_op(). (XEN_GUEST_HANDLE_PARAM(mmuext_op_compat_t) is not defined in header files so I used 'void' type. How is it not? It's in compat/xen.h (which is a generated header). @@ -4951,6 +4950,29 @@ static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = { [ __HYPERVISOR_arch_1 ] = (hvm_hypercall_t *)paging_domctl_continuation }; +extern int compat_mmuext_op(XEN_GUEST_HANDLE_PARAM(void) cmp_uops, +unsigned int count, +XEN_GUEST_HANDLE_PARAM(uint) pdone, +unsigned int foreigndom); +static hvm_hypercall_t *const pvh_hypercall32_table[NR_hypercalls] = { +HYPERCALL(platform_op), +COMPAT_CALL(memory_op), +HYPERCALL(xen_version), +HYPERCALL(console_io), +[ __HYPERVISOR_grant_table_op ] = +(hvm_hypercall_t *)hvm_grant_table_op_compat32, +COMPAT_CALL(vcpu_op), +COMPAT_CALL(mmuext_op), +HYPERCALL(xsm_op), +COMPAT_CALL(sched_op), +HYPERCALL(event_channel_op), +[ __HYPERVISOR_physdev_op ] = (hvm_hypercall_t *)hvm_physdev_op_compat32, +HYPERCALL(hvm_op), +HYPERCALL(sysctl), +HYPERCALL(domctl), +[ __HYPERVISOR_arch_1 ] = (hvm_hypercall_t *)paging_domctl_continuation Looks like you didn't fully sync with staging - did you forget that it was you who added xenpmu_op to the 64-bit counterpart? Without that ... @@ -4981,7 +5003,7 @@ int hvm_do_hypercall(struct cpu_user_regs *regs) return viridian_hypercall(regs); if ( (eax = NR_hypercalls) || - (is_pvh_domain(currd) ? !pvh_hypercall64_table[eax] + (is_pvh_domain(currd) ? !pvh_hypercall32_table[eax] : !hvm_hypercall32_table[eax]) ) ... this will break (as we're assuming 32- and 64-bit tables to be fully in sync here; there's still the pending work item of constructing these tables so that this has a better chance of not getting broken). Also, since you're touching this here anyway, could I ask you to adjust this along the lines of the change you do further down, i.e. to become !(is_pvh_domain(currd) ? pvh_hypercall32_table : hvm_hypercall32_table)[eax] ) Thanks, Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
On 07/23/2015 04:07 PM, Dario Faggioli wrote: On Thu, 2015-07-23 at 06:43 +0200, Juergen Gross wrote: On 07/22/2015 04:44 PM, Boris Ostrovsky wrote: On 07/22/2015 10:09 AM, Juergen Gross wrote: I think we have 2 possible solutions: 1. Try to handle this all in the hypervisor via CPUID mangling. 2. Add PV-topology support to the guest and indicate this capability via elfnote; only enable PV-numa if this note is present. I'd prefer the second solution. If you are okay with this, I'd try to do some patches for the pvops kernel. Why do you think that kernel patches are preferable to CPUID management? This would be all in tools, I'd think. (Well, one problem that I can think of is that AMD sometimes pokes at MSRs and/or Northbridge's PCI registers to figure out nodeID --- that we may need to have to address in the hypervisor) Doing it via CPUID is more HW specific. Trying to fake a topology for the guest from outside might lead to weird decisions in the guest e.g. regarding licenses based on socket counts. I do see the value of this, I think... If you are doing it in the guest itself you are able to address the different problems (scheduling, licensing) in different ways. ... but, at least in the case of vNUMA for instance, there still need to be a correlation between the vNUMA topology, and the CPUID topology, and vNUMA topology is decided by toolstack. Then, if you mean, within all the possible solutions that matches (i.e., that does not cause problems to!) the vNUMA setup we've been given, let's pick up one that also is best for this xxx other purpose, then I agree. What I'm not sure I see, although, is how you would be specifying the other purpose, e.g., in this case, are you thinking to another parameter saying that we want to minimize the socket count? Depending on the licensing model playing with CPUID is either good or bad. I can even imagine the CPUID configuration capabilities in xl are in use today for exactly this purpose. Using them for pv-NUMA as well will make this feature unusable for those users. Yeah, well... So, you want a VM with only one socket, because of whatever reasons (say licensing), and you're using libxl's CPUID fiddling capability to do that. Now, if you specify, for such a VM, a vNUMA layout with 2 vnodes, well, I'd call this asking for troubles. I know, strictly speaking, socket != (v)NUMA node. Still, I think this will be a corner case, way less common than just a user specifying a vNUMA topology, and getting only a fraction of the vcpus being used/usable! :-/ In summary, I probably know too few of CPUID handling to have a clear view on whether something like 'making it match the topology' --which also means, if no vNUMA, CPUID should say flat, for some definition of flat-- should leave in tools or in kernel... I just know that we need to do something *consistent*. FWIW, I was thinking that the kernel were a better place, as Juergen is saying, while now I'm more convinced that tools would be more appropriate, as Boris is saying. I'm just about to write down all sources of information the linux kernel is using to build topology information and for what purpose this data is being used. This should help to decide which way to go. Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 05/15] x86/altp2m: basic data structures and support routines.
From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, July 23, 2015 2:22 AM On 23.07.15 at 01:01, edmund.h.wh...@intel.com wrote: @@ -6569,6 +6571,25 @@ void hvm_toggle_singlestep(struct vcpu *v) v-arch.hvm_vcpu.single_step = !v-arch.hvm_vcpu.single_step; } +void altp2m_vcpu_update_p2m(struct vcpu *v) { +if ( hvm_funcs.altp2m_vcpu_update_p2m ) +hvm_funcs.altp2m_vcpu_update_p2m(v); +} + +void altp2m_vcpu_update_vmfunc_ve(struct vcpu *v) { +if ( hvm_funcs.altp2m_vcpu_update_vmfunc_ve ) +hvm_funcs.altp2m_vcpu_update_vmfunc_ve(v); +} + +bool_t altp2m_vcpu_emulate_ve(struct vcpu *v) { +if ( hvm_funcs.altp2m_vcpu_emulate_ve ) +return hvm_funcs.altp2m_vcpu_emulate_ve(v); +return 0; +} These are _still_ not inline functions (in hvm.h), and 00/15 also doesn't mention that this was intentionally left out. Yup possibly missed this one. @@ -498,6 +498,28 @@ int hap_enable(struct domain *d, u32 mode) goto out; } +if ( hvm_altp2m_supported() ) +{ +/* Init alternate p2m data */ +if ( (d-arch.altp2m_eptp = alloc_xenheap_page()) == NULL ) +{ +rv = -ENOMEM; +goto out; +} + +for ( i = 0; i MAX_EPTP; i++ ) +d-arch.altp2m_eptp[i] = INVALID_MFN; And there is _still_ EPT-specific code left in a generic file. This was mentioned explicitly in the cover letter and in the patch header - Altough our error was that the comment was only in patch 10, and should have been in patch 5 also. Patch 10: ... not done - abstracting some ept functionality from p2m @@ -183,6 +184,43 @@ static void p2m_teardown_nestedp2m(struct domain *d) } } +static void p2m_teardown_altp2m(struct domain *d) { +unsigned int i; +struct p2m_domain *p2m; + +for ( i = 0; i MAX_ALTP2M; i++ ) +{ +if ( !d-arch.altp2m_p2m[i] ) +continue; +p2m = d-arch.altp2m_p2m[i]; +p2m_free_one(p2m); +d-arch.altp2m_p2m[i] = NULL; If you already think it's necessary to latch the array entry into a local variable, why don't you zap the array entry _before_ freeing the structure? Could be done. @@ -1940,6 +1988,59 @@ int unmap_mmio_regions(struct domain *d, return err; } +unsigned int p2m_find_altp2m_by_eptp(struct domain *d, uint64_t eptp) +{ +struct p2m_domain *p2m; +struct ept_data *ept; +unsigned int i; + +altp2m_list_lock(d); + +for ( i = 0; i MAX_ALTP2M; i++ ) +{ +if ( d-arch.altp2m_eptp[i] == INVALID_MFN ) +continue; + +p2m = d-arch.altp2m_p2m[i]; +ept = p2m-ept; + +if ( eptp == ept_get_eptp(ept) ) +goto out; +} + +i = INVALID_ALTP2M; + +out: Missed sorry. Just to repeat - labels should be indented by at least one space. And you already know what the comment is regarding this being EPT-specific code (and here one can't even debate whether it's just unfortunate naming, since ept_get_eptp() is _very_ EPT- specific, and that macro - if headers were properly structured - shouldn't even be visible here). Was mentioned albeit in another patch (should have been mentioned here also) --- /dev/null +++ b/xen/include/asm-x86/altp2m.h @@ -0,0 +1,38 @@ +/* + * Alternate p2m HVM + * Copyright (c) 2014, Intel Corporation. + * + * This program is free software; you can redistribute it and/or +modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but +WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public +License for + * more details. + * + * You should have received a copy of the GNU General Public License +along with + * this program; if not, write to the Free Software Foundation, Inc., +59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + */ + +#ifndef _ALTP2M_H +#define _ALTP2M_H I'm sure I said so before - this is not a specific enough guard symbol. It should have at least an _X86 prefix. Right (don't think I saw this before but comment makes sense) +#include xen/types.h +#include xen/sched.h /* for struct vcpu, struct domain */ +#include asm/hvm/vcpu.h /* for vcpu_altp2m */ I don't see the type mentioned in the comment used anywhere below - is this a stray include? --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -233,6 +233,10 @@ struct paging_vcpu { typedef xen_domctl_cpuid_t cpuid_input_t; #define MAX_NESTEDP2M 10 + +#define MAX_ALTP2M ((uint16_t)10) +#define INVALID_ALTP2M ((uint16_t)~0) Didn't you claim to have removed all stray casts? Considering how late we are with this, this patch can have my ack only provided you promise
Re: [Xen-devel] [PATCH OSSTEST v8 00/14] add distro domU testing flight
On Wed, 2015-07-08 at 13:30 +0100, Ian Campbell wrote: I retained acks even when changing things due to either the moving of the make-*flight filter or the moving of the runvar docs to the script, otherwise I dropped them, I hope that is ok. Summary of (A)cks, (M)odified and (N)ew (NM==Replaced something): Ian acked the rest and after discussion with him I've pushed this to pretest. Cheers, Ian. AMmfi-common: Allow make-*flight to filter the set of build jobs to include MTestSupport: Add helper to fetch a URL on a host AMdistros: add support for installing Debian PV guests via d -i, flight and jobs AMdistros: support booting Debian PV (d-i installed) guests with pvgrub. Mdistros: Support pvgrub for Wheezy too. A Test pygrub and pvgrub on the regular flights A distros: add branch infrastructure N crontab-cambridge: Use hard tabs for alignment. Mdistros: Run one suite per day on a weekly basis A Debian: Handle lack of bootloader support in d-i on ARM. A ts-debian-di-install: Refactor root_disk specification A make-flight: refactor PV debian tests MAdd testing of file backed disk formats make-distros-flight: Use ftp.debian.org directly Results for an adhoc xen-unstable flight are at http://osstest.xs.citrite.net/~osstest/testlogs/logs/37711/ And for Jessie: http://osstest.xs.citrite.net/~osstest/testlogs/logs/37717/ Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 06/15] VMX/altp2m: add code to support EPTP switching and #VE.
From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, July 23, 2015 8:00 AM To: Sahita, Ravi Cc: Andrew Cooper; Wei Liu; George Dunlap; Ian Jackson; White, Edmund H; Nakajima, Jun; xen-devel@lists.xen.org; tleng...@novetta.com; Daniel De Graaf; Tim Deegan Subject: RE: [PATCH v7 06/15] VMX/altp2m: add code to support EPTP switching and #VE. On 23.07.15 at 16:40, ravi.sah...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, July 23, 2015 2:43 AM On 23.07.15 at 01:01, edmund.h.wh...@intel.com wrote: +static void vmx_vcpu_update_vmfunc_ve(struct vcpu *v) { +struct domain *d = v-domain; +u32 mask = SECONDARY_EXEC_ENABLE_VM_FUNCTIONS; + +if ( !cpu_has_vmx_vmfunc ) +return; + +if ( cpu_has_vmx_virt_exceptions ) +mask |= SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS; + +vmx_vmcs_enter(v); + +if ( !d-is_dying altp2m_active(d) ) +{ +v-arch.hvm_vmx.secondary_exec_control |= mask; +__vmwrite(VM_FUNCTION_CONTROL, VMX_VMFUNC_EPTP_SWITCHING); +__vmwrite(EPTP_LIST_ADDR, + virt_to_maddr(d-arch.altp2m_eptp)); + +if ( cpu_has_vmx_virt_exceptions ) +{ +p2m_type_t t; +mfn_t mfn; + +mfn = get_gfn_query_unlocked(d, + gfn_x(vcpu_altp2m(v).veinfo_gfn), t); + +if ( mfn_x(mfn) != INVALID_MFN ) +__vmwrite(VIRT_EXCEPTION_INFO, mfn_x(mfn) + PAGE_SHIFT); Considering that the VMCS field holds a byte-aligned address, why do you have the (later introduced) hvmop specify a GFN instead of a GPA? The SDM states: If the EPT-violation #VE VM-execution control is 1, the virtualization-exception information address must satisfy the following checks: - Bits 11:0 of the address must be 0. - The address must not set any bits beyond the processor's physical-address width. Ah, interesting. Knowing what to search for, I was able to find this. But to be honest, it should also be stated in the section talking about #VE, not just in the one talking about checks being done. With that, just like for the other patch my ack depends on the promise to deal with all the remaining issues. Yes our SDM has information that's not the best organized... Agreed on the remaining issues Thanks Ravi Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 10/15] x86/altp2m: add remaining support routines.
On 23.07.15 at 16:51, ravi.sah...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, July 23, 2015 3:06 AM On 23.07.15 at 01:01, edmund.h.wh...@intel.com wrote: @@ -758,6 +765,38 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, unsigned int idx); /* Check to see if vcpu should be switched to a different p2m. */ void p2m_altp2m_check(struct vcpu *v, uint16_t idx); The context here suggests that I overlooked a still not replaced uint16_t. Ok. Just wanted to make sure these are also ok to do post 4.6 I guess so, but you realize you're stretching it? Plus the patch can't go in as is anyway, considering the off-by-one bugs found. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 07/32] xen/x86: fix arch_set_info_guest for HVM guests
El 23/07/15 a les 13.41, Ian Campbell ha escrit: On Thu, 2015-07-23 at 05:29 -0600, Jan Beulich wrote: On 23.07.15 at 12:25, roger@citrix.com wrote: El 13/07/15 a les 16.01, Jan Beulich ha escrit: On 03.07.15 at 13:34, roger@citrix.com wrote: --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -795,6 +795,15 @@ int arch_set_info_guest( c.nat-fs_base || c.nat-gs_base_user)) ) return -EINVAL; } +else if ( is_hvm_domain(d) ) +{ +if ( c(ctrlreg[0]) || c(ctrlreg[1]) || c(ctrlreg[2]) || + c(ctrlreg[3]) || c(ctrlreg[4]) || c(ctrlreg[5]) || + c(ctrlreg[6]) || c(ctrlreg[7]) || c(ldt_base) || + c(ldt_ents) || c(kernel_ss) || c(kernel_sp) || + c(gdt_ents) ) +return -EINVAL; +} In addition to what Andrew said - is the use of c() here really correct considering compat = is_pv_32bit_domain(d); #define c(fld) (compat ? (c.cmp-fld) : (c.nat-fld)) near the beginning of the function? I've been thinking about this. Since HVM vCPUs are always started in 32bit mode, I would ideally like to use the vcpu_guest_context_x86_32_t struct to set the vCPU context. This means that for HVM guests compat should always be true. The problem is that inside of the guest there's no notion of vcpu_guest_context_x86_32_t or vcpu_guest_context_x86_64_t, there's only vcpu_guest_context, which defaults to the native bitness of the guest. So if your guest is running in 64bit mode vcpu_guest_context is aliased to vcpu_guest_context_x86_64_t by default. TBH I'm not sure about the best way to solve this. Should vcpu_guest_context_x86_32_t and vcpu_guest_context_x86_64_t be exposed to the guest like it's done for the tools? Why? We're in arch_set_info_guest(), which is unreachable for a HVM guest on itself. Or is this in preparation of allowing HVM guests to use VCPUOP_initialise? In that case, exposing might be an option, but remember that the compat mode layout even today is used only for PV guests. So I'd rather avoid exposing both layouts to guests, and do translation instead inside the hypervisor. On ARM we deliberately arranged that the vcpu_guest_context was the same for both arm32 and arm64. Moving to PVH seems like an ideal opportunity to do something similar for x86, if not by standardising on the x86_64 layout then by declaring a new struct which can encompass both and declaring the old ones PV legacy. IMHO introducing a new structure that gets rid of all the PV-only fields seems like a good option: struct vcpu_hvm_context { #define _VGCF_online 5 #define VGCF_online(1_VGCF_online) uint32_t flags; /* VGCF_* flags */ struct cpu_hvm_user_regs user_regs; /* User-level CPU registers */ /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */ uint32_t ctrlreg[8];/* CR0-CR7 (control registers) */ uint32_t debugreg[8]; /* DB0-DB7 (debug registers)*/ }; I'm also seriously considering getting rid of ctrlreg and debugreg. Since HVM VCPUs will always be started in 32bit flat mode, it doesn't make sense IMHO to have both the 32 and the 64 version of the registers, so cpu_hvm_user_regs is always going to be: struct cpu_hvm_user_regs { uint32_t ebx; uint32_t ecx; uint32_t edx; uint32_t esi; uint32_t edi; uint32_t ebp; uint32_t eax; uint32_t eip; uint32_t esp; uint32_t eflags; uint16_t cs; uint16_t ss; uint16_t es; uint16_t ds; uint16_t fs; uint16_t gs; }; We could however do something similar to what's done in ARM and have a union of both the 32 and the 64bit registers in case we want to start the vCPU in 64bit mode sometime in the future. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 11/15] x86/altp2m: define and implement alternate p2m HVMOP types.
From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, July 23, 2015 8:09 AM On 23.07.15 at 16:56, ravi.sah...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, July 23, 2015 3:22 AM On 23.07.15 at 01:01, edmund.h.wh...@intel.com wrote: Signed-off-by: Ed White edmund.h.wh...@intel.com Acked-by: Jan Beulich jbeul...@suse.com And I have to withdraw this ack pending clarification of (and perhaps adjustment to) the #VE info address interface. Could we have the ack back :-) I clarified the #VE info address interface in the other email - repeating here: If the EPT-violation #VE VM-execution control is 1, the virtualization-exception information address must satisfy the following checks: - Bits 11:0 of the address must be 0. - The address must not set any bits beyond the processor's physical-address width. Yes, for this aspect. --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -6138,6 +6138,140 @@ static int hvmop_get_param( return rc; } +static int do_altp2m_op( +XEN_GUEST_HANDLE_PARAM(void) arg) { +struct xen_hvm_altp2m_op a; +struct domain *d = NULL; +int rc = 0; + +if ( !hvm_altp2m_supported() ) +return -EOPNOTSUPP; + +if ( copy_from_guest(a, arg, 1) ) +return -EFAULT; + +if ( a.pad1 || a.pad2 || + (a.version != HVMOP_ALTP2M_INTERFACE_VERSION) || + (a.cmd HVMOP_altp2m_get_domain_state) || + (a.cmd HVMOP_altp2m_change_gfn) ) I'm afraid such a change invalidates any earlier ack, even if ti is correct. Instead of this, why don't you start numbering of the sub-ops at zero? Or if you really have a reason to start at 1, why not simply check a.cmd against zero (without using any particular sub-op value)? And then it escapes me why this can't be handled in a default case in the switch statement below anyway. Hmm - is that a requirement per se? we are consistently checking per the sub-op definition we have. Well, in a way. But doing range checks like this means future additions of sub- ops would always need to touch this code. Quite different from doing it in the default case of a switch statement. Plus, can you see how the expression is going to look like if in interface version 2 you need to remove one or two of the current entries, replacing them with new, higher numbers? Yes in a revision I would handle this with the default case. Would like this to be considered as is. As I said in the cover letter we have constraints on how much more we can do this week now - so requesting the maintainers to accept v7 with the review comments you have on those recorded as pending to be addressed by us. Yes, on that basis, albeit extremely hesitantly to be honest. If any other maintainer would be as hesitant as I am about this, I would likely put the two together to yield a NAK. Thanks for your flexibility and honesty on this - hope other maintainers are ok with v7 in the same way, with keeping the other comments you have logged already as pending for post 4.6. Ravi Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4 11/17] vt-d: Add API to update IRTE when VT-d PI is used
On 23.07.15 at 17:55, andrew.coop...@citrix.com wrote: On 23/07/15 16:52, Jan Beulich wrote: On 23.07.15 at 15:51, andrew.coop...@citrix.com wrote: On 23/07/15 12:35, Feng Wu wrote: +GET_IREMAP_ENTRY(ir_ctrl-iremap_maddr, remap_index, iremap_entries, p); + +old_ire = new_ire = *p; + +/* Setup/Update interrupt remapping table entry. */ +setup_posted_irte(new_ire, pi_desc, gvec); +ret = cmpxchg16b(p, old_ire, new_ire); + +ASSERT(ret == *(__uint128_t *)old_ire); This cannot be correct. Either the cmpxchg() is required and you must cope with it failing, or the cmpxchg() is not required and this should be a plain write. Not exactly: The cmpxchg() is required for this to be an atomic 128-bit write. And hence I would view the ASSERT() as appropriate - it validates that the entry didn't change behind our back. But p is an active descriptor, which means hardware is liable to change it behind our back. I inquired about this on the previous round and was told hardware doesn't alter the descriptor. Comparing this with the spec, I couldn't spot any fields that I would suspect getting written. Which fields do you have in mind? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 07/32] xen/x86: fix arch_set_info_guest for HVM guests
On Thu, 2015-07-23 at 17:48 +0200, Roger Pau Monné wrote: El 23/07/15 a les 17.32, Jan Beulich ha escrit: On 23.07.15 at 17:10, roger@citrix.com wrote: IMHO introducing a new structure that gets rid of all the PV-only fields seems like a good option: struct vcpu_hvm_context { #define _VGCF_online 5 #define VGCF_online(1_VGCF_online) uint32_t flags; /* VGCF_* flags */ struct cpu_hvm_user_regs user_regs; /* User-level CPU registers */ /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */ uint32_t ctrlreg[8];/* CR0-CR7 (control registers) */ uint32_t debugreg[8]; /* DB0-DB7 (debug registers) */ }; I'm also seriously considering getting rid of ctrlreg and debugreg. Since HVM VCPUs will always be started in 32bit flat mode, it doesn't make sense IMHO to have both the 32 and the 64 version of the registers, so cpu_hvm_user_regs is always going to be: struct cpu_hvm_user_regs { uint32_t ebx; uint32_t ecx; uint32_t edx; uint32_t esi; uint32_t edi; uint32_t ebp; uint32_t eax; uint32_t eip; uint32_t esp; uint32_t eflags; uint16_t cs; uint16_t ss; uint16_t es; uint16_t ds; uint16_t fs; uint16_t gs; }; We could however do something similar to what's done in ARM and have a union of both the 32 and the 64bit registers in case we want to start the vCPU in 64bit mode sometime in the future. What you gave above is suitable only for VCPUOP_initialise afaict. Did you intend this to be the case? Certainly not, this should also be used by XEN_DOMCTL_setvcpucontext. I'm afraid I'm missing something obvious, but I don't see why this couldn't be used by XEN_DOMCTL_setvcpucontext TBH, can you please clarify? set/getvcpucontext need to pickle all state for save/restore/migration, not just the start of day state. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] target_run_apt: Correctly escape line wrapping
Otherwise the envvars are on the preceding line and therefore not set in the apt-get process. I broke this in 6fea4be08306 apt: lock osstest's usages of apt-get against each other, committed in January. Signed-off-by: Ian Campbell ian.campb...@citrix.com --- Osstest/TestSupport.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm index 0fbc519..a8b66ef 100644 --- a/Osstest/TestSupport.pm +++ b/Osstest/TestSupport.pm @@ -432,7 +432,7 @@ sub target_putfile_root (;$) { sub target_run_apt { my ($ho, @aptopts) = @_; target_cmd_root($ho, -DEBIAN_PRIORITY=critical UCF_FORCE_CONFFOLD=y \ +DEBIAN_PRIORITY=critical UCF_FORCE_CONFFOLD=y \\ with-lock-ex -w /var/lock/osstest-apt apt-get @aptopts, 3000); } sub target_install_packages { -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 10/15] x86/altp2m: add remaining support routines.
On 07/23/2015 03:51 PM, Sahita, Ravi wrote: +int p2m_destroy_altp2m_by_id(struct domain *d, unsigned int idx) { +struct p2m_domain *p2m; +int rc = -EINVAL; + +if ( !idx || idx MAX_ALTP2M ) = (and then also elsewhere further down)? Right. [snip] Just wanted to make sure these are also ok to do post 4.6 Well the off-by-one errors certainly need to be fixed for 4.6. If this was the only thing holding it up, the committer could fix it up on check-in, or we could take a fix-up patch afterwards. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH OSSTEST v3 02/13] Introduce ts-saverestore-support-check
On Wed, 2015-07-22 at 11:13 +0100, Wei Liu wrote: We need this script because we're going to separate the concept of save / restore and migration later. Signed-off-by: Wei Liu wei.l...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Stable backport request for new OVMF version
On 23.07.15 at 16:58, ian.campb...@citrix.com wrote: On Thu, 2015-07-23 at 08:51 -0600, Jan Beulich wrote: On 23.07.15 at 16:43, ian.campb...@citrix.com wrote: On Thu, 2015-07-23 at 08:26 -0600, Jan Beulich wrote: On 23.07.15 at 15:56, ian.campb...@citrix.com wrote: The version of OVMF in 4.5 (and presumably earlier) doesn't build with the gcc in modern Linux distros (I noticed it with 4.9 from Debian/Jessie). I think we should update Config.mk:OVMF_UPSTREAM_REVISION to cb9a7ebabcd6b8a49dc0854b2f9592d732b5afbd in all stable branches. That version is the one currently in master. To be honest I didn't even know we support use of OVMF in any released version (nor in unstable). When did this change? It was present in 4.3. I don't think we build it by default since its toolchain is a bit picky, but we turn it on and test it in osstest from 4.4 onwards. You can ask for it with bios=ovmf in your hvm guest's config. Well, that means it can be used by the adventurous, but it doesn't mean we consider it supported. TBH, I'm more concerned that it is a blocker for upgrade osstest to Debian Jessie, which in turn is a prerequisite for testing arm64. If it was explicitly unsupported, I could say do whatever you need to do to it, but with not being sure what status it has I'm kind of wondering what other changes (i.e. apart from fixing the build) the tree(s) would get which then may break someone. But in the end it's probably okay to do the change. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 07/32] xen/x86: fix arch_set_info_guest for HVM guests
On 23.07.15 at 17:10, roger@citrix.com wrote: IMHO introducing a new structure that gets rid of all the PV-only fields seems like a good option: struct vcpu_hvm_context { #define _VGCF_online 5 #define VGCF_online(1_VGCF_online) uint32_t flags; /* VGCF_* flags */ struct cpu_hvm_user_regs user_regs; /* User-level CPU registers */ /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */ uint32_t ctrlreg[8];/* CR0-CR7 (control registers) */ uint32_t debugreg[8]; /* DB0-DB7 (debug registers) */ }; I'm also seriously considering getting rid of ctrlreg and debugreg. Since HVM VCPUs will always be started in 32bit flat mode, it doesn't make sense IMHO to have both the 32 and the 64 version of the registers, so cpu_hvm_user_regs is always going to be: struct cpu_hvm_user_regs { uint32_t ebx; uint32_t ecx; uint32_t edx; uint32_t esi; uint32_t edi; uint32_t ebp; uint32_t eax; uint32_t eip; uint32_t esp; uint32_t eflags; uint16_t cs; uint16_t ss; uint16_t es; uint16_t ds; uint16_t fs; uint16_t gs; }; We could however do something similar to what's done in ARM and have a union of both the 32 and the 64bit registers in case we want to start the vCPU in 64bit mode sometime in the future. What you gave above is suitable only for VCPUOP_initialise afaict. Did you intend this to be the case? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 0/2] xen: sched/cpupool: more fixing of (corner?) cases
On 23.07.15 at 16:45, dario.faggi...@citrix.com wrote: Take 4, with only typos and language fixes in comments, in patch 1. This now have all the Ack-s it requires, I think. So it does. Dario Faggioli (2): xen: sched: reorganize cpu_disable_scheduler() xen: sched/cpupool: properly update affinity when removing a cpu from a cpupool Especially for the first one, though, the title suggests mere cleanup (i.e. not to go in now), while the description of it looks more like a bug fix. Considering this I'd prefer to have a release ack. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 07/32] xen/x86: fix arch_set_info_guest for HVM guests
El 23/07/15 a les 17.32, Jan Beulich ha escrit: On 23.07.15 at 17:10, roger@citrix.com wrote: IMHO introducing a new structure that gets rid of all the PV-only fields seems like a good option: struct vcpu_hvm_context { #define _VGCF_online 5 #define VGCF_online(1_VGCF_online) uint32_t flags; /* VGCF_* flags */ struct cpu_hvm_user_regs user_regs; /* User-level CPU registers */ /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */ uint32_t ctrlreg[8];/* CR0-CR7 (control registers) */ uint32_t debugreg[8]; /* DB0-DB7 (debug registers) */ }; I'm also seriously considering getting rid of ctrlreg and debugreg. Since HVM VCPUs will always be started in 32bit flat mode, it doesn't make sense IMHO to have both the 32 and the 64 version of the registers, so cpu_hvm_user_regs is always going to be: struct cpu_hvm_user_regs { uint32_t ebx; uint32_t ecx; uint32_t edx; uint32_t esi; uint32_t edi; uint32_t ebp; uint32_t eax; uint32_t eip; uint32_t esp; uint32_t eflags; uint16_t cs; uint16_t ss; uint16_t es; uint16_t ds; uint16_t fs; uint16_t gs; }; We could however do something similar to what's done in ARM and have a union of both the 32 and the 64bit registers in case we want to start the vCPU in 64bit mode sometime in the future. What you gave above is suitable only for VCPUOP_initialise afaict. Did you intend this to be the case? Certainly not, this should also be used by XEN_DOMCTL_setvcpucontext. I'm afraid I'm missing something obvious, but I don't see why this couldn't be used by XEN_DOMCTL_setvcpucontext TBH, can you please clarify? Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 07/32] xen/x86: fix arch_set_info_guest for HVM guests
On 23.07.15 at 17:48, roger@citrix.com wrote: El 23/07/15 a les 17.32, Jan Beulich ha escrit: On 23.07.15 at 17:10, roger@citrix.com wrote: IMHO introducing a new structure that gets rid of all the PV-only fields seems like a good option: struct vcpu_hvm_context { #define _VGCF_online 5 #define VGCF_online(1_VGCF_online) uint32_t flags; /* VGCF_* flags */ struct cpu_hvm_user_regs user_regs; /* User-level CPU registers */ /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */ uint32_t ctrlreg[8];/* CR0-CR7 (control registers) */ uint32_t debugreg[8]; /* DB0-DB7 (debug registers) */ }; I'm also seriously considering getting rid of ctrlreg and debugreg. Since HVM VCPUs will always be started in 32bit flat mode, it doesn't make sense IMHO to have both the 32 and the 64 version of the registers, so cpu_hvm_user_regs is always going to be: struct cpu_hvm_user_regs { uint32_t ebx; uint32_t ecx; uint32_t edx; uint32_t esi; uint32_t edi; uint32_t ebp; uint32_t eax; uint32_t eip; uint32_t esp; uint32_t eflags; uint16_t cs; uint16_t ss; uint16_t es; uint16_t ds; uint16_t fs; uint16_t gs; }; We could however do something similar to what's done in ARM and have a union of both the 32 and the 64bit registers in case we want to start the vCPU in 64bit mode sometime in the future. What you gave above is suitable only for VCPUOP_initialise afaict. Did you intend this to be the case? Certainly not, this should also be used by XEN_DOMCTL_setvcpucontext. I'm afraid I'm missing something obvious, but I don't see why this couldn't be used by XEN_DOMCTL_setvcpucontext TBH, can you please clarify? Just like for HVM (and even more so with the plan to allow PVH to freely switch between 32- and 64-bit modes) the full 64-bit context should be got and set in a single domctl imo. If the caller was to first figure out what bitness a to be migrated guest was paused in (and that may differ between vCPU-s), things would likely become quite ugly. Not to speak of the need for a 64-bit tool stack to then have a way to specify which format the structure is in for a set operation. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 11/15] x86/altp2m: define and implement alternate p2m HVMOP types.
On 23.07.15 at 16:56, ravi.sah...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, July 23, 2015 3:22 AM On 23.07.15 at 01:01, edmund.h.wh...@intel.com wrote: Signed-off-by: Ed White edmund.h.wh...@intel.com Acked-by: Jan Beulich jbeul...@suse.com And I have to withdraw this ack pending clarification of (and perhaps adjustment to) the #VE info address interface. Could we have the ack back :-) I clarified the #VE info address interface in the other email - repeating here: If the EPT-violation #VE VM-execution control is 1, the virtualization-exception information address must satisfy the following checks: - Bits 11:0 of the address must be 0. - The address must not set any bits beyond the processor's physical-address width. Yes, for this aspect. --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -6138,6 +6138,140 @@ static int hvmop_get_param( return rc; } +static int do_altp2m_op( +XEN_GUEST_HANDLE_PARAM(void) arg) { +struct xen_hvm_altp2m_op a; +struct domain *d = NULL; +int rc = 0; + +if ( !hvm_altp2m_supported() ) +return -EOPNOTSUPP; + +if ( copy_from_guest(a, arg, 1) ) +return -EFAULT; + +if ( a.pad1 || a.pad2 || + (a.version != HVMOP_ALTP2M_INTERFACE_VERSION) || + (a.cmd HVMOP_altp2m_get_domain_state) || + (a.cmd HVMOP_altp2m_change_gfn) ) I'm afraid such a change invalidates any earlier ack, even if ti is correct. Instead of this, why don't you start numbering of the sub-ops at zero? Or if you really have a reason to start at 1, why not simply check a.cmd against zero (without using any particular sub-op value)? And then it escapes me why this can't be handled in a default case in the switch statement below anyway. Hmm - is that a requirement per se? we are consistently checking per the sub-op definition we have. Well, in a way. But doing range checks like this means future additions of sub-ops would always need to touch this code. Quite different from doing it in the default case of a switch statement. Plus, can you see how the expression is going to look like if in interface version 2 you need to remove one or two of the current entries, replacing them with new, higher numbers? Would like this to be considered as is. As I said in the cover letter we have constraints on how much more we can do this week now - so requesting the maintainers to accept v7 with the review comments you have on those recorded as pending to be addressed by us. Yes, on that basis, albeit extremely hesitantly to be honest. If any other maintainer would be as hesitant as I am about this, I would likely put the two together to yield a NAK. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xenconsole: Allow non-interactive use
On Thursday, 23.07.2015 at 09:48, Ian Campbell wrote: On Wed, 2015-07-22 at 19:08 +0200, Martin Lucina wrote: If xenconsole is run with stdin closed or redirected to /dev/null, console_loop() will return immediately due to failure to read from STDIN_FILENO. This patch tests if stdin and stdout are both connected to a TTY and, if not, xenconsole will not attempt to read from stdin or modify stdout terminal attributes. Existing behaviour when xenconsole is run from a terminal does not change. This allows for non-interactive use, eg. running xl create -c under systemd or piping the output of xl console to another command. Signed-off-by: Martin Lucina mar...@lucina.net Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com WRT the 4.6 freeze I'm torn between calling this a feature or a bugfix. A pair of nits, which probably aren't worth acting on: @@ -176,8 +177,13 @@ static int console_loop(int fd, struct xs_handle *xs, char *pty_path) fd_set fds; FD_ZERO(fds); - FD_SET(STDIN_FILENO, fds); - max_fd = STDIN_FILENO; + if (interactive) { + FD_SET(STDIN_FILENO, fds); + max_fd = STDIN_FILENO; + } + else { + max_fd = -1; + } Looking at the rest of the file and tools/console subtree it seems the prevailing coding style is: } else max_fd = -1; (i.e. } brace on the same line as the else and no {} for single statements after an else). But maybe it would be better to set max_fd = -1 on declaration and do the max dance here as with the following cases? Declaring max_fd = -1 is indeed clearer, I can do a v2 with that change if you like. One other bug that my change makes potentially easier to trigger is that you can run xl console DOMID multiple times with the same DOMID and the result is broken; each instance gets part of the data written to the console. Any ideas on how to address this in a simple fashion? Martin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xenconsole: Allow non-interactive use
On Thu, 2015-07-23 at 17:09 +0200, Martin Lucina wrote: But maybe it would be better to set max_fd = -1 on declaration and do the max dance here as with the following cases? Declaring max_fd = -1 is indeed clearer, I can do a v2 with that change if you like. If you are happy to then yes, please. One other bug that my change makes potentially easier to trigger is that you can run xl console DOMID multiple times with the same DOMID and the result is broken; each instance gets part of the data written to the console. Any ideas on how to address this in a simple fashion? Perhaps the client should take some exclusive lock (fcntl based?) on an fd of an open file with domid in the name. Failure to get the lock should result in the client exiting with some message indicating the console is in use. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Stable backport request for new OVMF version
On 23.07.15 at 17:31, wei.l...@citrix.com wrote: On Thu, Jul 23, 2015 at 09:28:15AM -0600, Jan Beulich wrote: On 23.07.15 at 16:58, ian.campb...@citrix.com wrote: On Thu, 2015-07-23 at 08:51 -0600, Jan Beulich wrote: On 23.07.15 at 16:43, ian.campb...@citrix.com wrote: On Thu, 2015-07-23 at 08:26 -0600, Jan Beulich wrote: On 23.07.15 at 15:56, ian.campb...@citrix.com wrote: The version of OVMF in 4.5 (and presumably earlier) doesn't build with the gcc in modern Linux distros (I noticed it with 4.9 from Debian/Jessie). I think we should update Config.mk:OVMF_UPSTREAM_REVISION to cb9a7ebabcd6b8a49dc0854b2f9592d732b5afbd in all stable branches. That version is the one currently in master. To be honest I didn't even know we support use of OVMF in any released version (nor in unstable). When did this change? It was present in 4.3. I don't think we build it by default since its toolchain is a bit picky, but we turn it on and test it in osstest from 4.4 onwards. You can ask for it with bios=ovmf in your hvm guest's config. Well, that means it can be used by the adventurous, but it doesn't mean we consider it supported. TBH, I'm more concerned that it is a blocker for upgrade osstest to Debian Jessie, which in turn is a prerequisite for testing arm64. If it was explicitly unsupported, I could say do whatever you need to do to it, but with not being sure what status it has I'm kind of wondering what other changes (i.e. apart from fixing the build) the tree(s) would get which then may break someone. But in the end it's probably okay to do the change. It's marked as experimental, not supported and not built by default -- if that makes you feel more comfortable. It does. So Ian, feel free to go ahead. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4 11/17] vt-d: Add API to update IRTE when VT-d PI is used
On 23.07.15 at 15:51, andrew.coop...@citrix.com wrote: On 23/07/15 12:35, Feng Wu wrote: +GET_IREMAP_ENTRY(ir_ctrl-iremap_maddr, remap_index, iremap_entries, p); + +old_ire = new_ire = *p; + +/* Setup/Update interrupt remapping table entry. */ +setup_posted_irte(new_ire, pi_desc, gvec); +ret = cmpxchg16b(p, old_ire, new_ire); + +ASSERT(ret == *(__uint128_t *)old_ire); This cannot be correct. Either the cmpxchg() is required and you must cope with it failing, or the cmpxchg() is not required and this should be a plain write. Not exactly: The cmpxchg() is required for this to be an atomic 128-bit write. And hence I would view the ASSERT() as appropriate - it validates that the entry didn't change behind our back. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH OSSTEST v3 01/13] toolstack: save / restore check
On Wed, 2015-07-22 at 11:13 +0100, Wei Liu wrote: Introduce check_for_command function and use it to check save / restore functionality. Signed-off-by: Wei Liu wei.l...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Stable backport request for new OVMF version
On Thu, Jul 23, 2015 at 09:28:15AM -0600, Jan Beulich wrote: On 23.07.15 at 16:58, ian.campb...@citrix.com wrote: On Thu, 2015-07-23 at 08:51 -0600, Jan Beulich wrote: On 23.07.15 at 16:43, ian.campb...@citrix.com wrote: On Thu, 2015-07-23 at 08:26 -0600, Jan Beulich wrote: On 23.07.15 at 15:56, ian.campb...@citrix.com wrote: The version of OVMF in 4.5 (and presumably earlier) doesn't build with the gcc in modern Linux distros (I noticed it with 4.9 from Debian/Jessie). I think we should update Config.mk:OVMF_UPSTREAM_REVISION to cb9a7ebabcd6b8a49dc0854b2f9592d732b5afbd in all stable branches. That version is the one currently in master. To be honest I didn't even know we support use of OVMF in any released version (nor in unstable). When did this change? It was present in 4.3. I don't think we build it by default since its toolchain is a bit picky, but we turn it on and test it in osstest from 4.4 onwards. You can ask for it with bios=ovmf in your hvm guest's config. Well, that means it can be used by the adventurous, but it doesn't mean we consider it supported. TBH, I'm more concerned that it is a blocker for upgrade osstest to Debian Jessie, which in turn is a prerequisite for testing arm64. If it was explicitly unsupported, I could say do whatever you need to do to it, but with not being sure what status it has I'm kind of wondering what other changes (i.e. apart from fixing the build) the tree(s) would get which then may break someone. But in the end it's probably okay to do the change. It's marked as experimental, not supported and not built by default -- if that makes you feel more comfortable. Wei. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4 11/17] vt-d: Add API to update IRTE when VT-d PI is used
On 23/07/15 16:52, Jan Beulich wrote: On 23.07.15 at 15:51, andrew.coop...@citrix.com wrote: On 23/07/15 12:35, Feng Wu wrote: +GET_IREMAP_ENTRY(ir_ctrl-iremap_maddr, remap_index, iremap_entries, p); + +old_ire = new_ire = *p; + +/* Setup/Update interrupt remapping table entry. */ +setup_posted_irte(new_ire, pi_desc, gvec); +ret = cmpxchg16b(p, old_ire, new_ire); + +ASSERT(ret == *(__uint128_t *)old_ire); This cannot be correct. Either the cmpxchg() is required and you must cope with it failing, or the cmpxchg() is not required and this should be a plain write. Not exactly: The cmpxchg() is required for this to be an atomic 128-bit write. And hence I would view the ASSERT() as appropriate - it validates that the entry didn't change behind our back. But p is an active descriptor, which means hardware is liable to change it behind our back. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Stable backport request for new OVMF version
On Thu, 2015-07-23 at 09:37 -0600, Jan Beulich wrote: It's marked as experimental, not supported and not built by default -- if that makes you feel more comfortable. It does. So Ian, feel free to go ahead. Thanks. I suppose you were addressing Ian J here since I don't typically do stable release work. I'm happy to do the grunt work on this occasion though... Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 13/15] x86/altp2m: XSM hooks for altp2m HVM ops
On 23.07.15 at 01:01, edmund.h.wh...@intel.com wrote: --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -6005,6 +6005,9 @@ static int hvmop_set_param( nestedhvm_vcpu_destroy(v); break; case HVM_PARAM_ALTP2M: +rc = xsm_hvm_param_altp2mhvm(XSM_PRIV, d); +if ( rc ) +break; if ( a.value 1 ) rc = -EINVAL; if ( a.value @@ -6189,6 +6192,9 @@ static int do_altp2m_op( goto out; } +if ( (rc = xsm_hvm_altp2mhvm_op(XSM_TARGET, d ? d : current-domain)) ) +goto out; + Shouldn't this have got updated after the changes to patch 11? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4 11/17] vt-d: Add API to update IRTE when VT-d PI is used
On 23/07/15 17:00, Jan Beulich wrote: On 23.07.15 at 17:55, andrew.coop...@citrix.com wrote: On 23/07/15 16:52, Jan Beulich wrote: On 23.07.15 at 15:51, andrew.coop...@citrix.com wrote: On 23/07/15 12:35, Feng Wu wrote: +GET_IREMAP_ENTRY(ir_ctrl-iremap_maddr, remap_index, iremap_entries, p); + +old_ire = new_ire = *p; + +/* Setup/Update interrupt remapping table entry. */ +setup_posted_irte(new_ire, pi_desc, gvec); +ret = cmpxchg16b(p, old_ire, new_ire); + +ASSERT(ret == *(__uint128_t *)old_ire); This cannot be correct. Either the cmpxchg() is required and you must cope with it failing, or the cmpxchg() is not required and this should be a plain write. Not exactly: The cmpxchg() is required for this to be an atomic 128-bit write. And hence I would view the ASSERT() as appropriate - it validates that the entry didn't change behind our back. But p is an active descriptor, which means hardware is liable to change it behind our back. I inquired about this on the previous round and was told hardware doesn't alter the descriptor. Ah - apologies for not noticing this. Comparing this with the spec, I couldn't spot any fields that I would suspect getting written. Which fields do you have in mind? None in particular. I called it out because ASSERT(cmpxchg(...) == old) reads as if it is buggy. If hardware will genuinely never update the descriptor, then a comment should be put in here explaining why the assert is safe in this instance. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Stable backport request for new OVMF version
On 23.07.15 at 17:59, ian.campb...@citrix.com wrote: On Thu, 2015-07-23 at 09:37 -0600, Jan Beulich wrote: It's marked as experimental, not supported and not built by default -- if that makes you feel more comfortable. It does. So Ian, feel free to go ahead. Thanks. I suppose you were addressing Ian J here since I don't typically do stable release work. No, since this was mainly for ARM (for which I recall you doing backports), I did indeed mean you. I'm happy to do the grunt work on this occasion though... Thanks. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 07/32] xen/x86: fix arch_set_info_guest for HVM guests
On 23/07/15 17:00, Ian Campbell wrote: On Thu, 2015-07-23 at 17:48 +0200, Roger Pau Monné wrote: El 23/07/15 a les 17.32, Jan Beulich ha escrit: On 23.07.15 at 17:10, roger@citrix.com wrote: IMHO introducing a new structure that gets rid of all the PV-only fields seems like a good option: struct vcpu_hvm_context { #define _VGCF_online 5 #define VGCF_online(1_VGCF_online) uint32_t flags; /* VGCF_* flags */ struct cpu_hvm_user_regs user_regs; /* User-level CPU registers */ /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */ uint32_t ctrlreg[8];/* CR0-CR7 (control registers) */ uint32_t debugreg[8]; /* DB0-DB7 (debug registers) */ }; I'm also seriously considering getting rid of ctrlreg and debugreg. Since HVM VCPUs will always be started in 32bit flat mode, it doesn't make sense IMHO to have both the 32 and the 64 version of the registers, so cpu_hvm_user_regs is always going to be: struct cpu_hvm_user_regs { uint32_t ebx; uint32_t ecx; uint32_t edx; uint32_t esi; uint32_t edi; uint32_t ebp; uint32_t eax; uint32_t eip; uint32_t esp; uint32_t eflags; uint16_t cs; uint16_t ss; uint16_t es; uint16_t ds; uint16_t fs; uint16_t gs; }; We could however do something similar to what's done in ARM and have a union of both the 32 and the 64bit registers in case we want to start the vCPU in 64bit mode sometime in the future. What you gave above is suitable only for VCPUOP_initialise afaict. Did you intend this to be the case? Certainly not, this should also be used by XEN_DOMCTL_setvcpucontext. I'm afraid I'm missing something obvious, but I don't see why this couldn't be used by XEN_DOMCTL_setvcpucontext TBH, can you please clarify? set/getvcpucontext need to pickle all state for save/restore/migration, not just the start of day state. HVM migration doesn't use set/getvcpucontext. I would expect an HVMLite-based-PVH to follow suit and just use set/gethvmcontext. (In fact, while doing migration v2 I was half thinking of making PV follow HVM suit somewhat, to simplify the toolstack side of things) ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH OSSTEST v3 05/13] toolstack: distinguish local and remote migration support
On Wed, 2015-07-22 at 11:13 +0100, Wei Liu wrote: +# Mode should be either 1 (local) or 0 (remote) +our ($whhost, $gn, $mode) = @ARGV; I think $mode would be better given a more boolean name, in this case $local. However that's a minor nit so with or without that changed: Acked-by: Ian Campbell ian.campb...@citrix.com Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [MINUTES] Monthly Xen.org Technical Call (2015-07-22)
On Thu, 2015-07-23 at 13:17 +0200, Roger Pau Monné wrote: El 23/07/15 a les 12.59, Konrad Rzeszutek Wilk ha escrit: [...] We forgot to speak about dom0. This work outlined will lay out how to do it - but the pieces for dom0 are not implemented and would need work (which actually may be following most of the is_pvh in the hypervisor). Let's do that once we have DomU nailed down and the guest ABI marked as stable. IIRC someone on the call pointed out that one large piece of the Dom0 work would be very similar to the PCI passthrough work, so it makes sense to defer them together. Some work will indeed be needed for Dom0, for example I would like to avoid using the current Dom0 builder (construct_dom0, that is suited for PV, but all the PVH hacks just make it almost impossible to understand) and start from scratch with a more sane approach. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 07/32] xen/x86: fix arch_set_info_guest for HVM guests
On 23.07.15 at 12:25, roger@citrix.com wrote: El 13/07/15 a les 16.01, Jan Beulich ha escrit: On 03.07.15 at 13:34, roger@citrix.com wrote: --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -795,6 +795,15 @@ int arch_set_info_guest( c.nat-fs_base || c.nat-gs_base_user)) ) return -EINVAL; } +else if ( is_hvm_domain(d) ) +{ +if ( c(ctrlreg[0]) || c(ctrlreg[1]) || c(ctrlreg[2]) || + c(ctrlreg[3]) || c(ctrlreg[4]) || c(ctrlreg[5]) || + c(ctrlreg[6]) || c(ctrlreg[7]) || c(ldt_base) || + c(ldt_ents) || c(kernel_ss) || c(kernel_sp) || + c(gdt_ents) ) +return -EINVAL; +} In addition to what Andrew said - is the use of c() here really correct considering compat = is_pv_32bit_domain(d); #define c(fld) (compat ? (c.cmp-fld) : (c.nat-fld)) near the beginning of the function? I've been thinking about this. Since HVM vCPUs are always started in 32bit mode, I would ideally like to use the vcpu_guest_context_x86_32_t struct to set the vCPU context. This means that for HVM guests compat should always be true. The problem is that inside of the guest there's no notion of vcpu_guest_context_x86_32_t or vcpu_guest_context_x86_64_t, there's only vcpu_guest_context, which defaults to the native bitness of the guest. So if your guest is running in 64bit mode vcpu_guest_context is aliased to vcpu_guest_context_x86_64_t by default. TBH I'm not sure about the best way to solve this. Should vcpu_guest_context_x86_32_t and vcpu_guest_context_x86_64_t be exposed to the guest like it's done for the tools? Why? We're in arch_set_info_guest(), which is unreachable for a HVM guest on itself. Or is this in preparation of allowing HVM guests to use VCPUOP_initialise? In that case, exposing might be an option, but remember that the compat mode layout even today is used only for PV guests. So I'd rather avoid exposing both layouts to guests, and do translation instead inside the hypervisor. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] tools/libxl: Fixes to stream v2 task joining logic
On Thu, Jul 23, 2015 at 12:09:38PM +0100, Andrew Cooper wrote: During review of the libxl migration v2 series, I changes the task joining logic, but clearly didn't think the result through properly. This patch fixes several errors. 1) Do not call check_all_finished() in the success cases of libxl__xc_domain_{save,restore}_done(). It serves no specific purpose as the save helper state will report itself as inactive by this point, and avoids triggering a second stream-completion_callback() in the case that write_toolstack_record()/stream_continue() record errors synchronously themselves. 2) Only ever set stream-rc in stream_complete(). The first version of the migration v2 series had separate rc and joined_rc parameters, where this logic worked. However when combining the two, the teardown path fails to trigger if stream_done() records stream-rc itself. A side effect of this is that stream_done() needs to take an rc parameter. 3) Avoid stacking of check_all_finished() via synchronous teardown of tasks. If the _abort() functions call back synchronously, stream-completion_callback() ends up getting called twice, as first and last check_all_finished() frames observe each task being finished. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- I found this while working to fix the toolstack record issue, but am posting this bugfix ahead of the other work as OSSTest has tripped over the issue. This change itself doesn't seem to have anything to do with libxc. In OSSTest the error that triggers this knock on effect is the failure of xc_map_foreign_bulk. Does that mean this patch only fix half of the problem seen in OSSTest? Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v4 11/17] vt-d: Add API to update IRTE when VT-d PI is used
This patch adds an API which is used to update the IRTE for posted-interrupt when guest changes MSI/MSI-X information. CC: Yang Zhang yang.z.zh...@intel.com CC: Kevin Tian kevin.t...@intel.com CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Feng Wu feng...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com --- v4: - Don't inline setup_posted_irte() - const struct pi_desc *pi_desc for setup_posted_irte() - return -EINVAL when pirq_spin_lock_irq_desc() fails. - Make some variables const - Release irq desc lock earlier in pi_update_irte() - Remove the pointless do-while() loop when doing cmpxchg16b() v3: - Remove adding PDA_MASK() when updating 'pda_l' and 'pda_h' for IRTE. - Change the return type of pi_update_irte() to int. - Remove some pointless printk message in pi_update_irte(). - Use structure assignment instead of memcpy() for irte copy. xen/drivers/passthrough/vtd/intremap.c | 107 + xen/drivers/passthrough/vtd/iommu.h| 2 + xen/include/asm-x86/iommu.h| 2 + 3 files changed, 111 insertions(+) diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c index b7a42f6..724ba83 100644 --- a/xen/drivers/passthrough/vtd/intremap.c +++ b/xen/drivers/passthrough/vtd/intremap.c @@ -900,3 +900,110 @@ void iommu_disable_x2apic_IR(void) for_each_drhd_unit ( drhd ) disable_qinval(drhd-iommu); } + +static void setup_posted_irte( +struct iremap_entry *new_ire, const struct pi_desc *pi_desc, uint8_t gvec) +{ +new_ire-post.urg = 0; +new_ire-post.vector = gvec; +new_ire-post.pda_l = virt_to_maddr(pi_desc) (32 - PDA_LOW_BIT); +new_ire-post.pda_h = virt_to_maddr(pi_desc) 32; + +new_ire-post.res_1 = 0; +new_ire-post.res_2 = 0; +new_ire-post.res_3 = 0; +new_ire-post.res_4 = 0; +new_ire-post.res_5 = 0; + +new_ire-post.im = 1; +} + +/* + * This function is used to update the IRTE for posted-interrupt + * when guest changes MSI/MSI-X information. + */ +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint8_t gvec) +{ +struct irq_desc *desc; +const struct msi_desc *msi_desc; +int remap_index; +int rc = 0; +const struct pci_dev *pci_dev; +const struct acpi_drhd_unit *drhd; +struct iommu *iommu; +struct ir_ctrl *ir_ctrl; +struct iremap_entry *iremap_entries = NULL, *p = NULL; +struct iremap_entry new_ire, old_ire; +const struct pi_desc *pi_desc = v-arch.hvm_vmx.pi_desc; +unsigned long flags; +__uint128_t ret; + +desc = pirq_spin_lock_irq_desc(pirq, NULL); +if ( !desc ) +return -EINVAL; + +msi_desc = desc-msi_desc; +if ( !msi_desc ) +{ +rc = -EBADSLT; +goto unlock_out; +} + +pci_dev = msi_desc-dev; +if ( !pci_dev ) +{ +rc = -ENODEV; +goto unlock_out; +} + +remap_index = msi_desc-remap_index; + +/* + * For performance concern, we will store the 'iommu' pointer in + * 'struct msi_desc' in some other place, so we don't need to waste + * time searching it here. I will fix this soon. + */ +drhd = acpi_find_matched_drhd_unit(pci_dev); +if ( !drhd ) +{ +rc = -ENODEV; +goto unlock_out; +} + +iommu = drhd-iommu; +ir_ctrl = iommu_ir_ctrl(iommu); +if ( !ir_ctrl ) +{ +rc = -ENODEV; +goto unlock_out; +} + +spin_unlock_irq(desc-lock); + +spin_lock_irqsave(ir_ctrl-iremap_lock, flags); + +GET_IREMAP_ENTRY(ir_ctrl-iremap_maddr, remap_index, iremap_entries, p); + +old_ire = new_ire = *p; + +/* Setup/Update interrupt remapping table entry. */ +setup_posted_irte(new_ire, pi_desc, gvec); +ret = cmpxchg16b(p, old_ire, new_ire); + +ASSERT(ret == *(__uint128_t *)old_ire); + +iommu_flush_cache_entry(p, sizeof(struct iremap_entry)); +iommu_flush_iec_index(iommu, 0, remap_index); + +if ( iremap_entries ) +unmap_vtd_domain_page(iremap_entries); + +spin_unlock_irqrestore(ir_ctrl-iremap_lock, flags); + +return 0; + + unlock_out: +spin_unlock_irq(desc-lock); + +return rc; +} diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h index 0be6c2e..692473f 100644 --- a/xen/drivers/passthrough/vtd/iommu.h +++ b/xen/drivers/passthrough/vtd/iommu.h @@ -330,6 +330,8 @@ struct iremap_entry { }; }; +#define PDA_LOW_BIT26 + /* Max intr remapping table page order is 8, as max number of IRTEs is 64K */ #define IREMAP_PAGE_ORDER 8 diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h index e7a65da..2a1523e 100644 --- a/xen/include/asm-x86/iommu.h +++ b/xen/include/asm-x86/iommu.h @@ -32,6 +32,8 @@ int iommu_supports_eim(void); int iommu_enable_x2apic_IR(void); void iommu_disable_x2apic_IR(void); +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint8_t gvec); + #endif /*
[Xen-devel] [v4 13/17] vmx: posted-interrupt handling when vCPU is blocked
This patch includes the following aspects: - Add a global vector to wake up the blocked vCPU when an interrupt is being posted to it (This part was sugguested by Yang Zhang yang.z.zh...@intel.com). - Adds a new per-vCPU tasklet to wakeup the blocked vCPU. It can be used in the case vcpu_unblock cannot be called directly. - Define two per-cpu variables: * pi_blocked_vcpu: A list storing the vCPUs which were blocked on this pCPU. * pi_blocked_vcpu_lock: The spinlock to protect pi_blocked_vcpu. CC: Kevin Tian kevin.t...@intel.com CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Feng Wu feng...@intel.com --- v4: - Use local variables in pi_wakeup_interrupt() - Remove vcpu from the blocked list when pi_desc.on==1, this - avoid kick vcpu multiple times. - Remove tasklet v3: - This patch is generated by merging the following three patches in v2: [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU [RFC v2 10/15] vmx: Define two per-cpu variables [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet' - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct' - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler' - Make pi_wakeup_interrupt() static - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list' - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct' - Rename 'blocked_vcpu' to 'pi_blocked_vcpu' - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock' xen/arch/x86/hvm/vmx/vmcs.c| 3 ++ xen/arch/x86/hvm/vmx/vmx.c | 63 ++ xen/include/asm-x86/hvm/vmx/vmcs.h | 3 ++ xen/include/asm-x86/hvm/vmx/vmx.h | 5 +++ 4 files changed, 74 insertions(+) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 11dc1b5..0c5ce3f 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -631,6 +631,9 @@ int vmx_cpu_up(void) if ( cpu_has_vmx_vpid ) vpid_sync_all(); +INIT_LIST_HEAD(per_cpu(pi_blocked_vcpu, cpu)); +spin_lock_init(per_cpu(pi_blocked_vcpu_lock, cpu)); + return 0; } diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index cd5bfab..d137ea0 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -82,7 +82,15 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content); static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content); static void vmx_invlpg_intercept(unsigned long vaddr); +/* + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we + * can find which vCPU should be waken up. + */ +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu); +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock); + uint8_t __read_mostly posted_intr_vector; +uint8_t __read_mostly pi_wakeup_vector; static int vmx_domain_initialise(struct domain *d) { @@ -105,6 +113,9 @@ static int vmx_vcpu_initialise(struct vcpu *v) spin_lock_init(v-arch.hvm_vmx.vmcs_lock); +INIT_LIST_HEAD(v-arch.hvm_vmx.pi_blocked_vcpu_list); +INIT_LIST_HEAD(v-arch.hvm_vmx.pi_vcpu_on_set_list); + v-arch.schedule_tail= vmx_do_resume; v-arch.ctxt_switch_from = vmx_ctxt_switch_from; v-arch.ctxt_switch_to = vmx_ctxt_switch_to; @@ -1849,6 +1860,53 @@ static struct hvm_function_table __initdata vmx_function_table = { .enable_msr_exit_interception = vmx_enable_msr_exit_interception, }; +/* + * Handle VT-d posted-interrupt when VCPU is blocked. + */ +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) +{ +struct arch_vmx_struct *vmx, *tmp; +struct vcpu *v; +spinlock_t *lock = this_cpu(pi_blocked_vcpu_lock); +struct list_head *blocked_vcpus = this_cpu(pi_blocked_vcpu); +LIST_HEAD(list); + +spin_lock(lock); + +/* + * XXX: The length of the list depends on how many vCPU is current + * blocked on this specific pCPU. This may hurt the interrupt latency + * if the list grows to too many entries. + */ +list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list) +{ +if ( pi_test_on(vmx-pi_desc) ) +{ +list_del_init(vmx-pi_blocked_vcpu_list); + +/* + * We cannot call vcpu_unblock here, since it also needs + * 'pi_blocked_vcpu_lock', we store the vCPUs with ON + * set in another list and unblock them after we release + * 'pi_blocked_vcpu_lock'. + */ +list_add_tail(vmx-pi_vcpu_on_set_list, list); +} +} + +spin_unlock(lock); + +list_for_each_entry_safe(vmx, tmp, list, pi_vcpu_on_set_list) +{ +v = container_of(vmx, struct vcpu, arch.hvm_vmx); +list_del_init(vmx-pi_vcpu_on_set_list); +vcpu_unblock(v); +} + +ack_APIC_irq(); +
[Xen-devel] [v4 08/17] vmx: Initialize VT-d Posted-Interrupts Descriptor
This patch initializes the VT-d Posted-interrupt Descriptor. CC: Kevin Tian kevin.t...@intel.com CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Feng Wu feng...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com --- v4: - No changes v3: - Move pi_desc_init() to xen/arch/x86/hvm/vmx/vmcs.c - Remove the 'inline' flag of pi_desc_init() xen/arch/x86/hvm/vmx/vmcs.c | 18 ++ xen/include/asm-x86/hvm/vmx/vmx.h | 2 ++ 2 files changed, 20 insertions(+) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 3aff365..11dc1b5 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -40,6 +40,7 @@ #include asm/flushtlb.h #include asm/shadow.h #include asm/tboot.h +#include asm/apic.h static bool_t __read_mostly opt_vpid_enabled = 1; boolean_param(vpid, opt_vpid_enabled); @@ -921,6 +922,20 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val) virtual_vmcs_exit(vvmcs); } +static void pi_desc_init(struct vcpu *v) +{ +uint32_t dest; + +v-arch.hvm_vmx.pi_desc.nv = posted_intr_vector; + +dest = cpu_physical_id(v-processor); + +if ( x2apic_enabled ) +v-arch.hvm_vmx.pi_desc.ndst = dest; +else +v-arch.hvm_vmx.pi_desc.ndst = MASK_INSR(dest, PI_xAPIC_NDST_MASK); +} + static int construct_vmcs(struct vcpu *v) { struct domain *d = v-domain; @@ -1054,6 +1069,9 @@ static int construct_vmcs(struct vcpu *v) if ( cpu_has_vmx_posted_intr_processing ) { +if ( iommu_intpost ) +pi_desc_init(v); + __vmwrite(PI_DESC_ADDR, virt_to_maddr(v-arch.hvm_vmx.pi_desc)); __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector); } diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h index 2eb7cbd..b53973e 100644 --- a/xen/include/asm-x86/hvm/vmx/vmx.h +++ b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -89,6 +89,8 @@ typedef enum { #define EPT_EMT_WB 6 #define EPT_EMT_RSV27 +#define PI_xAPIC_NDST_MASK 0xFF00 + void vmx_asm_vmexit_handler(struct cpu_user_regs); void vmx_asm_do_vmentry(void); void vmx_intr_assist(void); -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v4 12/17] Update IRTE according to guest interrupt config changes
When guest changes its interrupt configuration (such as, vector, etc.) for direct-assigned devices, we need to update the associated IRTE with the new guest vector, so external interrupts from the assigned devices can be injected to guests without VM-Exit. For lowest-priority interrupts, we use vector-hashing mechamisn to find the destination vCPU. This follows the hardware behavior, since modern Intel CPUs use vector hashing to handle the lowest-priority interrupt. For multicast/broadcast vCPU, we cannot handle it via interrupt posting, still use interrupt remapping. CC: Jan Beulich jbeul...@suse.com Signed-off-by: Feng Wu feng...@intel.com --- v4: - Make some 'int' variables 'unsigned int' in pi_find_dest_vcpu() - Make 'dest_id' uint32_t - Rename 'size' to 'bitmap_array_size' - find_next_bit() and find_first_bit() always return unsigned int, so no need to check whether the return value is less than 0. - Message error level XENLOG_G_WARNING - XENLOG_G_INFO - Remove useless warning message - Create a seperate function vector_hashing_dest() to find the - destination of lowest-priority interrupts. - Change some comments v3: - Use bitmap to store the all the possible destination vCPUs of an interrupt, then trying to find the right destination from the bitmap - Typo and some small changes xen/drivers/passthrough/io.c | 124 ++- 1 file changed, 123 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index 9b77334..546b962 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -26,6 +26,7 @@ #include asm/hvm/iommu.h #include asm/hvm/support.h #include xen/hvm/irq.h +#include asm/io_apic.h static DEFINE_PER_CPU(struct list_head, dpci_list); @@ -199,6 +200,108 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci) xfree(dpci); } +/* + * This routine handles lowest-priority interrupts using vector-hashing + * mechanism. As an example, modern Intel CPUs use this method to handle + * lowest-priority interrupts. + * + * Here is the details about the vector-hashing mechanism: + * 1. For lowest-priority interrupts, store all the possible destination + *vCPUs in an array. + * 2. Use gvec % max number of destination vCPUs to find the right + *destination vCPU in the array for the lowest-priority interrupt. + */ +static struct vcpu *vector_hashing_dest(const struct domain *d, +uint32_t dest_id, +bool_t dest_mode, +uint8_t gvec) + +{ +unsigned long *dest_vcpu_bitmap; +unsigned int dest_vcpu_num = 0, idx; +unsigned int bitmap_array_size = BITS_TO_LONGS(d-max_vcpus); +struct vcpu *v, *dest = NULL; +unsigned int i; + +dest_vcpu_bitmap = xzalloc_array(unsigned long, bitmap_array_size); +if ( !dest_vcpu_bitmap ) +{ +dprintk(XENLOG_G_INFO, +dom%d: failed to allocate memory\n, d-domain_id); +return NULL; +} + +for_each_vcpu ( d, v ) +{ +if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0, +dest_id, dest_mode) ) +continue; + +__set_bit(v-vcpu_id, dest_vcpu_bitmap); +dest_vcpu_num++; +} + +if ( dest_vcpu_num != 0 ) +{ +idx = 0; + +for ( i = gvec % dest_vcpu_num; i = 0; i--) +{ +idx = find_next_bit(dest_vcpu_bitmap, d-max_vcpus, idx) + 1; +BUG_ON(idx = d-max_vcpus); +} +idx--; + +dest = d-vcpu[idx]; +} + +xfree(dest_vcpu_bitmap); + +return dest; +} + +/* + * The purpose of this routine is to find the right destination vCPU for + * an interrupt which will be delivered by VT-d posted-interrupt. There + * are several cases as below: + * + * - For lowest-priority interrupts, use vector-hashing mechanism to find + * the destination. + * - Otherwise, for single destination interrupt, it is straightforward to + * find the destination vCPU and return true. + * - For multicast/broadcast vCPU, we cannot handle it via interrupt posting, + * so return NULL. + */ +static struct vcpu *pi_find_dest_vcpu(const struct domain *d, uint32_t dest_id, + bool_t dest_mode, uint8_t delivery_mode, + uint8_t gvec) +{ +unsigned int dest_vcpu_num = 0; +struct vcpu *v, *dest = NULL; + +if ( delivery_mode == dest_LowestPrio ) +return vector_hashing_dest(d, dest_id, dest_mode, gvec); + +for_each_vcpu ( d, v ) +{ +if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0, +dest_id, dest_mode) ) +continue; + +dest_vcpu_num++; +dest = v; +} + +/* + * For fixed destination, we only handle single-destination + * interrupts. + */ +if ( dest_vcpu_num == 1 ) + return dest; + +
[Xen-devel] [v4 14/17] vmx: Properly handle notification event when vCPU is running
When a vCPU is running in Root mode and a notification event has been injected to it. we need to set VCPU_KICK_SOFTIRQ for the current cpu, so the pending interrupt in PIRR will be synced to vIRR before VM-Exit in time. CC: Kevin Tian kevin.t...@intel.com CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Feng Wu feng...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com --- v4: - Coding style. v3: - Make pi_notification_interrupt() static xen/arch/x86/hvm/vmx/vmx.c | 47 +- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index d137ea0..f2b7fe0 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1907,6 +1907,51 @@ static void pi_wakeup_interrupt(struct cpu_user_regs *regs) this_cpu(irq_count)++; } +/* Handle VT-d posted-interrupt when VCPU is running. */ +static void pi_notification_interrupt(struct cpu_user_regs *regs) +{ +/* + * We get here when a vCPU is running in root-mode (such as via hypercall, + * or any other reasons which can result in VM-Exit), and before vCPU is + * back to non-root, external interrupts from an assigned device happen + * and a notification event is delivered to this logical CPU. + * + * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like + * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR will + * be synced to vIRR before VM-Exit in time. + * + * Please refer to the following code fragments from + * xen/arch/x86/hvm/vmx/entry.S: + * + * .Lvmx_do_vmentry + * + * .. + * point 1 + * + * cmp %ecx,(%rdx,%rax,1) + * jnz .Lvmx_process_softirqs + * + * .. + * + * je .Lvmx_launch + * + * .. + * + * .Lvmx_process_softirqs: + * sti + * call do_softirq + * jmp .Lvmx_do_vmentry + * + * If VT-d engine issues a notification event at point 1 above, it cannot + * be delivered to the guest during this VM-entry without raising the + * softirq in this notification handler. + */ +raise_softirq(VCPU_KICK_SOFTIRQ); + +ack_APIC_irq(); +this_cpu(irq_count)++; +} + const struct hvm_function_table * __init start_vmx(void) { set_in_cr4(X86_CR4_VMXE); @@ -1944,7 +1989,7 @@ const struct hvm_function_table * __init start_vmx(void) if ( cpu_has_vmx_posted_intr_processing ) { -alloc_direct_apic_vector(posted_intr_vector, event_check_interrupt); +alloc_direct_apic_vector(posted_intr_vector, pi_notification_interrupt); if ( iommu_intpost ) alloc_direct_apic_vector(pi_wakeup_vector, pi_wakeup_interrupt); -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v4 16/17] vmx: Add some scheduler hooks for VT-d posted interrupts
This patch adds the following arch hooks in scheduler: - vmx_pre_ctx_switch_pi(): It is called before context switch, we update the posted interrupt descriptor when the vCPU is preempted, go to sleep, or is blocked. - vmx_post_ctx_switch_pi() It is called after context switch, we update the posted interrupt descriptor when the vCPU is going to run. - arch_vcpu_wake() It will be called when waking up the vCPU, we update the posted interrupt descriptor when the vCPU is unblocked. CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com CC: Kevin Tian kevin.t...@intel.com CC: George Dunlap george.dun...@eu.citrix.com CC: Dario Faggioli dario.faggi...@citrix.com Sugguested-by: Dario Faggioli dario.faggi...@citrix.com Signed-off-by: Feng Wu feng...@intel.com --- v4: - Newly added xen/arch/x86/domain.c | 10 +++ xen/arch/x86/hvm/vmx/vmx.c | 145 + xen/common/schedule.c | 2 + xen/include/asm-x86/domain.h | 3 + xen/include/asm-x86/hvm/hvm.h | 2 + xen/include/asm-x86/hvm/vmx/vmcs.h | 8 ++ 6 files changed, 170 insertions(+) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index db073a6..36fd2d5 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1550,9 +1550,19 @@ void context_switch(struct vcpu *prev, struct vcpu *next) set_current(next); +/* + * We need to update posted interrupt descriptor for each context switch, + * hence cannot use the lazy context switch for this. + */ +if ( !is_idle_vcpu(prev) prev-arch.pi_ctxt_switch_from ) +prev-arch.pi_ctxt_switch_from(prev); + if ( (per_cpu(curr_vcpu, cpu) == next) || (is_idle_vcpu(next) cpu_online(cpu)) ) { +if ( !is_idle_vcpu(next) next-arch.pi_ctxt_switch_to ) +next-arch.pi_ctxt_switch_to(next); + local_irq_enable(); } else diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index f2b7fe0..f8b3601 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -67,6 +67,8 @@ enum handler_return { HNDL_done, HNDL_unhandled, HNDL_exception_raised }; static void vmx_ctxt_switch_from(struct vcpu *v); static void vmx_ctxt_switch_to(struct vcpu *v); +static void vmx_pre_ctx_switch_pi(struct vcpu *v); +static void vmx_post_ctx_switch_pi(struct vcpu *v); static int vmx_alloc_vlapic_mapping(struct domain *d); static void vmx_free_vlapic_mapping(struct domain *d); @@ -116,10 +118,17 @@ static int vmx_vcpu_initialise(struct vcpu *v) INIT_LIST_HEAD(v-arch.hvm_vmx.pi_blocked_vcpu_list); INIT_LIST_HEAD(v-arch.hvm_vmx.pi_vcpu_on_set_list); +v-arch.hvm_vmx.pi_block_cpu = -1; + +spin_lock_init(v-arch.hvm_vmx.pi_lock); + v-arch.schedule_tail= vmx_do_resume; v-arch.ctxt_switch_from = vmx_ctxt_switch_from; v-arch.ctxt_switch_to = vmx_ctxt_switch_to; +v-arch.pi_ctxt_switch_from = vmx_pre_ctx_switch_pi; +v-arch.pi_ctxt_switch_to = vmx_post_ctx_switch_pi; + if ( (rc = vmx_create_vmcs(v)) != 0 ) { dprintk(XENLOG_WARNING, @@ -715,6 +724,141 @@ static void vmx_fpu_leave(struct vcpu *v) } } +void arch_vcpu_wake(struct vcpu *v) +{ +unsigned long gflags; + +if ( !iommu_intpost || !is_hvm_vcpu(v) || !has_arch_pdevs(v-domain) ) +return; + +spin_lock_irqsave(v-arch.hvm_vmx.pi_lock, gflags); + +if ( likely(vcpu_runnable(v)) || + !test_bit(_VPF_blocked, v-pause_flags) ) +{ +struct pi_desc *pi_desc = v-arch.hvm_vmx.pi_desc; +unsigned long flags; + +/* + * We don't need to send notification event to a non-running + * vcpu, the interrupt information will be delivered to it before + * VM-ENTRY when the vcpu is scheduled to run next time. + */ +pi_set_sn(pi_desc); + +/* + * Set 'NV' feild back to posted_intr_vector, so the + * Posted-Interrupts can be delivered to the vCPU by + * VT-d HW after it is scheduled to run. + */ +write_atomic((uint8_t*)pi_desc-nv, posted_intr_vector); + +/* + * Delete the vCPU from the related block list + * if we are resuming from blocked state + */ +if ( v-arch.hvm_vmx.pi_block_cpu != -1 ) +{ +spin_lock_irqsave(per_cpu(pi_blocked_vcpu_lock, + v-arch.hvm_vmx.pi_block_cpu), flags); +list_del_init(v-arch.hvm_vmx.pi_blocked_vcpu_list); +spin_unlock_irqrestore(per_cpu(pi_blocked_vcpu_lock, +v-arch.hvm_vmx.pi_block_cpu), flags); +} +} + +spin_unlock_irqrestore(v-arch.hvm_vmx.pi_lock, gflags); +} + +static void vmx_pre_ctx_switch_pi(struct vcpu *v) +{ +struct pi_desc *pi_desc = v-arch.hvm_vmx.pi_desc; +struct pi_desc old, new; +unsigned long flags, gflags; + +if (
[Xen-devel] [v4 17/17] VT-d: Dump the posted format IRTE
Add the utility to dump the posted format IRTE. CC: Yang Zhang yang.z.zh...@intel.com CC: Kevin Tian kevin.t...@intel.com Signed-off-by: Feng Wu feng...@intel.com --- v4: - Newly added xen/drivers/passthrough/vtd/utils.c | 41 - 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c index a5fe237..7f73a84 100644 --- a/xen/drivers/passthrough/vtd/utils.c +++ b/xen/drivers/passthrough/vtd/utils.c @@ -211,6 +211,9 @@ static void dump_iommu_info(unsigned char key) ecap_intr_remap(iommu-ecap) ? : not , (status DMA_GSTS_IRES) ? and enabled : ); +printk( Interrupt Posting: %ssupported.\n, +cap_intr_post(iommu-ecap) ? : not ); + if ( status DMA_GSTS_IRES ) { /* Dump interrupt remapping table. */ @@ -221,6 +224,7 @@ static void dump_iommu_info(unsigned char key) printk( Interrupt remapping table (nr_entry=%#x. Only dump P=1 entries here):\n, nr_entry); +printk (Entries for remapped format:\n); printk( SVT SQ SID DST V AVL DLM TM RH DM FPD P\n); for ( i = 0; i nr_entry; i++ ) @@ -238,7 +242,7 @@ static void dump_iommu_info(unsigned char key) else p = iremap_entries[i % (1 IREMAP_ENTRY_ORDER)]; -if ( !p-remap.p ) +if ( !p-remap.p || p-remap.im ) continue; printk( %04x: %x %x %04x %08x %02x%x %x %x %x %x %x %x\n, i, @@ -248,6 +252,41 @@ static void dump_iommu_info(unsigned char key) (u32)p-remap.dm, (u32)p-remap.fpd, (u32)p-remap.p); print_cnt++; } + +if ( iremap_entries ) +unmap_vtd_domain_page(iremap_entries); + +iremap_entries = NULL; +printk (\nEntries for posted format:\n); +printk( SVT SQ SID PDA V URG AVL FPD P\n); +for ( i = 0; i nr_entry; i++ ) +{ +struct iremap_entry *p; +if ( i % (1 IREMAP_ENTRY_ORDER) == 0 ) +{ +/* This entry across page boundry */ +if ( iremap_entries ) +unmap_vtd_domain_page(iremap_entries); + +GET_IREMAP_ENTRY(iremap_maddr, i, + iremap_entries, p); +} +else +p = iremap_entries[i % (1 IREMAP_ENTRY_ORDER)]; + +if ( !p-post.p || !p-post.im ) +continue; + +printk( %04x: %x %x %04x %16lx %02x%x %x %x %x\n, +i, +(u32)p-post.svt, (u32)p-post.sq, (u32)p-post.sid, +(((u64)p-post.pda_h 32) | (p-post.pda_l 6)), +(u32)p-post.vector, (u32)p-post.urg, +(u32)p-post.avail, (u32)p-post.fpd, (u32)p-post.p); + +print_cnt++; +} + if ( iremap_entries ) unmap_vtd_domain_page(iremap_entries); if ( iommu_ir_ctrl(iommu)-iremap_num != print_cnt ) -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v4 10/17] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
Extend struct iremap_entry according to VT-d Posted-Interrupts Spec. CC: Yang Zhang yang.z.zh...@intel.com CC: Kevin Tian kevin.t...@intel.com Signed-off-by: Feng Wu feng...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com --- v4: - res_4 is not a bitfiled, correct it. - Expose 'im' to remapped irte as well. v3: - Use u32 instead of u64 to define the bitfields in 'struct iremap_entry' - Limit using bitfield if possible xen/drivers/passthrough/vtd/intremap.c | 92 +- xen/drivers/passthrough/vtd/iommu.h| 43 ++-- xen/drivers/passthrough/vtd/utils.c| 10 ++-- 3 files changed, 81 insertions(+), 64 deletions(-) diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c index 0333686..b7a42f6 100644 --- a/xen/drivers/passthrough/vtd/intremap.c +++ b/xen/drivers/passthrough/vtd/intremap.c @@ -123,9 +123,9 @@ static u16 hpetid_to_bdf(unsigned int hpet_id) static void set_ire_sid(struct iremap_entry *ire, unsigned int svt, unsigned int sq, unsigned int sid) { -ire-hi.svt = svt; -ire-hi.sq = sq; -ire-hi.sid = sid; +ire-remap.svt = svt; +ire-remap.sq = sq; +ire-remap.sid = sid; } static void set_ioapic_source_id(int apic_id, struct iremap_entry *ire) @@ -220,7 +220,7 @@ static unsigned int alloc_remap_entry(struct iommu *iommu, unsigned int nr) else p = iremap_entries[i % (1 IREMAP_ENTRY_ORDER)]; -if ( p-lo_val || p-hi_val ) /* not a free entry */ +if ( p-lo || p-hi ) /* not a free entry */ found = 0; else if ( ++found == nr ) break; @@ -254,7 +254,7 @@ static int remap_entry_to_ioapic_rte( GET_IREMAP_ENTRY(ir_ctrl-iremap_maddr, index, iremap_entries, iremap_entry); -if ( iremap_entry-hi_val == 0 iremap_entry-lo_val == 0 ) +if ( iremap_entry-hi == 0 iremap_entry-lo == 0 ) { dprintk(XENLOG_ERR VTDPREFIX, %s: index (%d) get an empty entry!\n, @@ -264,13 +264,13 @@ static int remap_entry_to_ioapic_rte( return -EFAULT; } -old_rte-vector = iremap_entry-lo.vector; -old_rte-delivery_mode = iremap_entry-lo.dlm; -old_rte-dest_mode = iremap_entry-lo.dm; -old_rte-trigger = iremap_entry-lo.tm; +old_rte-vector = iremap_entry-remap.vector; +old_rte-delivery_mode = iremap_entry-remap.dlm; +old_rte-dest_mode = iremap_entry-remap.dm; +old_rte-trigger = iremap_entry-remap.tm; old_rte-__reserved_2 = 0; old_rte-dest.logical.__reserved_1 = 0; -old_rte-dest.logical.logical_dest = iremap_entry-lo.dst 8; +old_rte-dest.logical.logical_dest = iremap_entry-remap.dst 8; unmap_vtd_domain_page(iremap_entries); spin_unlock_irqrestore(ir_ctrl-iremap_lock, flags); @@ -318,27 +318,28 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu, if ( rte_upper ) { if ( x2apic_enabled ) -new_ire.lo.dst = value; +new_ire.remap.dst = value; else -new_ire.lo.dst = (value 24) 8; +new_ire.remap.dst = (value 24) 8; } else { *(((u32 *)new_rte) + 0) = value; -new_ire.lo.fpd = 0; -new_ire.lo.dm = new_rte.dest_mode; -new_ire.lo.tm = new_rte.trigger; -new_ire.lo.dlm = new_rte.delivery_mode; +new_ire.remap.fpd = 0; +new_ire.remap.dm = new_rte.dest_mode; +new_ire.remap.tm = new_rte.trigger; +new_ire.remap.dlm = new_rte.delivery_mode; /* Hardware require RH = 1 for LPR delivery mode */ -new_ire.lo.rh = (new_ire.lo.dlm == dest_LowestPrio); -new_ire.lo.avail = 0; -new_ire.lo.res_1 = 0; -new_ire.lo.vector = new_rte.vector; -new_ire.lo.res_2 = 0; +new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio); +new_ire.remap.avail = 0; +new_ire.remap.res_1 = 0; +new_ire.remap.vector = new_rte.vector; +new_ire.remap.res_2 = 0; set_ioapic_source_id(IO_APIC_ID(apic), new_ire); -new_ire.hi.res_1 = 0; -new_ire.lo.p = 1; /* finally, set present bit */ +new_ire.remap.res_3 = 0; +new_ire.remap.res_4 = 0; +new_ire.remap.p = 1; /* finally, set present bit */ /* now construct new ioapic rte entry */ remap_rte-vector = new_rte.vector; @@ -511,7 +512,7 @@ static int remap_entry_to_msi_msg( GET_IREMAP_ENTRY(ir_ctrl-iremap_maddr, index, iremap_entries, iremap_entry); -if ( iremap_entry-hi_val == 0 iremap_entry-lo_val == 0 ) +if ( iremap_entry-hi == 0 iremap_entry-lo == 0 ) { dprintk(XENLOG_ERR VTDPREFIX, %s: index (%d) get an empty entry!\n, @@ -524,25 +525,25 @@ static int remap_entry_to_msi_msg( msg-address_hi = MSI_ADDR_BASE_HI; msg-address_lo = MSI_ADDR_BASE_LO | -((iremap_entry-lo.dm == 0)
[Xen-devel] [v4 15/17] arm: add a dummy arch hooks for scheduler
Add a dummy arch hooks for scheduler to make the build successful in ARM. CC: Ian Campbell ian.campb...@citrix.com CC: Stefano Stabellini stefano.stabell...@citrix.com CC: Tim Deegan t...@xen.org Signed-off-by: Feng Wu feng...@intel.com --- v4: - Newly added xen/include/asm-arm/domain.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index f1a087e..7f25fd5 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -265,6 +265,8 @@ static inline unsigned int domain_max_vcpus(const struct domain *d) return MAX_VIRT_CPUS; } +static void arch_vcpu_wake(struct vcpu *v) {} + #endif /* __ASM_DOMAIN_H__ */ /* -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v4 03/17] Add cmpxchg16b support for x86-64
This patch adds cmpxchg16b support for x86-64, so software can perform 128-bit atomic write/read. CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Feng Wu feng...@intel.com --- v4: - Use pointer as the parameter of __cmpxchg16b(). - Use gcc's __uint128_t built-in type - Make the parameters of __cmpxchg16b() void * v3: - Newly added. xen/include/asm-x86/x86_64/system.h | 32 1 file changed, 32 insertions(+) diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h index 662813a..0267797 100644 --- a/xen/include/asm-x86/x86_64/system.h +++ b/xen/include/asm-x86/x86_64/system.h @@ -6,6 +6,38 @@ (unsigned long)(n),sizeof(*(ptr /* + * Atomic 16 bytes compare and exchange. Compare OLD with MEM, if + * identical, store NEW in MEM. Return the initial value in MEM. + * Success is indicated by comparing RETURN with OLD. + * + * This function can only be called when cpu_has_cx16 is ture. + */ + +static always_inline __uint128_t __cmpxchg16b( +volatile void *ptr, void *old, void *new) +{ +uint64_t prev_high, prev_low; +uint64_t new_high = (*(__uint128_t *)new) 64; +uint64_t new_low = (uint64_t)(*(__uint128_t *)new); +uint64_t old_high = (*(__uint128_t *)old) 64; +uint64_t old_low = (uint64_t)(*(__uint128_t *)old); + +ASSERT(cpu_has_cx16); + +asm volatile ( lock; cmpxchg16b %4 + : =d (prev_high), =a (prev_low) + : c (new_high), b (new_low), + m (*__xg((volatile void *)ptr)), + 0 (old_high), 1 (old_low) + : memory ); + +return ( ((__uint128_t)prev_high) 64 | prev_low ); +} + +#define cmpxchg16b(ptr,o,n) \ +__cmpxchg16b((ptr), (__uint128_t *)(o), (__uint128_t *)(n)) + +/* * This function causes value _o to be changed to _n at location _p. * If this access causes a fault then we return 1, otherwise we return 0. * If no fault occurs then _o is updated to the value we saw at _p. If this -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel