Re: [Xen-devel] [PATCH] x86, paravirt, xen: Remove the 64-bit irq_enable_sysexit pvop
On 04/06/2015 01:44 PM, Andrew Cooper wrote: On 06/04/2015 16:29, Andy Lutomirski wrote: On Mon, Apr 6, 2015 at 7:10 AM, Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote: On Fri, Apr 03, 2015 at 03:52:30PM -0700, Andy Lutomirski wrote: [cc: Boris and Konrad. Whoops] On Fri, Apr 3, 2015 at 3:51 PM, Andy Lutomirski l...@kernel.org wrote: We don't use irq_enable_sysexit on 64-bit kernels any more. Remove Is there an commit (or name of patch) that explains why 32-bit-user-space-on-64-bit kernels is unsavory? sysexit never tasted very good :-p We're (hopefully) not breaking 32-bit-user-space-on-64-bit, but we're trying an unconventional approach to making the code faster and less scary. As a result, 64-bit kernels won't use sysexit any more. Hopefully Xen is okay with the slightly sneaky thing we're doing. AFAICT Xen thinks of sysretl and sysexit as slightly funny irets, so I don't expect there to be any problem. 64bit PV kernels must bounce through Xen to switch from the kernel to the user pagetables (since both kernel and userspace are both actually running in ring3 with user pages). As a result, exit to userspace ends up as a hypercall into Xen which has an effect very similar to an `iret`, but with some extra fixup in the background. I can't forsee any Xen issues as a result of this patch. I ran tip plus this patch (plus another patch that fixes an unrelated Xen regression in tip) through our test suite and it completed without problems. I also ran some very simple 32-bit programs in a 64-bit PV guest and didn't see any problems there neither. -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86, paravirt, xen: Remove the 64-bit irq_enable_sysexit pvop
On 06/04/2015 16:29, Andy Lutomirski wrote: On Mon, Apr 6, 2015 at 7:10 AM, Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote: On Fri, Apr 03, 2015 at 03:52:30PM -0700, Andy Lutomirski wrote: [cc: Boris and Konrad. Whoops] On Fri, Apr 3, 2015 at 3:51 PM, Andy Lutomirski l...@kernel.org wrote: We don't use irq_enable_sysexit on 64-bit kernels any more. Remove Is there an commit (or name of patch) that explains why 32-bit-user-space-on-64-bit kernels is unsavory? sysexit never tasted very good :-p We're (hopefully) not breaking 32-bit-user-space-on-64-bit, but we're trying an unconventional approach to making the code faster and less scary. As a result, 64-bit kernels won't use sysexit any more. Hopefully Xen is okay with the slightly sneaky thing we're doing. AFAICT Xen thinks of sysretl and sysexit as slightly funny irets, so I don't expect there to be any problem. 64bit PV kernels must bounce through Xen to switch from the kernel to the user pagetables (since both kernel and userspace are both actually running in ring3 with user pages). As a result, exit to userspace ends up as a hypercall into Xen which has an effect very similar to an `iret`, but with some extra fixup in the background. I can't forsee any Xen issues as a result of this patch. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86, paravirt, xen: Remove the 64-bit irq_enable_sysexit pvop
On Mon, Apr 6, 2015 at 11:30 AM, Boris Ostrovsky boris.ostrov...@oracle.com wrote: On 04/06/2015 01:44 PM, Andrew Cooper wrote: On 06/04/2015 16:29, Andy Lutomirski wrote: On Mon, Apr 6, 2015 at 7:10 AM, Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote: On Fri, Apr 03, 2015 at 03:52:30PM -0700, Andy Lutomirski wrote: [cc: Boris and Konrad. Whoops] On Fri, Apr 3, 2015 at 3:51 PM, Andy Lutomirski l...@kernel.org wrote: We don't use irq_enable_sysexit on 64-bit kernels any more. Remove Is there an commit (or name of patch) that explains why 32-bit-user-space-on-64-bit kernels is unsavory? sysexit never tasted very good :-p We're (hopefully) not breaking 32-bit-user-space-on-64-bit, but we're trying an unconventional approach to making the code faster and less scary. As a result, 64-bit kernels won't use sysexit any more. Hopefully Xen is okay with the slightly sneaky thing we're doing. AFAICT Xen thinks of sysretl and sysexit as slightly funny irets, so I don't expect there to be any problem. 64bit PV kernels must bounce through Xen to switch from the kernel to the user pagetables (since both kernel and userspace are both actually running in ring3 with user pages). As a result, exit to userspace ends up as a hypercall into Xen which has an effect very similar to an `iret`, but with some extra fixup in the background. I can't forsee any Xen issues as a result of this patch. I ran tip plus this patch (plus another patch that fixes an unrelated Xen regression in tip) through our test suite and it completed without problems. I also ran some very simple 32-bit programs in a 64-bit PV guest and didn't see any problems there neither. At the risk of redundancy, did you test on Intel hardware? At least on native systems, the code in question never executes on AMD systems. --Andy -boris -- Andy Lutomirski AMA Capital Management, LLC ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86, paravirt, xen: Remove the 64-bit irq_enable_sysexit pvop
On 04/06/2015 04:03 PM, Andy Lutomirski wrote: On Mon, Apr 6, 2015 at 11:30 AM, Boris Ostrovsky boris.ostrov...@oracle.com wrote: On 04/06/2015 01:44 PM, Andrew Cooper wrote: On 06/04/2015 16:29, Andy Lutomirski wrote: On Mon, Apr 6, 2015 at 7:10 AM, Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote: On Fri, Apr 03, 2015 at 03:52:30PM -0700, Andy Lutomirski wrote: [cc: Boris and Konrad. Whoops] On Fri, Apr 3, 2015 at 3:51 PM, Andy Lutomirski l...@kernel.org wrote: We don't use irq_enable_sysexit on 64-bit kernels any more. Remove Is there an commit (or name of patch) that explains why 32-bit-user-space-on-64-bit kernels is unsavory? sysexit never tasted very good :-p We're (hopefully) not breaking 32-bit-user-space-on-64-bit, but we're trying an unconventional approach to making the code faster and less scary. As a result, 64-bit kernels won't use sysexit any more. Hopefully Xen is okay with the slightly sneaky thing we're doing. AFAICT Xen thinks of sysretl and sysexit as slightly funny irets, so I don't expect there to be any problem. 64bit PV kernels must bounce through Xen to switch from the kernel to the user pagetables (since both kernel and userspace are both actually running in ring3 with user pages). As a result, exit to userspace ends up as a hypercall into Xen which has an effect very similar to an `iret`, but with some extra fixup in the background. I can't forsee any Xen issues as a result of this patch. I ran tip plus this patch (plus another patch that fixes an unrelated Xen regression in tip) through our test suite and it completed without problems. I also ran some very simple 32-bit programs in a 64-bit PV guest and didn't see any problems there neither. At the risk of redundancy, did you test on Intel hardware? At least on native systems, the code in question never executes on AMD systems. Yes, the tests ran on Intel. I left them scheduled for overnight runs too and that will be executed on both AMD and Intel. -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen: arm: X-Gene Storm check GIC DIST address for EOI quirk
Hi Pranav, Thank you for the patch. On 06/04/2015 10:54, Pranavkumar Sawargaonkar wrote: In old X-Gene Storm firmware and DT, secure mode addresses have been mentioned in GICv2 node. In this case maintenance interrupt is used instead of EOI HW method. This patch checks the GIC Distributor Base Address to enable EOI quirk for old firmware. Ref: http://lists.xen.org/archives/html/xen-devel/2014-07/msg01263.html Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org --- xen/arch/arm/platforms/xgene-storm.c | 37 +- 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/platforms/xgene-storm.c b/xen/arch/arm/platforms/xgene-storm.c index eee650e..dd7cbfc 100644 --- a/xen/arch/arm/platforms/xgene-storm.c +++ b/xen/arch/arm/platforms/xgene-storm.c @@ -22,6 +22,7 @@ #include asm/platform.h #include xen/stdbool.h #include xen/vmap.h +#include xen/device_tree.h #include asm/io.h #include asm/gic.h @@ -35,9 +36,41 @@ static u64 reset_addr, reset_size; static u32 reset_mask; static bool reset_vals_valid = false; +#define XGENE_SEC_GICV2_DIST_ADDR0x7801 +static u32 quirk_guest_pirq_need_eoi; This variable will mostly be read. So, I would add __read_mostly. + +static void xgene_check_pirq_eoi(void) If I'm not mistaken, this function is only called during Xen initialization. So, I would add __init. +{ +struct dt_device_node *node; +int res; +paddr_t dbase; + +dt_for_each_device_node( dt_host, node ) +{ It would be better to create a new callback for platform specific GIC initialization and use dt_interrupt_controller. This would avoid to have this loop and rely on there is always only one interrupt controller in the DT. +if ( !dt_get_property(node, interrupt-controller, NULL) ) +continue; + +res = dt_device_get_address(node, 0, dbase, NULL); +if ( !dbase ) +panic(%s: Cannot find a valid address for the +distributor, __func__); + +/* + * In old X-Gene Storm firmware and DT, secure mode addresses have + * been mentioned in GICv2 node. We have to use maintenance interrupt + * instead of EOI HW in this case. We check the GIC Distributor Base + * Address to maintain compatibility with older firmware. + */ + if (dbase == XGENE_SEC_GICV2_DIST_ADDR) Coding style: if ( ... ) + quirk_guest_pirq_need_eoi = PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI; + else + quirk_guest_pirq_need_eoi = 0; I would print a warning in order to notify the user that his platform would be slow/buggy... +} +} + static uint32_t xgene_storm_quirks(void) { -return PLATFORM_QUIRK_GIC_64K_STRIDE|PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI; +return PLATFORM_QUIRK_GIC_64K_STRIDE| quirk_guest_pirq_need_eoi; This function is called every time Xen injects a physical IRQ to a guest (i.e very often). It might be better to create a variable quirks which will contain PLATFORM_QUIRK_GIC_64K_STRIDE and, when necessary, PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI. That would avoid the or at each call. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-4.3-testing test] 50332: regressions - FAIL
flight 50332 xen-4.3-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/50332/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-freebsd10-i386 8 guest-start fail REGR. vs. 36755 Regressions which are regarded as allowable (not blocking): test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 36755 Tests which did not succeed, but are not blocking: test-amd64-i386-rumpuserxen-i386 1 build-check(1) blocked n/a test-amd64-amd64-rumpuserxen-amd64 1 build-check(1) blocked n/a test-amd64-i386-libvirt 10 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-ovmf-amd64 7 debian-hvm-install fail never pass test-amd64-amd64-libvirt 10 migrate-support-checkfail never pass test-amd64-i386-xl-qemuu-ovmf-amd64 7 debian-hvm-install fail never pass test-armhf-armhf-xl-arndale 5 xen-boot fail never pass test-armhf-armhf-xl-cubietruck 5 xen-boot fail never pass test-armhf-armhf-xl-credit2 5 xen-boot fail never pass test-armhf-armhf-libvirt 5 xen-boot fail never pass test-armhf-armhf-xl-sedf-pin 5 xen-boot fail never pass test-armhf-armhf-xl-multivcpu 5 xen-boot fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 14 guest-stop fail never pass build-amd64-rumpuserxen 6 xen-buildfail never pass build-i386-rumpuserxen6 xen-buildfail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass test-armhf-armhf-xl-sedf 5 xen-boot fail never pass test-armhf-armhf-xl 5 xen-boot fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xend-qemut-winxpsp3 17 leak-check/checkfail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xend-winxpsp3 17 leak-check/check fail never pass test-amd64-amd64-xl-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop fail never pass version targeted for testing: xen 46ed0083a76efa82713ea979b312fa69250380b2 baseline version: xen c58b16ef1572176cf2f6a424b527b5ed4bb73f17 People who touched revisions under test: Andrew Cooper andrew.coop...@citrix.com Ian Campbell ian.campb...@citrix.com Ian Jackson ian.jack...@eu.citrix.com Jan Beulich jbeul...@suse.com Konrad Rzeszutek Wilk konrad.w...@oracle.com jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass build-amd64-rumpuserxen fail build-i386-rumpuserxen fail test-amd64-amd64-xl pass test-armhf-armhf-xl fail test-amd64-i386-xl pass test-amd64-i386-rhel6hvm-amd pass test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64pass test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-freebsd10-amd64 pass
[Xen-devel] [PATCH v6 5/5] libxl: Add interface for querying hypervisor about PCI topology
.. and use this new interface to display it along with CPU topology and NUMA information when 'xl info -n' command is issued The output will look like ... cpu_topology : cpu:coresocket node 0: 000 ... device topology: device node :00:00.0 0 :00:01.0 0 ... Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com --- Changes in v6: * xc_pcitopoinfo() now has to deal with hypercall not finishing processing whole array (due to changes in patch 2): do_sysctl() is now called in a loop). tools/libxc/include/xenctrl.h |3 ++ tools/libxc/xc_misc.c | 44 ++ tools/libxl/libxl.c | 42 + tools/libxl/libxl.h | 12 +++ tools/libxl/libxl_freebsd.c | 12 +++ tools/libxl/libxl_internal.h |5 +++ tools/libxl/libxl_linux.c | 69 + tools/libxl/libxl_netbsd.c| 12 +++ tools/libxl/libxl_types.idl |7 tools/libxl/libxl_utils.c |8 + tools/libxl/xl_cmdimpl.c | 40 +++ 11 files changed, 247 insertions(+), 7 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 4cf8daf..787c29d 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -1229,6 +1229,7 @@ typedef xen_sysctl_physinfo_t xc_physinfo_t; typedef xen_sysctl_cputopo_t xc_cputopo_t; typedef xen_sysctl_numainfo_t xc_numainfo_t; typedef xen_sysctl_meminfo_t xc_meminfo_t; +typedef xen_sysctl_pcitopoinfo_t xc_pcitopoinfo_t; typedef uint32_t xc_cpu_to_node_t; typedef uint32_t xc_cpu_to_socket_t; @@ -1242,6 +1243,8 @@ int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus, xc_cputopo_t *cputopo); int xc_numainfo(xc_interface *xch, unsigned *max_nodes, xc_meminfo_t *meminfo, uint32_t *distance); +int xc_pcitopoinfo(xc_interface *xch, unsigned num_devs, + physdev_pci_device_t *devs, uint32_t *nodes); int xc_sched_id(xc_interface *xch, int *sched_id); diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c index 97cbe63..6e1d50e 100644 --- a/tools/libxc/xc_misc.c +++ b/tools/libxc/xc_misc.c @@ -244,6 +244,50 @@ out: return ret; } +int xc_pcitopoinfo(xc_interface *xch, unsigned num_devs, + physdev_pci_device_t *devs, + uint32_t *nodes) +{ +int ret = 0; +DECLARE_SYSCTL; +DECLARE_HYPERCALL_BOUNCE(devs, num_devs * sizeof(*devs), + XC_HYPERCALL_BUFFER_BOUNCE_IN); +DECLARE_HYPERCALL_BOUNCE(nodes, num_devs* sizeof(*nodes), + XC_HYPERCALL_BUFFER_BOUNCE_BOTH); + +if ( (ret = xc_hypercall_bounce_pre(xch, devs)) ) +goto out; +if ( (ret = xc_hypercall_bounce_pre(xch, nodes)) ) +goto out; + +sysctl.u.pcitopoinfo.first_dev = 0; +sysctl.u.pcitopoinfo.num_devs = num_devs; +set_xen_guest_handle(sysctl.u.pcitopoinfo.devs, devs); +set_xen_guest_handle(sysctl.u.pcitopoinfo.nodes, nodes); + +sysctl.cmd = XEN_SYSCTL_pcitopoinfo; + +while ( sysctl.u.pcitopoinfo.first_dev num_devs ) +{ +if ( (ret = do_sysctl(xch, sysctl)) != 0 ) +{ +/* + * node[] is set to XEN_INVALID_NODE_ID for invalid devices, + * we can just skip those entries. + */ +if ( errno == ENODEV ) +errno = ret = 0; +else +break; +} +} + + out: +xc_hypercall_bounce_post(xch, devs); +xc_hypercall_bounce_post(xch, nodes); + +return ret; +} int xc_sched_id(xc_interface *xch, int *sched_id) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 45cd318..5b3423d 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -5137,6 +5137,48 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out) return ret; } +libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_devs) +{ +GC_INIT(ctx); +physdev_pci_device_t *devs; +uint32_t *nodes; +libxl_pcitopology *ret = NULL; +int i; + +*num_devs = libxl__pci_numdevs(gc); +if (*num_devs 0) { +LOG(ERROR, Unable to determine number of PCI devices); +goto out; +} + +devs = libxl__zalloc(gc, sizeof(*devs) * *num_devs); +nodes = libxl__zalloc(gc, sizeof(*nodes) * *num_devs); + +if (libxl__pci_topology_init(gc, devs, *num_devs)) { +LOGE(ERROR, Cannot initialize PCI hypercall structure); +goto out; +} + +if (xc_pcitopoinfo(ctx-xch, *num_devs, devs, nodes) != 0) { +LOGE(ERROR, PCI topology info hypercall failed); +goto out; +} + +ret = libxl__zalloc(NOGC, sizeof(libxl_pcitopology) * *num_devs); + +for (i = 0; i *num_devs; i++) { +ret[i].seg = devs[i].seg; +ret[i].bus =
[Xen-devel] [PATCH v6 2/5] sysctl: Add sysctl interface for querying PCI topology
Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com --- Changes in v6: * Dropped continuations, the sysctl now returns after 64 iteration if necessary * -ENODEV returned if device is not found * sysctl's first_dev is now expected to be used by userspace to continue the query * Added XSM hooks docs/misc/xsm-flask.txt |1 + xen/common/sysctl.c | 58 +++ xen/include/public/sysctl.h | 30 ++ xen/xsm/flask/hooks.c |1 + xen/xsm/flask/policy/access_vectors |1 + 5 files changed, 91 insertions(+), 0 deletions(-) diff --git a/docs/misc/xsm-flask.txt b/docs/misc/xsm-flask.txt index 90a2aef..4e0f14f 100644 --- a/docs/misc/xsm-flask.txt +++ b/docs/misc/xsm-flask.txt @@ -121,6 +121,7 @@ __HYPERVISOR_sysctl (xen/include/public/sysctl.h) * XEN_SYSCTL_cpupool_op * XEN_SYSCTL_scheduler_op * XEN_SYSCTL_coverage_op + * XEN_SYSCTL_pcitopoinfo __HYPERVISOR_memory_op (xen/include/public/memory.h) diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c index d75440e..449ff70 100644 --- a/xen/common/sysctl.c +++ b/xen/common/sysctl.c @@ -399,6 +399,64 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl) break; #endif +#ifdef HAS_PCI +case XEN_SYSCTL_pcitopoinfo: +{ +xen_sysctl_pcitopoinfo_t *ti = op-u.pcitopoinfo; +unsigned dev_cnt = 0; + +if ( guest_handle_is_null(ti-devs) || + guest_handle_is_null(ti-nodes) || + (ti-first_dev ti-num_devs) ) +{ +ret = -EINVAL; +break; +} + +while ( ti-first_dev ti-num_devs ) +{ +physdev_pci_device_t dev; +uint32_t node; +struct pci_dev *pdev; + +if ( copy_from_guest_offset(dev, ti-devs, ti-first_dev, 1) ) +{ +ret = -EFAULT; +break; +} + +spin_lock(pcidevs_lock); +pdev = pci_get_pdev(dev.seg, dev.bus, dev.devfn); +if ( !pdev ) +{ +ret = -ENODEV; +node = XEN_INVALID_NODE_ID; +} +else if ( pdev-node == NUMA_NO_NODE ) +node = XEN_INVALID_NODE_ID; +else +node = pdev-node; +spin_unlock(pcidevs_lock); + +if ( copy_to_guest_offset(ti-nodes, ti-first_dev, node, 1) ) +{ +ret = -EFAULT; +break; +} + +ti-first_dev++; + +if ( (++dev_cnt 0x3f) hypercall_preempt_check() ) +break; +} + +if ( (ret != -EFAULT) + __copy_field_to_guest(u_sysctl, op, u.pcitopoinfo.first_dev) ) +ret = -EFAULT; +} +break; +#endif + default: ret = arch_do_sysctl(op, u_sysctl); copyback = 0; diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h index 5aa3708..877b661 100644 --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -33,6 +33,7 @@ #include xen.h #include domctl.h +#include physdev.h #define XEN_SYSCTL_INTERFACE_VERSION 0x000C @@ -668,6 +669,33 @@ struct xen_sysctl_psr_cmt_op { typedef struct xen_sysctl_psr_cmt_op xen_sysctl_psr_cmt_op_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cmt_op_t); +/* XEN_SYSCTL_pcitopoinfo */ +struct xen_sysctl_pcitopoinfo { +/* IN: Number of elements in 'pcitopo' and 'nodes' arrays. */ +uint32_t num_devs; + +/* + * IN/OUT: + * IN: First element of pcitopo array that needs to be processed by + * the hypervisor. + * OUT: Index of the first still unprocessed element of pcitopo array. + */ +uint32_t first_dev; + +/* IN: list of devices for which node IDs are requested. */ +XEN_GUEST_HANDLE_64(physdev_pci_device_t) devs; + +/* + * OUT: node identifier for each device. + * If information for a particular device is not avalable then set + * to XEN_INVALID_NODE_ID. In addition, if device is not known to the + * hypervisor, sysctl will stop further processing and return -ENODEV. + */ +XEN_GUEST_HANDLE_64(uint32) nodes; +}; +typedef struct xen_sysctl_pcitopoinfo xen_sysctl_pcitopoinfo_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pcitopoinfo_t); + struct xen_sysctl { uint32_t cmd; #define XEN_SYSCTL_readconsole1 @@ -690,12 +718,14 @@ struct xen_sysctl { #define XEN_SYSCTL_scheduler_op 19 #define XEN_SYSCTL_coverage_op 20 #define XEN_SYSCTL_psr_cmt_op21 +#define XEN_SYSCTL_pcitopoinfo 22 uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */ union { struct xen_sysctl_readconsole readconsole; struct xen_sysctl_tbuf_op tbuf_op; struct xen_sysctl_physinfo physinfo; struct xen_sysctl_cputopoinfo
[Xen-devel] [PATCH v6 1/5] sysctl: Make XEN_SYSCTL_numainfo a little more efficient
A number of changes to XEN_SYSCTL_numainfo interface: * Make sysctl NUMA topology query use fewer copies by combining some fields into a single structure and copying distances for each node in a single copy. * NULL meminfo and distance handles are a request for maximum number of nodes (num_nodes). If those handles are valid and num_nodes is is smaller than the number of nodes in the system then -ENOBUFS is returned (and correct num_nodes is provided) * Instead of using max_node_index for passing number of nodes keep this value in num_nodes: almost all uses of max_node_index required adding or subtracting one to eventually get to number of nodes anyway. * Replace INVALID_NUMAINFO_ID with XEN_INVALID_MEM_SZ and add XEN_INVALID_NODE_DIST. Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com Acked-by: Ian Campbell ian.campb...@citrix.com --- Changes in v6: * updated uint32 to unsigned in sysctl * updated syctl.h comment to reflect right logic for meminfo/distance test * declared distance[] array static to move it off the stack in sysctl * Fixed loop control variable initialization in sysctl tools/libxl/libxl.c | 66 ++- tools/python/xen/lowlevel/xc/xc.c | 58 +--- xen/common/sysctl.c | 78 + xen/include/public/sysctl.h | 53 ++--- 4 files changed, 131 insertions(+), 124 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 2a735b3..b7d6bb0 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -5154,65 +5154,59 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr) { GC_INIT(ctx); xc_numainfo_t ninfo; -DECLARE_HYPERCALL_BUFFER(xc_node_to_memsize_t, memsize); -DECLARE_HYPERCALL_BUFFER(xc_node_to_memfree_t, memfree); -DECLARE_HYPERCALL_BUFFER(uint32_t, node_dists); +DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo); +DECLARE_HYPERCALL_BUFFER(uint32_t, distance); libxl_numainfo *ret = NULL; -int i, j, max_nodes; +int i, j; -max_nodes = libxl_get_max_nodes(ctx); -if (max_nodes 0) -{ +set_xen_guest_handle(ninfo.meminfo, HYPERCALL_BUFFER_NULL); +set_xen_guest_handle(ninfo.distance, HYPERCALL_BUFFER_NULL); +if (xc_numainfo(ctx-xch, ninfo) != 0) { LIBXL__LOG(ctx, XTL_ERROR, Unable to determine number of NODES); ret = NULL; goto out; } -memsize = xc_hypercall_buffer_alloc -(ctx-xch, memsize, sizeof(*memsize) * max_nodes); -memfree = xc_hypercall_buffer_alloc -(ctx-xch, memfree, sizeof(*memfree) * max_nodes); -node_dists = xc_hypercall_buffer_alloc -(ctx-xch, node_dists, sizeof(*node_dists) * max_nodes * max_nodes); -if ((memsize == NULL) || (memfree == NULL) || (node_dists == NULL)) { +meminfo = xc_hypercall_buffer_alloc(ctx-xch, meminfo, +sizeof(*meminfo) * ninfo.num_nodes); +distance = xc_hypercall_buffer_alloc(ctx-xch, distance, + sizeof(*distance) * + ninfo.num_nodes * ninfo.num_nodes); +if ((meminfo == NULL) || (distance == NULL)) { LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM, Unable to allocate hypercall arguments); goto fail; } -set_xen_guest_handle(ninfo.node_to_memsize, memsize); -set_xen_guest_handle(ninfo.node_to_memfree, memfree); -set_xen_guest_handle(ninfo.node_to_node_distance, node_dists); -ninfo.max_node_index = max_nodes - 1; +set_xen_guest_handle(ninfo.meminfo, meminfo); +set_xen_guest_handle(ninfo.distance, distance); if (xc_numainfo(ctx-xch, ninfo) != 0) { LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, getting numainfo); goto fail; } -if (ninfo.max_node_index max_nodes - 1) -max_nodes = ninfo.max_node_index + 1; +*nr = ninfo.num_nodes; -*nr = max_nodes; +ret = libxl__zalloc(NOGC, sizeof(libxl_numainfo) * ninfo.num_nodes); +for (i = 0; i ninfo.num_nodes; i++) +ret[i].dists = libxl__calloc(NOGC, ninfo.num_nodes, sizeof(*distance)); -ret = libxl__zalloc(NOGC, sizeof(libxl_numainfo) * max_nodes); -for (i = 0; i max_nodes; i++) -ret[i].dists = libxl__calloc(NOGC, max_nodes, sizeof(*node_dists)); - -for (i = 0; i max_nodes; i++) { -#define V(mem, i) (mem[i] == INVALID_NUMAINFO_ID) ? \ -LIBXL_NUMAINFO_INVALID_ENTRY : mem[i] -ret[i].size = V(memsize, i); -ret[i].free = V(memfree, i); -ret[i].num_dists = max_nodes; -for (j = 0; j ret[i].num_dists; j++) -ret[i].dists[j] = V(node_dists, i * max_nodes + j); +for (i = 0; i ninfo.num_nodes; i++) { +#define V(val, invalid) (val == invalid) ? \ + LIBXL_NUMAINFO_INVALID_ENTRY : val +ret[i].size = V(meminfo[i].memsize, XEN_INVALID_MEM_SZ); +
[Xen-devel] [PATCH v6 4/5] libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management to libxc
xc_numainfo() is not expected to be used on a hot path and therefore hypercall buffer management can be pushed into libxc. This will simplify life for callers. Also update error logging macros. Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com --- Changes in v6: * Dropped separate NULL buffer tests in xc_numainfo() * Moved test for buffers validity (either both or neither are NULL) to be the the first thing to do in xc_numainfo() tools/libxc/include/xenctrl.h |4 ++- tools/libxc/xc_misc.c | 36 +- tools/libxl/libxl.c | 51 tools/python/xen/lowlevel/xc/xc.c | 38 ++- 4 files changed, 63 insertions(+), 66 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index f298702..4cf8daf 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -1228,6 +1228,7 @@ int xc_send_debug_keys(xc_interface *xch, char *keys); typedef xen_sysctl_physinfo_t xc_physinfo_t; typedef xen_sysctl_cputopo_t xc_cputopo_t; typedef xen_sysctl_numainfo_t xc_numainfo_t; +typedef xen_sysctl_meminfo_t xc_meminfo_t; typedef uint32_t xc_cpu_to_node_t; typedef uint32_t xc_cpu_to_socket_t; @@ -1239,7 +1240,8 @@ typedef uint32_t xc_node_to_node_dist_t; int xc_physinfo(xc_interface *xch, xc_physinfo_t *info); int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus, xc_cputopo_t *cputopo); -int xc_numainfo(xc_interface *xch, xc_numainfo_t *info); +int xc_numainfo(xc_interface *xch, unsigned *max_nodes, +xc_meminfo_t *meminfo, uint32_t *distance); int xc_sched_id(xc_interface *xch, int *sched_id); diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c index 630a86c..97cbe63 100644 --- a/tools/libxc/xc_misc.c +++ b/tools/libxc/xc_misc.c @@ -204,22 +204,44 @@ out: return ret; } -int xc_numainfo(xc_interface *xch, -xc_numainfo_t *put_info) +int xc_numainfo(xc_interface *xch, unsigned *max_nodes, +xc_meminfo_t *meminfo, uint32_t *distance) { int ret; DECLARE_SYSCTL; +DECLARE_HYPERCALL_BOUNCE(meminfo, *max_nodes * sizeof(*meminfo), + XC_HYPERCALL_BUFFER_BOUNCE_OUT); +DECLARE_HYPERCALL_BOUNCE(distance, + *max_nodes * *max_nodes * sizeof(*distance), + XC_HYPERCALL_BUFFER_BOUNCE_OUT); + +if ( !!meminfo ^ !!distance ) +{ +errno = EINVAL; +return -1; +} + +if ( (ret = xc_hypercall_bounce_pre(xch, meminfo)) ) +goto out; +if ((ret = xc_hypercall_bounce_pre(xch, distance)) ) +goto out; + +sysctl.u.numainfo.num_nodes = *max_nodes; +set_xen_guest_handle(sysctl.u.numainfo.meminfo, meminfo); +set_xen_guest_handle(sysctl.u.numainfo.distance, distance); sysctl.cmd = XEN_SYSCTL_numainfo; -memcpy(sysctl.u.numainfo, put_info, sizeof(*put_info)); +if ( (ret = do_sysctl(xch, sysctl)) != 0 ) +goto out; -if ((ret = do_sysctl(xch, sysctl)) != 0) -return ret; +*max_nodes = sysctl.u.numainfo.num_nodes; -memcpy(put_info, sysctl.u.numainfo, sizeof(*put_info)); +out: +xc_hypercall_bounce_post(xch, meminfo); +xc_hypercall_bounce_post(xch, distance); -return 0; +return ret; } diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 697c86d..45cd318 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -5140,61 +5140,44 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out) libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr) { GC_INIT(ctx); -xc_numainfo_t ninfo; -DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo); -DECLARE_HYPERCALL_BUFFER(uint32_t, distance); +xc_meminfo_t *meminfo; +uint32_t *distance; libxl_numainfo *ret = NULL; int i, j; +unsigned num_nodes; -set_xen_guest_handle(ninfo.meminfo, HYPERCALL_BUFFER_NULL); -set_xen_guest_handle(ninfo.distance, HYPERCALL_BUFFER_NULL); -if (xc_numainfo(ctx-xch, ninfo) != 0) { -LIBXL__LOG(ctx, XTL_ERROR, Unable to determine number of NODES); -ret = NULL; +if (xc_numainfo(ctx-xch, num_nodes, NULL, NULL)) { +LOGEV(ERROR, errno, Unable to determine number of nodes); goto out; } -meminfo = xc_hypercall_buffer_alloc(ctx-xch, meminfo, -sizeof(*meminfo) * ninfo.num_nodes); -distance = xc_hypercall_buffer_alloc(ctx-xch, distance, - sizeof(*distance) * - ninfo.num_nodes * ninfo.num_nodes); -if ((meminfo == NULL) || (distance == NULL)) { -LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM, -Unable to allocate hypercall arguments); -goto fail; -} +meminfo = libxl__zalloc(gc,
[Xen-devel] [PATCH v6 0/5] Display IO topology when PXM data is available (plus some cleanup)
Changes in v6: * PCI topology interface changes: no continuations, userspace will be dealing with unfinished sysctl (patches 2 and 5) * Unknown device will cause ENODEV in sysctl * No NULL tests in libxc * Loop control initialization fix (similar to commit 26da081ac91a) * Other minor changes (see per-patch notes) Changes in v5: * Make CPU topology and NUMA info sysctls behave more like XEN_DOMCTL_get_vcpu_msrs when passed NULL buffers. This required toolstack changes as well * Don't use 8-bit data types in interfaces * Fold interface version update into patch#3 Changes in v4: * Split cputopology and NUMA info changes into separate patches * Added patch#1 (partly because patch#4 needs to know when when distance is invalid, i.e. NUMA_NO_DISTANCE) * Split sysctl version update into a separate patch * Other changes are listed in each patch * NOTE: I did not test python's xc changes since I don't think I know how. Changes in v3: * Added patch #1 to more consistently define nodes as a u8 and properly use NUMA_NO_NODE. * Make changes to xen_sysctl_numainfo, similar to those made to xen_sysctl_topologyinfo. (Q: I kept both sets of changes in the same patch #3 to avoid bumping interface version twice. Perhaps it's better to split it into two?) * Instead of copying data for each loop index allocate a buffer and copy once for all three queries in sysctl.c. * Move hypercall buffer management from libxl to libxc (as requested by Dario, patches #5 and #6). * Report topology info for offlined CPUs as well * Added LIBXL_HAVE_PCITOPO macro Changes in v2: * Split topology sysctls into two --- one for CPU topology and the other for devices * Avoid long loops in the hypervisor by using continuations. (I am not particularly happy about using first_dev in the interface, suggestions for a better interface would be appreciated) * Use proper libxl conventions for interfaces * Avoid hypervisor stack corruption when copying PXM data from guest A few patches that add interface for querying hypervisor about device topology and allow 'xl info -n' display this information if PXM object is provided by ACPI. This series also makes some optimizations and cleanup of current CPU topology and NUMA sysctl queries. Boris Ostrovsky (5): sysctl: Make XEN_SYSCTL_numainfo a little more efficient sysctl: Add sysctl interface for querying PCI topology libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management to libxc libxl: Add interface for querying hypervisor about PCI topology docs/misc/xsm-flask.txt |1 + tools/libxc/include/xenctrl.h | 12 ++- tools/libxc/xc_misc.c | 103 +++--- tools/libxl/libxl.c | 160 ++- tools/libxl/libxl.h | 12 +++ tools/libxl/libxl_freebsd.c | 12 +++ tools/libxl/libxl_internal.h|5 + tools/libxl/libxl_linux.c | 69 +++ tools/libxl/libxl_netbsd.c | 12 +++ tools/libxl/libxl_types.idl |7 ++ tools/libxl/libxl_utils.c |8 ++ tools/libxl/xl_cmdimpl.c| 40 +++-- tools/misc/xenpm.c | 51 +-- tools/python/xen/lowlevel/xc/xc.c | 74 ++-- xen/common/sysctl.c | 136 ++ xen/include/public/sysctl.h | 83 +- xen/xsm/flask/hooks.c |1 + xen/xsm/flask/policy/access_vectors |1 + 18 files changed, 554 insertions(+), 233 deletions(-) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 3/5] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc
xc_cputopoinfo() is not expected to be used on a hot path and therefore hypercall buffer management can be pushed into libxc. This will simplify life for callers. Also update error reporting macros. Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com --- Changes in v6: * Dropped NULL buffer test in xc_cputopoinfo() tools/libxc/include/xenctrl.h |5 ++- tools/libxc/xc_misc.c | 23 +++- tools/libxl/libxl.c | 37 -- tools/misc/xenpm.c| 51 - tools/python/xen/lowlevel/xc/xc.c | 20 ++ 5 files changed, 61 insertions(+), 75 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 552ace8..f298702 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -1226,7 +1226,7 @@ int xc_readconsolering(xc_interface *xch, int xc_send_debug_keys(xc_interface *xch, char *keys); typedef xen_sysctl_physinfo_t xc_physinfo_t; -typedef xen_sysctl_cputopoinfo_t xc_cputopoinfo_t; +typedef xen_sysctl_cputopo_t xc_cputopo_t; typedef xen_sysctl_numainfo_t xc_numainfo_t; typedef uint32_t xc_cpu_to_node_t; @@ -1237,7 +1237,8 @@ typedef uint64_t xc_node_to_memfree_t; typedef uint32_t xc_node_to_node_dist_t; int xc_physinfo(xc_interface *xch, xc_physinfo_t *info); -int xc_cputopoinfo(xc_interface *xch, xc_cputopoinfo_t *info); +int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus, + xc_cputopo_t *cputopo); int xc_numainfo(xc_interface *xch, xc_numainfo_t *info); int xc_sched_id(xc_interface *xch, diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c index be68291..630a86c 100644 --- a/tools/libxc/xc_misc.c +++ b/tools/libxc/xc_misc.c @@ -177,22 +177,31 @@ int xc_physinfo(xc_interface *xch, return 0; } -int xc_cputopoinfo(xc_interface *xch, - xc_cputopoinfo_t *put_info) +int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus, + xc_cputopo_t *cputopo) { int ret; DECLARE_SYSCTL; +DECLARE_HYPERCALL_BOUNCE(cputopo, *max_cpus * sizeof(*cputopo), + XC_HYPERCALL_BUFFER_BOUNCE_OUT); -sysctl.cmd = XEN_SYSCTL_cputopoinfo; +if ( (ret = xc_hypercall_bounce_pre(xch, cputopo)) ) +goto out; -memcpy(sysctl.u.cputopoinfo, put_info, sizeof(*put_info)); +sysctl.u.cputopoinfo.num_cpus = *max_cpus; +set_xen_guest_handle(sysctl.u.cputopoinfo.cputopo, cputopo); + +sysctl.cmd = XEN_SYSCTL_cputopoinfo; if ( (ret = do_sysctl(xch, sysctl)) != 0 ) -return ret; +goto out; -memcpy(put_info, sysctl.u.cputopoinfo, sizeof(*put_info)); +*max_cpus = sysctl.u.cputopoinfo.num_cpus; -return 0; +out: +xc_hypercall_bounce_post(xch, cputopo); + +return ret; } int xc_numainfo(xc_interface *xch, diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index b7d6bb0..697c86d 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -5100,37 +5100,28 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo) libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out) { GC_INIT(ctx); -xc_cputopoinfo_t tinfo; -DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo); +xc_cputopo_t *cputopo; libxl_cputopology *ret = NULL; int i; +unsigned num_cpus; -/* Setting buffer to NULL makes the hypercall return number of CPUs */ -set_xen_guest_handle(tinfo.cputopo, HYPERCALL_BUFFER_NULL); -if (xc_cputopoinfo(ctx-xch, tinfo) != 0) +/* Setting buffer to NULL makes the call return number of CPUs */ +if (xc_cputopoinfo(ctx-xch, num_cpus, NULL)) { -LIBXL__LOG(ctx, XTL_ERROR, Unable to determine number of CPUS); -ret = NULL; +LOGEV(ERROR, errno, Unable to determine number of CPUS); goto out; } -cputopo = xc_hypercall_buffer_alloc(ctx-xch, cputopo, -sizeof(*cputopo) * tinfo.num_cpus); -if (cputopo == NULL) { -LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM, -Unable to allocate hypercall arguments); -goto fail; -} -set_xen_guest_handle(tinfo.cputopo, cputopo); +cputopo = libxl__zalloc(gc, sizeof(*cputopo) * num_cpus); -if (xc_cputopoinfo(ctx-xch, tinfo) != 0) { -LIBXL__LOG_ERRNO(ctx, XTL_ERROR, CPU topology info hypercall failed); -goto fail; +if (xc_cputopoinfo(ctx-xch, num_cpus, cputopo)) { +LOGEV(ERROR, errno, CPU topology info hypercall failed); +goto out; } -ret = libxl__zalloc(NOGC, sizeof(libxl_cputopology) * tinfo.num_cpus); +ret = libxl__zalloc(NOGC, sizeof(libxl_cputopology) * num_cpus); -for (i = 0; i tinfo.num_cpus; i++) { +for (i = 0; i num_cpus; i++) { #define V(map, i, invalid) ( cputopo[i].map == invalid) ? \ LIBXL_CPUTOPOLOGY_INVALID_ENTRY : cputopo[i].map
[Xen-devel] [PATCH v15 01/15] qspinlock: A simple generic 4-byte queue spinlock
This patch introduces a new generic queue spinlock implementation that can serve as an alternative to the default ticket spinlock. Compared with the ticket spinlock, this queue spinlock should be almost as fair as the ticket spinlock. It has about the same speed in single-thread and it can be much faster in high contention situations especially when the spinlock is embedded within the data structure to be protected. Only in light to moderate contention where the average queue depth is around 1-3 will this queue spinlock be potentially a bit slower due to the higher slowpath overhead. This queue spinlock is especially suit to NUMA machines with a large number of cores as the chance of spinlock contention is much higher in those machines. The cost of contention is also higher because of slower inter-node memory traffic. Due to the fact that spinlocks are acquired with preemption disabled, the process will not be migrated to another CPU while it is trying to get a spinlock. Ignoring interrupt handling, a CPU can only be contending in one spinlock at any one time. Counting soft IRQ, hard IRQ and NMI, a CPU can only have a maximum of 4 concurrent lock waiting activities. By allocating a set of per-cpu queue nodes and used them to form a waiting queue, we can encode the queue node address into a much smaller 24-bit size (including CPU number and queue node index) leaving one byte for the lock. Please note that the queue node is only needed when waiting for the lock. Once the lock is acquired, the queue node can be released to be used later. Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org --- include/asm-generic/qspinlock.h | 132 + include/asm-generic/qspinlock_types.h | 58 + kernel/Kconfig.locks |7 + kernel/locking/Makefile |1 + kernel/locking/mcs_spinlock.h |1 + kernel/locking/qspinlock.c| 209 + 6 files changed, 408 insertions(+), 0 deletions(-) create mode 100644 include/asm-generic/qspinlock.h create mode 100644 include/asm-generic/qspinlock_types.h create mode 100644 kernel/locking/qspinlock.c diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h new file mode 100644 index 000..315d6dc --- /dev/null +++ b/include/asm-generic/qspinlock.h @@ -0,0 +1,132 @@ +/* + * Queue spinlock + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P. + * + * Authors: Waiman Long waiman.l...@hp.com + */ +#ifndef __ASM_GENERIC_QSPINLOCK_H +#define __ASM_GENERIC_QSPINLOCK_H + +#include asm-generic/qspinlock_types.h + +/** + * queue_spin_is_locked - is the spinlock locked? + * @lock: Pointer to queue spinlock structure + * Return: 1 if it is locked, 0 otherwise + */ +static __always_inline int queue_spin_is_locked(struct qspinlock *lock) +{ + return atomic_read(lock-val); +} + +/** + * queue_spin_value_unlocked - is the spinlock structure unlocked? + * @lock: queue spinlock structure + * Return: 1 if it is unlocked, 0 otherwise + * + * N.B. Whenever there are tasks waiting for the lock, it is considered + * locked wrt the lockref code to avoid lock stealing by the lockref + * code and change things underneath the lock. This also allows some + * optimizations to be applied without conflict with lockref. + */ +static __always_inline int queue_spin_value_unlocked(struct qspinlock lock) +{ + return !atomic_read(lock.val); +} + +/** + * queue_spin_is_contended - check if the lock is contended + * @lock : Pointer to queue spinlock structure + * Return: 1 if lock contended, 0 otherwise + */ +static __always_inline int queue_spin_is_contended(struct qspinlock *lock) +{ + return atomic_read(lock-val) ~_Q_LOCKED_MASK; +} +/** + * queue_spin_trylock - try to acquire the queue spinlock + * @lock : Pointer to queue spinlock structure + * Return: 1 if lock acquired, 0 if failed + */ +static __always_inline int queue_spin_trylock(struct qspinlock *lock) +{ + if (!atomic_read(lock-val) + (atomic_cmpxchg(lock-val, 0, _Q_LOCKED_VAL) == 0)) + return 1; + return 0; +} + +extern void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val); + +/** + * queue_spin_lock - acquire a queue spinlock + * @lock: Pointer to queue spinlock structure + */ +static __always_inline void queue_spin_lock(struct qspinlock *lock) +{ + u32
[Xen-devel] [PATCH v15 11/15] pvqspinlock, x86: Enable PV qspinlock for KVM
This patch adds the necessary KVM specific code to allow KVM to support the CPU halting and kicking operations needed by the queue spinlock PV code. Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/kernel/kvm.c | 43 +++ kernel/Kconfig.locks |2 +- 2 files changed, 44 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index e354cc6..4bb42c0 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -584,6 +584,39 @@ static void kvm_kick_cpu(int cpu) kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid); } + +#ifdef CONFIG_QUEUE_SPINLOCK + +#include asm/qspinlock.h + +static void kvm_wait(u8 *ptr, u8 val) +{ + unsigned long flags; + + if (in_nmi()) + return; + + local_irq_save(flags); + + if (READ_ONCE(*ptr) != val) + goto out; + + /* +* halt until it's our turn and kicked. Note that we do safe halt +* for irq enabled case to avoid hang when lock info is overwritten +* in irq spinlock slowpath and no spurious interrupt occur to save us. +*/ + if (arch_irqs_disabled_flags(flags)) + halt(); + else + safe_halt(); + +out: + local_irq_restore(flags); +} + +#else /* !CONFIG_QUEUE_SPINLOCK */ + enum kvm_contention_stat { TAKEN_SLOW, TAKEN_SLOW_PICKUP, @@ -817,6 +850,8 @@ static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket) } } +#endif /* !CONFIG_QUEUE_SPINLOCK */ + /* * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present. */ @@ -828,8 +863,16 @@ void __init kvm_spinlock_init(void) if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT)) return; +#ifdef CONFIG_QUEUE_SPINLOCK + __pv_init_lock_hash(); + pv_lock_ops.queue_spin_lock_slowpath = __pv_queue_spin_lock_slowpath; + pv_lock_ops.queue_spin_unlock = PV_CALLEE_SAVE(__pv_queue_spin_unlock); + pv_lock_ops.wait = kvm_wait; + pv_lock_ops.kick = kvm_kick_cpu; +#else /* !CONFIG_QUEUE_SPINLOCK */ pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning); pv_lock_ops.unlock_kick = kvm_unlock_kick; +#endif } static __init int kvm_spinlock_init_jump(void) diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index c6a8f7c..537b13e 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -240,7 +240,7 @@ config ARCH_USE_QUEUE_SPINLOCK config QUEUE_SPINLOCK def_bool y if ARCH_USE_QUEUE_SPINLOCK - depends on SMP !PARAVIRT_SPINLOCKS + depends on SMP (!PARAVIRT_SPINLOCKS || !XEN) config ARCH_USE_QUEUE_RWLOCK bool -- 1.7.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v15 04/15] qspinlock: Extract out code snippets for the next patch
This is a preparatory patch that extracts out the following 2 code snippets to prepare for the next performance optimization patch. 1) the logic for the exchange of new and previous tail code words into a new xchg_tail() function. 2) the logic for clearing the pending bit and setting the locked bit into a new clear_pending_set_locked() function. This patch also simplifies the trylock operation before queuing by calling queue_spin_trylock() directly. Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org --- include/asm-generic/qspinlock_types.h |2 + kernel/locking/qspinlock.c| 79 - 2 files changed, 50 insertions(+), 31 deletions(-) diff --git a/include/asm-generic/qspinlock_types.h b/include/asm-generic/qspinlock_types.h index 9c3f5c2..ef36613 100644 --- a/include/asm-generic/qspinlock_types.h +++ b/include/asm-generic/qspinlock_types.h @@ -58,6 +58,8 @@ typedef struct qspinlock { #define _Q_TAIL_CPU_BITS (32 - _Q_TAIL_CPU_OFFSET) #define _Q_TAIL_CPU_MASK _Q_SET_MASK(TAIL_CPU) +#define _Q_TAIL_MASK (_Q_TAIL_IDX_MASK | _Q_TAIL_CPU_MASK) + #define _Q_LOCKED_VAL (1U _Q_LOCKED_OFFSET) #define _Q_PENDING_VAL (1U _Q_PENDING_OFFSET) diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 0351f78..11f6ad9 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -97,6 +97,42 @@ static inline struct mcs_spinlock *decode_tail(u32 tail) #define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK) /** + * clear_pending_set_locked - take ownership and clear the pending bit. + * @lock: Pointer to queue spinlock structure + * + * *,1,0 - *,0,1 + */ +static __always_inline void clear_pending_set_locked(struct qspinlock *lock) +{ + atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, lock-val); +} + +/** + * xchg_tail - Put in the new queue tail code word retrieve previous one + * @lock : Pointer to queue spinlock structure + * @tail : The new queue tail code word + * Return: The previous queue tail code word + * + * xchg(lock, tail) + * + * p,*,* - n,*,* ; prev = xchg(lock, node) + */ +static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) +{ + u32 old, new, val = atomic_read(lock-val); + + for (;;) { + new = (val _Q_LOCKED_PENDING_MASK) | tail; + old = atomic_cmpxchg(lock-val, val, new); + if (old == val) + break; + + val = old; + } + return old; +} + +/** * queue_spin_lock_slowpath - acquire the queue spinlock * @lock: Pointer to queue spinlock structure * @val: Current value of the queue spinlock 32-bit word @@ -178,15 +214,7 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) * * *,1,0 - *,0,1 */ - for (;;) { - new = (val ~_Q_PENDING_MASK) | _Q_LOCKED_VAL; - - old = atomic_cmpxchg(lock-val, val, new); - if (old == val) - break; - - val = old; - } + clear_pending_set_locked(lock); return; /* @@ -203,37 +231,26 @@ queue: node-next = NULL; /* -* We have already touched the queueing cacheline; don't bother with -* pending stuff. -* -* trylock || xchg(lock, node) -* -* 0,0,0 - 0,0,1 ; no tail, not locked - no tail, locked. -* p,y,x - n,y,x ; tail was p - tail is n; preserving locked. +* We touched a (possibly) cold cacheline in the per-cpu queue node; +* attempt the trylock once more in the hope someone let go while we +* weren't watching. */ - for (;;) { - new = _Q_LOCKED_VAL; - if (val) - new = tail | (val _Q_LOCKED_PENDING_MASK); - - old = atomic_cmpxchg(lock-val, val, new); - if (old == val) - break; - - val = old; - } + if (queue_spin_trylock(lock)) + goto release; /* -* we won the trylock; forget about queueing. +* We have already touched the queueing cacheline; don't bother with +* pending stuff. +* +* p,*,* - n,*,* */ - if (new == _Q_LOCKED_VAL) - goto release; + old = xchg_tail(lock, tail); /* * if there was a previous node; link it and wait until reaching the * head of the waitqueue. */ - if (old ~_Q_LOCKED_PENDING_MASK) { + if (old _Q_TAIL_MASK) { prev = decode_tail(old); WRITE_ONCE(prev-next, node); -- 1.7.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v15 07/15] qspinlock: Revert to test-and-set on hypervisors
From: Peter Zijlstra (Intel) pet...@infradead.org When we detect a hypervisor (!paravirt, see qspinlock paravirt support patches), revert to a simple test-and-set lock to avoid the horrors of queue preemption. Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/include/asm/qspinlock.h | 14 ++ include/asm-generic/qspinlock.h |7 +++ kernel/locking/qspinlock.c |3 +++ 3 files changed, 24 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h index 222995b..64c925e 100644 --- a/arch/x86/include/asm/qspinlock.h +++ b/arch/x86/include/asm/qspinlock.h @@ -1,6 +1,7 @@ #ifndef _ASM_X86_QSPINLOCK_H #define _ASM_X86_QSPINLOCK_H +#include asm/cpufeature.h #include asm-generic/qspinlock_types.h #definequeue_spin_unlock queue_spin_unlock @@ -15,6 +16,19 @@ static inline void queue_spin_unlock(struct qspinlock *lock) smp_store_release((u8 *)lock, 0); } +#define virt_queue_spin_lock virt_queue_spin_lock + +static inline bool virt_queue_spin_lock(struct qspinlock *lock) +{ + if (!static_cpu_has(X86_FEATURE_HYPERVISOR)) + return false; + + while (atomic_cmpxchg(lock-val, 0, _Q_LOCKED_VAL) != 0) + cpu_relax(); + + return true; +} + #include asm-generic/qspinlock.h #endif /* _ASM_X86_QSPINLOCK_H */ diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h index 315d6dc..bcbbc5e 100644 --- a/include/asm-generic/qspinlock.h +++ b/include/asm-generic/qspinlock.h @@ -111,6 +111,13 @@ static inline void queue_spin_unlock_wait(struct qspinlock *lock) cpu_relax(); } +#ifndef virt_queue_spin_lock +static __always_inline bool virt_queue_spin_lock(struct qspinlock *lock) +{ + return false; +} +#endif + /* * Initializier */ diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 99503ef..fc2e5ab 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -249,6 +249,9 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS = (1U _Q_TAIL_CPU_BITS)); + if (virt_queue_spin_lock(lock)) + return; + /* * wait for in-progress pending-locked hand-overs * -- 1.7.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v15 03/15] qspinlock: Add pending bit
From: Peter Zijlstra (Intel) pet...@infradead.org Because the qspinlock needs to touch a second cacheline (the per-cpu mcs_nodes[]); add a pending bit and allow a single in-word spinner before we punt to the second cacheline. It is possible so observe the pending bit without the locked bit when the last owner has just released but the pending owner has not yet taken ownership. In this case we would normally queue -- because the pending bit is already taken. However, in this case the pending bit is guaranteed to be released 'soon', therefore wait for it and avoid queueing. Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org Signed-off-by: Waiman Long waiman.l...@hp.com --- include/asm-generic/qspinlock_types.h | 12 +++- kernel/locking/qspinlock.c| 119 +++-- 2 files changed, 107 insertions(+), 24 deletions(-) diff --git a/include/asm-generic/qspinlock_types.h b/include/asm-generic/qspinlock_types.h index c9348d8..9c3f5c2 100644 --- a/include/asm-generic/qspinlock_types.h +++ b/include/asm-generic/qspinlock_types.h @@ -36,8 +36,9 @@ typedef struct qspinlock { * Bitfields in the atomic value: * * 0- 7: locked byte - * 8- 9: tail index - * 10-31: tail cpu (+1) + * 8: pending + * 9-10: tail index + * 11-31: tail cpu (+1) */ #define_Q_SET_MASK(type) (((1U _Q_ ## type ## _BITS) - 1)\ _Q_ ## type ## _OFFSET) @@ -45,7 +46,11 @@ typedef struct qspinlock { #define _Q_LOCKED_BITS 8 #define _Q_LOCKED_MASK _Q_SET_MASK(LOCKED) -#define _Q_TAIL_IDX_OFFSET (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS) +#define _Q_PENDING_OFFSET (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS) +#define _Q_PENDING_BITS1 +#define _Q_PENDING_MASK_Q_SET_MASK(PENDING) + +#define _Q_TAIL_IDX_OFFSET (_Q_PENDING_OFFSET + _Q_PENDING_BITS) #define _Q_TAIL_IDX_BITS 2 #define _Q_TAIL_IDX_MASK _Q_SET_MASK(TAIL_IDX) @@ -54,5 +59,6 @@ typedef struct qspinlock { #define _Q_TAIL_CPU_MASK _Q_SET_MASK(TAIL_CPU) #define _Q_LOCKED_VAL (1U _Q_LOCKED_OFFSET) +#define _Q_PENDING_VAL (1U _Q_PENDING_OFFSET) #endif /* __ASM_GENERIC_QSPINLOCK_TYPES_H */ diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 3456819..0351f78 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -94,24 +94,28 @@ static inline struct mcs_spinlock *decode_tail(u32 tail) return per_cpu_ptr(mcs_nodes[idx], cpu); } +#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK) + /** * queue_spin_lock_slowpath - acquire the queue spinlock * @lock: Pointer to queue spinlock structure * @val: Current value of the queue spinlock 32-bit word * - * (queue tail, lock value) - * - * fast :slow : unlock - *: : - * uncontended (0,0) --:-- (0,1) :-- (*,0) - *: | ^./ : - *: v \ | : - * uncontended:(n,x) --+-- (n,0) | : - * queue: | ^--' | : - *: v | : - * contended :(*,x) --+-- (*,0) - (*,1) ---' : - * queue: ^--' : + * (queue tail, pending bit, lock value) * + * fast :slow :unlock + * : : + * uncontended (0,0,0) -:-- (0,0,1) --:-- (*,*,0) + * : | ^.--. / : + * : v \ \| : + * pending :(0,1,1) +-- (0,1,0) \ | : + * : | ^--' | | : + * : v | | : + * uncontended :(n,x,y) +-- (n,0,0) --' | : + * queue : | ^--' | : + * : v | : + * contended :(*,x,y) +-- (*,0,0) --- (*,0,1) -' : + * queue : ^--' : */ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) { @@ -121,6 +125,75 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS = (1U _Q_TAIL_CPU_BITS)); + /* +* wait for in-progress pending-locked hand-overs +* +* 0,1,0 - 0,0,1 +*/ + if (val == _Q_PENDING_VAL) { + while ((val = atomic_read(lock-val)) == _Q_PENDING_VAL) +
[Xen-devel] [PATCH v15 05/15] qspinlock: Optimize for smaller NR_CPUS
From: Peter Zijlstra (Intel) pet...@infradead.org When we allow for a max NR_CPUS 2^14 we can optimize the pending wait-acquire and the xchg_tail() operations. By growing the pending bit to a byte, we reduce the tail to 16bit. This means we can use xchg16 for the tail part and do away with all the repeated compxchg() operations. This in turn allows us to unconditionally acquire; the locked state as observed by the wait loops cannot change. And because both locked and pending are now a full byte we can use simple stores for the state transition, obviating one atomic operation entirely. This optimization is needed to make the qspinlock achieve performance parity with ticket spinlock at light load. All this is horribly broken on Alpha pre EV56 (and any other arch that cannot do single-copy atomic byte stores). Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org Signed-off-by: Waiman Long waiman.l...@hp.com --- include/asm-generic/qspinlock_types.h | 13 ++ kernel/locking/qspinlock.c| 69 - 2 files changed, 81 insertions(+), 1 deletions(-) diff --git a/include/asm-generic/qspinlock_types.h b/include/asm-generic/qspinlock_types.h index ef36613..f01b55d 100644 --- a/include/asm-generic/qspinlock_types.h +++ b/include/asm-generic/qspinlock_types.h @@ -35,6 +35,14 @@ typedef struct qspinlock { /* * Bitfields in the atomic value: * + * When NR_CPUS 16K + * 0- 7: locked byte + * 8: pending + * 9-15: not used + * 16-17: tail index + * 18-31: tail cpu (+1) + * + * When NR_CPUS = 16K * 0- 7: locked byte * 8: pending * 9-10: tail index @@ -47,7 +55,11 @@ typedef struct qspinlock { #define _Q_LOCKED_MASK _Q_SET_MASK(LOCKED) #define _Q_PENDING_OFFSET (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS) +#if CONFIG_NR_CPUS (1U 14) +#define _Q_PENDING_BITS8 +#else #define _Q_PENDING_BITS1 +#endif #define _Q_PENDING_MASK_Q_SET_MASK(PENDING) #define _Q_TAIL_IDX_OFFSET (_Q_PENDING_OFFSET + _Q_PENDING_BITS) @@ -58,6 +70,7 @@ typedef struct qspinlock { #define _Q_TAIL_CPU_BITS (32 - _Q_TAIL_CPU_OFFSET) #define _Q_TAIL_CPU_MASK _Q_SET_MASK(TAIL_CPU) +#define _Q_TAIL_OFFSET _Q_TAIL_IDX_OFFSET #define _Q_TAIL_MASK (_Q_TAIL_IDX_MASK | _Q_TAIL_CPU_MASK) #define _Q_LOCKED_VAL (1U _Q_LOCKED_OFFSET) diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 11f6ad9..bcc99e6 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -24,6 +24,7 @@ #include linux/percpu.h #include linux/hardirq.h #include linux/mutex.h +#include asm/byteorder.h #include asm/qspinlock.h /* @@ -56,6 +57,10 @@ * node; whereby avoiding the need to carry a node from lock to unlock, and * preserving existing lock API. This also makes the unlock code simpler and * faster. + * + * N.B. The current implementation only supports architectures that allow + * atomic operations on smaller 8-bit and 16-bit data types. + * */ #include mcs_spinlock.h @@ -96,6 +101,62 @@ static inline struct mcs_spinlock *decode_tail(u32 tail) #define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK) +/* + * By using the whole 2nd least significant byte for the pending bit, we + * can allow better optimization of the lock acquisition for the pending + * bit holder. + */ +#if _Q_PENDING_BITS == 8 + +struct __qspinlock { + union { + atomic_t val; + struct { +#ifdef __LITTLE_ENDIAN + u16 locked_pending; + u16 tail; +#else + u16 tail; + u16 locked_pending; +#endif + }; + }; +}; + +/** + * clear_pending_set_locked - take ownership and clear the pending bit. + * @lock: Pointer to queue spinlock structure + * + * *,1,0 - *,0,1 + * + * Lock stealing is not allowed if this function is used. + */ +static __always_inline void clear_pending_set_locked(struct qspinlock *lock) +{ + struct __qspinlock *l = (void *)lock; + + WRITE_ONCE(l-locked_pending, _Q_LOCKED_VAL); +} + +/* + * xchg_tail - Put in the new queue tail code word retrieve previous one + * @lock : Pointer to queue spinlock structure + * @tail : The new queue tail code word + * Return: The previous queue tail code word + * + * xchg(lock, tail) + * + * p,*,* - n,*,* ; prev = xchg(lock, node) + */ +static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) +{ + struct __qspinlock *l = (void *)lock; + + return (u32)xchg(l-tail, tail _Q_TAIL_OFFSET) _Q_TAIL_OFFSET; +} + +#else /* _Q_PENDING_BITS == 8 */ + /** * clear_pending_set_locked - take ownership and clear the pending bit. * @lock: Pointer to queue spinlock structure @@ -131,6 +192,7 @@ static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) } return old; } +#endif /* _Q_PENDING_BITS == 8 */
Re: [Xen-devel] [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running
-Original Message- From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com] Sent: Friday, April 03, 2015 9:37 PM To: Wu, Feng Cc: Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org; jbeul...@suse.com Subject: Re: [Xen-devel] [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running On Fri, Apr 03, 2015 at 02:00:24AM +, Wu, Feng wrote: -Original Message- From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com] Sent: Friday, April 03, 2015 3:15 AM To: Tian, Kevin Cc: Wu, Feng; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org; jbeul...@suse.com Subject: Re: [Xen-devel] [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running On Thu, Apr 02, 2015 at 06:08:12AM +, Tian, Kevin wrote: From: Wu, Feng Sent: Friday, March 27, 2015 12:58 PM -Original Message- From: Zhang, Yang Z Sent: Friday, March 27, 2015 12:44 PM To: Wu, Feng; xen-devel@lists.xen.org Cc: jbeul...@suse.com; k...@xen.org; Tian, Kevin Subject: RE: [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running Wu, Feng wrote on 2015-03-27: Zhang, Yang Z wrote on 2015-03-25: when vCPU is running Wu, Feng wrote on 2015-03-25: When a vCPU is running in Root mode and a notification event has been injected to it. we need to set VCPU_KICK_SOFTIRQ for the current cpu, so the pending interrupt in PIRR will be synced to vIRR before This would imply that we had VMEXIT-ed due to pending interrupt? And we end up calling 'do_IRQ'? If so then the DPCI_SOFTIRQ ends up being set and you stll end up calling the softirq code? No. Here is the scenario for the description of this patch: When vCPU is running in root-mode (such as via hypercall, or any other reasons which can result in VM-Exit), and before vCPU is back to non-root, external interrupts happen. Notice that the VM-exit is not caused by this external interrupt. Thank you for the explanation. You might want to add that in the commit along with the explanation of the code flow below! Good idea! Thank you! Thanks, Feng Thanks, Feng VM-Exit in time. Shouldn't the pending interrupt be synced unconditionally before next vmentry? What happens if we didn't set the softirq? If we didn't set the softirq in the notification handler, the interrupts happened exactly before VM-entry cannot be delivered to guest at this time. Please see the following code fragments from xen/arch/x86/hvm/vmx/entry.S: (pls pay attention to the comments) .Lvmx_do_vmentry .. /* If Vt-d engine issues a notification event here, * it cannot be delivered to guest during this VM-entry * without raising the softirq in notification handler. */ cmp %ecx,(%rdx,%rax,1) jnz .Lvmx_process_softirqs .. je .Lvmx_launch .. .Lvmx_process_softirqs: sti call do_softirq jmp .Lvmx_do_vmentry You are right! This helps me to recall why raise the softirq when delivering the PI. Yes, __vmx_deliver_posted_interrupt() is the software way to deliver PI, it sets the softirq for this purpose, however, when VT-d HW delivers PI, we have no control to the HW itself, hence we need to set this softirq in the Notification Event handler. could you include this information in the comment so others can easily understand this requirement? from code you only mentioned VCPU_KICK _SOFTIRQ is required, but how it leads to PIRR-VIRR sync is not explained. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Nested Virt - Xen 4.4 through 4.6 - Hyper-V; Can't boot after enabling Hyper-V
Hi -- I've been trying to get nested virtualization working with Xen so that I could boot Windows and use Hyper-V related features, however I have not had much success. Using Windows 8.1 or Windows 2012r2, I'm able to install Windows, select and install Hyper-V features, and start rebooting. However, at that point, the Windows VM only partially boots, then drops me to a screen stating: Your PC needs to restart. Please hold down the power button. Error Code: 0x001E Parameters: 0xC096 0xF80315430485 0x 0x Restarting does not yield any different results. I've set up Xen in accordance with the notes for patches and config options here: http://wiki.xenproject.org/wiki/Nested_Virtualization_in_Xen Trying Xen 4.4.2 stable, 4.5.1 staging, and 4.6 staging. I applied the patch labeled (2/2) from the wiki link above, compiled, and used the three options provided for the DomU running Windows (hap, nestedhvm, and cpuid mask). Windows installs and allows me to turn on HyperV features on all versions of Xen listed above, however all give the same or similar message on reboot... I'm never able to get to a running state. I've tried this on two separate systems. One has an Intel E5-1620 v2, and the other is a n E5-1650 (original, v1 I guess). All the virtualization options are enabled in the BIOS. If the cpuid mask is removed from the DomU config, Windows boots, however I'm unable to start any virtual machines (there was a message in the Windows event log about a component not being started in regards to Hyper V). Has anyone else run into similar issues? Any thoughts on next steps? ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v15 14/15] pvqspinlock: Improve slowpath performance by avoiding cmpxchg
In the pv_scan_next() function, the slow cmpxchg atomic operation is performed even if the other CPU is not even close to being halted. This extra cmpxchg can harm slowpath performance. This patch introduces the new mayhalt flag to indicate if the other spinning CPU is close to being halted or not. The current threshold for x86 is 2k cpu_relax() calls. If this flag is not set, the other spinning CPU will have at least 2k more cpu_relax() calls before it can enter the halt state. This should give enough time for the setting of the locked flag in struct mcs_spinlock to propagate to that CPU without using atomic op. Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock_paravirt.h | 28 +--- 1 files changed, 25 insertions(+), 3 deletions(-) diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h index a210061..a9fe10d 100644 --- a/kernel/locking/qspinlock_paravirt.h +++ b/kernel/locking/qspinlock_paravirt.h @@ -16,7 +16,8 @@ * native_queue_spin_unlock(). */ -#define _Q_SLOW_VAL(3U _Q_LOCKED_OFFSET) +#define _Q_SLOW_VAL(3U _Q_LOCKED_OFFSET) +#define MAYHALT_THRESHOLD (SPIN_THRESHOLD 4) /* * The vcpu_hashed is a special state that is set by the new lock holder on @@ -36,6 +37,7 @@ struct pv_node { int cpu; u8 state; + u8 mayhalt; }; /* @@ -187,6 +189,7 @@ static void pv_init_node(struct mcs_spinlock *node) pn-cpu = smp_processor_id(); pn-state = vcpu_running; + pn-mayhalt = false; } /* @@ -203,17 +206,27 @@ static void pv_wait_node(struct mcs_spinlock *node) for (loop = SPIN_THRESHOLD; loop; loop--) { if (READ_ONCE(node-locked)) return; + if (loop == MAYHALT_THRESHOLD) + xchg(pn-mayhalt, true); cpu_relax(); } /* -* Order pn-state vs pn-locked thusly: +* Order pn-state/pn-mayhalt vs pn-locked thusly: * -* [S] pn-state = vcpu_halted[S] next-locked = 1 +* [S] pn-mayhalt = 1[S] next-locked = 1 +* MB, delay barrier() +* [S] pn-state = vcpu_halted[L] pn-mayhalt * MB MB * [L] pn-locked [RmW] pn-state = vcpu_hashed * * Matches the cmpxchg() from pv_scan_next(). +* +* As the new lock holder may quit (when pn-mayhalt is not +* set) without memory barrier, a sufficiently long delay is +* inserted between the setting of pn-mayhalt and pn-state +* to ensure that there is enough time for the new pn-locked +* value to be propagated here to be checked below. */ (void)xchg(pn-state, vcpu_halted); @@ -226,6 +239,7 @@ static void pv_wait_node(struct mcs_spinlock *node) * needs to move on to pv_wait_head(). */ (void)cmpxchg(pn-state, vcpu_halted, vcpu_running); + pn-mayhalt = false; } /* @@ -246,6 +260,14 @@ static void pv_scan_next(struct qspinlock *lock, struct mcs_spinlock *node) struct __qspinlock *l = (void *)lock; /* +* If mayhalt is not set, there is enough time for the just set value +* in pn-locked to be propagated to the other CPU before it is time +* to halt. +*/ + if (!READ_ONCE(pn-mayhalt)) + return; + + /* * Transition CPU state: halted = hashed * Quit if the transition failed. */ -- 1.7.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support
v14-v15: - Incorporate PeterZ's v15 qspinlock patch and improve upon the PV qspinlock code by dynamically allocating the hash table as well as some other performance optimization. - Simplified the Xen PV qspinlock code as suggested by David Vrabel david.vra...@citrix.com. - Add benchmarking data for 3.19 kernel to compare the performance of a spinlock heavy test with and without the qspinlock patch under different cpufreq drivers and scaling governors. v13-v14: - Patches 1 2: Add queue_spin_unlock_wait() to accommodate commit 78bff1c86 from Oleg Nesterov. - Fix the system hang problem when using PV qspinlock in an over-committed guest due to a racing condition in the pv_set_head_in_tail() function. - Increase the MAYHALT_THRESHOLD from 10 to 1024. - Change kick_cpu into a regular function pointer instead of a callee-saved function. - Change lock statistics code to use separate bits for different statistics. v12-v13: - Change patch 9 to generate separate versions of the queue_spin_lock_slowpath functions for bare metal and PV guest. This reduces the performance impact of the PV code on bare metal systems. v11-v12: - Based on PeterZ's version of the qspinlock patch (https://lkml.org/lkml/2014/6/15/63). - Incorporated many of the review comments from Konrad Wilk and Paolo Bonzini. - The pvqspinlock code is largely from my previous version with PeterZ's way of going from queue tail to head and his idea of using callee saved calls to KVM and XEN codes. v10-v11: - Use a simple test-and-set unfair lock to simplify the code, but performance may suffer a bit for large guest with many CPUs. - Take out Raghavendra KT's test results as the unfair lock changes may render some of his results invalid. - Add PV support without increasing the size of the core queue node structure. - Other minor changes to address some of the feedback comments. v9-v10: - Make some minor changes to qspinlock.c to accommodate review feedback. - Change author to PeterZ for 2 of the patches. - Include Raghavendra KT's test results in patch 18. v8-v9: - Integrate PeterZ's version of the queue spinlock patch with some modification: http://lkml.kernel.org/r/20140310154236.038181...@infradead.org - Break the more complex patches into smaller ones to ease review effort. - Fix a racing condition in the PV qspinlock code. v7-v8: - Remove one unneeded atomic operation from the slowpath, thus improving performance. - Simplify some of the codes and add more comments. - Test for X86_FEATURE_HYPERVISOR CPU feature bit to enable/disable unfair lock. - Reduce unfair lock slowpath lock stealing frequency depending on its distance from the queue head. - Add performance data for IvyBridge-EX CPU. v6-v7: - Remove an atomic operation from the 2-task contending code - Shorten the names of some macros - Make the queue waiter to attempt to steal lock when unfair lock is enabled. - Remove lock holder kick from the PV code and fix a race condition - Run the unfair lock PV code on overcommitted KVM guests to collect performance data. v5-v6: - Change the optimized 2-task contending code to make it fairer at the expense of a bit of performance. - Add a patch to support unfair queue spinlock for Xen. - Modify the PV qspinlock code to follow what was done in the PV ticketlock. - Add performance data for the unfair lock as well as the PV support code. v4-v5: - Move the optimized 2-task contending code to the generic file to enable more architectures to use it without code duplication. - Address some of the style-related comments by PeterZ. - Allow the use of unfair queue spinlock in a real para-virtualized execution environment. - Add para-virtualization support to the qspinlock code by ensuring that the lock holder and queue head stay alive as much as possible. v3-v4: - Remove debugging code and fix a configuration error - Simplify the qspinlock structure and streamline the code to make it perform a bit better - Add an x86 version of asm/qspinlock.h for holding x86 specific optimization. - Add an optimized x86 code path for 2 contending tasks to improve low contention performance. v2-v3: - Simplify the code by using numerous mode only without an unfair option. - Use the latest smp_load_acquire()/smp_store_release() barriers. - Move the queue spinlock code to kernel/locking. - Make the use of queue spinlock the default for x86-64 without user configuration. - Additional performance tuning. v1-v2: - Add some more comments to document what the code does. - Add a numerous CPU mode to support = 16K CPUs - Add a configuration option to allow lock stealing which can further improve performance in many cases. - Enable wakeup of queue head CPU at unlock time for non-numerous CPU mode. This patch set has 3 different sections: 1) Patches 1-6: Introduces a queue-based
[Xen-devel] [PATCH v15 13/15] pvqspinlock: Only kick CPU at unlock time
Before this patch, a CPU may have been kicked twice before getting the lock - one before it becomes queue head and once before it gets the lock. All these CPU kicking and halting (VMEXIT) can be expensive and slow down system performance, especially in an overcommitted guest. This patch add a new vCPU state (vcpu_hashed) which enables the code to delay CPU kicking until at unlock time. Once this state is set, the new lock holder will set _Q_SLOW_VAL and fill in the hash table on behalf of the halted queue head vCPU. Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 10 ++-- kernel/locking/qspinlock_paravirt.h | 76 +-- 2 files changed, 59 insertions(+), 27 deletions(-) diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 33b3f54..b9ba83b 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -239,8 +239,8 @@ static __always_inline void set_locked(struct qspinlock *lock) static __always_inline void __pv_init_node(struct mcs_spinlock *node) { } static __always_inline void __pv_wait_node(struct mcs_spinlock *node) { } -static __always_inline void __pv_kick_node(struct mcs_spinlock *node) { } - +static __always_inline void __pv_scan_next(struct qspinlock *lock, + struct mcs_spinlock *node) { } static __always_inline void __pv_wait_head(struct qspinlock *lock, struct mcs_spinlock *node) { } @@ -248,7 +248,7 @@ static __always_inline void __pv_wait_head(struct qspinlock *lock, #define pv_init_node __pv_init_node #define pv_wait_node __pv_wait_node -#define pv_kick_node __pv_kick_node +#define pv_scan_next __pv_scan_next #define pv_wait_head __pv_wait_head @@ -441,7 +441,7 @@ queue: cpu_relax(); arch_mcs_spin_unlock_contended(next-locked); - pv_kick_node(next); + pv_scan_next(lock, next); release: /* @@ -462,7 +462,7 @@ EXPORT_SYMBOL(queue_spin_lock_slowpath); #undef pv_init_node #undef pv_wait_node -#undef pv_kick_node +#undef pv_scan_next #undef pv_wait_head #undef queue_spin_lock_slowpath diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h index 49dbd39..a210061 100644 --- a/kernel/locking/qspinlock_paravirt.h +++ b/kernel/locking/qspinlock_paravirt.h @@ -18,9 +18,16 @@ #define _Q_SLOW_VAL(3U _Q_LOCKED_OFFSET) +/* + * The vcpu_hashed is a special state that is set by the new lock holder on + * the new queue head to indicate that _Q_SLOW_VAL is set and hash entry + * filled. With this state, the queue head CPU will always be kicked even + * if it is not halted to avoid potential racing condition. + */ enum vcpu_state { vcpu_running = 0, vcpu_halted, + vcpu_hashed }; struct pv_node { @@ -97,7 +104,13 @@ static inline u32 hash_align(u32 hash) return hash ~(PV_HB_PER_LINE - 1); } -static struct qspinlock **pv_hash(struct qspinlock *lock, struct pv_node *node) +/* + * Set up an entry in the lock hash table + * This is not inlined to reduce size of generated code as it is included + * twice and is used only in the slowest path of handling CPU halting. + */ +static noinline struct qspinlock ** +pv_hash(struct qspinlock *lock, struct pv_node *node) { unsigned long init_hash, hash = hash_ptr(lock, pv_lock_hash_bits); struct pv_hash_bucket *hb, *end; @@ -178,7 +191,8 @@ static void pv_init_node(struct mcs_spinlock *node) /* * Wait for node-locked to become true, halt the vcpu after a short spin. - * pv_kick_node() is used to wake the vcpu again. + * pv_scan_next() is used to set _Q_SLOW_VAL and fill in hash table on its + * behalf. */ static void pv_wait_node(struct mcs_spinlock *node) { @@ -189,7 +203,6 @@ static void pv_wait_node(struct mcs_spinlock *node) for (loop = SPIN_THRESHOLD; loop; loop--) { if (READ_ONCE(node-locked)) return; - cpu_relax(); } @@ -198,17 +211,21 @@ static void pv_wait_node(struct mcs_spinlock *node) * * [S] pn-state = vcpu_halted[S] next-locked = 1 * MB MB -* [L] pn-locked [RmW] pn-state = vcpu_running +* [L] pn-locked [RmW] pn-state = vcpu_hashed * -* Matches the xchg() from pv_kick_node(). +* Matches the cmpxchg() from pv_scan_next(). */ (void)xchg(pn-state, vcpu_halted); if (!READ_ONCE(node-locked)) pv_wait(pn-state, vcpu_halted); - /* Make sure that state is correct for spurious wakeup */ - WRITE_ONCE(pn-state, vcpu_running); + /* +
[Xen-devel] [PATCH v15 02/15] qspinlock, x86: Enable x86-64 to use queue spinlock
This patch makes the necessary changes at the x86 architecture specific layer to enable the use of queue spinlock for x86-64. As x86-32 machines are typically not multi-socket. The benefit of queue spinlock may not be apparent. So queue spinlock is not enabled. Currently, there is some incompatibilities between the para-virtualized spinlock code (which hard-codes the use of ticket spinlock) and the queue spinlock. Therefore, the use of queue spinlock is disabled when the para-virtualized spinlock is enabled. The arch/x86/include/asm/qspinlock.h header file includes some x86 specific optimization which will make the queue spinlock code perform better than the generic implementation. Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org --- arch/x86/Kconfig |1 + arch/x86/include/asm/qspinlock.h | 20 arch/x86/include/asm/spinlock.h |5 + arch/x86/include/asm/spinlock_types.h |4 4 files changed, 30 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/qspinlock.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index b7d31ca..49fecb1 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -125,6 +125,7 @@ config X86 select MODULES_USE_ELF_RELA if X86_64 select CLONE_BACKWARDS if X86_32 select ARCH_USE_BUILTIN_BSWAP + select ARCH_USE_QUEUE_SPINLOCK select ARCH_USE_QUEUE_RWLOCK select OLD_SIGSUSPEND3 if X86_32 || IA32_EMULATION select OLD_SIGACTION if X86_32 diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h new file mode 100644 index 000..222995b --- /dev/null +++ b/arch/x86/include/asm/qspinlock.h @@ -0,0 +1,20 @@ +#ifndef _ASM_X86_QSPINLOCK_H +#define _ASM_X86_QSPINLOCK_H + +#include asm-generic/qspinlock_types.h + +#definequeue_spin_unlock queue_spin_unlock +/** + * queue_spin_unlock - release a queue spinlock + * @lock : Pointer to queue spinlock structure + * + * A smp_store_release() on the least-significant byte. + */ +static inline void queue_spin_unlock(struct qspinlock *lock) +{ + smp_store_release((u8 *)lock, 0); +} + +#include asm-generic/qspinlock.h + +#endif /* _ASM_X86_QSPINLOCK_H */ diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index cf87de3..a9c01fd 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -42,6 +42,10 @@ extern struct static_key paravirt_ticketlocks_enabled; static __always_inline bool static_key_false(struct static_key *key); +#ifdef CONFIG_QUEUE_SPINLOCK +#include asm/qspinlock.h +#else + #ifdef CONFIG_PARAVIRT_SPINLOCKS static inline void __ticket_enter_slowpath(arch_spinlock_t *lock) @@ -196,6 +200,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *lock) cpu_relax(); } } +#endif /* CONFIG_QUEUE_SPINLOCK */ /* * Read-write spinlocks, allowing multiple readers diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h index 5f9d757..5d654a1 100644 --- a/arch/x86/include/asm/spinlock_types.h +++ b/arch/x86/include/asm/spinlock_types.h @@ -23,6 +23,9 @@ typedef u32 __ticketpair_t; #define TICKET_SHIFT (sizeof(__ticket_t) * 8) +#ifdef CONFIG_QUEUE_SPINLOCK +#include asm-generic/qspinlock_types.h +#else typedef struct arch_spinlock { union { __ticketpair_t head_tail; @@ -33,6 +36,7 @@ typedef struct arch_spinlock { } arch_spinlock_t; #define __ARCH_SPIN_LOCK_UNLOCKED { { 0 } } +#endif /* CONFIG_QUEUE_SPINLOCK */ #include asm-generic/qrwlock_types.h -- 1.7.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v15 06/15] qspinlock: Use a simple write to grab the lock
Currently, atomic_cmpxchg() is used to get the lock. However, this is not really necessary if there is more than one task in the queue and the queue head don't need to reset the tail code. For that case, a simple write to set the lock bit is enough as the queue head will be the only one eligible to get the lock as long as it checks that both the lock and pending bits are not set. The current pending bit waiting code will ensure that the bit will not be set as soon as the tail code in the lock is set. With that change, the are some slight improvement in the performance of the queue spinlock in the 5M loop micro-benchmark run on a 4-socket Westere-EX machine as shown in the tables below. [Standalone/Embedded - same node] # of tasksBefore patchAfter patch %Change ----- -- --- 3 2324/2321 2248/2265-3%/-2% 4 2890/2896 2819/2831-2%/-2% 5 3611/3595 3522/3512-2%/-2% 6 4281/4276 4173/4160-3%/-3% 7 5018/5001 4875/4861-3%/-3% 8 5759/5750 5563/5568-3%/-3% [Standalone/Embedded - different nodes] # of tasksBefore patchAfter patch %Change ----- -- --- 312242/12237 12087/12093 -1%/-1% 410688/10696 10507/10521 -2%/-2% It was also found that this change produced a much bigger performance improvement in the newer IvyBridge-EX chip and was essentially to close the performance gap between the ticket spinlock and queue spinlock. The disk workload of the AIM7 benchmark was run on a 4-socket Westmere-EX machine with both ext4 and xfs RAM disks at 3000 users on a 3.14 based kernel. The results of the test runs were: AIM7 XFS Disk Test kernel JPMReal Time Sys TimeUsr Time - ---- ticketlock56782333.17 96.61 5.81 qspinlock 57507993.13 94.83 5.97 AIM7 EXT4 Disk Test kernel JPMReal Time Sys TimeUsr Time - ---- ticketlock1114551 16.15 509.72 7.11 qspinlock 21844668.24 232.99 6.01 The ext4 filesystem run had a much higher spinlock contention than the xfs filesystem run. The ebizzy -m test was also run with the following results: kernel records/s Real Time Sys TimeUsr Time -- - ticketlock 2075 10.00 216.35 3.49 qspinlock 3023 10.00 198.20 4.80 Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org --- kernel/locking/qspinlock.c | 66 +-- 1 files changed, 50 insertions(+), 16 deletions(-) diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index bcc99e6..99503ef 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -105,24 +105,37 @@ static inline struct mcs_spinlock *decode_tail(u32 tail) * By using the whole 2nd least significant byte for the pending bit, we * can allow better optimization of the lock acquisition for the pending * bit holder. + * + * This internal structure is also used by the set_locked function which + * is not restricted to _Q_PENDING_BITS == 8. */ -#if _Q_PENDING_BITS == 8 - struct __qspinlock { union { atomic_t val; - struct { #ifdef __LITTLE_ENDIAN + struct { + u8 locked; + u8 pending; + }; + struct { u16 locked_pending; u16 tail; + }; #else + struct { u16 tail; u16 locked_pending; -#endif }; + struct { + u8 reserved[2]; + u8 pending; + u8 locked; + }; +#endif }; }; +#if _Q_PENDING_BITS == 8 /** * clear_pending_set_locked - take ownership and clear the pending bit. * @lock: Pointer to queue spinlock structure @@ -195,6 +208,19 @@ static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) #endif /* _Q_PENDING_BITS == 8 */ /** + * set_locked - Set the lock bit and own the lock + * @lock: Pointer to queue spinlock structure + * + * *,*,0 - *,0,1 + */ +static __always_inline void set_locked(struct qspinlock *lock) +{ + struct __qspinlock *l = (void *)lock; + + WRITE_ONCE(l-locked, _Q_LOCKED_VAL); +} + +/** *
[Xen-devel] [PATCH v15 10/15] pvqspinlock: Implement the paravirt qspinlock for x86
From: Peter Zijlstra (Intel) pet...@infradead.org We use the regular paravirt call patching to switch between: native_queue_spin_lock_slowpath() __pv_queue_spin_lock_slowpath() native_queue_spin_unlock()__pv_queue_spin_unlock() We use a callee saved call for the unlock function which reduces the i-cache footprint and allows 'inlining' of SPIN_UNLOCK functions again. We further optimize the unlock path by patching the direct call with a movb $0,%arg1 if we are indeed using the native unlock code. This makes the unlock code almost as fast as the !PARAVIRT case. This significantly lowers the overhead of having CONFIG_PARAVIRT_SPINLOCKS enabled, even for native code. Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/Kconfig |2 +- arch/x86/include/asm/paravirt.h | 28 +++- arch/x86/include/asm/paravirt_types.h | 10 ++ arch/x86/include/asm/qspinlock.h | 25 - arch/x86/kernel/paravirt-spinlocks.c | 24 +++- arch/x86/kernel/paravirt_patch_32.c | 22 ++ arch/x86/kernel/paravirt_patch_64.c | 22 ++ 7 files changed, 121 insertions(+), 12 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 49fecb1..a0946e7 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -661,7 +661,7 @@ config PARAVIRT_DEBUG config PARAVIRT_SPINLOCKS bool Paravirtualization layer for spinlocks depends on PARAVIRT SMP - select UNINLINE_SPIN_UNLOCK + select UNINLINE_SPIN_UNLOCK if !QUEUE_SPINLOCK ---help--- Paravirtualized spinlocks allow a pvops backend to replace the spinlock implementation with something virtualization-friendly diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 965c47d..dd40269 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -712,6 +712,30 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, #if defined(CONFIG_SMP) defined(CONFIG_PARAVIRT_SPINLOCKS) +#ifdef CONFIG_QUEUE_SPINLOCK + +static __always_inline void pv_queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) +{ + PVOP_VCALL2(pv_lock_ops.queue_spin_lock_slowpath, lock, val); +} + +static __always_inline void pv_queue_spin_unlock(struct qspinlock *lock) +{ + PVOP_VCALLEE1(pv_lock_ops.queue_spin_unlock, lock); +} + +static __always_inline void pv_wait(u8 *ptr, u8 val) +{ + PVOP_VCALL2(pv_lock_ops.wait, ptr, val); +} + +static __always_inline void pv_kick(int cpu) +{ + PVOP_VCALL1(pv_lock_ops.kick, cpu); +} + +#else /* !CONFIG_QUEUE_SPINLOCK */ + static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, __ticket_t ticket) { @@ -724,7 +748,9 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket); } -#endif +#endif /* CONFIG_QUEUE_SPINLOCK */ + +#endif /* SMP PARAVIRT_SPINLOCKS */ #ifdef CONFIG_X86_32 #define PV_SAVE_REGS pushl %ecx; pushl %edx; diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 7549b8b..f6acaea 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -333,9 +333,19 @@ struct arch_spinlock; typedef u16 __ticket_t; #endif +struct qspinlock; + struct pv_lock_ops { +#ifdef CONFIG_QUEUE_SPINLOCK + void (*queue_spin_lock_slowpath)(struct qspinlock *lock, u32 val); + struct paravirt_callee_save queue_spin_unlock; + + void (*wait)(u8 *ptr, u8 val); + void (*kick)(int cpu); +#else /* !CONFIG_QUEUE_SPINLOCK */ struct paravirt_callee_save lock_spinning; void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket); +#endif /* !CONFIG_QUEUE_SPINLOCK */ }; /* This contains all the paravirt structures: we get a convenient diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h index 64c925e..c8290db 100644 --- a/arch/x86/include/asm/qspinlock.h +++ b/arch/x86/include/asm/qspinlock.h @@ -3,6 +3,7 @@ #include asm/cpufeature.h #include asm-generic/qspinlock_types.h +#include asm/paravirt.h #definequeue_spin_unlock queue_spin_unlock /** @@ -11,11 +12,33 @@ * * A smp_store_release() on the least-significant byte. */ -static inline void queue_spin_unlock(struct qspinlock *lock) +static inline void native_queue_spin_unlock(struct qspinlock *lock) { smp_store_release((u8 *)lock, 0); } +#ifdef CONFIG_PARAVIRT_SPINLOCKS +extern void native_queue_spin_lock_slowpath(struct qspinlock *lock, u32 val); +extern void __pv_init_lock_hash(void); +extern void __pv_queue_spin_lock_slowpath(struct qspinlock *lock, u32 val); +extern void
[Xen-devel] [PATCH v15 08/15] lfsr: a simple binary Galois linear feedback shift register
This patch is based on the code sent out by Peter Zijstra as part of his queue spinlock patch to provide a hashing function with open addressing. The lfsr() function can be used to return a sequence of numbers that cycle through all the bit patterns (2^n -1) of a given bit width n except the value 0 in a somewhat random fashion depending on the LFSR taps that is being used. Callers can provide their own taps value or use the default. Signed-off-by: Waiman Long waiman.l...@hp.com --- include/linux/lfsr.h | 80 ++ 1 files changed, 80 insertions(+), 0 deletions(-) create mode 100644 include/linux/lfsr.h diff --git a/include/linux/lfsr.h b/include/linux/lfsr.h new file mode 100644 index 000..f570819 --- /dev/null +++ b/include/linux/lfsr.h @@ -0,0 +1,80 @@ +#ifndef _LINUX_LFSR_H +#define _LINUX_LFSR_H + +/* + * Simple Binary Galois Linear Feedback Shift Register + * + * http://en.wikipedia.org/wiki/Linear_feedback_shift_register + * + * This function only currently supports only bits values of 4-30. Callers + * that doesn't pass in a constant bits value can optionally define + * LFSR_MIN_BITS and LFSR_MAX_BITS before including the lfsr.h header file + * to reduce the size of the jump table in the compiled code, if desired. + */ +#ifndef LFSR_MIN_BITS +#define LFSR_MIN_BITS 4 +#endif + +#ifndef LFSR_MAX_BITS +#define LFSR_MAX_BITS 30 +#endif + +static __always_inline u32 lfsr_taps(int bits) +{ + BUG_ON((bits LFSR_MIN_BITS) || (bits LFSR_MAX_BITS)); + BUILD_BUG_ON((LFSR_MIN_BITS 4) || (LFSR_MAX_BITS 30)); + +#define _IF_BITS_EQ(x) \ + if (((x) = LFSR_MIN_BITS) ((x) = LFSR_MAX_BITS) ((x) == bits)) + + /* +* Feedback terms copied from +* http://users.ece.cmu.edu/~koopman/lfsr/index.html +*/ + _IF_BITS_EQ(4) return 0x0009; + _IF_BITS_EQ(5) return 0x0012; + _IF_BITS_EQ(6) return 0x0021; + _IF_BITS_EQ(7) return 0x0041; + _IF_BITS_EQ(8) return 0x008E; + _IF_BITS_EQ(9) return 0x0108; + _IF_BITS_EQ(10) return 0x0204; + _IF_BITS_EQ(11) return 0x0402; + _IF_BITS_EQ(12) return 0x0829; + _IF_BITS_EQ(13) return 0x100D; + _IF_BITS_EQ(14) return 0x2015; + _IF_BITS_EQ(15) return 0x4122; + _IF_BITS_EQ(16) return 0x8112; + _IF_BITS_EQ(17) return 0x102C9; + _IF_BITS_EQ(18) return 0x20195; + _IF_BITS_EQ(19) return 0x403FE; + _IF_BITS_EQ(20) return 0x80637; + _IF_BITS_EQ(21) return 0x100478; + _IF_BITS_EQ(22) return 0x20069E; + _IF_BITS_EQ(23) return 0x4004B2; + _IF_BITS_EQ(24) return 0x800B87; + _IF_BITS_EQ(25) return 0x10004F3; + _IF_BITS_EQ(26) return 0x200072D; + _IF_BITS_EQ(27) return 0x40006AE; + _IF_BITS_EQ(28) return 0x80009E3; + _IF_BITS_EQ(29) return 0x1583; + _IF_BITS_EQ(30) return 0x2C92; +#undef _IF_BITS_EQ + + /* Unreachable */ + return 0; +} + +/* + * Please note that LFSR doesn't work with a start state of 0. + */ +static inline u32 lfsr(u32 val, int bits, u32 taps) +{ + u32 bit = val 1; + + val = 1; + if (bit) + val ^= taps ? taps : lfsr_taps(bits); + return val; +} + +#endif /* _LINUX_LFSR_H */ -- 1.7.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v15 12/15] pvqspinlock, x86: Enable PV qspinlock for Xen
This patch adds the necessary Xen specific code to allow Xen to support the CPU halting and kicking operations needed by the queue spinlock PV code. Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/xen/spinlock.c | 63 --- kernel/Kconfig.locks|2 +- 2 files changed, 60 insertions(+), 5 deletions(-) diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 956374c..728b45b 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -17,6 +17,55 @@ #include xen-ops.h #include debugfs.h +static DEFINE_PER_CPU(int, lock_kicker_irq) = -1; +static DEFINE_PER_CPU(char *, irq_name); +static bool xen_pvspin = true; + +#ifdef CONFIG_QUEUE_SPINLOCK + +#include asm/qspinlock.h + +static void xen_qlock_kick(int cpu) +{ + xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR); +} + +/* + * Halt the current CPU release it back to the host + */ +static void xen_qlock_wait(u8 *byte, u8 val) +{ + int irq = __this_cpu_read(lock_kicker_irq); + + /* If kicker interrupts not initialized yet, just spin */ + if (irq == -1) + return; + + /* clear pending */ + xen_clear_irq_pending(irq); + + /* +* We check the byte value after clearing pending IRQ to make sure +* that we won't miss a wakeup event because of the clearing. +* +* The sync_clear_bit() call in xen_clear_irq_pending() is atomic. +* So it is effectively a memory barrier for x86. +*/ + if (READ_ONCE(*byte) != val) + return; + + /* +* If an interrupt happens here, it will leave the wakeup irq +* pending, which will cause xen_poll_irq() to return +* immediately. +*/ + + /* Block until irq becomes pending (or perhaps a spurious wakeup) */ + xen_poll_irq(irq); +} + +#else /* CONFIG_QUEUE_SPINLOCK */ + enum xen_contention_stat { TAKEN_SLOW, TAKEN_SLOW_PICKUP, @@ -100,12 +149,9 @@ struct xen_lock_waiting { __ticket_t want; }; -static DEFINE_PER_CPU(int, lock_kicker_irq) = -1; -static DEFINE_PER_CPU(char *, irq_name); static DEFINE_PER_CPU(struct xen_lock_waiting, lock_waiting); static cpumask_t waiting_cpus; -static bool xen_pvspin = true; __visible void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want) { int irq = __this_cpu_read(lock_kicker_irq); @@ -217,6 +263,7 @@ static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next) } } } +#endif /* CONFIG_QUEUE_SPINLOCK */ static irqreturn_t dummy_handler(int irq, void *dev_id) { @@ -280,8 +327,16 @@ void __init xen_init_spinlocks(void) return; } printk(KERN_DEBUG xen: PV spinlocks enabled\n); +#ifdef CONFIG_QUEUE_SPINLOCK + __pv_init_lock_hash(); + pv_lock_ops.queue_spin_lock_slowpath = __pv_queue_spin_lock_slowpath; + pv_lock_ops.queue_spin_unlock = PV_CALLEE_SAVE(__pv_queue_spin_unlock); + pv_lock_ops.wait = xen_qlock_wait; + pv_lock_ops.kick = xen_qlock_kick; +#else pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning); pv_lock_ops.unlock_kick = xen_unlock_kick; +#endif } /* @@ -310,7 +365,7 @@ static __init int xen_parse_nopvspin(char *arg) } early_param(xen_nopvspin, xen_parse_nopvspin); -#ifdef CONFIG_XEN_DEBUG_FS +#if defined(CONFIG_XEN_DEBUG_FS) !defined(CONFIG_QUEUE_SPINLOCK) static struct dentry *d_spin_debug; diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index 537b13e..0b42933 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -240,7 +240,7 @@ config ARCH_USE_QUEUE_SPINLOCK config QUEUE_SPINLOCK def_bool y if ARCH_USE_QUEUE_SPINLOCK - depends on SMP (!PARAVIRT_SPINLOCKS || !XEN) + depends on SMP config ARCH_USE_QUEUE_RWLOCK bool -- 1.7.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock
Provide a separate (second) version of the spin_lock_slowpath for paravirt along with a special unlock path. The second slowpath is generated by adding a few pv hooks to the normal slowpath, but where those will compile away for the native case, they expand into special wait/wake code for the pv version. The actual MCS queue can use extra storage in the mcs_nodes[] array to keep track of state and therefore uses directed wakeups. The head contender has no such storage directly visible to the unlocker. So the unlocker searches a hash table with open addressing using a simple binary Galois linear feedback shift register. Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 69 - kernel/locking/qspinlock_paravirt.h | 321 +++ 2 files changed, 389 insertions(+), 1 deletions(-) create mode 100644 kernel/locking/qspinlock_paravirt.h diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index fc2e5ab..33b3f54 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -18,6 +18,9 @@ * Authors: Waiman Long waiman.l...@hp.com * Peter Zijlstra pet...@infradead.org */ + +#ifndef _GEN_PV_LOCK_SLOWPATH + #include linux/smp.h #include linux/bug.h #include linux/cpumask.h @@ -65,13 +68,21 @@ #include mcs_spinlock.h +#ifdef CONFIG_PARAVIRT_SPINLOCKS +#define MAX_NODES 8 +#else +#define MAX_NODES 4 +#endif + /* * Per-CPU queue node structures; we can never have more than 4 nested * contexts: task, softirq, hardirq, nmi. * * Exactly fits one 64-byte cacheline on a 64-bit architecture. + * + * PV doubles the storage and uses the second cacheline for PV state. */ -static DEFINE_PER_CPU_ALIGNED(struct mcs_spinlock, mcs_nodes[4]); +static DEFINE_PER_CPU_ALIGNED(struct mcs_spinlock, mcs_nodes[MAX_NODES]); /* * We must be able to distinguish between no-tail and the tail at 0:0, @@ -220,6 +231,33 @@ static __always_inline void set_locked(struct qspinlock *lock) WRITE_ONCE(l-locked, _Q_LOCKED_VAL); } + +/* + * Generate the native code for queue_spin_unlock_slowpath(); provide NOPs for + * all the PV callbacks. + */ + +static __always_inline void __pv_init_node(struct mcs_spinlock *node) { } +static __always_inline void __pv_wait_node(struct mcs_spinlock *node) { } +static __always_inline void __pv_kick_node(struct mcs_spinlock *node) { } + +static __always_inline void __pv_wait_head(struct qspinlock *lock, + struct mcs_spinlock *node) { } + +#define pv_enabled() false + +#define pv_init_node __pv_init_node +#define pv_wait_node __pv_wait_node +#define pv_kick_node __pv_kick_node + +#define pv_wait_head __pv_wait_head + +#ifdef CONFIG_PARAVIRT_SPINLOCKS +#define queue_spin_lock_slowpath native_queue_spin_lock_slowpath +#endif + +#endif /* _GEN_PV_LOCK_SLOWPATH */ + /** * queue_spin_lock_slowpath - acquire the queue spinlock * @lock: Pointer to queue spinlock structure @@ -249,6 +287,9 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS = (1U _Q_TAIL_CPU_BITS)); + if (pv_enabled()) + goto queue; + if (virt_queue_spin_lock(lock)) return; @@ -325,6 +366,7 @@ queue: node += idx; node-locked = 0; node-next = NULL; + pv_init_node(node); /* * We touched a (possibly) cold cacheline in the per-cpu queue node; @@ -350,6 +392,7 @@ queue: prev = decode_tail(old); WRITE_ONCE(prev-next, node); + pv_wait_node(node); arch_mcs_spin_lock_contended(node-locked); } @@ -365,6 +408,7 @@ queue: * does not imply a full barrier. * */ + pv_wait_head(lock, node); while ((val = smp_load_acquire(lock-val.counter)) _Q_LOCKED_PENDING_MASK) cpu_relax(); @@ -397,6 +441,7 @@ queue: cpu_relax(); arch_mcs_spin_unlock_contended(next-locked); + pv_kick_node(next); release: /* @@ -405,3 +450,25 @@ release: this_cpu_dec(mcs_nodes[0].count); } EXPORT_SYMBOL(queue_spin_lock_slowpath); + +/* + * Generate the paravirt code for queue_spin_unlock_slowpath(). + */ +#if !defined(_GEN_PV_LOCK_SLOWPATH) defined(CONFIG_PARAVIRT_SPINLOCKS) +#define _GEN_PV_LOCK_SLOWPATH + +#undef pv_enabled +#define pv_enabled() true + +#undef pv_init_node +#undef pv_wait_node +#undef pv_kick_node +#undef pv_wait_head + +#undef queue_spin_lock_slowpath +#define queue_spin_lock_slowpath __pv_queue_spin_lock_slowpath + +#include qspinlock_paravirt.h +#include qspinlock.c + +#endif diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h new file mode 100644 index 000..49dbd39 --- /dev/null +++ b/kernel/locking/qspinlock_paravirt.h @@ -0,0 +1,321 @@
[Xen-devel] [PATCH v15 15/15] pvqspinlock: Add debug code to check for PV lock hash sanity
The current code for PV lock hash table processing will panic the system if pv_hash_find() can't find the desired hash bucket. However, there is no check to see if there is more than one entry for a given lock which should never happen. This patch adds a pv_hash_check_duplicate() function to do that which will only be enabled if CONFIG_DEBUG_SPINLOCK is defined because of the performance overhead it introduces. Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock_paravirt.h | 58 +++ 1 files changed, 58 insertions(+), 0 deletions(-) diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h index a9fe10d..4d39c8b 100644 --- a/kernel/locking/qspinlock_paravirt.h +++ b/kernel/locking/qspinlock_paravirt.h @@ -107,6 +107,63 @@ static inline u32 hash_align(u32 hash) } /* + * Hash table debugging code + */ +#ifdef CONFIG_DEBUG_SPINLOCK + +#define _NODE_IDX(pn) unsigned long)pn) (SMP_CACHE_BYTES - 1)) /\ + sizeof(struct mcs_spinlock)) +/* + * Check if there is additional hash buckets with the same lock which + * should not happen. + */ +static inline void pv_hash_check_duplicate(struct qspinlock *lock) +{ + struct pv_hash_bucket *hb, *end, *hb1 = NULL; + int count = 0, used = 0; + + end = pv_lock_hash[1 pv_lock_hash_bits]; + for (hb = pv_lock_hash; hb end; hb++) { + struct qspinlock *l = READ_ONCE(hb-lock); + struct pv_node *pn; + + if (l) + used++; + if (l != lock) + continue; + if (++count == 1) { + hb1 = hb; + continue; + } + WARN_ON(count == 2); + if (hb1) { + pn = READ_ONCE(hb1-node); + printk(KERN_ERR PV lock hash error: duplicated entry + #%d - hash %ld, node %ld, cpu %d\n, 1, + hb1 - pv_lock_hash, _NODE_IDX(pn), + pn ? pn-cpu : -1); + hb1 = NULL; + } + pn = READ_ONCE(hb-node); + printk(KERN_ERR PV lock hash error: duplicated entry #%d - + hash %ld, node %ld, cpu %d\n, count, hb - pv_lock_hash, + _NODE_IDX(pn), pn ? pn-cpu : -1); + } + /* +* Warn if more than half of the buckets are used +*/ + if (used (1 (pv_lock_hash_bits - 1))) + printk(KERN_WARNING PV lock hash warning: + %d hash entries used!\n, used); +} + +#else /* CONFIG_DEBUG_SPINLOCK */ + +static inline void pv_hash_check_duplicate(struct qspinlock *lock) {} + +#endif /* CONFIG_DEBUG_SPINLOCK */ + +/* * Set up an entry in the lock hash table * This is not inlined to reduce size of generated code as it is included * twice and is used only in the slowest path of handling CPU halting. @@ -141,6 +198,7 @@ pv_hash(struct qspinlock *lock, struct pv_node *node) } done: + pv_hash_check_duplicate(lock); return hb-lock; } -- 1.7.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-4.4-testing test] 50333: trouble: blocked/broken/fail/pass
flight 50333 xen-4.4-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/50333/ Failures and problems with tests :-( Tests which did not succeed and are blocking, including tests which could not be run: build-armhf 3 host-install(3) broken REGR. vs. 50266 Regressions which are regarded as allowable (not blocking): test-amd64-i386-freebsd10-i386 14 guest-localmigrate/x10 fail like 50266 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 36776 Tests which did not succeed, but are not blocking: test-amd64-amd64-rumpuserxen-amd64 1 build-check(1) blocked n/a test-amd64-i386-rumpuserxen-i386 1 build-check(1) blocked n/a test-amd64-i386-libvirt 10 migrate-support-checkfail never pass test-amd64-amd64-libvirt 10 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf 1 build-check(1) blocked n/a test-armhf-armhf-xl-arndale 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-armhf-armhf-xl-multivcpu 1 build-check(1) blocked n/a test-armhf-armhf-xl-sedf-pin 1 build-check(1) blocked n/a test-armhf-armhf-xl-cubietruck 1 build-check(1) blocked n/a test-armhf-armhf-xl-credit2 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-armhf-armhf-xl 1 build-check(1) blocked n/a test-amd64-amd64-xl-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass build-amd64-rumpuserxen 6 xen-buildfail never pass build-i386-rumpuserxen6 xen-buildfail never pass test-amd64-i386-xend-winxpsp3 17 leak-check/check fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass build-armhf-libvirt 1 build-check(1) blocked n/a test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xend-qemut-winxpsp3 17 leak-check/checkfail never pass test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 14 guest-stop fail never pass version targeted for testing: xen 6b09a29ced2e7fc449a39f513e1d8c2b10d2af6d baseline version: xen fc6fe18f1511d4b393057c60a2e6b05ccd963e90 People who touched revisions under test: Andrew Cooper andrew.coop...@citrix.com Ian Campbell ian.campb...@citrix.com Ian Jackson ian.jack...@eu.citrix.com Jan Beulich jbeul...@suse.com Konrad Rzeszutek Wilk konrad.w...@oracle.com jobs: build-amd64-xend pass build-i386-xend pass build-amd64 pass build-armhf broken build-i386 pass build-amd64-libvirt pass build-armhf-libvirt blocked build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass build-amd64-rumpuserxen fail build-i386-rumpuserxen fail test-amd64-amd64-xl pass test-armhf-armhf-xl blocked test-amd64-i386-xl pass test-amd64-i386-rhel6hvm-amd pass test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64pass test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
[Xen-devel] [PATCH] xen: arm: X-Gene Storm check GIC DIST address for EOI quirk
In old X-Gene Storm firmware and DT, secure mode addresses have been mentioned in GICv2 node. In this case maintenance interrupt is used instead of EOI HW method. This patch checks the GIC Distributor Base Address to enable EOI quirk for old firmware. Ref: http://lists.xen.org/archives/html/xen-devel/2014-07/msg01263.html Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org --- xen/arch/arm/platforms/xgene-storm.c | 37 +- 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/platforms/xgene-storm.c b/xen/arch/arm/platforms/xgene-storm.c index eee650e..dd7cbfc 100644 --- a/xen/arch/arm/platforms/xgene-storm.c +++ b/xen/arch/arm/platforms/xgene-storm.c @@ -22,6 +22,7 @@ #include asm/platform.h #include xen/stdbool.h #include xen/vmap.h +#include xen/device_tree.h #include asm/io.h #include asm/gic.h @@ -35,9 +36,41 @@ static u64 reset_addr, reset_size; static u32 reset_mask; static bool reset_vals_valid = false; +#define XGENE_SEC_GICV2_DIST_ADDR0x7801 +static u32 quirk_guest_pirq_need_eoi; + +static void xgene_check_pirq_eoi(void) +{ +struct dt_device_node *node; +int res; +paddr_t dbase; + +dt_for_each_device_node( dt_host, node ) +{ +if ( !dt_get_property(node, interrupt-controller, NULL) ) +continue; + +res = dt_device_get_address(node, 0, dbase, NULL); +if ( !dbase ) +panic(%s: Cannot find a valid address for the +distributor, __func__); + +/* + * In old X-Gene Storm firmware and DT, secure mode addresses have + * been mentioned in GICv2 node. We have to use maintenance interrupt + * instead of EOI HW in this case. We check the GIC Distributor Base + * Address to maintain compatibility with older firmware. + */ + if (dbase == XGENE_SEC_GICV2_DIST_ADDR) + quirk_guest_pirq_need_eoi = PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI; + else + quirk_guest_pirq_need_eoi = 0; +} +} + static uint32_t xgene_storm_quirks(void) { -return PLATFORM_QUIRK_GIC_64K_STRIDE|PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI; +return PLATFORM_QUIRK_GIC_64K_STRIDE| quirk_guest_pirq_need_eoi; } static int map_one_mmio(struct domain *d, const char *what, @@ -216,6 +249,8 @@ static int xgene_storm_init(void) reset_mask = XGENE_RESET_MASK; reset_vals_valid = true; +xgene_check_pirq_eoi(); + return 0; } -- 1.7.9.5 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 09/19] xen: arm: Annotate registers trapped by HSR_EL1.TIDCP
Hi Ian, Subject: s/HSR/HCR/ On 31/03/2015 12:07, Ian Campbell wrote: This traps variety of implementation defined registers, so add a note to the default case of the respective handler. Signed-off-by: Ian Campbell ian.campb...@citrix.com Other than the typo in the subject: Reviewed-by: Julien Grall julien.gr...@citrix.com Regards, --- xen/arch/arm/traps.c | 16 1 file changed, 16 insertions(+) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index ca43f79..e26e673 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1698,6 +1698,14 @@ static void do_cp15_32(struct cpu_user_regs *regs, */ return handle_raz_wi(regs, r, cp32.read, hsr, 1); +/* + * HCR_EL2.TIDCP + * + * ARMv7 (DDI 0406C.b): B1.14.3 + * ARMv8 (DDI 0487A.d): D1-1501 Table D1-43 + * + * And all other unknown registers. + */ default: gdprintk(XENLOG_ERR, %s p15, %d, r%d, cr%d, cr%d, %d @ 0x%PRIregister\n, @@ -1948,6 +1956,14 @@ static void do_sysreg(struct cpu_user_regs *regs, dprintk(XENLOG_WARNING, Emulation of sysreg ICC_SGI0R_EL1/ASGI1R_EL1 not supported\n); return inject_undef64_exception(regs, hsr.len); + +/* + * HCR_EL2.TIDCP + * + * ARMv8 (DDI 0487A.d): D1-1501 Table D1-43 + * + * And all other unknown registers. + */ default: { const struct hsr_sysreg sysreg = hsr.sysreg; -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 08/19] xen: arm: implement handling of ACTLR_EL1 trap
Hi Ian, On 31/03/2015 12:07, Ian Campbell wrote: While annotating ACTLR I noticed that we don't appear to handle the 64-bit version of this trap. Do so and annotate everything. While Linux doesn't use ACTLR_EL1 on aarch64, another OS may use it. I'm not sure if we should consider it as a possible security issue as at least the Cortex A53 implements the register RES0. Signed-off-by: Ian Campbell ian.campb...@citrix.com --- xen/arch/arm/traps.c | 20 xen/include/asm-arm/sysregs.h |1 + 2 files changed, 21 insertions(+) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 70e1b4d..ca43f79 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1647,6 +1647,13 @@ static void do_cp15_32(struct cpu_user_regs *regs, if ( !vtimer_emulate(regs, hsr) ) return inject_undef_exception(regs, hsr); break; + +/* + * HSR_EL2.TASC / HSR.TAC I don't find any TASC in the ARMv8 doc. Did you intend to say TACR? Also it's not HSR but HCR. + * + * ARMv7 (DDI 0406C.b): B1.14.6 + * ARMv8 (DDI 0487A.d): G6.2.1 + */ case HSR_CPREG32(ACTLR): if ( psr_mode_is_user(regs) ) return inject_undef_exception(regs, hsr); @@ -1849,9 +1856,22 @@ static void do_sysreg(struct cpu_user_regs *regs, const union hsr hsr) { register_t *x = select_user_reg(regs, hsr.sysreg.reg); +struct vcpu *v = current; switch ( hsr.bits HSR_SYSREG_REGS_MASK ) { +/* + * HSR_EL2.TASC Same question here for TASC. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 1/3] xen/arm: smmu: Rename arm_smmu_xen_device with, device_iommu_info
On Friday 27 March 2015 11:30 PM, Jaggi, Manish wrote: From: Julien Grall julien.gr...@linaro.org Sent: Friday, March 27, 2015 7:05 PM To: Jaggi, Manish; Xen Devel; Stefano Stabellini; Ian Campbell; prasun.kap...@cavium.com; Kumar, Vijaya Subject: Re: [PATCH v1 1/3] xen/arm: smmu: Rename arm_smmu_xen_device with, device_iommu_info On 27/03/15 13:21, Jaggi, Manish wrote: Regards, Manish Jaggi Could you please try to configure you email client correctly? It's rather confusing the regards, Manish Jaggi at the beginning of the mail. [manish] Fixed. Thanks for pointing out From: Julien Grall julien.gr...@linaro.org Sent: Friday, March 27, 2015 6:29 PM To: Jaggi, Manish; Xen Devel; Stefano Stabellini; Ian Campbell; prasun.kap...@cavium.com; Kumar, Vijaya Subject: Re: [PATCH v1 1/3] xen/arm: smmu: Rename arm_smmu_xen_device with, device_iommu_info Hi Manish, On 27/03/15 07:20, Manish Jaggi wrote: arm_smmu_xen_device is not an intuitive name for a datastructure which represents device-archdata.iommu. Rename arm_smmu_xen_device with device_iommu_info device_iommu_info is not more intuitive... At least arm_smmu_xen_device shows that it's a specific Xen structure and not coming from the Linux drivers. [manish] But that is not a valid reason for a non intuitive naming. It is really hard to keep us readability of the code with arm_smmu_xen_device. It is not clear that it is referring to a device attached to smmu or smmu itself. There is another data structure arm_smmu_device as well. Did you read the comment explaining the structure arm_smmu_xen_device? It's just above the definition. arm_smmu is the prefix for any structure within this file. xen means it's a structure added for Xen. device means it's data stored for a device. Please choose another name I can take it but arm_smmu_xen_device is really confusing I won't choose a name myself for a name that I think valid... If you really want to change the name, you have to put at least arm_smmu_xen_ in the name. [manish] what about device_archdata_priv, this is denoting what it is. Regards, As per Ians mail in other thread, %s/arm_smmu_xen_device/arch_smm_xen_device/g is ok with you ? Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 13/19] xen: arm: Annotate registers trapped by MDCR_EL2.TDRA
Hi Ian, On 31/03/2015 12:07, Ian Campbell wrote: Signed-off-by: Ian Campbell ian.campb...@citrix.com --- xen/arch/arm/traps.c | 32 xen/include/asm-arm/cpregs.h |4 xen/include/asm-arm/sysregs.h |1 + 3 files changed, 37 insertions(+) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 21bef01..7c37cec 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1790,6 +1790,17 @@ static void do_cp14_32(struct cpu_user_regs *regs, const union hsr hsr) switch ( hsr.bits HSR_CP32_REGS_MASK ) { +/* + * MDCR_EL2.TDRA + * + * ARMv7 (DDI 0406C.b): B1.14.15 + * ARMv8 (DDI 0487A.d): D1-1508 Table D1-57 + * + * Unhandled: + *DBGDRAR + *DBGDSAR + */ + Why did you put the comment here? For AArch32, only DBGDRAR and DBGSAR are trapped with this bit. I think this should be moved above the label default. case HSR_CPREG32(DBGDIDR): /* * Read-only register. Accessible by EL0 if DBGDSCRext.UDCCdis @@ -1840,6 +1851,8 @@ static void do_cp14_32(struct cpu_user_regs *regs, const union hsr hsr) * * ARMv7 (DDI 0406C.b): B1.14.16 * ARMv8 (DDI 0487A.d): D1-1507 Table D1-54 + * + * And all other unknown registers. */ default: gdprintk(XENLOG_ERR, @@ -1870,6 +1883,17 @@ static void do_cp14_64(struct cpu_user_regs *regs, const union hsr hsr) * * ARMv7 (DDI 0406C.b): B1.14.16 * ARMv8 (DDI 0487A.d): D1-1507 Table D1-54 + * + * MDCR_EL2.TDRA + * + * ARMv7 (DDI 0406C.b): B1.14.15 + * ARMv8 (DDI 0487A.d): D1-1508 Table D1-57 + * + * Unhandled: + *DBGDRAR64 + *DBGDSAR64 This is confusing. The real name of the register is DBGDRAR. I would say DBGDRAR 64-bit. Furthermore, this is the only registers not handled on AArch32 for this bit. This is rather strange to list them while you didn't do it for the trace registers. + * + * And all other unknown registers. For consistency, I would have add this part of the comment in patch #10 (where the comment has been added). Anyway, the patch is already written so I'm fine with it. */ gdprintk(XENLOG_ERR, %s p14, %d, r%d, r%d, cr%d @ 0x%PRIregister\n, @@ -1936,6 +1960,14 @@ static void do_sysreg(struct cpu_user_regs *regs, *x = v-arch.actlr; break; +/* + * MDCR_EL2.TDRA + * + * ARMv8 (DDI 0487A.d): D1-1508 Table D1-57 + */ +case HSR_SYSREG_MDRAR_EL1: +return handle_ro_raz(regs, x, hsr.sysreg.read, hsr, 1); This change should be in a separate patch or mention in the commit message. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 3/7] libxl/libxl_domain_info: Log if domain not found.
On Fri, Apr 03, 2015 at 11:12:15PM +0100, Ian Murray wrote: On 03/04/15 21:02, Konrad Rzeszutek Wilk wrote: If we cannot find the domain - log an error (and still continue returning an error). Forgive me if I am misunderstanding the effect of this patch (I tried to find the original rationale but failed). If the effect is that commands such as xl domid will cause a log entry when the specified domain doesn't exist, I would suggest that's going to be a problem for people It would. that use that or similar commands to tell if a domain is present or still alive. I use it as part of a back-up script to make sure a domain shutdown before the script continues. I suspect many other people will be doing something similar. But won't 'xl domid' give you an return 0 if it exists and 1 if it does not? Ah it does this (if it can't find the domain): 6195 fprintf(stderr, Can't get domid of domain name '%s', maybe this domain does not exist.\n, domname); 6196 return 1; If you are using 'xl list domid' it also prints: 4739 if (rc == ERROR_DOMAIN_NOTFOUND) { 4740 fprintf(stderr, Error: Domain \'%s\' does not exist.\n, 4741 argv[optind]); 4742 return -rc; (Previously it would also print this). Either way the data is already presented to the user. With this patch it is presented twice - which is repetitive. Ian C, thoughts? Just ditch this patch? (The patchset can go in without this one). Apologies if I have the wrong end of the stick! There is never an wrong end! Thanks, Ian. Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Acked-by: Ian Campbell ian.campb...@citrix.com --- tools/libxl/libxl.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index c0e9cfe..8753e27 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -698,8 +698,10 @@ int libxl_domain_info(libxl_ctx *ctx, libxl_dominfo *info_r, LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, getting domain info list); return ERROR_FAIL; } -if (ret==0 || xcinfo.domain != domid) return ERROR_DOMAIN_NOTFOUND; - +if (ret==0 || xcinfo.domain != domid) { +LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, Domain %d not found!, domid); +return ERROR_DOMAIN_NOTFOUND; +} if (info_r) xcinfo2xlinfo(ctx, xcinfo, info_r); return 0; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH 3/7] xen: psr: reserve an RMID for each core
On Sat, Apr 04, 2015 at 04:14:41AM +0200, Dario Faggioli wrote: This allows for a new item to be passed as part of the psr= boot option: percpu_cmt. If that is specified, Xen tries, at boot time, to associate an RMID to each core. XXX This all looks rather straightforward, if it weren't for the fact that it is, apparently, more common than I though to run out of RMID. For example, on a dev box we have in Cambridge, there are 144 pCPUs and only 71 RMIDs. In this preliminary version, nothing particularly smart happens if we run out of RMIDs, we just fail attaching the remaining cores and that's it. In future, I'd probably like to: + check whether the operation have any chance to succeed up front (by comparing number of pCPUs with available RMIDs) + on unexpected failure, rollback everything... it seems to make more sense to me than just leaving the system half configured for per-cpu CMT Thoughts? XXX Another idea I just have is to allow the user to somehow specify a different 'granularity'. Something like allowing 'percpu_cmt'|'percore_cmt'|'persocket_cmt' with the following meaning: + 'percpu_cmt': as in this patch + 'percore_cmt': same RMID to hthreads of the same core + 'persocket_cmt': same RMID to all cores of the same socket. 'percore_cmt' would only allow gathering info on a per-core basis... still better than nothing if we do not have enough RMIDs for each pCPUs. Could we allocate nr_online_cpus() / nr_pmids() and have some CPUs share the same PMIDs? 'persocket_cmt' would basically only allow to track the amount of free L3 on each socket (by subtracting the monitored value from the total). Again, still better than nothing, would use very few RMIDs, and I could think of ways of using this information in a few places in the scheduler... Again, thought? XXX Finally, when a domain with its own RMID executes on a core that also has its own RMID, domain monitoring just overrides per-CPU monitoring. That means the cache occupancy reported fo that pCPU is not accurate. For reasons why this situation is difficult to deal with properly, see the document in the cover letter. Ideas on how to deal with this, either about how to make it work or how to handle the thing from a 'policying' perspective (i.e., which one mechanism should be disabled or penalized?), are very welcome Signed-off-by: Dario Faggioli dario.faggi...@citrix.com --- xen/arch/x86/psr.c| 72 - xen/include/asm-x86/psr.h | 11 ++- 2 files changed, 67 insertions(+), 16 deletions(-) diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c index 0f2a6ce..a71391c 100644 --- a/xen/arch/x86/psr.c +++ b/xen/arch/x86/psr.c @@ -26,10 +26,13 @@ struct psr_assoc { struct psr_cmt *__read_mostly psr_cmt; static bool_t __initdata opt_psr; +static bool_t __initdata opt_cpu_cmt; static unsigned int __initdata opt_rmid_max = 255; static uint64_t rmid_mask; static DEFINE_PER_CPU(struct psr_assoc, psr_assoc); +DEFINE_PER_CPU(unsigned int, pcpu_rmid); + static void __init parse_psr_param(char *s) { char *ss, *val_str; @@ -57,6 +60,8 @@ static void __init parse_psr_param(char *s) val_str); } } +else if ( !strcmp(s, percpu_cmt) ) +opt_cpu_cmt = 1; else if ( val_str !strcmp(s, rmid_max) ) opt_rmid_max = simple_strtoul(val_str, NULL, 0); @@ -94,8 +99,8 @@ static void __init init_psr_cmt(unsigned int rmid_max) } psr_cmt-rmid_max = min(psr_cmt-rmid_max, psr_cmt-l3.rmid_max); -psr_cmt-rmid_to_dom = xmalloc_array(domid_t, psr_cmt-rmid_max + 1UL); -if ( !psr_cmt-rmid_to_dom ) +psr_cmt-rmids = xmalloc_array(domid_t, psr_cmt-rmid_max + 1UL); +if ( !psr_cmt-rmids ) { xfree(psr_cmt); psr_cmt = NULL; @@ -107,56 +112,86 @@ static void __init init_psr_cmt(unsigned int rmid_max) * with it. To reduce the waste of RMID, reserve RMID 0 for all CPUs that * have no domain being monitored. */ -psr_cmt-rmid_to_dom[0] = DOMID_XEN; +psr_cmt-rmids[0] = DOMID_XEN; for ( rmid = 1; rmid = psr_cmt-rmid_max; rmid++ ) -psr_cmt-rmid_to_dom[rmid] = DOMID_INVALID; +psr_cmt-rmids[rmid] = DOMID_INVALID; printk(XENLOG_INFO Cache Monitoring Technology enabled, RMIDs: %u\n, psr_cmt-rmid_max); } -/* Called with domain lock held, no psr specific lock needed */ -int psr_alloc_rmid(struct domain *d) +static int _psr_alloc_rmid(unsigned int *trmid, unsigned int id) { unsigned int rmid; ASSERT(psr_cmt_enabled()); -if ( d-arch.psr_rmid 0 ) +if ( *trmid 0 ) return
[Xen-devel] [linux-linus test] 50329: tolerable FAIL - PUSHED
flight 50329 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/50329/ Failures :-/ but no regressions. Regressions which are regarded as allowable (not blocking): test-amd64-i386-freebsd10-i386 7 freebsd-install fail like 50276 test-amd64-i386-freebsd10-amd64 7 freebsd-install fail like 50276 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 7 debian-hvm-install fail never pass test-amd64-amd64-xl-xsm 9 guest-start fail never pass test-amd64-i386-libvirt-xsm 9 guest-start fail never pass test-amd64-amd64-xl-pvh-intel 9 guest-start fail never pass test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 7 debian-hvm-install fail never pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 7 debian-hvm-install fail never pass test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 7 debian-hvm-install fail never pass test-amd64-amd64-xl-pvh-amd 9 guest-start fail never pass test-amd64-amd64-libvirt 10 migrate-support-checkfail never pass test-amd64-i386-xl-xsm9 guest-start fail never pass test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail never pass test-amd64-i386-libvirt 10 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 9 guest-start fail never pass test-armhf-armhf-xl-xsm 5 xen-boot fail never pass test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail never pass test-armhf-armhf-xl 10 migrate-support-checkfail never pass test-armhf-armhf-libvirt 10 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf 10 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 10 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 5 xen-boot fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-armhf-armhf-xl-credit2 10 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-armhf-armhf-xl-arndale 10 migrate-support-checkfail never pass test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass test-amd64-amd64-xl-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop fail never pass version targeted for testing: linux1cced5015b171415169d938fb179c44fe060dc15 baseline version: linux6c310bc1acdd02110182a2ec6efa3e7571a3b80c People who touched revisions under test: Ahmed S. Darwish ahmed.darw...@valeo.com Alex Deucher alexander.deuc...@amd.com Alex Williamson alex.william...@redhat.com Alexey Bogoslavsky ale...@swortex.com Alexey Kodanev alexey.koda...@oracle.com Andi Kleen a...@linux.intel.com Andre Przywara andre.przyw...@arm.com Andreas Werner ker...@andy89.org Andri Yngvason andri.yngva...@marel.com Andy Gospodarek go...@cumulusnetworks.com Andy Lutomirski l...@kernel.org Anton Nayshtut an...@swortex.com Ard Biesheuvel ard.biesheu...@linaro.org Arend van Spriel ar...@broadcom.com Ariel Elior ariel.el...@qlogic.com Axel Lin axel@ingics.com Baptiste Reynal b.rey...@virtualopensystems.com Ben Hutchings ben.hutchi...@codethink.co.uk Benjamin Herrenschmidt b...@kernel.crashing.org Bjørn Mork bj...@mork.no Borislav Petkov b...@suse.de Charlie Mooney charliemoo...@chromium.org Chris Wilson ch...@chris-wilson.co.uk Christian Hesse m...@eworm.de Christian König christian.koe...@amd.com Christoph Hellwig h...@lst.de Cliff Clark cliff_cl...@selinc.com Colin Ian King colin.k...@canonical.com Cong Wang xiyou.wangc...@gmail.com D.S. Ljungmark ljungm...@modio.se Daniel Stone dani...@collabora.com Daniel Vetter daniel.vet...@ffwll.ch Daniel Vetter daniel.vet...@intel.com Dave Airlie airl...@redhat.com David Disseldorp
Re: [Xen-devel] [PATCH 17/19] xen: arm: Remove CNTPCT_EL0 trap handling.
Hi Ian, On 31/03/2015 12:07, Ian Campbell wrote: We set CNTHCTL_EL2.EL1PCTEN and therefore according to ARMv8 (DDI 0487A.d) D1-1510 Table D1-60 we are not trapping this. Signed-off-by: Ian Campbell ian.campb...@citrix.com Reviewed-by: Julien Grall julien.gr...@citrix.com Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH 1/7] x86: improve psr scheduling code
On Sat, Apr 04, 2015 at 04:14:24AM +0200, Dario Faggioli wrote: From: Chao Peng chao.p.p...@linux.intel.com Switching RMID from previous vcpu to next vcpu only needs to write MSR_IA32_PSR_ASSOC once. Write it with the value of next vcpu is enough, no need to write '0' first. Idle domain has RMID set to 0 and because MSR is already updated lazily, so just switch it as it does. Also move the initialization of per-CPU variable which used for lazy update from context switch to CPU starting. Signed-off-by: Chao Peng chao.p.p...@linux.intel.com --- xen/arch/x86/domain.c |7 +--- xen/arch/x86/psr.c| 89 +++-- xen/include/asm-x86/psr.h |3 +- 3 files changed, 73 insertions(+), 26 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 393aa26..73f5d7f 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1443,8 +1443,6 @@ static void __context_switch(void) { memcpy(p-arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES); vcpu_save_fpu(p); -if ( psr_cmt_enabled() ) -psr_assoc_rmid(0); p-arch.ctxt_switch_from(p); } @@ -1469,11 +1467,10 @@ static void __context_switch(void) } vcpu_restore_fpu_eager(n); n-arch.ctxt_switch_to(n); - -if ( psr_cmt_enabled() n-domain-arch.psr_rmid 0 ) -psr_assoc_rmid(n-domain-arch.psr_rmid); } +psr_ctxt_switch_to(n-domain); + gdt = !is_pv_32on64_vcpu(n) ? per_cpu(gdt_table, cpu) : per_cpu(compat_gdt_table, cpu); if ( need_full_gdt(n) ) diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c index 2ef83df..c902625 100644 --- a/xen/arch/x86/psr.c +++ b/xen/arch/x86/psr.c @@ -22,7 +22,6 @@ struct psr_assoc { uint64_t val; -bool_t initialized; }; struct psr_cmt *__read_mostly psr_cmt; @@ -115,14 +114,6 @@ static void __init init_psr_cmt(unsigned int rmid_max) printk(XENLOG_INFO Cache Monitoring Technology enabled\n); } -static int __init init_psr(void) -{ -if ( (opt_psr PSR_CMT) opt_rmid_max ) -init_psr_cmt(opt_rmid_max); -return 0; -} -__initcall(init_psr); - /* Called with domain lock held, no psr specific lock needed */ int psr_alloc_rmid(struct domain *d) { @@ -168,26 +159,84 @@ void psr_free_rmid(struct domain *d) d-arch.psr_rmid = 0; } -void psr_assoc_rmid(unsigned int rmid) +static inline void psr_assoc_init(void) { -uint64_t val; -uint64_t new_val; struct psr_assoc *psra = this_cpu(psr_assoc); -if ( !psra-initialized ) -{ +if ( psr_cmt_enabled() ) rdmsrl(MSR_IA32_PSR_ASSOC, psra-val); -psra-initialized = 1; +} + +static inline void psr_assoc_reg_read(struct psr_assoc *psra, uint64_t *reg) +{ +*reg = psra-val; +} + +static inline void psr_assoc_reg_write(struct psr_assoc *psra, uint64_t reg) +{ +if ( reg != psra-val ) +{ +wrmsrl(MSR_IA32_PSR_ASSOC, reg); +psra-val = reg; } -val = psra-val; +} + +static inline void psr_assoc_rmid(uint64_t *reg, unsigned int rmid) +{ +*reg = (*reg ~rmid_mask) | (rmid rmid_mask); +} + +void psr_ctxt_switch_to(struct domain *d) +{ +uint64_t reg; +struct psr_assoc *psra = this_cpu(psr_assoc); + +psr_assoc_reg_read(psra, reg); -new_val = (val ~rmid_mask) | (rmid rmid_mask); -if ( val != new_val ) +if ( psr_cmt_enabled() ) +psr_assoc_rmid(reg, d-arch.psr_rmid); + +psr_assoc_reg_write(psra, reg); +} + +static void psr_cpu_init(unsigned int cpu) +{ +psr_assoc_init(); +} + +static int cpu_callback( +struct notifier_block *nfb, unsigned long action, void *hcpu) +{ +unsigned int cpu = (unsigned long)hcpu; + +switch ( action ) +{ +case CPU_STARTING: +psr_cpu_init(cpu); +break; +} You could just make it if ( action == CPU_STARTING ) psr_assoc_init(); return NOTIFY_DONE; Instead of this big switch statement with casting and such.. Thought oddly enough, your psr_assoc_init figures out the CPU by running it with 'this_cpu'. Why not make psr_assoc_init()' accept the CPU value? + +return NOTIFY_DONE; +} + +static struct notifier_block cpu_nfb = { +.notifier_call = cpu_callback +}; + +static int __init psr_presmp_init(void) +{ +if ( (opt_psr PSR_CMT) opt_rmid_max ) +init_psr_cmt(opt_rmid_max); + +if ( psr_cmt_enabled() ) Extra space. { -wrmsrl(MSR_IA32_PSR_ASSOC, new_val); -psra-val = new_val; +psr_cpu_init(smp_processor_id()); +register_cpu_notifier(cpu_nfb); } + +return 0; } +presmp_initcall(psr_presmp_init); /* * Local variables: diff --git a/xen/include/asm-x86/psr.h b/xen/include/asm-x86/psr.h index
Re: [Xen-devel] [PATCH 19/19] xen: arm: Annotate source of ICC SGI register trapping
Hi Ian, On 31/03/2015 12:07, Ian Campbell wrote: I was unable to find an ARMv8 ARM reference to this, so refer to the GIC Architecture Specification instead. ARMv8 ARM does cover other ways of trapping these accesses via ICH_HCR_EL2 but we don't use those and they trap additional registers as well. Signed-off-by: Ian Campbell ian.campb...@citrix.com Reviewed-by: Julien Grall julien.gr...@citrix.com Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86, paravirt, xen: Remove the 64-bit irq_enable_sysexit pvop
On Fri, Apr 03, 2015 at 03:52:30PM -0700, Andy Lutomirski wrote: [cc: Boris and Konrad. Whoops] On Fri, Apr 3, 2015 at 3:51 PM, Andy Lutomirski l...@kernel.org wrote: We don't use irq_enable_sysexit on 64-bit kernels any more. Remove Is there an commit (or name of patch) that explains why 32-bit-user-space-on-64-bit kernels is unsavory? Thank you! all the paravirt and Xen machinery to support it on 64-bit kernels. Signed-off-by: Andy Lutomirski l...@kernel.org --- I haven't actually tested this on Xen, but it builds for me. arch/x86/ia32/ia32entry.S | 6 -- arch/x86/include/asm/paravirt_types.h | 7 --- arch/x86/kernel/asm-offsets.c | 2 ++ arch/x86/kernel/paravirt.c| 4 +++- arch/x86/kernel/paravirt_patch_64.c | 1 - arch/x86/xen/enlighten.c | 3 ++- arch/x86/xen/xen-asm_64.S | 16 arch/x86/xen/xen-ops.h| 2 ++ 8 files changed, 13 insertions(+), 28 deletions(-) diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S index 5d8f987a340d..eb1eb7b70f4b 100644 --- a/arch/x86/ia32/ia32entry.S +++ b/arch/x86/ia32/ia32entry.S @@ -77,12 +77,6 @@ ENTRY(native_usergs_sysret32) swapgs sysretl ENDPROC(native_usergs_sysret32) - -ENTRY(native_irq_enable_sysexit) - swapgs - sti - sysexit -ENDPROC(native_irq_enable_sysexit) #endif /* diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 7549b8b369e4..38a0ff9ef06e 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -160,13 +160,14 @@ struct pv_cpu_ops { u64 (*read_pmc)(int counter); unsigned long long (*read_tscp)(unsigned int *aux); +#ifdef CONFIG_X86_32 /* * Atomically enable interrupts and return to userspace. This -* is only ever used to return to 32-bit processes; in a -* 64-bit kernel, it's used for 32-on-64 compat processes, but -* never native 64-bit processes. (Jump, not call.) +* is only used in 32-bit kernels. 64-bit kernels use +* usergs_sysret32 instead. */ void (*irq_enable_sysexit)(void); +#endif /* * Switch to usermode gs and return to 64-bit usermode using diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 9f6b9341950f..2d27ebf0aed8 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -49,7 +49,9 @@ void common(void) { OFFSET(PV_IRQ_irq_disable, pv_irq_ops, irq_disable); OFFSET(PV_IRQ_irq_enable, pv_irq_ops, irq_enable); OFFSET(PV_CPU_iret, pv_cpu_ops, iret); +#ifdef CONFIG_X86_32 OFFSET(PV_CPU_irq_enable_sysexit, pv_cpu_ops, irq_enable_sysexit); +#endif OFFSET(PV_CPU_read_cr0, pv_cpu_ops, read_cr0); OFFSET(PV_MMU_read_cr2, pv_mmu_ops, read_cr2); #endif diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 548d25f00c90..7563114d9c3a 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -154,7 +154,9 @@ unsigned paravirt_patch_default(u8 type, u16 clobbers, void *insnbuf, ret = paravirt_patch_ident_64(insnbuf, len); else if (type == PARAVIRT_PATCH(pv_cpu_ops.iret) || +#ifdef CONFIG_X86_32 type == PARAVIRT_PATCH(pv_cpu_ops.irq_enable_sysexit) || +#endif type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret32) || type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret64)) /* If operation requires a jmp, then jmp */ @@ -371,7 +373,7 @@ __visible struct pv_cpu_ops pv_cpu_ops = { .load_sp0 = native_load_sp0, -#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION) +#if defined(CONFIG_X86_32) .irq_enable_sysexit = native_irq_enable_sysexit, #endif #ifdef CONFIG_X86_64 diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c index a1da6737ba5b..0de21c62c348 100644 --- a/arch/x86/kernel/paravirt_patch_64.c +++ b/arch/x86/kernel/paravirt_patch_64.c @@ -49,7 +49,6 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf, PATCH_SITE(pv_irq_ops, save_fl); PATCH_SITE(pv_irq_ops, irq_enable); PATCH_SITE(pv_irq_ops, irq_disable); - PATCH_SITE(pv_cpu_ops, irq_enable_sysexit); PATCH_SITE(pv_cpu_ops, usergs_sysret32); PATCH_SITE(pv_cpu_ops, usergs_sysret64); PATCH_SITE(pv_cpu_ops, swapgs); diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 81665c9f2132..3797b6b31f95 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1267,10 +1267,11 @@ static const struct
Re: [Xen-devel] [PATCH 04/19] xen: arm: provide and use a handle_raz_wi helper
On 02/04/2015 18:19, Ian Campbell wrote: On Thu, 2015-04-02 at 17:01 +0100, Ian Campbell wrote: On Thu, 2015-04-02 at 16:50 +0100, Ian Campbell wrote: Writing to the bottom half (e.g. w0) of a register implicitly clears the top half, IIRC, so I think a kernel is unlikely to want to do this, even if it could (which I'm not quite convinced of). That said, I'll see if I can make something work with the handle_* taking the reg number instead of a pointer and calling select_user_reg in each. Actually don't even need that, I think the following does what is needed. I'm not 100% convinced it is needed though, but it's simple enough, and I can't find anything in the ARM ARM right now which rules out what you are suggesting, even if it is unlikely. The paragraph Pseudocode description of registers in AArch64 state in section B1.2.1 (ARMv8 DDI0487 A.d) confirms your previous mail. I.e writing to the bottom half (e.g. w0) of a register implicitly clears the top half. I think it may be worth to mention the paragraph somewhere in the patch. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 12/19] xen: arm: Annotate the handlers for HSTR_EL2.Tx
Hi Ian, On 31/03/2015 12:07, Ian Campbell wrote: Signed-off-by: Ian Campbell ian.campb...@citrix.com --- xen/arch/arm/traps.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index ba120e5..21bef01 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1709,6 +1709,11 @@ static void do_cp15_32(struct cpu_user_regs *regs, * ARMv7 (DDI 0406C.b): B1.14.12 * ARMv8 (DDI 0487A.d): N/A * + * HSTR_EL2.Tx I would prefer if you use T15 instead of Tx. This is less confusing as we only trap c15 and the bit T14 exists on ARMv8 (even though it's RES0). Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 11/19] xen: arm: Annotate handlers for PCTR_EL2.Tx
Hi Ian, Subject: s/PCTR/CPTR/ On 31/03/2015 12:07, Ian Campbell wrote: Signed-off-by: Ian Campbell ian.campb...@citrix.com --- xen/arch/arm/traps.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 9cdbda8..ba120e5 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1704,6 +1704,11 @@ static void do_cp15_32(struct cpu_user_regs *regs, * ARMv7 (DDI 0406C.b): B1.14.3 * ARMv8 (DDI 0487A.d): D1-1501 Table D1-43 * + * CPTR_EL2.T{0..9,12..13} + * + * ARMv7 (DDI 0406C.b): B1.14.12 + * ARMv8 (DDI 0487A.d): N/A I would also update the comment on top of WRITE_SYSREG(..., CPTR_EL2) to make clear that CP0..CP9 CP12..CP13 are only traps for ARMv7. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 10/19] xen: arm: Annotate registers trapped by CPTR_EL2.TTA
Hi Ian, On 31/03/2015 12:07, Ian Campbell wrote: Add explicit handler for 64-bit CP14 accesses, with more relevant debug message (as per other handlers) and to provide a place for the comment. It's a bit strange to name the patch Annotate... while the main change is 64-bit CP14 accesses. AFAICT, this was a bug in Xen implementation. Although, I'm not sure if the current platform we support have Trace registers (maybe the Arndale?). Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH 2/7] Xen: x86: print max usable RMID during init
On Sat, Apr 04, 2015 at 04:14:33AM +0200, Dario Faggioli wrote: Just print it. Signed-off-by: Dario Faggioli dario.faggi...@citrix.com --- xen/arch/x86/psr.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c index c902625..0f2a6ce 100644 --- a/xen/arch/x86/psr.c +++ b/xen/arch/x86/psr.c @@ -111,7 +111,8 @@ static void __init init_psr_cmt(unsigned int rmid_max) for ( rmid = 1; rmid = psr_cmt-rmid_max; rmid++ ) psr_cmt-rmid_to_dom[rmid] = DOMID_INVALID; -printk(XENLOG_INFO Cache Monitoring Technology enabled\n); +printk(XENLOG_INFO Cache Monitoring Technology enabled, RMIDs: %u\n, max RMID: ? + psr_cmt-rmid_max); } /* Called with domain lock held, no psr specific lock needed */ ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 18/19] xen: arm: Annotate registers trapped when CNTHCTL_EL2.EL1PCEN == 0
Hi Ian, On 31/03/2015 12:07, Ian Campbell wrote: Signed-off-by: Ian Campbell ian.campb...@citrix.com --- xen/arch/arm/traps.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 1c9cf21..cc5b8dd 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1642,6 +1642,12 @@ static void do_cp15_32(struct cpu_user_regs *regs, switch ( hsr.bits HSR_CP32_REGS_MASK ) { +/* + * !CNTHCTL_EL2.EL1PCEN / !CNTHCTL.PL1PCEN I will be picky. The listing here is ARMv8 (AArch64) / ARMv7, but below it's ARMv7 / ARMv8. + * + * ARMv7 (DDI 0406C.b): B4.1.22 + * ARMv8 (DDI 0487A.d): D1-1510 Table D1-60 + */ case HSR_CPREG32(CNTP_CTL): case HSR_CPREG32(CNTP_TVAL): if ( !vtimer_emulate(regs, hsr) ) @@ -1757,6 +1763,12 @@ static void do_cp15_64(struct cpu_user_regs *regs, switch ( hsr.bits HSR_CP64_REGS_MASK ) { +/* + * !CNTHCTL_EL2.EL1PCEN / !CNTHCTL.PL1PCEN + * + * ARMv7 (DDI 0406C.b): B4.1.22 + * ARMv8 (DDI 0487A.d): D1-1510 Table D1-60 + */ case HSR_CPREG64(CNTP_CVAL): if ( !vtimer_emulate(regs, hsr) ) return inject_undef_exception(regs, hsr); @@ -2120,14 +2132,18 @@ static void do_sysreg(struct cpu_user_regs *regs, */ return handle_raz_wi(regs, x, hsr.sysreg.read, hsr, 1); -/* Write only, Write ignore registers: */ - This comment should have been dropped in patch #14. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 3/3] xen/arm: smmu: Renaming struct iommu_domain *domain to, struct iommu_domain *iommu_domain
Hi Ian, On 01/04/2015 10:30, Ian Campbell wrote: On Tue, 2015-03-31 at 17:48 +0100, Stefano Stabellini wrote: If it helps we could add a couple of comments on top of the structs in smmu.c to explain the meaning of the fields, like: /* iommu_domain, not to be confused with a Xen domain */ I was going to suggest something similar but more expansive, i.e. a table of them all in one place (i.e. at the top of the file) for ease of referencing: Struct NameWhat Wherefrom Normally found in - iommu_domain IOMMU ContextLinux d-arch.blah arch_smmu_xen_device Device specific Xen device-arch.blurg The actual name of the structure is arm_smmu_xen_device not arch_smmu_xen_device. Did you suggest to rename the name? Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86, paravirt, xen: Remove the 64-bit irq_enable_sysexit pvop
On Mon, Apr 6, 2015 at 7:10 AM, Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote: On Fri, Apr 03, 2015 at 03:52:30PM -0700, Andy Lutomirski wrote: [cc: Boris and Konrad. Whoops] On Fri, Apr 3, 2015 at 3:51 PM, Andy Lutomirski l...@kernel.org wrote: We don't use irq_enable_sysexit on 64-bit kernels any more. Remove Is there an commit (or name of patch) that explains why 32-bit-user-space-on-64-bit kernels is unsavory? sysexit never tasted very good :-p We're (hopefully) not breaking 32-bit-user-space-on-64-bit, but we're trying an unconventional approach to making the code faster and less scary. As a result, 64-bit kernels won't use sysexit any more. Hopefully Xen is okay with the slightly sneaky thing we're doing. AFAICT Xen thinks of sysretl and sysexit as slightly funny irets, so I don't expect there to be any problem. See: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/asmid=4214a16b02971c60960afd675d03544e109e0d75 and https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/asmid=47091e3c5b072daca29a15d2a3caf40359b0d140 --Andy Thank you! all the paravirt and Xen machinery to support it on 64-bit kernels. Signed-off-by: Andy Lutomirski l...@kernel.org --- I haven't actually tested this on Xen, but it builds for me. arch/x86/ia32/ia32entry.S | 6 -- arch/x86/include/asm/paravirt_types.h | 7 --- arch/x86/kernel/asm-offsets.c | 2 ++ arch/x86/kernel/paravirt.c| 4 +++- arch/x86/kernel/paravirt_patch_64.c | 1 - arch/x86/xen/enlighten.c | 3 ++- arch/x86/xen/xen-asm_64.S | 16 arch/x86/xen/xen-ops.h| 2 ++ 8 files changed, 13 insertions(+), 28 deletions(-) diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S index 5d8f987a340d..eb1eb7b70f4b 100644 --- a/arch/x86/ia32/ia32entry.S +++ b/arch/x86/ia32/ia32entry.S @@ -77,12 +77,6 @@ ENTRY(native_usergs_sysret32) swapgs sysretl ENDPROC(native_usergs_sysret32) - -ENTRY(native_irq_enable_sysexit) - swapgs - sti - sysexit -ENDPROC(native_irq_enable_sysexit) #endif /* diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 7549b8b369e4..38a0ff9ef06e 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -160,13 +160,14 @@ struct pv_cpu_ops { u64 (*read_pmc)(int counter); unsigned long long (*read_tscp)(unsigned int *aux); +#ifdef CONFIG_X86_32 /* * Atomically enable interrupts and return to userspace. This -* is only ever used to return to 32-bit processes; in a -* 64-bit kernel, it's used for 32-on-64 compat processes, but -* never native 64-bit processes. (Jump, not call.) +* is only used in 32-bit kernels. 64-bit kernels use +* usergs_sysret32 instead. */ void (*irq_enable_sysexit)(void); +#endif /* * Switch to usermode gs and return to 64-bit usermode using diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 9f6b9341950f..2d27ebf0aed8 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -49,7 +49,9 @@ void common(void) { OFFSET(PV_IRQ_irq_disable, pv_irq_ops, irq_disable); OFFSET(PV_IRQ_irq_enable, pv_irq_ops, irq_enable); OFFSET(PV_CPU_iret, pv_cpu_ops, iret); +#ifdef CONFIG_X86_32 OFFSET(PV_CPU_irq_enable_sysexit, pv_cpu_ops, irq_enable_sysexit); +#endif OFFSET(PV_CPU_read_cr0, pv_cpu_ops, read_cr0); OFFSET(PV_MMU_read_cr2, pv_mmu_ops, read_cr2); #endif diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 548d25f00c90..7563114d9c3a 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -154,7 +154,9 @@ unsigned paravirt_patch_default(u8 type, u16 clobbers, void *insnbuf, ret = paravirt_patch_ident_64(insnbuf, len); else if (type == PARAVIRT_PATCH(pv_cpu_ops.iret) || +#ifdef CONFIG_X86_32 type == PARAVIRT_PATCH(pv_cpu_ops.irq_enable_sysexit) || +#endif type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret32) || type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret64)) /* If operation requires a jmp, then jmp */ @@ -371,7 +373,7 @@ __visible struct pv_cpu_ops pv_cpu_ops = { .load_sp0 = native_load_sp0, -#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION) +#if defined(CONFIG_X86_32) .irq_enable_sysexit = native_irq_enable_sysexit, #endif #ifdef CONFIG_X86_64 diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c index a1da6737ba5b..0de21c62c348 100644 ---
[Xen-devel] [qemu-upstream-4.5-testing test] 50330: regressions - FAIL
flight 50330 qemu-upstream-4.5-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/50330/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-pair 17 guest-migrate/src_host/dst_host fail REGR. vs. 36517 Tests which are failing intermittently (not blocking): test-amd64-i386-freebsd10-i386 11 guest-localmigratefail pass in 50313 test-armhf-armhf-libvirt 9 guest-start fail pass in 50313 test-amd64-amd64-xl-qemuu-winxpsp3 13 guest-localmigrate/x10 fail pass in 50313 test-amd64-amd64-libvirt 9 guest-startfail in 50313 pass in 50330 test-amd64-i386-freebsd10-i386 14 guest-localmigrate/x10 fail in 50313 pass in 50283 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pvh-intel 9 guest-start fail never pass test-amd64-amd64-xl-pvh-amd 9 guest-start fail never pass test-amd64-amd64-libvirt 10 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 10 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 5 xen-boot fail never pass test-amd64-i386-libvirt 10 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 10 migrate-support-checkfail never pass test-armhf-armhf-xl 10 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf 10 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass test-amd64-i386-xl-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop fail never pass test-armhf-armhf-libvirt 10 migrate-support-check fail in 50313 never pass test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop fail in 50313 never pass version targeted for testing: qemuuc9ac5f816bf3a8b56f836b078711dcef6e5c90b8 baseline version: qemuu0b8fb1ec3d666d1eb8bbff56c76c5e6daa2789e4 People who touched revisions under test: Ian Campbell ian.campb...@citrix.com Jan Beulich jbeul...@suse.com jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-xl pass test-armhf-armhf-xl pass test-amd64-i386-xl pass test-amd64-amd64-xl-pvh-amd fail test-amd64-i386-rhel6hvm-amd pass test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64pass test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-freebsd10-amd64 pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64
[Xen-devel] remove entry in shadow table
Hi I want to remove entry of a given page in the shadow page table so that when the next time the guest access to the page there is page fault. Here is what I try to do: 1. I have a timer which wake up every 30 seconds and remove entry in the shadow by calling sh_remove_all_mappings(d-vcpu[0], _mfn(page_to_mfn(page))) here d is the domain and page is the page that I want to remove from the shadow page table. 2. In the function sh_page_fault() I get the gmfn and compare it with the mfn of the page that I removed earlier from the shadow page table. Is this method correct? I also get this error: sh error: sh_remove_all_mappings(): can't find all mappings of mfn Thank you ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel