Re: [Xen-devel] [PATCH] x86, paravirt, xen: Remove the 64-bit irq_enable_sysexit pvop

2015-04-06 Thread Boris Ostrovsky


On 04/06/2015 01:44 PM, Andrew Cooper wrote:

On 06/04/2015 16:29, Andy Lutomirski wrote:

On Mon, Apr 6, 2015 at 7:10 AM, Konrad Rzeszutek Wilk
konrad.w...@oracle.com wrote:

On Fri, Apr 03, 2015 at 03:52:30PM -0700, Andy Lutomirski wrote:

[cc: Boris and Konrad.  Whoops]

On Fri, Apr 3, 2015 at 3:51 PM, Andy Lutomirski l...@kernel.org wrote:

We don't use irq_enable_sysexit on 64-bit kernels any more.  Remove

Is there an commit (or name of patch) that explains why 
32-bit-user-space-on-64-bit
kernels is unsavory?

sysexit never tasted very good :-p

We're (hopefully) not breaking 32-bit-user-space-on-64-bit, but we're
trying an unconventional approach to making the code faster and less
scary.  As a result, 64-bit kernels won't use sysexit any more.
Hopefully Xen is okay with the slightly sneaky thing we're doing.
AFAICT Xen thinks of sysretl and sysexit as slightly funny irets, so I
don't expect there to be any problem.

64bit PV kernels must bounce through Xen to switch from the kernel to
the user pagetables (since both kernel and userspace are both actually
running in ring3 with user pages).

As a result, exit to userspace ends up as a hypercall into Xen which has
an effect very similar to an `iret`, but with some extra fixup in the
background.

I can't forsee any Xen issues as a result of this patch.



I ran tip plus this patch (plus another patch that fixes an unrelated 
Xen regression in tip) through our test suite and it completed without 
problems.


I also ran some very simple 32-bit programs in a 64-bit PV guest and 
didn't see any problems there neither.


-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86, paravirt, xen: Remove the 64-bit irq_enable_sysexit pvop

2015-04-06 Thread Andrew Cooper
On 06/04/2015 16:29, Andy Lutomirski wrote:
 On Mon, Apr 6, 2015 at 7:10 AM, Konrad Rzeszutek Wilk
 konrad.w...@oracle.com wrote:
 On Fri, Apr 03, 2015 at 03:52:30PM -0700, Andy Lutomirski wrote:
 [cc: Boris and Konrad.  Whoops]

 On Fri, Apr 3, 2015 at 3:51 PM, Andy Lutomirski l...@kernel.org wrote:
 We don't use irq_enable_sysexit on 64-bit kernels any more.  Remove
 Is there an commit (or name of patch) that explains why 
 32-bit-user-space-on-64-bit
 kernels is unsavory?
 sysexit never tasted very good :-p

 We're (hopefully) not breaking 32-bit-user-space-on-64-bit, but we're
 trying an unconventional approach to making the code faster and less
 scary.  As a result, 64-bit kernels won't use sysexit any more.
 Hopefully Xen is okay with the slightly sneaky thing we're doing.
 AFAICT Xen thinks of sysretl and sysexit as slightly funny irets, so I
 don't expect there to be any problem.

64bit PV kernels must bounce through Xen to switch from the kernel to
the user pagetables (since both kernel and userspace are both actually
running in ring3 with user pages).

As a result, exit to userspace ends up as a hypercall into Xen which has
an effect very similar to an `iret`, but with some extra fixup in the
background.

I can't forsee any Xen issues as a result of this patch.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86, paravirt, xen: Remove the 64-bit irq_enable_sysexit pvop

2015-04-06 Thread Andy Lutomirski
On Mon, Apr 6, 2015 at 11:30 AM, Boris Ostrovsky
boris.ostrov...@oracle.com wrote:

 On 04/06/2015 01:44 PM, Andrew Cooper wrote:

 On 06/04/2015 16:29, Andy Lutomirski wrote:

 On Mon, Apr 6, 2015 at 7:10 AM, Konrad Rzeszutek Wilk
 konrad.w...@oracle.com wrote:

 On Fri, Apr 03, 2015 at 03:52:30PM -0700, Andy Lutomirski wrote:

 [cc: Boris and Konrad.  Whoops]

 On Fri, Apr 3, 2015 at 3:51 PM, Andy Lutomirski l...@kernel.org
 wrote:

 We don't use irq_enable_sysexit on 64-bit kernels any more.  Remove

 Is there an commit (or name of patch) that explains why
 32-bit-user-space-on-64-bit
 kernels is unsavory?

 sysexit never tasted very good :-p

 We're (hopefully) not breaking 32-bit-user-space-on-64-bit, but we're
 trying an unconventional approach to making the code faster and less
 scary.  As a result, 64-bit kernels won't use sysexit any more.
 Hopefully Xen is okay with the slightly sneaky thing we're doing.
 AFAICT Xen thinks of sysretl and sysexit as slightly funny irets, so I
 don't expect there to be any problem.

 64bit PV kernels must bounce through Xen to switch from the kernel to
 the user pagetables (since both kernel and userspace are both actually
 running in ring3 with user pages).

 As a result, exit to userspace ends up as a hypercall into Xen which has
 an effect very similar to an `iret`, but with some extra fixup in the
 background.

 I can't forsee any Xen issues as a result of this patch.



 I ran tip plus this patch (plus another patch that fixes an unrelated Xen
 regression in tip) through our test suite and it completed without problems.

 I also ran some very simple 32-bit programs in a 64-bit PV guest and didn't
 see any problems there neither.

At the risk of redundancy, did you test on Intel hardware?  At least
on native systems, the code in question never executes on AMD systems.

--Andy


 -boris




-- 
Andy Lutomirski
AMA Capital Management, LLC

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86, paravirt, xen: Remove the 64-bit irq_enable_sysexit pvop

2015-04-06 Thread Boris Ostrovsky


On 04/06/2015 04:03 PM, Andy Lutomirski wrote:

On Mon, Apr 6, 2015 at 11:30 AM, Boris Ostrovsky
boris.ostrov...@oracle.com wrote:

On 04/06/2015 01:44 PM, Andrew Cooper wrote:

On 06/04/2015 16:29, Andy Lutomirski wrote:

On Mon, Apr 6, 2015 at 7:10 AM, Konrad Rzeszutek Wilk
konrad.w...@oracle.com wrote:

On Fri, Apr 03, 2015 at 03:52:30PM -0700, Andy Lutomirski wrote:

[cc: Boris and Konrad.  Whoops]

On Fri, Apr 3, 2015 at 3:51 PM, Andy Lutomirski l...@kernel.org
wrote:

We don't use irq_enable_sysexit on 64-bit kernels any more.  Remove

Is there an commit (or name of patch) that explains why
32-bit-user-space-on-64-bit
kernels is unsavory?

sysexit never tasted very good :-p

We're (hopefully) not breaking 32-bit-user-space-on-64-bit, but we're
trying an unconventional approach to making the code faster and less
scary.  As a result, 64-bit kernels won't use sysexit any more.
Hopefully Xen is okay with the slightly sneaky thing we're doing.
AFAICT Xen thinks of sysretl and sysexit as slightly funny irets, so I
don't expect there to be any problem.

64bit PV kernels must bounce through Xen to switch from the kernel to
the user pagetables (since both kernel and userspace are both actually
running in ring3 with user pages).

As a result, exit to userspace ends up as a hypercall into Xen which has
an effect very similar to an `iret`, but with some extra fixup in the
background.

I can't forsee any Xen issues as a result of this patch.



I ran tip plus this patch (plus another patch that fixes an unrelated Xen
regression in tip) through our test suite and it completed without problems.

I also ran some very simple 32-bit programs in a 64-bit PV guest and didn't
see any problems there neither.

At the risk of redundancy, did you test on Intel hardware?  At least
on native systems, the code in question never executes on AMD systems.


Yes, the tests ran on Intel. I left them scheduled for overnight runs 
too and that will be executed on both AMD and Intel.


-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen: arm: X-Gene Storm check GIC DIST address for EOI quirk

2015-04-06 Thread Julien Grall

Hi Pranav,

Thank you for the patch.

On 06/04/2015 10:54, Pranavkumar Sawargaonkar wrote:

In old X-Gene Storm firmware and DT, secure mode addresses have been
mentioned in GICv2 node. In this case maintenance interrupt is used
instead of EOI HW method.

This patch checks the GIC Distributor Base Address to enable EOI quirk
for old firmware.

Ref:
http://lists.xen.org/archives/html/xen-devel/2014-07/msg01263.html

Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org
---
  xen/arch/arm/platforms/xgene-storm.c |   37 +-
  1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/platforms/xgene-storm.c 
b/xen/arch/arm/platforms/xgene-storm.c
index eee650e..dd7cbfc 100644
--- a/xen/arch/arm/platforms/xgene-storm.c
+++ b/xen/arch/arm/platforms/xgene-storm.c
@@ -22,6 +22,7 @@
  #include asm/platform.h
  #include xen/stdbool.h
  #include xen/vmap.h
+#include xen/device_tree.h
  #include asm/io.h
  #include asm/gic.h

@@ -35,9 +36,41 @@ static u64 reset_addr, reset_size;
  static u32 reset_mask;
  static bool reset_vals_valid = false;

+#define XGENE_SEC_GICV2_DIST_ADDR0x7801
+static u32 quirk_guest_pirq_need_eoi;


This variable will mostly be read. So, I would add __read_mostly.


+
+static void xgene_check_pirq_eoi(void)


If I'm not mistaken, this function is only called during Xen 
initialization. So, I would add __init.



+{
+struct dt_device_node *node;
+int res;
+paddr_t dbase;
+
+dt_for_each_device_node( dt_host, node )
+{


It would be better to create a new callback for platform specific GIC 
initialization and use dt_interrupt_controller.


This would avoid to have this loop and rely on there is always only one 
interrupt controller in the DT.



+if ( !dt_get_property(node, interrupt-controller, NULL) )
+continue;
+
+res = dt_device_get_address(node, 0, dbase, NULL);
+if ( !dbase )
+panic(%s: Cannot find a valid address for the 
+distributor, __func__);
+
+/*
+ * In old X-Gene Storm firmware and DT, secure mode addresses have
+ * been mentioned in GICv2 node. We have to use maintenance interrupt
+ * instead of EOI HW in this case. We check the GIC Distributor Base
+ * Address to maintain compatibility with older firmware.
+ */
+ if (dbase == XGENE_SEC_GICV2_DIST_ADDR)


Coding style:

if ( ... )


+ quirk_guest_pirq_need_eoi = PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI;
+ else
+ quirk_guest_pirq_need_eoi = 0;


I would print a warning in order to notify the user that his platform 
would be slow/buggy...



+}
+}
+
  static uint32_t xgene_storm_quirks(void)
  {
-return PLATFORM_QUIRK_GIC_64K_STRIDE|PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI;
+return PLATFORM_QUIRK_GIC_64K_STRIDE| quirk_guest_pirq_need_eoi;


This function is called every time Xen injects a physical IRQ to a guest 
(i.e very often). It might be better to create a variable quirks which 
will contain PLATFORM_QUIRK_GIC_64K_STRIDE and, when necessary, 
PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI.


That would avoid the or at each call.

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-4.3-testing test] 50332: regressions - FAIL

2015-04-06 Thread osstest service user
flight 50332 xen-4.3-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/50332/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-freebsd10-i386  8 guest-start fail REGR. vs. 36755

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 36755

Tests which did not succeed, but are not blocking:
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64  7 debian-hvm-install fail never pass
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 debian-hvm-install  fail never pass
 test-armhf-armhf-xl-arndale   5 xen-boot fail   never pass
 test-armhf-armhf-xl-cubietruck  5 xen-boot fail never pass
 test-armhf-armhf-xl-credit2   5 xen-boot fail   never pass
 test-armhf-armhf-libvirt  5 xen-boot fail   never pass
 test-armhf-armhf-xl-sedf-pin  5 xen-boot fail   never pass
 test-armhf-armhf-xl-multivcpu  5 xen-boot fail  never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 build-amd64-rumpuserxen   6 xen-buildfail   never pass
 build-i386-rumpuserxen6 xen-buildfail   never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-armhf-armhf-xl-sedf  5 xen-boot fail   never pass
 test-armhf-armhf-xl   5 xen-boot fail   never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xend-qemut-winxpsp3 17 leak-check/checkfail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xend-winxpsp3 17 leak-check/check fail  never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass

version targeted for testing:
 xen  46ed0083a76efa82713ea979b312fa69250380b2
baseline version:
 xen  c58b16ef1572176cf2f6a424b527b5ed4bb73f17


People who touched revisions under test:
  Andrew Cooper andrew.coop...@citrix.com
  Ian Campbell ian.campb...@citrix.com
  Ian Jackson ian.jack...@eu.citrix.com
  Jan Beulich jbeul...@suse.com
  Konrad Rzeszutek Wilk konrad.w...@oracle.com


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  fail
 build-i386-rumpuserxen   fail
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  fail
 test-amd64-i386-xl   pass
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  pass
 

[Xen-devel] [PATCH v6 5/5] libxl: Add interface for querying hypervisor about PCI topology

2015-04-06 Thread Boris Ostrovsky
.. and use this new interface to display it along with CPU topology
and NUMA information when 'xl info -n' command is issued

The output will look like
...
cpu_topology   :
cpu:coresocket node
  0:   000
...
device topology:
device   node
:00:00.0  0
:00:01.0  0
...

Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
---

Changes in v6:
* xc_pcitopoinfo() now has to deal with hypercall not finishing processing whole
  array (due to changes in patch 2): do_sysctl() is now called in a loop).

 tools/libxc/include/xenctrl.h |3 ++
 tools/libxc/xc_misc.c |   44 ++
 tools/libxl/libxl.c   |   42 +
 tools/libxl/libxl.h   |   12 +++
 tools/libxl/libxl_freebsd.c   |   12 +++
 tools/libxl/libxl_internal.h  |5 +++
 tools/libxl/libxl_linux.c |   69 +
 tools/libxl/libxl_netbsd.c|   12 +++
 tools/libxl/libxl_types.idl   |7 
 tools/libxl/libxl_utils.c |8 +
 tools/libxl/xl_cmdimpl.c  |   40 +++
 11 files changed, 247 insertions(+), 7 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 4cf8daf..787c29d 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1229,6 +1229,7 @@ typedef xen_sysctl_physinfo_t xc_physinfo_t;
 typedef xen_sysctl_cputopo_t xc_cputopo_t;
 typedef xen_sysctl_numainfo_t xc_numainfo_t;
 typedef xen_sysctl_meminfo_t xc_meminfo_t;
+typedef xen_sysctl_pcitopoinfo_t xc_pcitopoinfo_t;
 
 typedef uint32_t xc_cpu_to_node_t;
 typedef uint32_t xc_cpu_to_socket_t;
@@ -1242,6 +1243,8 @@ int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus,
xc_cputopo_t *cputopo);
 int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
 xc_meminfo_t *meminfo, uint32_t *distance);
+int xc_pcitopoinfo(xc_interface *xch, unsigned num_devs,
+   physdev_pci_device_t *devs, uint32_t *nodes);
 
 int xc_sched_id(xc_interface *xch,
 int *sched_id);
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 97cbe63..6e1d50e 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -244,6 +244,50 @@ out:
 return ret;
 }
 
+int xc_pcitopoinfo(xc_interface *xch, unsigned num_devs,
+   physdev_pci_device_t *devs,
+   uint32_t *nodes)
+{
+int ret = 0;
+DECLARE_SYSCTL;
+DECLARE_HYPERCALL_BOUNCE(devs, num_devs * sizeof(*devs),
+ XC_HYPERCALL_BUFFER_BOUNCE_IN);
+DECLARE_HYPERCALL_BOUNCE(nodes, num_devs* sizeof(*nodes),
+ XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
+
+if ( (ret = xc_hypercall_bounce_pre(xch, devs)) )
+goto out;
+if ( (ret = xc_hypercall_bounce_pre(xch, nodes)) )
+goto out;
+
+sysctl.u.pcitopoinfo.first_dev = 0;
+sysctl.u.pcitopoinfo.num_devs = num_devs;
+set_xen_guest_handle(sysctl.u.pcitopoinfo.devs, devs);
+set_xen_guest_handle(sysctl.u.pcitopoinfo.nodes, nodes);
+
+sysctl.cmd = XEN_SYSCTL_pcitopoinfo;
+
+while ( sysctl.u.pcitopoinfo.first_dev  num_devs )
+{
+if ( (ret = do_sysctl(xch, sysctl)) != 0 )
+{
+/*
+ * node[] is set to XEN_INVALID_NODE_ID for invalid devices,
+ * we can just skip those entries.
+ */
+if ( errno == ENODEV )
+errno = ret = 0;
+else
+break;
+}
+}
+
+ out:
+xc_hypercall_bounce_post(xch, devs);
+xc_hypercall_bounce_post(xch, nodes);
+
+return ret;
+}
 
 int xc_sched_id(xc_interface *xch,
 int *sched_id)
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 45cd318..5b3423d 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5137,6 +5137,48 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx 
*ctx, int *nb_cpu_out)
 return ret;
 }
 
+libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_devs)
+{
+GC_INIT(ctx);
+physdev_pci_device_t *devs;
+uint32_t *nodes;
+libxl_pcitopology *ret = NULL;
+int i;
+
+*num_devs = libxl__pci_numdevs(gc);
+if (*num_devs  0) {
+LOG(ERROR, Unable to determine number of PCI devices);
+goto out;
+}
+
+devs = libxl__zalloc(gc, sizeof(*devs) * *num_devs);
+nodes = libxl__zalloc(gc, sizeof(*nodes) * *num_devs);
+
+if (libxl__pci_topology_init(gc, devs, *num_devs)) {
+LOGE(ERROR, Cannot initialize PCI hypercall structure);
+goto out;
+}
+
+if (xc_pcitopoinfo(ctx-xch, *num_devs, devs, nodes) != 0) {
+LOGE(ERROR, PCI topology info hypercall failed);
+goto out;
+}
+
+ret = libxl__zalloc(NOGC, sizeof(libxl_pcitopology) * *num_devs);
+
+for (i = 0; i  *num_devs; i++) {
+ret[i].seg = devs[i].seg;
+ret[i].bus = 

[Xen-devel] [PATCH v6 2/5] sysctl: Add sysctl interface for querying PCI topology

2015-04-06 Thread Boris Ostrovsky
Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
---

Changes in v6:
* Dropped continuations, the sysctl now returns after 64 iteration if necessary
* -ENODEV returned if device is not found
* sysctl's first_dev is now expected to be used by userspace to continue the 
query
* Added XSM hooks

 docs/misc/xsm-flask.txt |1 +
 xen/common/sysctl.c |   58 +++
 xen/include/public/sysctl.h |   30 ++
 xen/xsm/flask/hooks.c   |1 +
 xen/xsm/flask/policy/access_vectors |1 +
 5 files changed, 91 insertions(+), 0 deletions(-)

diff --git a/docs/misc/xsm-flask.txt b/docs/misc/xsm-flask.txt
index 90a2aef..4e0f14f 100644
--- a/docs/misc/xsm-flask.txt
+++ b/docs/misc/xsm-flask.txt
@@ -121,6 +121,7 @@ __HYPERVISOR_sysctl (xen/include/public/sysctl.h)
  * XEN_SYSCTL_cpupool_op
  * XEN_SYSCTL_scheduler_op
  * XEN_SYSCTL_coverage_op
+ * XEN_SYSCTL_pcitopoinfo
 
 __HYPERVISOR_memory_op (xen/include/public/memory.h)
 
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index d75440e..449ff70 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -399,6 +399,64 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) 
u_sysctl)
 break;
 #endif
 
+#ifdef HAS_PCI
+case XEN_SYSCTL_pcitopoinfo:
+{
+xen_sysctl_pcitopoinfo_t *ti = op-u.pcitopoinfo;
+unsigned dev_cnt = 0;
+
+if ( guest_handle_is_null(ti-devs) ||
+ guest_handle_is_null(ti-nodes) ||
+ (ti-first_dev  ti-num_devs) )
+{
+ret = -EINVAL;
+break;
+}
+
+while ( ti-first_dev  ti-num_devs )
+{
+physdev_pci_device_t dev;
+uint32_t node;
+struct pci_dev *pdev;
+
+if ( copy_from_guest_offset(dev, ti-devs, ti-first_dev, 1) )
+{
+ret = -EFAULT;
+break;
+}
+
+spin_lock(pcidevs_lock);
+pdev = pci_get_pdev(dev.seg, dev.bus, dev.devfn);
+if ( !pdev )
+{
+ret = -ENODEV;
+node = XEN_INVALID_NODE_ID;
+}
+else if ( pdev-node == NUMA_NO_NODE )
+node = XEN_INVALID_NODE_ID;
+else
+node = pdev-node;
+spin_unlock(pcidevs_lock);
+
+if ( copy_to_guest_offset(ti-nodes, ti-first_dev, node, 1) )
+{
+ret = -EFAULT;
+break;
+}
+
+ti-first_dev++;
+
+if ( (++dev_cnt  0x3f)  hypercall_preempt_check() )
+break;
+}
+
+if ( (ret != -EFAULT) 
+ __copy_field_to_guest(u_sysctl, op, u.pcitopoinfo.first_dev) )
+ret = -EFAULT;
+}
+break;
+#endif
+
 default:
 ret = arch_do_sysctl(op, u_sysctl);
 copyback = 0;
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 5aa3708..877b661 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -33,6 +33,7 @@
 
 #include xen.h
 #include domctl.h
+#include physdev.h
 
 #define XEN_SYSCTL_INTERFACE_VERSION 0x000C
 
@@ -668,6 +669,33 @@ struct xen_sysctl_psr_cmt_op {
 typedef struct xen_sysctl_psr_cmt_op xen_sysctl_psr_cmt_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cmt_op_t);
 
+/* XEN_SYSCTL_pcitopoinfo */
+struct xen_sysctl_pcitopoinfo {
+/* IN: Number of elements in 'pcitopo' and 'nodes' arrays. */
+uint32_t num_devs;
+
+/*
+ * IN/OUT:
+ *   IN: First element of pcitopo array that needs to be processed by
+ *   the hypervisor.
+ *  OUT: Index of the first still unprocessed element of pcitopo array.
+ */
+uint32_t first_dev;
+
+/* IN: list of devices for which node IDs are requested. */
+XEN_GUEST_HANDLE_64(physdev_pci_device_t) devs;
+
+/*
+ * OUT: node identifier for each device.
+ * If information for a particular device is not avalable then set
+ * to XEN_INVALID_NODE_ID. In addition, if device is not known to the
+ * hypervisor, sysctl will stop further processing and return -ENODEV.
+ */
+XEN_GUEST_HANDLE_64(uint32) nodes;
+};
+typedef struct xen_sysctl_pcitopoinfo xen_sysctl_pcitopoinfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pcitopoinfo_t);
+
 struct xen_sysctl {
 uint32_t cmd;
 #define XEN_SYSCTL_readconsole1
@@ -690,12 +718,14 @@ struct xen_sysctl {
 #define XEN_SYSCTL_scheduler_op  19
 #define XEN_SYSCTL_coverage_op   20
 #define XEN_SYSCTL_psr_cmt_op21
+#define XEN_SYSCTL_pcitopoinfo   22
 uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
 union {
 struct xen_sysctl_readconsole   readconsole;
 struct xen_sysctl_tbuf_op   tbuf_op;
 struct xen_sysctl_physinfo  physinfo;
 struct xen_sysctl_cputopoinfo   

[Xen-devel] [PATCH v6 1/5] sysctl: Make XEN_SYSCTL_numainfo a little more efficient

2015-04-06 Thread Boris Ostrovsky
A number of changes to XEN_SYSCTL_numainfo interface:

* Make sysctl NUMA topology query use fewer copies by combining some
  fields into a single structure and copying distances for each node
  in a single copy.
* NULL meminfo and distance handles are a request for maximum number
  of nodes (num_nodes). If those handles are valid and num_nodes is
  is smaller than the number of nodes in the system then -ENOBUFS is
  returned (and correct num_nodes is provided)
* Instead of using max_node_index for passing number of nodes keep this
  value in num_nodes: almost all uses of max_node_index required adding
  or subtracting one to eventually get to number of nodes anyway.
* Replace INVALID_NUMAINFO_ID with XEN_INVALID_MEM_SZ and add
  XEN_INVALID_NODE_DIST.

Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
Acked-by: Ian Campbell ian.campb...@citrix.com
---

Changes in v6:
* updated uint32 to unsigned in sysctl
* updated syctl.h comment to reflect right logic for meminfo/distance test
* declared distance[] array static to move it off the stack in sysctl
* Fixed loop control variable initialization in sysctl

 tools/libxl/libxl.c   |   66 ++-
 tools/python/xen/lowlevel/xc/xc.c |   58 +---
 xen/common/sysctl.c   |   78 +
 xen/include/public/sysctl.h   |   53 ++---
 4 files changed, 131 insertions(+), 124 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 2a735b3..b7d6bb0 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5154,65 +5154,59 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int 
*nr)
 {
 GC_INIT(ctx);
 xc_numainfo_t ninfo;
-DECLARE_HYPERCALL_BUFFER(xc_node_to_memsize_t, memsize);
-DECLARE_HYPERCALL_BUFFER(xc_node_to_memfree_t, memfree);
-DECLARE_HYPERCALL_BUFFER(uint32_t, node_dists);
+DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo);
+DECLARE_HYPERCALL_BUFFER(uint32_t, distance);
 libxl_numainfo *ret = NULL;
-int i, j, max_nodes;
+int i, j;
 
-max_nodes = libxl_get_max_nodes(ctx);
-if (max_nodes  0)
-{
+set_xen_guest_handle(ninfo.meminfo, HYPERCALL_BUFFER_NULL);
+set_xen_guest_handle(ninfo.distance, HYPERCALL_BUFFER_NULL);
+if (xc_numainfo(ctx-xch, ninfo) != 0) {
 LIBXL__LOG(ctx, XTL_ERROR, Unable to determine number of NODES);
 ret = NULL;
 goto out;
 }
 
-memsize = xc_hypercall_buffer_alloc
-(ctx-xch, memsize, sizeof(*memsize) * max_nodes);
-memfree = xc_hypercall_buffer_alloc
-(ctx-xch, memfree, sizeof(*memfree) * max_nodes);
-node_dists = xc_hypercall_buffer_alloc
-(ctx-xch, node_dists, sizeof(*node_dists) * max_nodes * max_nodes);
-if ((memsize == NULL) || (memfree == NULL) || (node_dists == NULL)) {
+meminfo = xc_hypercall_buffer_alloc(ctx-xch, meminfo,
+sizeof(*meminfo) * ninfo.num_nodes);
+distance = xc_hypercall_buffer_alloc(ctx-xch, distance,
+ sizeof(*distance) *
+ ninfo.num_nodes * ninfo.num_nodes);
+if ((meminfo == NULL) || (distance == NULL)) {
 LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
 Unable to allocate hypercall arguments);
 goto fail;
 }
 
-set_xen_guest_handle(ninfo.node_to_memsize, memsize);
-set_xen_guest_handle(ninfo.node_to_memfree, memfree);
-set_xen_guest_handle(ninfo.node_to_node_distance, node_dists);
-ninfo.max_node_index = max_nodes - 1;
+set_xen_guest_handle(ninfo.meminfo, meminfo);
+set_xen_guest_handle(ninfo.distance, distance);
 if (xc_numainfo(ctx-xch, ninfo) != 0) {
 LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, getting numainfo);
 goto fail;
 }
 
-if (ninfo.max_node_index  max_nodes - 1)
-max_nodes = ninfo.max_node_index + 1;
+*nr = ninfo.num_nodes;
 
-*nr = max_nodes;
+ret = libxl__zalloc(NOGC, sizeof(libxl_numainfo) * ninfo.num_nodes);
+for (i = 0; i  ninfo.num_nodes; i++)
+ret[i].dists = libxl__calloc(NOGC, ninfo.num_nodes, sizeof(*distance));
 
-ret = libxl__zalloc(NOGC, sizeof(libxl_numainfo) * max_nodes);
-for (i = 0; i  max_nodes; i++)
-ret[i].dists = libxl__calloc(NOGC, max_nodes, sizeof(*node_dists));
-
-for (i = 0; i  max_nodes; i++) {
-#define V(mem, i) (mem[i] == INVALID_NUMAINFO_ID) ? \
-LIBXL_NUMAINFO_INVALID_ENTRY : mem[i]
-ret[i].size = V(memsize, i);
-ret[i].free = V(memfree, i);
-ret[i].num_dists = max_nodes;
-for (j = 0; j  ret[i].num_dists; j++)
-ret[i].dists[j] = V(node_dists, i * max_nodes + j);
+for (i = 0; i  ninfo.num_nodes; i++) {
+#define V(val, invalid) (val == invalid) ? \
+   LIBXL_NUMAINFO_INVALID_ENTRY : val
+ret[i].size = V(meminfo[i].memsize, XEN_INVALID_MEM_SZ);
+

[Xen-devel] [PATCH v6 4/5] libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management to libxc

2015-04-06 Thread Boris Ostrovsky
xc_numainfo() is not expected to be used on a hot path and therefore
hypercall buffer management can be pushed into libxc. This will simplify
life for callers.

Also update error logging macros.

Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
---

Changes in v6:
* Dropped separate NULL buffer tests in xc_numainfo()
* Moved test for buffers validity (either both or neither are NULL) to be the
  the first thing to do in xc_numainfo()

 tools/libxc/include/xenctrl.h |4 ++-
 tools/libxc/xc_misc.c |   36 +-
 tools/libxl/libxl.c   |   51 
 tools/python/xen/lowlevel/xc/xc.c |   38 ++-
 4 files changed, 63 insertions(+), 66 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index f298702..4cf8daf 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1228,6 +1228,7 @@ int xc_send_debug_keys(xc_interface *xch, char *keys);
 typedef xen_sysctl_physinfo_t xc_physinfo_t;
 typedef xen_sysctl_cputopo_t xc_cputopo_t;
 typedef xen_sysctl_numainfo_t xc_numainfo_t;
+typedef xen_sysctl_meminfo_t xc_meminfo_t;
 
 typedef uint32_t xc_cpu_to_node_t;
 typedef uint32_t xc_cpu_to_socket_t;
@@ -1239,7 +1240,8 @@ typedef uint32_t xc_node_to_node_dist_t;
 int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
 int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus,
xc_cputopo_t *cputopo);
-int xc_numainfo(xc_interface *xch, xc_numainfo_t *info);
+int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
+xc_meminfo_t *meminfo, uint32_t *distance);
 
 int xc_sched_id(xc_interface *xch,
 int *sched_id);
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 630a86c..97cbe63 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -204,22 +204,44 @@ out:
 return ret;
 }
 
-int xc_numainfo(xc_interface *xch,
-xc_numainfo_t *put_info)
+int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
+xc_meminfo_t *meminfo, uint32_t *distance)
 {
 int ret;
 DECLARE_SYSCTL;
+DECLARE_HYPERCALL_BOUNCE(meminfo, *max_nodes * sizeof(*meminfo),
+ XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+DECLARE_HYPERCALL_BOUNCE(distance,
+ *max_nodes * *max_nodes * sizeof(*distance),
+ XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+if ( !!meminfo ^ !!distance )
+{
+errno = EINVAL;
+return -1;
+}
+
+if ( (ret = xc_hypercall_bounce_pre(xch, meminfo)) )
+goto out;
+if ((ret = xc_hypercall_bounce_pre(xch, distance)) )
+goto out;
+
+sysctl.u.numainfo.num_nodes = *max_nodes;
+set_xen_guest_handle(sysctl.u.numainfo.meminfo, meminfo);
+set_xen_guest_handle(sysctl.u.numainfo.distance, distance);
 
 sysctl.cmd = XEN_SYSCTL_numainfo;
 
-memcpy(sysctl.u.numainfo, put_info, sizeof(*put_info));
+if ( (ret = do_sysctl(xch, sysctl)) != 0 )
+goto out;
 
-if ((ret = do_sysctl(xch, sysctl)) != 0)
-return ret;
+*max_nodes = sysctl.u.numainfo.num_nodes;
 
-memcpy(put_info, sysctl.u.numainfo, sizeof(*put_info));
+out:
+xc_hypercall_bounce_post(xch, meminfo);
+xc_hypercall_bounce_post(xch, distance);
 
-return 0;
+return ret;
 }
 
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 697c86d..45cd318 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5140,61 +5140,44 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx 
*ctx, int *nb_cpu_out)
 libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
 {
 GC_INIT(ctx);
-xc_numainfo_t ninfo;
-DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo);
-DECLARE_HYPERCALL_BUFFER(uint32_t, distance);
+xc_meminfo_t *meminfo;
+uint32_t *distance;
 libxl_numainfo *ret = NULL;
 int i, j;
+unsigned num_nodes;
 
-set_xen_guest_handle(ninfo.meminfo, HYPERCALL_BUFFER_NULL);
-set_xen_guest_handle(ninfo.distance, HYPERCALL_BUFFER_NULL);
-if (xc_numainfo(ctx-xch, ninfo) != 0) {
-LIBXL__LOG(ctx, XTL_ERROR, Unable to determine number of NODES);
-ret = NULL;
+if (xc_numainfo(ctx-xch, num_nodes, NULL, NULL)) {
+LOGEV(ERROR, errno, Unable to determine number of nodes);
 goto out;
 }
 
-meminfo = xc_hypercall_buffer_alloc(ctx-xch, meminfo,
-sizeof(*meminfo) * ninfo.num_nodes);
-distance = xc_hypercall_buffer_alloc(ctx-xch, distance,
- sizeof(*distance) *
- ninfo.num_nodes * ninfo.num_nodes);
-if ((meminfo == NULL) || (distance == NULL)) {
-LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
-Unable to allocate hypercall arguments);
-goto fail;
-}
+meminfo = libxl__zalloc(gc, 

[Xen-devel] [PATCH v6 0/5] Display IO topology when PXM data is available (plus some cleanup)

2015-04-06 Thread Boris Ostrovsky
Changes in v6:
* PCI topology interface changes: no continuations, userspace will be dealing
  with unfinished sysctl (patches 2 and 5)
* Unknown device will cause ENODEV in sysctl
* No NULL tests in libxc
* Loop control initialization fix (similar to commit 26da081ac91a)
* Other minor changes (see per-patch notes)

Changes in v5:
* Make CPU topology and NUMA info sysctls behave more like 
XEN_DOMCTL_get_vcpu_msrs
  when passed NULL buffers. This required toolstack changes as well
* Don't use 8-bit data types in interfaces
* Fold interface version update into patch#3

Changes in v4:
* Split cputopology and NUMA info changes into separate patches
* Added patch#1 (partly because patch#4 needs to know when when distance is 
invalid,
  i.e. NUMA_NO_DISTANCE)
* Split sysctl version update into a separate patch
* Other changes are listed in each patch
* NOTE: I did not test python's xc changes since I don't think I know how.

Changes in v3:
* Added patch #1 to more consistently define nodes as a u8 and properly
  use NUMA_NO_NODE.
* Make changes to xen_sysctl_numainfo, similar to those made to
  xen_sysctl_topologyinfo. (Q: I kept both sets of changes in the same
  patch #3 to avoid bumping interface version twice. Perhaps it's better
  to split it into two?)
* Instead of copying data for each loop index allocate a buffer and copy
  once for all three queries in sysctl.c.
* Move hypercall buffer management from libxl to libxc (as requested by
  Dario, patches #5 and #6).
* Report topology info for offlined CPUs as well
* Added LIBXL_HAVE_PCITOPO macro

Changes in v2:
* Split topology sysctls into two --- one for CPU topology and the other
  for devices
* Avoid long loops in the hypervisor by using continuations. (I am not
  particularly happy about using first_dev in the interface, suggestions
  for a better interface would be appreciated)
* Use proper libxl conventions for interfaces
* Avoid hypervisor stack corruption when copying PXM data from guest


A few patches that add interface for querying hypervisor about device
topology and allow 'xl info -n' display this information if PXM object
is provided by ACPI.

This series also makes some optimizations and cleanup of current CPU
topology and NUMA sysctl queries.

Boris Ostrovsky (5):
  sysctl: Make XEN_SYSCTL_numainfo a little more efficient
  sysctl: Add sysctl interface for querying PCI topology
  libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer
management to libxc
  libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management
to libxc
  libxl: Add interface for querying hypervisor about PCI topology

 docs/misc/xsm-flask.txt |1 +
 tools/libxc/include/xenctrl.h   |   12 ++-
 tools/libxc/xc_misc.c   |  103 +++---
 tools/libxl/libxl.c |  160 ++-
 tools/libxl/libxl.h |   12 +++
 tools/libxl/libxl_freebsd.c |   12 +++
 tools/libxl/libxl_internal.h|5 +
 tools/libxl/libxl_linux.c   |   69 +++
 tools/libxl/libxl_netbsd.c  |   12 +++
 tools/libxl/libxl_types.idl |7 ++
 tools/libxl/libxl_utils.c   |8 ++
 tools/libxl/xl_cmdimpl.c|   40 +++--
 tools/misc/xenpm.c  |   51 +--
 tools/python/xen/lowlevel/xc/xc.c   |   74 ++--
 xen/common/sysctl.c |  136 ++
 xen/include/public/sysctl.h |   83 +-
 xen/xsm/flask/hooks.c   |1 +
 xen/xsm/flask/policy/access_vectors |1 +
 18 files changed, 554 insertions(+), 233 deletions(-)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 3/5] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc

2015-04-06 Thread Boris Ostrovsky
xc_cputopoinfo() is not expected to be used on a hot path and therefore
hypercall buffer management can be pushed into libxc. This will simplify
life for callers.

Also update error reporting macros.

Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
---
Changes in v6:
* Dropped NULL buffer test in xc_cputopoinfo()

 tools/libxc/include/xenctrl.h |5 ++-
 tools/libxc/xc_misc.c |   23 +++-
 tools/libxl/libxl.c   |   37 --
 tools/misc/xenpm.c|   51 -
 tools/python/xen/lowlevel/xc/xc.c |   20 ++
 5 files changed, 61 insertions(+), 75 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 552ace8..f298702 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1226,7 +1226,7 @@ int xc_readconsolering(xc_interface *xch,
 int xc_send_debug_keys(xc_interface *xch, char *keys);
 
 typedef xen_sysctl_physinfo_t xc_physinfo_t;
-typedef xen_sysctl_cputopoinfo_t xc_cputopoinfo_t;
+typedef xen_sysctl_cputopo_t xc_cputopo_t;
 typedef xen_sysctl_numainfo_t xc_numainfo_t;
 
 typedef uint32_t xc_cpu_to_node_t;
@@ -1237,7 +1237,8 @@ typedef uint64_t xc_node_to_memfree_t;
 typedef uint32_t xc_node_to_node_dist_t;
 
 int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
-int xc_cputopoinfo(xc_interface *xch, xc_cputopoinfo_t *info);
+int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus,
+   xc_cputopo_t *cputopo);
 int xc_numainfo(xc_interface *xch, xc_numainfo_t *info);
 
 int xc_sched_id(xc_interface *xch,
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index be68291..630a86c 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -177,22 +177,31 @@ int xc_physinfo(xc_interface *xch,
 return 0;
 }
 
-int xc_cputopoinfo(xc_interface *xch,
-   xc_cputopoinfo_t *put_info)
+int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus,
+   xc_cputopo_t *cputopo)
 {
 int ret;
 DECLARE_SYSCTL;
+DECLARE_HYPERCALL_BOUNCE(cputopo, *max_cpus * sizeof(*cputopo),
+ XC_HYPERCALL_BUFFER_BOUNCE_OUT);
 
-sysctl.cmd = XEN_SYSCTL_cputopoinfo;
+if ( (ret = xc_hypercall_bounce_pre(xch, cputopo)) )
+goto out;
 
-memcpy(sysctl.u.cputopoinfo, put_info, sizeof(*put_info));
+sysctl.u.cputopoinfo.num_cpus = *max_cpus;
+set_xen_guest_handle(sysctl.u.cputopoinfo.cputopo, cputopo);
+
+sysctl.cmd = XEN_SYSCTL_cputopoinfo;
 
 if ( (ret = do_sysctl(xch, sysctl)) != 0 )
-return ret;
+goto out;
 
-memcpy(put_info, sysctl.u.cputopoinfo, sizeof(*put_info));
+*max_cpus = sysctl.u.cputopoinfo.num_cpus;
 
-return 0;
+out:
+xc_hypercall_bounce_post(xch, cputopo);
+
+return ret;
 }
 
 int xc_numainfo(xc_interface *xch,
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index b7d6bb0..697c86d 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5100,37 +5100,28 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo 
*physinfo)
 libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
 {
 GC_INIT(ctx);
-xc_cputopoinfo_t tinfo;
-DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
+xc_cputopo_t *cputopo;
 libxl_cputopology *ret = NULL;
 int i;
+unsigned num_cpus;
 
-/* Setting buffer to NULL makes the hypercall return number of CPUs */
-set_xen_guest_handle(tinfo.cputopo, HYPERCALL_BUFFER_NULL);
-if (xc_cputopoinfo(ctx-xch, tinfo) != 0)
+/* Setting buffer to NULL makes the call return number of CPUs */
+if (xc_cputopoinfo(ctx-xch, num_cpus, NULL))
 {
-LIBXL__LOG(ctx, XTL_ERROR, Unable to determine number of CPUS);
-ret = NULL;
+LOGEV(ERROR, errno, Unable to determine number of CPUS);
 goto out;
 }
 
-cputopo = xc_hypercall_buffer_alloc(ctx-xch, cputopo,
-sizeof(*cputopo) * tinfo.num_cpus);
-if (cputopo == NULL) {
-LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
-Unable to allocate hypercall arguments);
-goto fail;
-}
-set_xen_guest_handle(tinfo.cputopo, cputopo);
+cputopo = libxl__zalloc(gc, sizeof(*cputopo) * num_cpus);
 
-if (xc_cputopoinfo(ctx-xch, tinfo) != 0) {
-LIBXL__LOG_ERRNO(ctx, XTL_ERROR, CPU topology info hypercall failed);
-goto fail;
+if (xc_cputopoinfo(ctx-xch, num_cpus, cputopo)) {
+LOGEV(ERROR, errno, CPU topology info hypercall failed);
+goto out;
 }
 
-ret = libxl__zalloc(NOGC, sizeof(libxl_cputopology) * tinfo.num_cpus);
+ret = libxl__zalloc(NOGC, sizeof(libxl_cputopology) * num_cpus);
 
-for (i = 0; i  tinfo.num_cpus; i++) {
+for (i = 0; i  num_cpus; i++) {
 #define V(map, i, invalid) ( cputopo[i].map == invalid) ? \
LIBXL_CPUTOPOLOGY_INVALID_ENTRY : cputopo[i].map
 

[Xen-devel] [PATCH v15 01/15] qspinlock: A simple generic 4-byte queue spinlock

2015-04-06 Thread Waiman Long
This patch introduces a new generic queue spinlock implementation that
can serve as an alternative to the default ticket spinlock. Compared
with the ticket spinlock, this queue spinlock should be almost as fair
as the ticket spinlock. It has about the same speed in single-thread
and it can be much faster in high contention situations especially when
the spinlock is embedded within the data structure to be protected.

Only in light to moderate contention where the average queue depth
is around 1-3 will this queue spinlock be potentially a bit slower
due to the higher slowpath overhead.

This queue spinlock is especially suit to NUMA machines with a large
number of cores as the chance of spinlock contention is much higher
in those machines. The cost of contention is also higher because of
slower inter-node memory traffic.

Due to the fact that spinlocks are acquired with preemption disabled,
the process will not be migrated to another CPU while it is trying
to get a spinlock. Ignoring interrupt handling, a CPU can only be
contending in one spinlock at any one time. Counting soft IRQ, hard
IRQ and NMI, a CPU can only have a maximum of 4 concurrent lock waiting
activities.  By allocating a set of per-cpu queue nodes and used them
to form a waiting queue, we can encode the queue node address into a
much smaller 24-bit size (including CPU number and queue node index)
leaving one byte for the lock.

Please note that the queue node is only needed when waiting for the
lock. Once the lock is acquired, the queue node can be released to
be used later.

Signed-off-by: Waiman Long waiman.l...@hp.com
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
---
 include/asm-generic/qspinlock.h   |  132 +
 include/asm-generic/qspinlock_types.h |   58 +
 kernel/Kconfig.locks  |7 +
 kernel/locking/Makefile   |1 +
 kernel/locking/mcs_spinlock.h |1 +
 kernel/locking/qspinlock.c|  209 +
 6 files changed, 408 insertions(+), 0 deletions(-)
 create mode 100644 include/asm-generic/qspinlock.h
 create mode 100644 include/asm-generic/qspinlock_types.h
 create mode 100644 kernel/locking/qspinlock.c

diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
new file mode 100644
index 000..315d6dc
--- /dev/null
+++ b/include/asm-generic/qspinlock.h
@@ -0,0 +1,132 @@
+/*
+ * Queue spinlock
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P.
+ *
+ * Authors: Waiman Long waiman.l...@hp.com
+ */
+#ifndef __ASM_GENERIC_QSPINLOCK_H
+#define __ASM_GENERIC_QSPINLOCK_H
+
+#include asm-generic/qspinlock_types.h
+
+/**
+ * queue_spin_is_locked - is the spinlock locked?
+ * @lock: Pointer to queue spinlock structure
+ * Return: 1 if it is locked, 0 otherwise
+ */
+static __always_inline int queue_spin_is_locked(struct qspinlock *lock)
+{
+   return atomic_read(lock-val);
+}
+
+/**
+ * queue_spin_value_unlocked - is the spinlock structure unlocked?
+ * @lock: queue spinlock structure
+ * Return: 1 if it is unlocked, 0 otherwise
+ *
+ * N.B. Whenever there are tasks waiting for the lock, it is considered
+ *  locked wrt the lockref code to avoid lock stealing by the lockref
+ *  code and change things underneath the lock. This also allows some
+ *  optimizations to be applied without conflict with lockref.
+ */
+static __always_inline int queue_spin_value_unlocked(struct qspinlock lock)
+{
+   return !atomic_read(lock.val);
+}
+
+/**
+ * queue_spin_is_contended - check if the lock is contended
+ * @lock : Pointer to queue spinlock structure
+ * Return: 1 if lock contended, 0 otherwise
+ */
+static __always_inline int queue_spin_is_contended(struct qspinlock *lock)
+{
+   return atomic_read(lock-val)  ~_Q_LOCKED_MASK;
+}
+/**
+ * queue_spin_trylock - try to acquire the queue spinlock
+ * @lock : Pointer to queue spinlock structure
+ * Return: 1 if lock acquired, 0 if failed
+ */
+static __always_inline int queue_spin_trylock(struct qspinlock *lock)
+{
+   if (!atomic_read(lock-val) 
+  (atomic_cmpxchg(lock-val, 0, _Q_LOCKED_VAL) == 0))
+   return 1;
+   return 0;
+}
+
+extern void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+
+/**
+ * queue_spin_lock - acquire a queue spinlock
+ * @lock: Pointer to queue spinlock structure
+ */
+static __always_inline void queue_spin_lock(struct qspinlock *lock)
+{
+   u32 

[Xen-devel] [PATCH v15 11/15] pvqspinlock, x86: Enable PV qspinlock for KVM

2015-04-06 Thread Waiman Long
This patch adds the necessary KVM specific code to allow KVM to
support the CPU halting and kicking operations needed by the queue
spinlock PV code.

Signed-off-by: Waiman Long waiman.l...@hp.com
---
 arch/x86/kernel/kvm.c |   43 +++
 kernel/Kconfig.locks  |2 +-
 2 files changed, 44 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index e354cc6..4bb42c0 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -584,6 +584,39 @@ static void kvm_kick_cpu(int cpu)
kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
 }
 
+
+#ifdef CONFIG_QUEUE_SPINLOCK
+
+#include asm/qspinlock.h
+
+static void kvm_wait(u8 *ptr, u8 val)
+{
+   unsigned long flags;
+
+   if (in_nmi())
+   return;
+
+   local_irq_save(flags);
+
+   if (READ_ONCE(*ptr) != val)
+   goto out;
+
+   /*
+* halt until it's our turn and kicked. Note that we do safe halt
+* for irq enabled case to avoid hang when lock info is overwritten
+* in irq spinlock slowpath and no spurious interrupt occur to save us.
+*/
+   if (arch_irqs_disabled_flags(flags))
+   halt();
+   else
+   safe_halt();
+
+out:
+   local_irq_restore(flags);
+}
+
+#else /* !CONFIG_QUEUE_SPINLOCK */
+
 enum kvm_contention_stat {
TAKEN_SLOW,
TAKEN_SLOW_PICKUP,
@@ -817,6 +850,8 @@ static void kvm_unlock_kick(struct arch_spinlock *lock, 
__ticket_t ticket)
}
 }
 
+#endif /* !CONFIG_QUEUE_SPINLOCK */
+
 /*
  * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
  */
@@ -828,8 +863,16 @@ void __init kvm_spinlock_init(void)
if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
return;
 
+#ifdef CONFIG_QUEUE_SPINLOCK
+   __pv_init_lock_hash();
+   pv_lock_ops.queue_spin_lock_slowpath = __pv_queue_spin_lock_slowpath;
+   pv_lock_ops.queue_spin_unlock = PV_CALLEE_SAVE(__pv_queue_spin_unlock);
+   pv_lock_ops.wait = kvm_wait;
+   pv_lock_ops.kick = kvm_kick_cpu;
+#else /* !CONFIG_QUEUE_SPINLOCK */
pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
pv_lock_ops.unlock_kick = kvm_unlock_kick;
+#endif
 }
 
 static __init int kvm_spinlock_init_jump(void)
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index c6a8f7c..537b13e 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -240,7 +240,7 @@ config ARCH_USE_QUEUE_SPINLOCK
 
 config QUEUE_SPINLOCK
def_bool y if ARCH_USE_QUEUE_SPINLOCK
-   depends on SMP  !PARAVIRT_SPINLOCKS
+   depends on SMP  (!PARAVIRT_SPINLOCKS || !XEN)
 
 config ARCH_USE_QUEUE_RWLOCK
bool
-- 
1.7.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v15 04/15] qspinlock: Extract out code snippets for the next patch

2015-04-06 Thread Waiman Long
This is a preparatory patch that extracts out the following 2 code
snippets to prepare for the next performance optimization patch.

 1) the logic for the exchange of new and previous tail code words
into a new xchg_tail() function.
 2) the logic for clearing the pending bit and setting the locked bit
into a new clear_pending_set_locked() function.

This patch also simplifies the trylock operation before queuing by
calling queue_spin_trylock() directly.

Signed-off-by: Waiman Long waiman.l...@hp.com
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
---
 include/asm-generic/qspinlock_types.h |2 +
 kernel/locking/qspinlock.c|   79 -
 2 files changed, 50 insertions(+), 31 deletions(-)

diff --git a/include/asm-generic/qspinlock_types.h 
b/include/asm-generic/qspinlock_types.h
index 9c3f5c2..ef36613 100644
--- a/include/asm-generic/qspinlock_types.h
+++ b/include/asm-generic/qspinlock_types.h
@@ -58,6 +58,8 @@ typedef struct qspinlock {
 #define _Q_TAIL_CPU_BITS   (32 - _Q_TAIL_CPU_OFFSET)
 #define _Q_TAIL_CPU_MASK   _Q_SET_MASK(TAIL_CPU)
 
+#define _Q_TAIL_MASK   (_Q_TAIL_IDX_MASK | _Q_TAIL_CPU_MASK)
+
 #define _Q_LOCKED_VAL  (1U  _Q_LOCKED_OFFSET)
 #define _Q_PENDING_VAL (1U  _Q_PENDING_OFFSET)
 
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 0351f78..11f6ad9 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -97,6 +97,42 @@ static inline struct mcs_spinlock *decode_tail(u32 tail)
 #define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
 
 /**
+ * clear_pending_set_locked - take ownership and clear the pending bit.
+ * @lock: Pointer to queue spinlock structure
+ *
+ * *,1,0 - *,0,1
+ */
+static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
+{
+   atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, lock-val);
+}
+
+/**
+ * xchg_tail - Put in the new queue tail code word  retrieve previous one
+ * @lock : Pointer to queue spinlock structure
+ * @tail : The new queue tail code word
+ * Return: The previous queue tail code word
+ *
+ * xchg(lock, tail)
+ *
+ * p,*,* - n,*,* ; prev = xchg(lock, node)
+ */
+static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
+{
+   u32 old, new, val = atomic_read(lock-val);
+
+   for (;;) {
+   new = (val  _Q_LOCKED_PENDING_MASK) | tail;
+   old = atomic_cmpxchg(lock-val, val, new);
+   if (old == val)
+   break;
+
+   val = old;
+   }
+   return old;
+}
+
+/**
  * queue_spin_lock_slowpath - acquire the queue spinlock
  * @lock: Pointer to queue spinlock structure
  * @val: Current value of the queue spinlock 32-bit word
@@ -178,15 +214,7 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 
val)
 *
 * *,1,0 - *,0,1
 */
-   for (;;) {
-   new = (val  ~_Q_PENDING_MASK) | _Q_LOCKED_VAL;
-
-   old = atomic_cmpxchg(lock-val, val, new);
-   if (old == val)
-   break;
-
-   val = old;
-   }
+   clear_pending_set_locked(lock);
return;
 
/*
@@ -203,37 +231,26 @@ queue:
node-next = NULL;
 
/*
-* We have already touched the queueing cacheline; don't bother with
-* pending stuff.
-*
-* trylock || xchg(lock, node)
-*
-* 0,0,0 - 0,0,1 ; no tail, not locked - no tail, locked.
-* p,y,x - n,y,x ; tail was p - tail is n; preserving locked.
+* We touched a (possibly) cold cacheline in the per-cpu queue node;
+* attempt the trylock once more in the hope someone let go while we
+* weren't watching.
 */
-   for (;;) {
-   new = _Q_LOCKED_VAL;
-   if (val)
-   new = tail | (val  _Q_LOCKED_PENDING_MASK);
-
-   old = atomic_cmpxchg(lock-val, val, new);
-   if (old == val)
-   break;
-
-   val = old;
-   }
+   if (queue_spin_trylock(lock))
+   goto release;
 
/*
-* we won the trylock; forget about queueing.
+* We have already touched the queueing cacheline; don't bother with
+* pending stuff.
+*
+* p,*,* - n,*,*
 */
-   if (new == _Q_LOCKED_VAL)
-   goto release;
+   old = xchg_tail(lock, tail);
 
/*
 * if there was a previous node; link it and wait until reaching the
 * head of the waitqueue.
 */
-   if (old  ~_Q_LOCKED_PENDING_MASK) {
+   if (old  _Q_TAIL_MASK) {
prev = decode_tail(old);
WRITE_ONCE(prev-next, node);
 
-- 
1.7.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v15 07/15] qspinlock: Revert to test-and-set on hypervisors

2015-04-06 Thread Waiman Long
From: Peter Zijlstra (Intel) pet...@infradead.org

When we detect a hypervisor (!paravirt, see qspinlock paravirt support
patches), revert to a simple test-and-set lock to avoid the horrors
of queue preemption.

Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Signed-off-by: Waiman Long waiman.l...@hp.com
---
 arch/x86/include/asm/qspinlock.h |   14 ++
 include/asm-generic/qspinlock.h  |7 +++
 kernel/locking/qspinlock.c   |3 +++
 3 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 222995b..64c925e 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -1,6 +1,7 @@
 #ifndef _ASM_X86_QSPINLOCK_H
 #define _ASM_X86_QSPINLOCK_H
 
+#include asm/cpufeature.h
 #include asm-generic/qspinlock_types.h
 
 #definequeue_spin_unlock queue_spin_unlock
@@ -15,6 +16,19 @@ static inline void queue_spin_unlock(struct qspinlock *lock)
smp_store_release((u8 *)lock, 0);
 }
 
+#define virt_queue_spin_lock virt_queue_spin_lock
+
+static inline bool virt_queue_spin_lock(struct qspinlock *lock)
+{
+   if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
+   return false;
+
+   while (atomic_cmpxchg(lock-val, 0, _Q_LOCKED_VAL) != 0)
+   cpu_relax();
+
+   return true;
+}
+
 #include asm-generic/qspinlock.h
 
 #endif /* _ASM_X86_QSPINLOCK_H */
diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
index 315d6dc..bcbbc5e 100644
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -111,6 +111,13 @@ static inline void queue_spin_unlock_wait(struct qspinlock 
*lock)
cpu_relax();
 }
 
+#ifndef virt_queue_spin_lock
+static __always_inline bool virt_queue_spin_lock(struct qspinlock *lock)
+{
+   return false;
+}
+#endif
+
 /*
  * Initializier
  */
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 99503ef..fc2e5ab 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -249,6 +249,9 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 
val)
 
BUILD_BUG_ON(CONFIG_NR_CPUS = (1U  _Q_TAIL_CPU_BITS));
 
+   if (virt_queue_spin_lock(lock))
+   return;
+
/*
 * wait for in-progress pending-locked hand-overs
 *
-- 
1.7.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v15 03/15] qspinlock: Add pending bit

2015-04-06 Thread Waiman Long
From: Peter Zijlstra (Intel) pet...@infradead.org

Because the qspinlock needs to touch a second cacheline (the per-cpu
mcs_nodes[]); add a pending bit and allow a single in-word spinner
before we punt to the second cacheline.

It is possible so observe the pending bit without the locked bit when
the last owner has just released but the pending owner has not yet
taken ownership.

In this case we would normally queue -- because the pending bit is
already taken. However, in this case the pending bit is guaranteed
to be released 'soon', therefore wait for it and avoid queueing.

Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Signed-off-by: Waiman Long waiman.l...@hp.com
---
 include/asm-generic/qspinlock_types.h |   12 +++-
 kernel/locking/qspinlock.c|  119 +++--
 2 files changed, 107 insertions(+), 24 deletions(-)

diff --git a/include/asm-generic/qspinlock_types.h 
b/include/asm-generic/qspinlock_types.h
index c9348d8..9c3f5c2 100644
--- a/include/asm-generic/qspinlock_types.h
+++ b/include/asm-generic/qspinlock_types.h
@@ -36,8 +36,9 @@ typedef struct qspinlock {
  * Bitfields in the atomic value:
  *
  *  0- 7: locked byte
- *  8- 9: tail index
- * 10-31: tail cpu (+1)
+ * 8: pending
+ *  9-10: tail index
+ * 11-31: tail cpu (+1)
  */
 #define_Q_SET_MASK(type)   (((1U  _Q_ ## type ## _BITS) - 1)\
   _Q_ ## type ## _OFFSET)
@@ -45,7 +46,11 @@ typedef struct qspinlock {
 #define _Q_LOCKED_BITS 8
 #define _Q_LOCKED_MASK _Q_SET_MASK(LOCKED)
 
-#define _Q_TAIL_IDX_OFFSET (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS)
+#define _Q_PENDING_OFFSET  (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS)
+#define _Q_PENDING_BITS1
+#define _Q_PENDING_MASK_Q_SET_MASK(PENDING)
+
+#define _Q_TAIL_IDX_OFFSET (_Q_PENDING_OFFSET + _Q_PENDING_BITS)
 #define _Q_TAIL_IDX_BITS   2
 #define _Q_TAIL_IDX_MASK   _Q_SET_MASK(TAIL_IDX)
 
@@ -54,5 +59,6 @@ typedef struct qspinlock {
 #define _Q_TAIL_CPU_MASK   _Q_SET_MASK(TAIL_CPU)
 
 #define _Q_LOCKED_VAL  (1U  _Q_LOCKED_OFFSET)
+#define _Q_PENDING_VAL (1U  _Q_PENDING_OFFSET)
 
 #endif /* __ASM_GENERIC_QSPINLOCK_TYPES_H */
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 3456819..0351f78 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -94,24 +94,28 @@ static inline struct mcs_spinlock *decode_tail(u32 tail)
return per_cpu_ptr(mcs_nodes[idx], cpu);
 }
 
+#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
+
 /**
  * queue_spin_lock_slowpath - acquire the queue spinlock
  * @lock: Pointer to queue spinlock structure
  * @val: Current value of the queue spinlock 32-bit word
  *
- * (queue tail, lock value)
- *
- *  fast  :slow  :
unlock
- *:  :
- * uncontended  (0,0)   --:-- (0,1) :-- (*,0)
- *:   | ^./  :
- *:   v   \   |  :
- * uncontended:(n,x) --+-- (n,0) |  :
- *   queue:   | ^--'  |  :
- *:   v   |  :
- * contended  :(*,x) --+-- (*,0) - (*,1) ---'  :
- *   queue: ^--' :
+ * (queue tail, pending bit, lock value)
  *
+ *  fast :slow  :unlock
+ *   :  :
+ * uncontended  (0,0,0) -:-- (0,0,1) --:-- 
(*,*,0)
+ *   :   | ^.--. /  :
+ *   :   v   \  \|  :
+ * pending   :(0,1,1) +-- (0,1,0)   \   |  :
+ *   :   | ^--'  |   |  :
+ *   :   v   |   |  :
+ * uncontended   :(n,x,y) +-- (n,0,0) --'   |  :
+ *   queue   :   | ^--'  |  :
+ *   :   v   |  :
+ * contended :(*,x,y) +-- (*,0,0) --- (*,0,1) -'  :
+ *   queue   : ^--' :
  */
 void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 {
@@ -121,6 +125,75 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 
val)
 
BUILD_BUG_ON(CONFIG_NR_CPUS = (1U  _Q_TAIL_CPU_BITS));
 
+   /*
+* wait for in-progress pending-locked hand-overs
+*
+* 0,1,0 - 0,0,1
+*/
+   if (val == _Q_PENDING_VAL) {
+   while ((val = atomic_read(lock-val)) == _Q_PENDING_VAL)
+ 

[Xen-devel] [PATCH v15 05/15] qspinlock: Optimize for smaller NR_CPUS

2015-04-06 Thread Waiman Long
From: Peter Zijlstra (Intel) pet...@infradead.org

When we allow for a max NR_CPUS  2^14 we can optimize the pending
wait-acquire and the xchg_tail() operations.

By growing the pending bit to a byte, we reduce the tail to 16bit.
This means we can use xchg16 for the tail part and do away with all
the repeated compxchg() operations.

This in turn allows us to unconditionally acquire; the locked state
as observed by the wait loops cannot change. And because both locked
and pending are now a full byte we can use simple stores for the
state transition, obviating one atomic operation entirely.

This optimization is needed to make the qspinlock achieve performance
parity with ticket spinlock at light load.

All this is horribly broken on Alpha pre EV56 (and any other arch that
cannot do single-copy atomic byte stores).

Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Signed-off-by: Waiman Long waiman.l...@hp.com
---
 include/asm-generic/qspinlock_types.h |   13 ++
 kernel/locking/qspinlock.c|   69 -
 2 files changed, 81 insertions(+), 1 deletions(-)

diff --git a/include/asm-generic/qspinlock_types.h 
b/include/asm-generic/qspinlock_types.h
index ef36613..f01b55d 100644
--- a/include/asm-generic/qspinlock_types.h
+++ b/include/asm-generic/qspinlock_types.h
@@ -35,6 +35,14 @@ typedef struct qspinlock {
 /*
  * Bitfields in the atomic value:
  *
+ * When NR_CPUS  16K
+ *  0- 7: locked byte
+ * 8: pending
+ *  9-15: not used
+ * 16-17: tail index
+ * 18-31: tail cpu (+1)
+ *
+ * When NR_CPUS = 16K
  *  0- 7: locked byte
  * 8: pending
  *  9-10: tail index
@@ -47,7 +55,11 @@ typedef struct qspinlock {
 #define _Q_LOCKED_MASK _Q_SET_MASK(LOCKED)
 
 #define _Q_PENDING_OFFSET  (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS)
+#if CONFIG_NR_CPUS  (1U  14)
+#define _Q_PENDING_BITS8
+#else
 #define _Q_PENDING_BITS1
+#endif
 #define _Q_PENDING_MASK_Q_SET_MASK(PENDING)
 
 #define _Q_TAIL_IDX_OFFSET (_Q_PENDING_OFFSET + _Q_PENDING_BITS)
@@ -58,6 +70,7 @@ typedef struct qspinlock {
 #define _Q_TAIL_CPU_BITS   (32 - _Q_TAIL_CPU_OFFSET)
 #define _Q_TAIL_CPU_MASK   _Q_SET_MASK(TAIL_CPU)
 
+#define _Q_TAIL_OFFSET _Q_TAIL_IDX_OFFSET
 #define _Q_TAIL_MASK   (_Q_TAIL_IDX_MASK | _Q_TAIL_CPU_MASK)
 
 #define _Q_LOCKED_VAL  (1U  _Q_LOCKED_OFFSET)
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 11f6ad9..bcc99e6 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -24,6 +24,7 @@
 #include linux/percpu.h
 #include linux/hardirq.h
 #include linux/mutex.h
+#include asm/byteorder.h
 #include asm/qspinlock.h
 
 /*
@@ -56,6 +57,10 @@
  * node; whereby avoiding the need to carry a node from lock to unlock, and
  * preserving existing lock API. This also makes the unlock code simpler and
  * faster.
+ *
+ * N.B. The current implementation only supports architectures that allow
+ *  atomic operations on smaller 8-bit and 16-bit data types.
+ *
  */
 
 #include mcs_spinlock.h
@@ -96,6 +101,62 @@ static inline struct mcs_spinlock *decode_tail(u32 tail)
 
 #define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
 
+/*
+ * By using the whole 2nd least significant byte for the pending bit, we
+ * can allow better optimization of the lock acquisition for the pending
+ * bit holder.
+ */
+#if _Q_PENDING_BITS == 8
+
+struct __qspinlock {
+   union {
+   atomic_t val;
+   struct {
+#ifdef __LITTLE_ENDIAN
+   u16 locked_pending;
+   u16 tail;
+#else
+   u16 tail;
+   u16 locked_pending;
+#endif
+   };
+   };
+};
+
+/**
+ * clear_pending_set_locked - take ownership and clear the pending bit.
+ * @lock: Pointer to queue spinlock structure
+ *
+ * *,1,0 - *,0,1
+ *
+ * Lock stealing is not allowed if this function is used.
+ */
+static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
+{
+   struct __qspinlock *l = (void *)lock;
+
+   WRITE_ONCE(l-locked_pending, _Q_LOCKED_VAL);
+}
+
+/*
+ * xchg_tail - Put in the new queue tail code word  retrieve previous one
+ * @lock : Pointer to queue spinlock structure
+ * @tail : The new queue tail code word
+ * Return: The previous queue tail code word
+ *
+ * xchg(lock, tail)
+ *
+ * p,*,* - n,*,* ; prev = xchg(lock, node)
+ */
+static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
+{
+   struct __qspinlock *l = (void *)lock;
+
+   return (u32)xchg(l-tail, tail  _Q_TAIL_OFFSET)  _Q_TAIL_OFFSET;
+}
+
+#else /* _Q_PENDING_BITS == 8 */
+
 /**
  * clear_pending_set_locked - take ownership and clear the pending bit.
  * @lock: Pointer to queue spinlock structure
@@ -131,6 +192,7 @@ static __always_inline u32 xchg_tail(struct qspinlock 
*lock, u32 tail)
}
return old;
 }
+#endif /* _Q_PENDING_BITS == 8 */
 
 

Re: [Xen-devel] [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running

2015-04-06 Thread Wu, Feng


 -Original Message-
 From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com]
 Sent: Friday, April 03, 2015 9:37 PM
 To: Wu, Feng
 Cc: Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org;
 jbeul...@suse.com
 Subject: Re: [Xen-devel] [RFC v1 12/15] vmx: Properly handle notification 
 event
 when vCPU is running
 
 On Fri, Apr 03, 2015 at 02:00:24AM +, Wu, Feng wrote:
 
 
   -Original Message-
   From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com]
   Sent: Friday, April 03, 2015 3:15 AM
   To: Tian, Kevin
   Cc: Wu, Feng; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org;
   jbeul...@suse.com
   Subject: Re: [Xen-devel] [RFC v1 12/15] vmx: Properly handle notification
 event
   when vCPU is running
  
   On Thu, Apr 02, 2015 at 06:08:12AM +, Tian, Kevin wrote:
 From: Wu, Feng
 Sent: Friday, March 27, 2015 12:58 PM



  -Original Message-
  From: Zhang, Yang Z
  Sent: Friday, March 27, 2015 12:44 PM
  To: Wu, Feng; xen-devel@lists.xen.org
  Cc: jbeul...@suse.com; k...@xen.org; Tian, Kevin
  Subject: RE: [RFC v1 12/15] vmx: Properly handle notification event
 when
 vCPU
  is running
 
  Wu, Feng wrote on 2015-03-27:
  
  
   Zhang, Yang Z wrote on 2015-03-25:
   when vCPU is running
  
   Wu, Feng wrote on 2015-03-25:
   When a vCPU is running in Root mode and a notification event has
   been injected to it. we need to set VCPU_KICK_SOFTIRQ for the
   current cpu, so the pending interrupt in PIRR will be synced to
   vIRR before
  
   This would imply that we had VMEXIT-ed due to pending interrupt? And we
   end up calling 'do_IRQ'? If so then the DPCI_SOFTIRQ ends up being set
   and you stll end up calling the softirq code?
 
  No.
 
  Here is the scenario for the description of this patch:
 
  When vCPU is running in root-mode (such as via hypercall, or any other
  reasons which can result in VM-Exit), and before vCPU is back to non-root,
  external interrupts happen. Notice that the VM-exit is not caused by this
  external interrupt.
 
 Thank you for the explanation. You might want to add that in the commit
 along with the explanation of the code flow below!

Good idea! Thank you!

Thanks,
Feng

 
 
  Thanks,
  Feng
 
  
   VM-Exit in time.
  
   Shouldn't the pending interrupt be synced unconditionally before
 next
   vmentry? What happens if we didn't set the softirq?
  
   If we didn't set the softirq in the notification handler, the
   interrupts happened exactly before VM-entry cannot be delivered to
   guest at this time. Please see the following code fragments from
   xen/arch/x86/hvm/vmx/entry.S: (pls pay attention to the comments)
  
   .Lvmx_do_vmentry
  
   ..
 /* If Vt-d engine issues a notification event here,
* it cannot be delivered to guest during this VM-entry
* without raising the softirq in notification handler. */
   cmp  %ecx,(%rdx,%rax,1)
   jnz  .Lvmx_process_softirqs
   ..
  
   je   .Lvmx_launch
   ..
  
  
   .Lvmx_process_softirqs:
   sti
   call do_softirq
   jmp  .Lvmx_do_vmentry
 
  You are right! This helps me to recall why raise the softirq when
 delivering
 the
  PI.

 Yes, __vmx_deliver_posted_interrupt() is the software way to deliver 
 PI,
 it
   sets
 the
 softirq for this purpose, however, when VT-d HW delivers PI, we have 
 no
 control to
 the HW itself, hence we need to set this softirq in the Notification 
 Event
 handler.

   
could you include this information in the comment so others can easily
understand this requirement? from code you only mentioned VCPU_KICK
_SOFTIRQ is required, but how it leads to PIRR-VIRR sync is not 
explained.
   
Thanks
Kevin
   
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Nested Virt - Xen 4.4 through 4.6 - Hyper-V; Can't boot after enabling Hyper-V

2015-04-06 Thread mailing lists
Hi --

I've been trying to get nested virtualization working with Xen so that I
could boot Windows and use Hyper-V related features, however I have not had
much success.  Using Windows 8.1 or Windows 2012r2, I'm able to install
Windows, select and install Hyper-V features, and start rebooting.
However, at that point, the Windows VM only partially boots, then drops me
to a screen stating:

Your PC needs to restart.
Please hold down the power button.
Error Code: 0x001E
Parameters:
0xC096
0xF80315430485
0x
0x


Restarting does not yield any different results.

I've set up Xen in accordance with the notes for patches and config options
here:

http://wiki.xenproject.org/wiki/Nested_Virtualization_in_Xen

Trying Xen 4.4.2 stable, 4.5.1 staging, and 4.6 staging.  I applied the
patch labeled (2/2) from the wiki link above, compiled, and used the three
options provided for the DomU running Windows (hap, nestedhvm, and cpuid
mask).  Windows installs and allows me to turn on HyperV features on all
versions of Xen listed above, however all give the same or similar message
on reboot... I'm never able to get to a running state.

I've tried this on two separate systems.  One has an Intel E5-1620 v2, and
the other is a n E5-1650 (original, v1 I guess).  All the virtualization
options are enabled in the BIOS.

If the cpuid mask is removed from the DomU config, Windows boots, however
I'm unable to start any virtual machines (there was a message in the
Windows event log about a component not being started in regards to Hyper
V).

Has anyone else run into similar issues?  Any thoughts on next steps?
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v15 14/15] pvqspinlock: Improve slowpath performance by avoiding cmpxchg

2015-04-06 Thread Waiman Long
In the pv_scan_next() function, the slow cmpxchg atomic operation is
performed even if the other CPU is not even close to being halted. This
extra cmpxchg can harm slowpath performance.

This patch introduces the new mayhalt flag to indicate if the other
spinning CPU is close to being halted or not. The current threshold
for x86 is 2k cpu_relax() calls. If this flag is not set, the other
spinning CPU will have at least 2k more cpu_relax() calls before
it can enter the halt state. This should give enough time for the
setting of the locked flag in struct mcs_spinlock to propagate to
that CPU without using atomic op.

Signed-off-by: Waiman Long waiman.l...@hp.com
---
 kernel/locking/qspinlock_paravirt.h |   28 +---
 1 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index a210061..a9fe10d 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -16,7 +16,8 @@
  * native_queue_spin_unlock().
  */
 
-#define _Q_SLOW_VAL(3U  _Q_LOCKED_OFFSET)
+#define _Q_SLOW_VAL(3U  _Q_LOCKED_OFFSET)
+#define MAYHALT_THRESHOLD  (SPIN_THRESHOLD  4)
 
 /*
  * The vcpu_hashed is a special state that is set by the new lock holder on
@@ -36,6 +37,7 @@ struct pv_node {
 
int cpu;
u8  state;
+   u8  mayhalt;
 };
 
 /*
@@ -187,6 +189,7 @@ static void pv_init_node(struct mcs_spinlock *node)
 
pn-cpu = smp_processor_id();
pn-state = vcpu_running;
+   pn-mayhalt = false;
 }
 
 /*
@@ -203,17 +206,27 @@ static void pv_wait_node(struct mcs_spinlock *node)
for (loop = SPIN_THRESHOLD; loop; loop--) {
if (READ_ONCE(node-locked))
return;
+   if (loop == MAYHALT_THRESHOLD)
+   xchg(pn-mayhalt, true);
cpu_relax();
}
 
/*
-* Order pn-state vs pn-locked thusly:
+* Order pn-state/pn-mayhalt vs pn-locked thusly:
 *
-* [S] pn-state = vcpu_halted[S] next-locked = 1
+* [S] pn-mayhalt = 1[S] next-locked = 1
+* MB, delay  barrier()
+* [S] pn-state = vcpu_halted[L] pn-mayhalt
 * MB MB
 * [L] pn-locked   [RmW] pn-state = vcpu_hashed
 *
 * Matches the cmpxchg() from pv_scan_next().
+*
+* As the new lock holder may quit (when pn-mayhalt is not
+* set) without memory barrier, a sufficiently long delay is
+* inserted between the setting of pn-mayhalt and pn-state
+* to ensure that there is enough time for the new pn-locked
+* value to be propagated here to be checked below.
 */
(void)xchg(pn-state, vcpu_halted);
 
@@ -226,6 +239,7 @@ static void pv_wait_node(struct mcs_spinlock *node)
 * needs to move on to pv_wait_head().
 */
(void)cmpxchg(pn-state, vcpu_halted, vcpu_running);
+   pn-mayhalt = false;
}
 
/*
@@ -246,6 +260,14 @@ static void pv_scan_next(struct qspinlock *lock, struct 
mcs_spinlock *node)
struct __qspinlock *l = (void *)lock;
 
/*
+* If mayhalt is not set, there is enough time for the just set value
+* in pn-locked to be propagated to the other CPU before it is time
+* to halt.
+*/
+   if (!READ_ONCE(pn-mayhalt))
+   return;
+
+   /*
 * Transition CPU state: halted = hashed
 * Quit if the transition failed.
 */
-- 
1.7.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support

2015-04-06 Thread Waiman Long
v14-v15:
 - Incorporate PeterZ's v15 qspinlock patch and improve upon the PV
   qspinlock code by dynamically allocating the hash table as well
   as some other performance optimization.
 - Simplified the Xen PV qspinlock code as suggested by David Vrabel
   david.vra...@citrix.com.
 - Add benchmarking data for 3.19 kernel to compare the performance
   of a spinlock heavy test with and without the qspinlock patch
   under different cpufreq drivers and scaling governors.

v13-v14:
 - Patches 1  2: Add queue_spin_unlock_wait() to accommodate commit
   78bff1c86 from Oleg Nesterov.
 - Fix the system hang problem when using PV qspinlock in an
   over-committed guest due to a racing condition in the
   pv_set_head_in_tail() function.
 - Increase the MAYHALT_THRESHOLD from 10 to 1024.
 - Change kick_cpu into a regular function pointer instead of a
   callee-saved function.
 - Change lock statistics code to use separate bits for different
   statistics.

v12-v13:
 - Change patch 9 to generate separate versions of the
   queue_spin_lock_slowpath functions for bare metal and PV guest. This
   reduces the performance impact of the PV code on bare metal systems.

v11-v12:
 - Based on PeterZ's version of the qspinlock patch
   (https://lkml.org/lkml/2014/6/15/63).
 - Incorporated many of the review comments from Konrad Wilk and
   Paolo Bonzini.
 - The pvqspinlock code is largely from my previous version with
   PeterZ's way of going from queue tail to head and his idea of
   using callee saved calls to KVM and XEN codes.

v10-v11:
  - Use a simple test-and-set unfair lock to simplify the code,
but performance may suffer a bit for large guest with many CPUs.
  - Take out Raghavendra KT's test results as the unfair lock changes
may render some of his results invalid.
  - Add PV support without increasing the size of the core queue node
structure.
  - Other minor changes to address some of the feedback comments.

v9-v10:
  - Make some minor changes to qspinlock.c to accommodate review feedback.
  - Change author to PeterZ for 2 of the patches.
  - Include Raghavendra KT's test results in patch 18.

v8-v9:
  - Integrate PeterZ's version of the queue spinlock patch with some
modification:
http://lkml.kernel.org/r/20140310154236.038181...@infradead.org
  - Break the more complex patches into smaller ones to ease review effort.
  - Fix a racing condition in the PV qspinlock code.

v7-v8:
  - Remove one unneeded atomic operation from the slowpath, thus
improving performance.
  - Simplify some of the codes and add more comments.
  - Test for X86_FEATURE_HYPERVISOR CPU feature bit to enable/disable
unfair lock.
  - Reduce unfair lock slowpath lock stealing frequency depending
on its distance from the queue head.
  - Add performance data for IvyBridge-EX CPU.

v6-v7:
  - Remove an atomic operation from the 2-task contending code
  - Shorten the names of some macros
  - Make the queue waiter to attempt to steal lock when unfair lock is
enabled.
  - Remove lock holder kick from the PV code and fix a race condition
  - Run the unfair lock  PV code on overcommitted KVM guests to collect
performance data.

v5-v6:
 - Change the optimized 2-task contending code to make it fairer at the
   expense of a bit of performance.
 - Add a patch to support unfair queue spinlock for Xen.
 - Modify the PV qspinlock code to follow what was done in the PV
   ticketlock.
 - Add performance data for the unfair lock as well as the PV
   support code.

v4-v5:
 - Move the optimized 2-task contending code to the generic file to
   enable more architectures to use it without code duplication.
 - Address some of the style-related comments by PeterZ.
 - Allow the use of unfair queue spinlock in a real para-virtualized
   execution environment.
 - Add para-virtualization support to the qspinlock code by ensuring
   that the lock holder and queue head stay alive as much as possible.

v3-v4:
 - Remove debugging code and fix a configuration error
 - Simplify the qspinlock structure and streamline the code to make it
   perform a bit better
 - Add an x86 version of asm/qspinlock.h for holding x86 specific
   optimization.
 - Add an optimized x86 code path for 2 contending tasks to improve
   low contention performance.

v2-v3:
 - Simplify the code by using numerous mode only without an unfair option.
 - Use the latest smp_load_acquire()/smp_store_release() barriers.
 - Move the queue spinlock code to kernel/locking.
 - Make the use of queue spinlock the default for x86-64 without user
   configuration.
 - Additional performance tuning.

v1-v2:
 - Add some more comments to document what the code does.
 - Add a numerous CPU mode to support = 16K CPUs
 - Add a configuration option to allow lock stealing which can further
   improve performance in many cases.
 - Enable wakeup of queue head CPU at unlock time for non-numerous
   CPU mode.

This patch set has 3 different sections:
 1) Patches 1-6: Introduces a queue-based 

[Xen-devel] [PATCH v15 13/15] pvqspinlock: Only kick CPU at unlock time

2015-04-06 Thread Waiman Long
Before this patch, a CPU may have been kicked twice before getting
the lock - one before it becomes queue head and once before it gets
the lock. All these CPU kicking and halting (VMEXIT) can be expensive
and slow down system performance, especially in an overcommitted guest.

This patch add a new vCPU state (vcpu_hashed) which enables the code
to delay CPU kicking until at unlock time. Once this state is set,
the new lock holder will set _Q_SLOW_VAL and fill in the hash table
on behalf of the halted queue head vCPU.

Signed-off-by: Waiman Long waiman.l...@hp.com
---
 kernel/locking/qspinlock.c  |   10 ++--
 kernel/locking/qspinlock_paravirt.h |   76 +--
 2 files changed, 59 insertions(+), 27 deletions(-)

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 33b3f54..b9ba83b 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -239,8 +239,8 @@ static __always_inline void set_locked(struct qspinlock 
*lock)
 
 static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
 static __always_inline void __pv_wait_node(struct mcs_spinlock *node) { }
-static __always_inline void __pv_kick_node(struct mcs_spinlock *node) { }
-
+static __always_inline void __pv_scan_next(struct qspinlock *lock,
+  struct mcs_spinlock *node) { }
 static __always_inline void __pv_wait_head(struct qspinlock *lock,
   struct mcs_spinlock *node) { }
 
@@ -248,7 +248,7 @@ static __always_inline void __pv_wait_head(struct qspinlock 
*lock,
 
 #define pv_init_node   __pv_init_node
 #define pv_wait_node   __pv_wait_node
-#define pv_kick_node   __pv_kick_node
+#define pv_scan_next   __pv_scan_next
 
 #define pv_wait_head   __pv_wait_head
 
@@ -441,7 +441,7 @@ queue:
cpu_relax();
 
arch_mcs_spin_unlock_contended(next-locked);
-   pv_kick_node(next);
+   pv_scan_next(lock, next);
 
 release:
/*
@@ -462,7 +462,7 @@ EXPORT_SYMBOL(queue_spin_lock_slowpath);
 
 #undef pv_init_node
 #undef pv_wait_node
-#undef pv_kick_node
+#undef pv_scan_next
 #undef pv_wait_head
 
 #undef  queue_spin_lock_slowpath
diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index 49dbd39..a210061 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -18,9 +18,16 @@
 
 #define _Q_SLOW_VAL(3U  _Q_LOCKED_OFFSET)
 
+/*
+ * The vcpu_hashed is a special state that is set by the new lock holder on
+ * the new queue head to indicate that _Q_SLOW_VAL is set and hash entry
+ * filled. With this state, the queue head CPU will always be kicked even
+ * if it is not halted to avoid potential racing condition.
+ */
 enum vcpu_state {
vcpu_running = 0,
vcpu_halted,
+   vcpu_hashed
 };
 
 struct pv_node {
@@ -97,7 +104,13 @@ static inline u32 hash_align(u32 hash)
return hash  ~(PV_HB_PER_LINE - 1);
 }
 
-static struct qspinlock **pv_hash(struct qspinlock *lock, struct pv_node *node)
+/*
+ * Set up an entry in the lock hash table
+ * This is not inlined to reduce size of generated code as it is included
+ * twice and is used only in the slowest path of handling CPU halting.
+ */
+static noinline struct qspinlock **
+pv_hash(struct qspinlock *lock, struct pv_node *node)
 {
unsigned long init_hash, hash = hash_ptr(lock, pv_lock_hash_bits);
struct pv_hash_bucket *hb, *end;
@@ -178,7 +191,8 @@ static void pv_init_node(struct mcs_spinlock *node)
 
 /*
  * Wait for node-locked to become true, halt the vcpu after a short spin.
- * pv_kick_node() is used to wake the vcpu again.
+ * pv_scan_next() is used to set _Q_SLOW_VAL and fill in hash table on its
+ * behalf.
  */
 static void pv_wait_node(struct mcs_spinlock *node)
 {
@@ -189,7 +203,6 @@ static void pv_wait_node(struct mcs_spinlock *node)
for (loop = SPIN_THRESHOLD; loop; loop--) {
if (READ_ONCE(node-locked))
return;
-
cpu_relax();
}
 
@@ -198,17 +211,21 @@ static void pv_wait_node(struct mcs_spinlock *node)
 *
 * [S] pn-state = vcpu_halted[S] next-locked = 1
 * MB MB
-* [L] pn-locked   [RmW] pn-state = vcpu_running
+* [L] pn-locked   [RmW] pn-state = vcpu_hashed
 *
-* Matches the xchg() from pv_kick_node().
+* Matches the cmpxchg() from pv_scan_next().
 */
(void)xchg(pn-state, vcpu_halted);
 
if (!READ_ONCE(node-locked))
pv_wait(pn-state, vcpu_halted);
 
-   /* Make sure that state is correct for spurious wakeup */
-   WRITE_ONCE(pn-state, vcpu_running);
+   /*
+   

[Xen-devel] [PATCH v15 02/15] qspinlock, x86: Enable x86-64 to use queue spinlock

2015-04-06 Thread Waiman Long
This patch makes the necessary changes at the x86 architecture
specific layer to enable the use of queue spinlock for x86-64. As
x86-32 machines are typically not multi-socket. The benefit of queue
spinlock may not be apparent. So queue spinlock is not enabled.

Currently, there is some incompatibilities between the para-virtualized
spinlock code (which hard-codes the use of ticket spinlock) and the
queue spinlock. Therefore, the use of queue spinlock is disabled when
the para-virtualized spinlock is enabled.

The arch/x86/include/asm/qspinlock.h header file includes some x86
specific optimization which will make the queue spinlock code perform
better than the generic implementation.

Signed-off-by: Waiman Long waiman.l...@hp.com
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
---
 arch/x86/Kconfig  |1 +
 arch/x86/include/asm/qspinlock.h  |   20 
 arch/x86/include/asm/spinlock.h   |5 +
 arch/x86/include/asm/spinlock_types.h |4 
 4 files changed, 30 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/qspinlock.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b7d31ca..49fecb1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -125,6 +125,7 @@ config X86
select MODULES_USE_ELF_RELA if X86_64
select CLONE_BACKWARDS if X86_32
select ARCH_USE_BUILTIN_BSWAP
+   select ARCH_USE_QUEUE_SPINLOCK
select ARCH_USE_QUEUE_RWLOCK
select OLD_SIGSUSPEND3 if X86_32 || IA32_EMULATION
select OLD_SIGACTION if X86_32
diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
new file mode 100644
index 000..222995b
--- /dev/null
+++ b/arch/x86/include/asm/qspinlock.h
@@ -0,0 +1,20 @@
+#ifndef _ASM_X86_QSPINLOCK_H
+#define _ASM_X86_QSPINLOCK_H
+
+#include asm-generic/qspinlock_types.h
+
+#definequeue_spin_unlock queue_spin_unlock
+/**
+ * queue_spin_unlock - release a queue spinlock
+ * @lock : Pointer to queue spinlock structure
+ *
+ * A smp_store_release() on the least-significant byte.
+ */
+static inline void queue_spin_unlock(struct qspinlock *lock)
+{
+   smp_store_release((u8 *)lock, 0);
+}
+
+#include asm-generic/qspinlock.h
+
+#endif /* _ASM_X86_QSPINLOCK_H */
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index cf87de3..a9c01fd 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -42,6 +42,10 @@
 extern struct static_key paravirt_ticketlocks_enabled;
 static __always_inline bool static_key_false(struct static_key *key);
 
+#ifdef CONFIG_QUEUE_SPINLOCK
+#include asm/qspinlock.h
+#else
+
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
 
 static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
@@ -196,6 +200,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t 
*lock)
cpu_relax();
}
 }
+#endif /* CONFIG_QUEUE_SPINLOCK */
 
 /*
  * Read-write spinlocks, allowing multiple readers
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 5f9d757..5d654a1 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -23,6 +23,9 @@ typedef u32 __ticketpair_t;
 
 #define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
 
+#ifdef CONFIG_QUEUE_SPINLOCK
+#include asm-generic/qspinlock_types.h
+#else
 typedef struct arch_spinlock {
union {
__ticketpair_t head_tail;
@@ -33,6 +36,7 @@ typedef struct arch_spinlock {
 } arch_spinlock_t;
 
 #define __ARCH_SPIN_LOCK_UNLOCKED  { { 0 } }
+#endif /* CONFIG_QUEUE_SPINLOCK */
 
 #include asm-generic/qrwlock_types.h
 
-- 
1.7.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v15 06/15] qspinlock: Use a simple write to grab the lock

2015-04-06 Thread Waiman Long
Currently, atomic_cmpxchg() is used to get the lock. However, this
is not really necessary if there is more than one task in the queue
and the queue head don't need to reset the tail code. For that case,
a simple write to set the lock bit is enough as the queue head will
be the only one eligible to get the lock as long as it checks that
both the lock and pending bits are not set. The current pending bit
waiting code will ensure that the bit will not be set as soon as the
tail code in the lock is set.

With that change, the are some slight improvement in the performance
of the queue spinlock in the 5M loop micro-benchmark run on a 4-socket
Westere-EX machine as shown in the tables below.

[Standalone/Embedded - same node]
  # of tasksBefore patchAfter patch %Change
  ----- --  ---
   3 2324/2321  2248/2265-3%/-2%
   4 2890/2896  2819/2831-2%/-2%
   5 3611/3595  3522/3512-2%/-2%
   6 4281/4276  4173/4160-3%/-3%
   7 5018/5001  4875/4861-3%/-3%
   8 5759/5750  5563/5568-3%/-3%

[Standalone/Embedded - different nodes]
  # of tasksBefore patchAfter patch %Change
  ----- --  ---
   312242/12237 12087/12093  -1%/-1%
   410688/10696 10507/10521  -2%/-2%

It was also found that this change produced a much bigger performance
improvement in the newer IvyBridge-EX chip and was essentially to close
the performance gap between the ticket spinlock and queue spinlock.

The disk workload of the AIM7 benchmark was run on a 4-socket
Westmere-EX machine with both ext4 and xfs RAM disks at 3000 users
on a 3.14 based kernel. The results of the test runs were:

AIM7 XFS Disk Test
  kernel JPMReal Time   Sys TimeUsr Time
  -  ----   
  ticketlock56782333.17   96.61   5.81
  qspinlock 57507993.13   94.83   5.97

AIM7 EXT4 Disk Test
  kernel JPMReal Time   Sys TimeUsr Time
  -  ----   
  ticketlock1114551   16.15  509.72   7.11
  qspinlock 21844668.24  232.99   6.01

The ext4 filesystem run had a much higher spinlock contention than
the xfs filesystem run.

The ebizzy -m test was also run with the following results:

  kernel   records/s  Real Time   Sys TimeUsr Time
  --  -   
  ticketlock 2075   10.00  216.35   3.49
  qspinlock  3023   10.00  198.20   4.80

Signed-off-by: Waiman Long waiman.l...@hp.com
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
---
 kernel/locking/qspinlock.c |   66 +--
 1 files changed, 50 insertions(+), 16 deletions(-)

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index bcc99e6..99503ef 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -105,24 +105,37 @@ static inline struct mcs_spinlock *decode_tail(u32 tail)
  * By using the whole 2nd least significant byte for the pending bit, we
  * can allow better optimization of the lock acquisition for the pending
  * bit holder.
+ *
+ * This internal structure is also used by the set_locked function which
+ * is not restricted to _Q_PENDING_BITS == 8.
  */
-#if _Q_PENDING_BITS == 8
-
 struct __qspinlock {
union {
atomic_t val;
-   struct {
 #ifdef __LITTLE_ENDIAN
+   struct {
+   u8  locked;
+   u8  pending;
+   };
+   struct {
u16 locked_pending;
u16 tail;
+   };
 #else
+   struct {
u16 tail;
u16 locked_pending;
-#endif
};
+   struct {
+   u8  reserved[2];
+   u8  pending;
+   u8  locked;
+   };
+#endif
};
 };
 
+#if _Q_PENDING_BITS == 8
 /**
  * clear_pending_set_locked - take ownership and clear the pending bit.
  * @lock: Pointer to queue spinlock structure
@@ -195,6 +208,19 @@ static __always_inline u32 xchg_tail(struct qspinlock 
*lock, u32 tail)
 #endif /* _Q_PENDING_BITS == 8 */
 
 /**
+ * set_locked - Set the lock bit and own the lock
+ * @lock: Pointer to queue spinlock structure
+ *
+ * *,*,0 - *,0,1
+ */
+static __always_inline void set_locked(struct qspinlock *lock)
+{
+   struct __qspinlock *l = (void *)lock;
+
+   WRITE_ONCE(l-locked, _Q_LOCKED_VAL);
+}
+
+/**
  * 

[Xen-devel] [PATCH v15 10/15] pvqspinlock: Implement the paravirt qspinlock for x86

2015-04-06 Thread Waiman Long
From: Peter Zijlstra (Intel) pet...@infradead.org

We use the regular paravirt call patching to switch between:

  native_queue_spin_lock_slowpath() __pv_queue_spin_lock_slowpath()
  native_queue_spin_unlock()__pv_queue_spin_unlock()

We use a callee saved call for the unlock function which reduces the
i-cache footprint and allows 'inlining' of SPIN_UNLOCK functions
again.

We further optimize the unlock path by patching the direct call with a
movb $0,%arg1 if we are indeed using the native unlock code. This
makes the unlock code almost as fast as the !PARAVIRT case.

This significantly lowers the overhead of having
CONFIG_PARAVIRT_SPINLOCKS enabled, even for native code.

Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Signed-off-by: Waiman Long waiman.l...@hp.com
---
 arch/x86/Kconfig  |2 +-
 arch/x86/include/asm/paravirt.h   |   28 +++-
 arch/x86/include/asm/paravirt_types.h |   10 ++
 arch/x86/include/asm/qspinlock.h  |   25 -
 arch/x86/kernel/paravirt-spinlocks.c  |   24 +++-
 arch/x86/kernel/paravirt_patch_32.c   |   22 ++
 arch/x86/kernel/paravirt_patch_64.c   |   22 ++
 7 files changed, 121 insertions(+), 12 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 49fecb1..a0946e7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -661,7 +661,7 @@ config PARAVIRT_DEBUG
 config PARAVIRT_SPINLOCKS
bool Paravirtualization layer for spinlocks
depends on PARAVIRT  SMP
-   select UNINLINE_SPIN_UNLOCK
+   select UNINLINE_SPIN_UNLOCK if !QUEUE_SPINLOCK
---help---
  Paravirtualized spinlocks allow a pvops backend to replace the
  spinlock implementation with something virtualization-friendly
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 965c47d..dd40269 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -712,6 +712,30 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 #if defined(CONFIG_SMP)  defined(CONFIG_PARAVIRT_SPINLOCKS)
 
+#ifdef CONFIG_QUEUE_SPINLOCK
+
+static __always_inline void pv_queue_spin_lock_slowpath(struct qspinlock 
*lock, u32 val)
+{
+   PVOP_VCALL2(pv_lock_ops.queue_spin_lock_slowpath, lock, val);
+}
+
+static __always_inline void pv_queue_spin_unlock(struct qspinlock *lock)
+{
+   PVOP_VCALLEE1(pv_lock_ops.queue_spin_unlock, lock);
+}
+
+static __always_inline void pv_wait(u8 *ptr, u8 val)
+{
+   PVOP_VCALL2(pv_lock_ops.wait, ptr, val);
+}
+
+static __always_inline void pv_kick(int cpu)
+{
+   PVOP_VCALL1(pv_lock_ops.kick, cpu);
+}
+
+#else /* !CONFIG_QUEUE_SPINLOCK */
+
 static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
__ticket_t ticket)
 {
@@ -724,7 +748,9 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
 }
 
-#endif
+#endif /* CONFIG_QUEUE_SPINLOCK */
+
+#endif /* SMP  PARAVIRT_SPINLOCKS */
 
 #ifdef CONFIG_X86_32
 #define PV_SAVE_REGS pushl %ecx; pushl %edx;
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 7549b8b..f6acaea 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -333,9 +333,19 @@ struct arch_spinlock;
 typedef u16 __ticket_t;
 #endif
 
+struct qspinlock;
+
 struct pv_lock_ops {
+#ifdef CONFIG_QUEUE_SPINLOCK
+   void (*queue_spin_lock_slowpath)(struct qspinlock *lock, u32 val);
+   struct paravirt_callee_save queue_spin_unlock;
+
+   void (*wait)(u8 *ptr, u8 val);
+   void (*kick)(int cpu);
+#else /* !CONFIG_QUEUE_SPINLOCK */
struct paravirt_callee_save lock_spinning;
void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
+#endif /* !CONFIG_QUEUE_SPINLOCK */
 };
 
 /* This contains all the paravirt structures: we get a convenient
diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 64c925e..c8290db 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -3,6 +3,7 @@
 
 #include asm/cpufeature.h
 #include asm-generic/qspinlock_types.h
+#include asm/paravirt.h
 
 #definequeue_spin_unlock queue_spin_unlock
 /**
@@ -11,11 +12,33 @@
  *
  * A smp_store_release() on the least-significant byte.
  */
-static inline void queue_spin_unlock(struct qspinlock *lock)
+static inline void native_queue_spin_unlock(struct qspinlock *lock)
 {
smp_store_release((u8 *)lock, 0);
 }
 
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+extern void native_queue_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+extern void __pv_init_lock_hash(void);
+extern void __pv_queue_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+extern void 

[Xen-devel] [PATCH v15 08/15] lfsr: a simple binary Galois linear feedback shift register

2015-04-06 Thread Waiman Long
This patch is based on the code sent out by Peter Zijstra as part
of his queue spinlock patch to provide a hashing function with open
addressing.  The lfsr() function can be used to return a sequence of
numbers that cycle through all the bit patterns (2^n -1) of a given
bit width n except the value 0 in a somewhat random fashion depending
on the LFSR taps that is being used. Callers can provide their own
taps value or use the default.

Signed-off-by: Waiman Long waiman.l...@hp.com
---
 include/linux/lfsr.h |   80 ++
 1 files changed, 80 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/lfsr.h

diff --git a/include/linux/lfsr.h b/include/linux/lfsr.h
new file mode 100644
index 000..f570819
--- /dev/null
+++ b/include/linux/lfsr.h
@@ -0,0 +1,80 @@
+#ifndef _LINUX_LFSR_H
+#define _LINUX_LFSR_H
+
+/*
+ * Simple Binary Galois Linear Feedback Shift Register
+ *
+ * http://en.wikipedia.org/wiki/Linear_feedback_shift_register
+ *
+ * This function only currently supports only bits values of 4-30. Callers
+ * that doesn't pass in a constant bits value can optionally define
+ * LFSR_MIN_BITS and LFSR_MAX_BITS before including the lfsr.h header file
+ * to reduce the size of the jump table in the compiled code, if desired.
+ */
+#ifndef LFSR_MIN_BITS
+#define LFSR_MIN_BITS  4
+#endif
+
+#ifndef LFSR_MAX_BITS
+#define LFSR_MAX_BITS  30
+#endif
+
+static __always_inline u32 lfsr_taps(int bits)
+{
+   BUG_ON((bits  LFSR_MIN_BITS) || (bits  LFSR_MAX_BITS));
+   BUILD_BUG_ON((LFSR_MIN_BITS  4) || (LFSR_MAX_BITS  30));
+
+#define _IF_BITS_EQ(x) \
+   if (((x) = LFSR_MIN_BITS)  ((x) = LFSR_MAX_BITS)  ((x) == bits))
+
+   /*
+* Feedback terms copied from
+* http://users.ece.cmu.edu/~koopman/lfsr/index.html
+*/
+   _IF_BITS_EQ(4)  return 0x0009;
+   _IF_BITS_EQ(5)  return 0x0012;
+   _IF_BITS_EQ(6)  return 0x0021;
+   _IF_BITS_EQ(7)  return 0x0041;
+   _IF_BITS_EQ(8)  return 0x008E;
+   _IF_BITS_EQ(9)  return 0x0108;
+   _IF_BITS_EQ(10) return 0x0204;
+   _IF_BITS_EQ(11) return 0x0402;
+   _IF_BITS_EQ(12) return 0x0829;
+   _IF_BITS_EQ(13) return 0x100D;
+   _IF_BITS_EQ(14) return 0x2015;
+   _IF_BITS_EQ(15) return 0x4122;
+   _IF_BITS_EQ(16) return 0x8112;
+   _IF_BITS_EQ(17) return 0x102C9;
+   _IF_BITS_EQ(18) return 0x20195;
+   _IF_BITS_EQ(19) return 0x403FE;
+   _IF_BITS_EQ(20) return 0x80637;
+   _IF_BITS_EQ(21) return 0x100478;
+   _IF_BITS_EQ(22) return 0x20069E;
+   _IF_BITS_EQ(23) return 0x4004B2;
+   _IF_BITS_EQ(24) return 0x800B87;
+   _IF_BITS_EQ(25) return 0x10004F3;
+   _IF_BITS_EQ(26) return 0x200072D;
+   _IF_BITS_EQ(27) return 0x40006AE;
+   _IF_BITS_EQ(28) return 0x80009E3;
+   _IF_BITS_EQ(29) return 0x1583;
+   _IF_BITS_EQ(30) return 0x2C92;
+#undef _IF_BITS_EQ
+
+   /* Unreachable */
+   return 0;
+}
+
+/*
+ * Please note that LFSR doesn't work with a start state of 0.
+ */
+static inline u32 lfsr(u32 val, int bits, u32 taps)
+{
+   u32 bit = val  1;
+
+   val = 1;
+   if (bit)
+   val ^= taps ? taps : lfsr_taps(bits);
+   return val;
+}
+
+#endif /* _LINUX_LFSR_H */
-- 
1.7.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v15 12/15] pvqspinlock, x86: Enable PV qspinlock for Xen

2015-04-06 Thread Waiman Long
This patch adds the necessary Xen specific code to allow Xen to
support the CPU halting and kicking operations needed by the queue
spinlock PV code.

Signed-off-by: Waiman Long waiman.l...@hp.com
---
 arch/x86/xen/spinlock.c |   63 ---
 kernel/Kconfig.locks|2 +-
 2 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 956374c..728b45b 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -17,6 +17,55 @@
 #include xen-ops.h
 #include debugfs.h
 
+static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
+static DEFINE_PER_CPU(char *, irq_name);
+static bool xen_pvspin = true;
+
+#ifdef CONFIG_QUEUE_SPINLOCK
+
+#include asm/qspinlock.h
+
+static void xen_qlock_kick(int cpu)
+{
+   xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
+}
+
+/*
+ * Halt the current CPU  release it back to the host
+ */
+static void xen_qlock_wait(u8 *byte, u8 val)
+{
+   int irq = __this_cpu_read(lock_kicker_irq);
+
+   /* If kicker interrupts not initialized yet, just spin */
+   if (irq == -1)
+   return;
+
+   /* clear pending */
+   xen_clear_irq_pending(irq);
+
+   /*
+* We check the byte value after clearing pending IRQ to make sure
+* that we won't miss a wakeup event because of the clearing.
+*
+* The sync_clear_bit() call in xen_clear_irq_pending() is atomic.
+* So it is effectively a memory barrier for x86.
+*/
+   if (READ_ONCE(*byte) != val)
+   return;
+
+   /*
+* If an interrupt happens here, it will leave the wakeup irq
+* pending, which will cause xen_poll_irq() to return
+* immediately.
+*/
+
+   /* Block until irq becomes pending (or perhaps a spurious wakeup) */
+   xen_poll_irq(irq);
+}
+
+#else /* CONFIG_QUEUE_SPINLOCK */
+
 enum xen_contention_stat {
TAKEN_SLOW,
TAKEN_SLOW_PICKUP,
@@ -100,12 +149,9 @@ struct xen_lock_waiting {
__ticket_t want;
 };
 
-static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
-static DEFINE_PER_CPU(char *, irq_name);
 static DEFINE_PER_CPU(struct xen_lock_waiting, lock_waiting);
 static cpumask_t waiting_cpus;
 
-static bool xen_pvspin = true;
 __visible void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
 {
int irq = __this_cpu_read(lock_kicker_irq);
@@ -217,6 +263,7 @@ static void xen_unlock_kick(struct arch_spinlock *lock, 
__ticket_t next)
}
}
 }
+#endif /* CONFIG_QUEUE_SPINLOCK */
 
 static irqreturn_t dummy_handler(int irq, void *dev_id)
 {
@@ -280,8 +327,16 @@ void __init xen_init_spinlocks(void)
return;
}
printk(KERN_DEBUG xen: PV spinlocks enabled\n);
+#ifdef CONFIG_QUEUE_SPINLOCK
+   __pv_init_lock_hash();
+   pv_lock_ops.queue_spin_lock_slowpath = __pv_queue_spin_lock_slowpath;
+   pv_lock_ops.queue_spin_unlock = PV_CALLEE_SAVE(__pv_queue_spin_unlock);
+   pv_lock_ops.wait = xen_qlock_wait;
+   pv_lock_ops.kick = xen_qlock_kick;
+#else
pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
pv_lock_ops.unlock_kick = xen_unlock_kick;
+#endif
 }
 
 /*
@@ -310,7 +365,7 @@ static __init int xen_parse_nopvspin(char *arg)
 }
 early_param(xen_nopvspin, xen_parse_nopvspin);
 
-#ifdef CONFIG_XEN_DEBUG_FS
+#if defined(CONFIG_XEN_DEBUG_FS)  !defined(CONFIG_QUEUE_SPINLOCK)
 
 static struct dentry *d_spin_debug;
 
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index 537b13e..0b42933 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -240,7 +240,7 @@ config ARCH_USE_QUEUE_SPINLOCK
 
 config QUEUE_SPINLOCK
def_bool y if ARCH_USE_QUEUE_SPINLOCK
-   depends on SMP  (!PARAVIRT_SPINLOCKS || !XEN)
+   depends on SMP
 
 config ARCH_USE_QUEUE_RWLOCK
bool
-- 
1.7.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015-04-06 Thread Waiman Long
Provide a separate (second) version of the spin_lock_slowpath for
paravirt along with a special unlock path.

The second slowpath is generated by adding a few pv hooks to the
normal slowpath, but where those will compile away for the native
case, they expand into special wait/wake code for the pv version.

The actual MCS queue can use extra storage in the mcs_nodes[] array to
keep track of state and therefore uses directed wakeups.

The head contender has no such storage directly visible to the
unlocker.  So the unlocker searches a hash table with open addressing
using a simple binary Galois linear feedback shift register.

Signed-off-by: Waiman Long waiman.l...@hp.com
---
 kernel/locking/qspinlock.c  |   69 -
 kernel/locking/qspinlock_paravirt.h |  321 +++
 2 files changed, 389 insertions(+), 1 deletions(-)
 create mode 100644 kernel/locking/qspinlock_paravirt.h

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index fc2e5ab..33b3f54 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -18,6 +18,9 @@
  * Authors: Waiman Long waiman.l...@hp.com
  *  Peter Zijlstra pet...@infradead.org
  */
+
+#ifndef _GEN_PV_LOCK_SLOWPATH
+
 #include linux/smp.h
 #include linux/bug.h
 #include linux/cpumask.h
@@ -65,13 +68,21 @@
 
 #include mcs_spinlock.h
 
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define MAX_NODES  8
+#else
+#define MAX_NODES  4
+#endif
+
 /*
  * Per-CPU queue node structures; we can never have more than 4 nested
  * contexts: task, softirq, hardirq, nmi.
  *
  * Exactly fits one 64-byte cacheline on a 64-bit architecture.
+ *
+ * PV doubles the storage and uses the second cacheline for PV state.
  */
-static DEFINE_PER_CPU_ALIGNED(struct mcs_spinlock, mcs_nodes[4]);
+static DEFINE_PER_CPU_ALIGNED(struct mcs_spinlock, mcs_nodes[MAX_NODES]);
 
 /*
  * We must be able to distinguish between no-tail and the tail at 0:0,
@@ -220,6 +231,33 @@ static __always_inline void set_locked(struct qspinlock 
*lock)
WRITE_ONCE(l-locked, _Q_LOCKED_VAL);
 }
 
+
+/*
+ * Generate the native code for queue_spin_unlock_slowpath(); provide NOPs for
+ * all the PV callbacks.
+ */
+
+static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
+static __always_inline void __pv_wait_node(struct mcs_spinlock *node) { }
+static __always_inline void __pv_kick_node(struct mcs_spinlock *node) { }
+
+static __always_inline void __pv_wait_head(struct qspinlock *lock,
+  struct mcs_spinlock *node) { }
+
+#define pv_enabled()   false
+
+#define pv_init_node   __pv_init_node
+#define pv_wait_node   __pv_wait_node
+#define pv_kick_node   __pv_kick_node
+
+#define pv_wait_head   __pv_wait_head
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define queue_spin_lock_slowpath   native_queue_spin_lock_slowpath
+#endif
+
+#endif /* _GEN_PV_LOCK_SLOWPATH */
+
 /**
  * queue_spin_lock_slowpath - acquire the queue spinlock
  * @lock: Pointer to queue spinlock structure
@@ -249,6 +287,9 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 
val)
 
BUILD_BUG_ON(CONFIG_NR_CPUS = (1U  _Q_TAIL_CPU_BITS));
 
+   if (pv_enabled())
+   goto queue;
+
if (virt_queue_spin_lock(lock))
return;
 
@@ -325,6 +366,7 @@ queue:
node += idx;
node-locked = 0;
node-next = NULL;
+   pv_init_node(node);
 
/*
 * We touched a (possibly) cold cacheline in the per-cpu queue node;
@@ -350,6 +392,7 @@ queue:
prev = decode_tail(old);
WRITE_ONCE(prev-next, node);
 
+   pv_wait_node(node);
arch_mcs_spin_lock_contended(node-locked);
}
 
@@ -365,6 +408,7 @@ queue:
 * does not imply a full barrier.
 *
 */
+   pv_wait_head(lock, node);
while ((val = smp_load_acquire(lock-val.counter))  
_Q_LOCKED_PENDING_MASK)
cpu_relax();
 
@@ -397,6 +441,7 @@ queue:
cpu_relax();
 
arch_mcs_spin_unlock_contended(next-locked);
+   pv_kick_node(next);
 
 release:
/*
@@ -405,3 +450,25 @@ release:
this_cpu_dec(mcs_nodes[0].count);
 }
 EXPORT_SYMBOL(queue_spin_lock_slowpath);
+
+/*
+ * Generate the paravirt code for queue_spin_unlock_slowpath().
+ */
+#if !defined(_GEN_PV_LOCK_SLOWPATH)  defined(CONFIG_PARAVIRT_SPINLOCKS)
+#define _GEN_PV_LOCK_SLOWPATH
+
+#undef  pv_enabled
+#define pv_enabled()   true
+
+#undef pv_init_node
+#undef pv_wait_node
+#undef pv_kick_node
+#undef pv_wait_head
+
+#undef  queue_spin_lock_slowpath
+#define queue_spin_lock_slowpath   __pv_queue_spin_lock_slowpath
+
+#include qspinlock_paravirt.h
+#include qspinlock.c
+
+#endif
diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
new file mode 100644
index 000..49dbd39
--- /dev/null
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -0,0 +1,321 @@

[Xen-devel] [PATCH v15 15/15] pvqspinlock: Add debug code to check for PV lock hash sanity

2015-04-06 Thread Waiman Long
The current code for PV lock hash table processing will panic the
system if pv_hash_find() can't find the desired hash bucket. However,
there is no check to see if there is more than one entry for a given
lock which should never happen.

This patch adds a pv_hash_check_duplicate() function to do that which
will only be enabled if CONFIG_DEBUG_SPINLOCK is defined because of
the performance overhead it introduces.

Signed-off-by: Waiman Long waiman.l...@hp.com
---
 kernel/locking/qspinlock_paravirt.h |   58 +++
 1 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index a9fe10d..4d39c8b 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -107,6 +107,63 @@ static inline u32 hash_align(u32 hash)
 }
 
 /*
+ * Hash table debugging code
+ */
+#ifdef CONFIG_DEBUG_SPINLOCK
+
+#define _NODE_IDX(pn)  unsigned long)pn)  (SMP_CACHE_BYTES - 1)) /\
+   sizeof(struct mcs_spinlock))
+/*
+ * Check if there is additional hash buckets with the same lock which
+ * should not happen.
+ */
+static inline void pv_hash_check_duplicate(struct qspinlock *lock)
+{
+   struct pv_hash_bucket *hb, *end, *hb1 = NULL;
+   int count = 0, used = 0;
+
+   end = pv_lock_hash[1  pv_lock_hash_bits];
+   for (hb = pv_lock_hash; hb  end; hb++) {
+   struct qspinlock *l = READ_ONCE(hb-lock);
+   struct pv_node *pn;
+
+   if (l)
+   used++;
+   if (l != lock)
+   continue;
+   if (++count == 1) {
+   hb1 = hb;
+   continue;
+   }
+   WARN_ON(count == 2);
+   if (hb1) {
+   pn = READ_ONCE(hb1-node);
+   printk(KERN_ERR PV lock hash error: duplicated entry 
+  #%d - hash %ld, node %ld, cpu %d\n, 1,
+  hb1 - pv_lock_hash, _NODE_IDX(pn),
+  pn ? pn-cpu : -1);
+   hb1 = NULL;
+   }
+   pn = READ_ONCE(hb-node);
+   printk(KERN_ERR PV lock hash error: duplicated entry #%d - 
+  hash %ld, node %ld, cpu %d\n, count, hb - pv_lock_hash,
+  _NODE_IDX(pn), pn ? pn-cpu : -1);
+   }
+   /*
+* Warn if more than half of the buckets are used
+*/
+   if (used  (1  (pv_lock_hash_bits - 1)))
+   printk(KERN_WARNING PV lock hash warning: 
+  %d hash entries used!\n, used);
+}
+
+#else /* CONFIG_DEBUG_SPINLOCK */
+
+static inline void pv_hash_check_duplicate(struct qspinlock *lock) {}
+
+#endif /* CONFIG_DEBUG_SPINLOCK */
+
+/*
  * Set up an entry in the lock hash table
  * This is not inlined to reduce size of generated code as it is included
  * twice and is used only in the slowest path of handling CPU halting.
@@ -141,6 +198,7 @@ pv_hash(struct qspinlock *lock, struct pv_node *node)
}
 
 done:
+   pv_hash_check_duplicate(lock);
return hb-lock;
 }
 
-- 
1.7.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-4.4-testing test] 50333: trouble: blocked/broken/fail/pass

2015-04-06 Thread osstest service user
flight 50333 xen-4.4-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/50333/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf   3 host-install(3) broken REGR. vs. 50266

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-freebsd10-i386 14 guest-localmigrate/x10   fail like 50266
 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 36776

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-sedf-pin  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 build-amd64-rumpuserxen   6 xen-buildfail   never pass
 build-i386-rumpuserxen6 xen-buildfail   never pass
 test-amd64-i386-xend-winxpsp3 17 leak-check/check fail  never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xend-qemut-winxpsp3 17 leak-check/checkfail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass

version targeted for testing:
 xen  6b09a29ced2e7fc449a39f513e1d8c2b10d2af6d
baseline version:
 xen  fc6fe18f1511d4b393057c60a2e6b05ccd963e90


People who touched revisions under test:
  Andrew Cooper andrew.coop...@citrix.com
  Ian Campbell ian.campb...@citrix.com
  Ian Jackson ian.jack...@eu.citrix.com
  Jan Beulich jbeul...@suse.com
  Konrad Rzeszutek Wilk konrad.w...@oracle.com


jobs:
 build-amd64-xend pass
 build-i386-xend  pass
 build-amd64  pass
 build-armhf  broken  
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  blocked 
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  fail
 build-i386-rumpuserxen   fail
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  blocked 
 test-amd64-i386-xl   pass
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 

[Xen-devel] [PATCH] xen: arm: X-Gene Storm check GIC DIST address for EOI quirk

2015-04-06 Thread Pranavkumar Sawargaonkar
In old X-Gene Storm firmware and DT, secure mode addresses have been
mentioned in GICv2 node. In this case maintenance interrupt is used
instead of EOI HW method.

This patch checks the GIC Distributor Base Address to enable EOI quirk
for old firmware.

Ref:
http://lists.xen.org/archives/html/xen-devel/2014-07/msg01263.html

Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org
---
 xen/arch/arm/platforms/xgene-storm.c |   37 +-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/platforms/xgene-storm.c 
b/xen/arch/arm/platforms/xgene-storm.c
index eee650e..dd7cbfc 100644
--- a/xen/arch/arm/platforms/xgene-storm.c
+++ b/xen/arch/arm/platforms/xgene-storm.c
@@ -22,6 +22,7 @@
 #include asm/platform.h
 #include xen/stdbool.h
 #include xen/vmap.h
+#include xen/device_tree.h
 #include asm/io.h
 #include asm/gic.h
 
@@ -35,9 +36,41 @@ static u64 reset_addr, reset_size;
 static u32 reset_mask;
 static bool reset_vals_valid = false;
 
+#define XGENE_SEC_GICV2_DIST_ADDR0x7801
+static u32 quirk_guest_pirq_need_eoi;
+
+static void xgene_check_pirq_eoi(void)
+{
+struct dt_device_node *node;
+int res;
+paddr_t dbase;
+
+dt_for_each_device_node( dt_host, node )
+{
+if ( !dt_get_property(node, interrupt-controller, NULL) )
+continue;
+
+res = dt_device_get_address(node, 0, dbase, NULL);
+if ( !dbase )
+panic(%s: Cannot find a valid address for the 
+distributor, __func__);
+
+/* 
+ * In old X-Gene Storm firmware and DT, secure mode addresses have
+ * been mentioned in GICv2 node. We have to use maintenance interrupt
+ * instead of EOI HW in this case. We check the GIC Distributor Base
+ * Address to maintain compatibility with older firmware.
+ */
+ if (dbase == XGENE_SEC_GICV2_DIST_ADDR)
+ quirk_guest_pirq_need_eoi = PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI;
+ else
+ quirk_guest_pirq_need_eoi = 0;
+}
+}
+
 static uint32_t xgene_storm_quirks(void)
 {
-return PLATFORM_QUIRK_GIC_64K_STRIDE|PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI;
+return PLATFORM_QUIRK_GIC_64K_STRIDE| quirk_guest_pirq_need_eoi;
 }
 
 static int map_one_mmio(struct domain *d, const char *what,
@@ -216,6 +249,8 @@ static int xgene_storm_init(void)
 reset_mask = XGENE_RESET_MASK;
 
 reset_vals_valid = true;
+xgene_check_pirq_eoi();
+
 return 0;
 }
 
-- 
1.7.9.5


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 09/19] xen: arm: Annotate registers trapped by HSR_EL1.TIDCP

2015-04-06 Thread Julien Grall

Hi Ian,

Subject: s/HSR/HCR/

On 31/03/2015 12:07, Ian Campbell wrote:

This traps variety of implementation defined registers, so add a note
to the default case of the respective handler.

Signed-off-by: Ian Campbell ian.campb...@citrix.com


Other than the typo in the subject:

Reviewed-by: Julien Grall julien.gr...@citrix.com

Regards,


---
  xen/arch/arm/traps.c |   16 
  1 file changed, 16 insertions(+)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index ca43f79..e26e673 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1698,6 +1698,14 @@ static void do_cp15_32(struct cpu_user_regs *regs,
   */
  return handle_raz_wi(regs, r, cp32.read, hsr, 1);

+/*
+ * HCR_EL2.TIDCP
+ *
+ * ARMv7 (DDI 0406C.b): B1.14.3
+ * ARMv8 (DDI 0487A.d): D1-1501 Table D1-43
+ *
+ * And all other unknown registers.
+ */
  default:
  gdprintk(XENLOG_ERR,
   %s p15, %d, r%d, cr%d, cr%d, %d @ 0x%PRIregister\n,
@@ -1948,6 +1956,14 @@ static void do_sysreg(struct cpu_user_regs *regs,
  dprintk(XENLOG_WARNING,
  Emulation of sysreg ICC_SGI0R_EL1/ASGI1R_EL1 not 
supported\n);
  return inject_undef64_exception(regs, hsr.len);
+
+/*
+ * HCR_EL2.TIDCP
+ *
+ * ARMv8 (DDI 0487A.d): D1-1501 Table D1-43
+ *
+ * And all other unknown registers.
+ */
  default:
  {
  const struct hsr_sysreg sysreg = hsr.sysreg;



--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 08/19] xen: arm: implement handling of ACTLR_EL1 trap

2015-04-06 Thread Julien Grall

Hi Ian,

On 31/03/2015 12:07, Ian Campbell wrote:

While annotating ACTLR I noticed that we don't appear to handle the
64-bit version of this trap. Do so and annotate everything.


While Linux doesn't use ACTLR_EL1 on aarch64, another OS may use it.

I'm not sure if we should consider it as a possible security issue as at 
least the Cortex A53 implements the register RES0.



Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
  xen/arch/arm/traps.c  |   20 
  xen/include/asm-arm/sysregs.h |1 +
  2 files changed, 21 insertions(+)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 70e1b4d..ca43f79 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1647,6 +1647,13 @@ static void do_cp15_32(struct cpu_user_regs *regs,
  if ( !vtimer_emulate(regs, hsr) )
  return inject_undef_exception(regs, hsr);
  break;
+
+/*
+ * HSR_EL2.TASC / HSR.TAC


I don't find any TASC in the ARMv8 doc. Did you intend to say TACR?

Also it's not HSR but HCR.


+ *
+ * ARMv7 (DDI 0406C.b): B1.14.6
+ * ARMv8 (DDI 0487A.d): G6.2.1
+ */
  case HSR_CPREG32(ACTLR):
  if ( psr_mode_is_user(regs) )
  return inject_undef_exception(regs, hsr);
@@ -1849,9 +1856,22 @@ static void do_sysreg(struct cpu_user_regs *regs,
const union hsr hsr)
  {
  register_t *x = select_user_reg(regs, hsr.sysreg.reg);
+struct vcpu *v = current;

  switch ( hsr.bits  HSR_SYSREG_REGS_MASK )
  {
+/*
+ * HSR_EL2.TASC


Same question here for TASC.

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 1/3] xen/arm: smmu: Rename arm_smmu_xen_device with, device_iommu_info

2015-04-06 Thread Manish Jaggi


On Friday 27 March 2015 11:30 PM, Jaggi, Manish wrote:


From: Julien Grall julien.gr...@linaro.org
Sent: Friday, March 27, 2015 7:05 PM
To: Jaggi, Manish; Xen Devel; Stefano Stabellini; Ian Campbell; 
prasun.kap...@cavium.com; Kumar, Vijaya
Subject: Re: [PATCH v1 1/3] xen/arm: smmu: Rename arm_smmu_xen_device with, 
device_iommu_info

On 27/03/15 13:21, Jaggi, Manish wrote:


Regards,
Manish Jaggi

Could you please try to configure you email client correctly? It's
rather confusing the regards, Manish Jaggi at the beginning of the mail.

[manish] Fixed. Thanks for pointing out



From: Julien Grall julien.gr...@linaro.org
Sent: Friday, March 27, 2015 6:29 PM
To: Jaggi, Manish; Xen Devel; Stefano Stabellini; Ian Campbell; 
prasun.kap...@cavium.com; Kumar, Vijaya
Subject: Re: [PATCH v1 1/3] xen/arm: smmu: Rename arm_smmu_xen_device with, 
device_iommu_info

Hi Manish,

On 27/03/15 07:20, Manish Jaggi wrote:

arm_smmu_xen_device is not an intuitive name for a datastructure which
represents
device-archdata.iommu. Rename arm_smmu_xen_device with device_iommu_info

device_iommu_info is not more intuitive... At least arm_smmu_xen_device
shows that it's a specific Xen structure and not coming from the Linux
drivers.

[manish] But that is not a valid reason for a non intuitive naming. It is 
really hard to keep us readability of the code with arm_smmu_xen_device. It is 
not clear that it is referring to a device attached to smmu or smmu itself. 
There is another data structure arm_smmu_device as well.

Did you read the comment explaining the structure arm_smmu_xen_device?
It's just above the definition.

arm_smmu is the prefix for any structure within this file.
xen means it's a structure added for Xen.
device means it's data stored for a device.


Please choose another name I can take it but arm_smmu_xen_device is really 
confusing

I won't choose a name myself for a name that I think valid...

If you really want to change the name, you have to put at least
arm_smmu_xen_ in the name.

[manish] what about device_archdata_priv, this is denoting what it is.


Regards,
As per Ians mail in other thread, 
%s/arm_smmu_xen_device/arch_smm_xen_device/g is ok with you ?

Regards,

--
Julien Grall



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 13/19] xen: arm: Annotate registers trapped by MDCR_EL2.TDRA

2015-04-06 Thread Julien Grall

Hi Ian,

On 31/03/2015 12:07, Ian Campbell wrote:

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
  xen/arch/arm/traps.c  |   32 
  xen/include/asm-arm/cpregs.h  |4 
  xen/include/asm-arm/sysregs.h |1 +
  3 files changed, 37 insertions(+)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 21bef01..7c37cec 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1790,6 +1790,17 @@ static void do_cp14_32(struct cpu_user_regs *regs, const 
union hsr hsr)

  switch ( hsr.bits  HSR_CP32_REGS_MASK )
  {
+/*
+ * MDCR_EL2.TDRA
+ *
+ * ARMv7 (DDI 0406C.b): B1.14.15
+ * ARMv8 (DDI 0487A.d): D1-1508 Table D1-57
+ *
+ * Unhandled:
+ *DBGDRAR
+ *DBGDSAR
+ */
+


Why did you put the comment here? For AArch32, only DBGDRAR and DBGSAR 
are trapped with this bit.


I think this should be moved above the label default.


  case HSR_CPREG32(DBGDIDR):
  /*
   * Read-only register. Accessible by EL0 if DBGDSCRext.UDCCdis
@@ -1840,6 +1851,8 @@ static void do_cp14_32(struct cpu_user_regs *regs, const 
union hsr hsr)
   *
   * ARMv7 (DDI 0406C.b): B1.14.16
   * ARMv8 (DDI 0487A.d): D1-1507 Table D1-54
+ *
+ * And all other unknown registers.
   */
  default:
  gdprintk(XENLOG_ERR,
@@ -1870,6 +1883,17 @@ static void do_cp14_64(struct cpu_user_regs *regs, const 
union hsr hsr)
   *
   * ARMv7 (DDI 0406C.b): B1.14.16
   * ARMv8 (DDI 0487A.d): D1-1507 Table D1-54
+ *
+ * MDCR_EL2.TDRA
+ *
+ * ARMv7 (DDI 0406C.b): B1.14.15
+ * ARMv8 (DDI 0487A.d): D1-1508 Table D1-57
+ *
+ * Unhandled:
+ *DBGDRAR64
+ *DBGDSAR64


This is confusing. The real name of the register is DBGDRAR. I would say 
DBGDRAR 64-bit.


Furthermore, this is the only registers not handled on AArch32 for this 
bit. This is rather strange to list them while you didn't do it for the 
trace registers.



+ *
+ * And all other unknown registers.


For consistency, I would have add this part of the comment in patch #10 
(where the comment has been added).


Anyway, the patch is already written so I'm fine with it.

   */
  gdprintk(XENLOG_ERR,
   %s p14, %d, r%d, r%d, cr%d @ 0x%PRIregister\n,
@@ -1936,6 +1960,14 @@ static void do_sysreg(struct cpu_user_regs *regs,
 *x = v-arch.actlr;
  break;

+/*
+ * MDCR_EL2.TDRA
+ *
+ * ARMv8 (DDI 0487A.d): D1-1508 Table D1-57
+ */
+case HSR_SYSREG_MDRAR_EL1:
+return handle_ro_raz(regs, x, hsr.sysreg.read, hsr, 1);


This change should be in a separate patch or mention in the commit message.

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 3/7] libxl/libxl_domain_info: Log if domain not found.

2015-04-06 Thread Konrad Rzeszutek Wilk
On Fri, Apr 03, 2015 at 11:12:15PM +0100, Ian Murray wrote:
 On 03/04/15 21:02, Konrad Rzeszutek Wilk wrote:
  If we cannot find the domain - log an error (and still
  continue returning an error).
 Forgive me if I am misunderstanding the effect of this patch (I tried to
 find the original rationale but failed). If the effect is that commands
 such as xl domid will cause a log entry when the specified domain
 doesn't exist, I would suggest that's going to be a problem for people

It would.
 that use that or similar commands to tell if a domain is present or
 still alive. I use it as part of a back-up script to make sure a domain
 shutdown before the script continues. I suspect many other people will
 be doing something similar.

But won't 'xl domid' give you an return 0 if it exists and 1 if it does not?

Ah it does this (if it can't find the domain):

6195 fprintf(stderr, Can't get domid of domain name '%s', maybe this 
domain does not exist.\n, domname);
6196 return 1;  
 


If you are using 'xl list domid' it also prints:

4739 if (rc == ERROR_DOMAIN_NOTFOUND) { 
 
4740 fprintf(stderr, Error: Domain \'%s\' does not exist.\n,  
 
4741 argv[optind]); 
 
4742 return -rc;  

(Previously it would also print this).

Either way the data is already presented to the user. With this
patch it is presented twice - which is repetitive.


Ian C, thoughts? Just ditch this patch? (The patchset can go in without
this one).

 
 Apologies if I have the wrong end of the stick!

There is never an wrong end!
 
 Thanks,
 
 Ian.
 
 
 
  Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
  Acked-by: Ian Campbell ian.campb...@citrix.com
  ---
   tools/libxl/libxl.c | 6 --
   1 file changed, 4 insertions(+), 2 deletions(-)
 
  diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
  index c0e9cfe..8753e27 100644
  --- a/tools/libxl/libxl.c
  +++ b/tools/libxl/libxl.c
  @@ -698,8 +698,10 @@ int libxl_domain_info(libxl_ctx *ctx, libxl_dominfo 
  *info_r,
   LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, getting domain info 
  list);
   return ERROR_FAIL;
   }
  -if (ret==0 || xcinfo.domain != domid) return ERROR_DOMAIN_NOTFOUND;
  -
  +if (ret==0 || xcinfo.domain != domid) {
  +LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, Domain %d not found!, 
  domid);
  +return ERROR_DOMAIN_NOTFOUND;
  +}
   if (info_r)
   xcinfo2xlinfo(ctx, xcinfo, info_r);
   return 0;
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 3/7] xen: psr: reserve an RMID for each core

2015-04-06 Thread Konrad Rzeszutek Wilk
On Sat, Apr 04, 2015 at 04:14:41AM +0200, Dario Faggioli wrote:
 This allows for a new item to be passed as part of the psr=
 boot option: percpu_cmt. If that is specified, Xen tries,
 at boot time, to associate an RMID to each core.
 
 XXX This all looks rather straightforward, if it weren't
 for the fact that it is, apparently, more common than
 I though to run out of RMID. For example, on a dev box
 we have in Cambridge, there are 144 pCPUs and only 71
 RMIDs.
 
 In this preliminary version, nothing particularly smart
 happens if we run out of RMIDs, we just fail attaching
 the remaining cores and that's it. In future, I'd
 probably like to:
  + check whether the operation have any chance to
succeed up front (by comparing number of pCPUs with
available RMIDs)
  + on unexpected failure, rollback everything... it
seems to make more sense to me than just leaving
the system half configured for per-cpu CMT
 
 Thoughts?
 
 XXX Another idea I just have is to allow the user to
 somehow specify a different 'granularity'. Something
 like allowing 'percpu_cmt'|'percore_cmt'|'persocket_cmt'
 with the following meaning:
  + 'percpu_cmt': as in this patch
  + 'percore_cmt': same RMID to hthreads of the same core
  + 'persocket_cmt': same RMID to all cores of the same
 socket.
 
 'percore_cmt' would only allow gathering info on a
 per-core basis... still better than nothing if we
 do not have enough RMIDs for each pCPUs.

Could we allocate nr_online_cpus() / nr_pmids() and have
some CPUs share the same PMIDs?

 
 'persocket_cmt' would basically only allow to track the
 amount of free L3 on each socket (by subtracting the
 monitored value from the total). Again, still better
 than nothing, would use very few RMIDs, and I could
 think of ways of using this information in a few
 places in the scheduler...
 
 Again, thought?
 
 XXX Finally, when a domain with its own RMID executes on
 a core that also has its own RMID, domain monitoring
 just overrides per-CPU monitoring. That means the
 cache occupancy reported fo that pCPU is not accurate.
 
 For reasons why this situation is difficult to deal
 with properly, see the document in the cover letter.
 
 Ideas on how to deal with this, either about how to
 make it work or how to handle the thing from a
 'policying' perspective (i.e., which one mechanism
 should be disabled or penalized?), are very welcome
 
 Signed-off-by: Dario Faggioli dario.faggi...@citrix.com
 ---
  xen/arch/x86/psr.c|   72 
 -
  xen/include/asm-x86/psr.h |   11 ++-
  2 files changed, 67 insertions(+), 16 deletions(-)
 
 diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
 index 0f2a6ce..a71391c 100644
 --- a/xen/arch/x86/psr.c
 +++ b/xen/arch/x86/psr.c
 @@ -26,10 +26,13 @@ struct psr_assoc {
  
  struct psr_cmt *__read_mostly psr_cmt;
  static bool_t __initdata opt_psr;
 +static bool_t __initdata opt_cpu_cmt;
  static unsigned int __initdata opt_rmid_max = 255;
  static uint64_t rmid_mask;
  static DEFINE_PER_CPU(struct psr_assoc, psr_assoc);
  
 +DEFINE_PER_CPU(unsigned int, pcpu_rmid);
 +
  static void __init parse_psr_param(char *s)
  {
  char *ss, *val_str;
 @@ -57,6 +60,8 @@ static void __init parse_psr_param(char *s)
  val_str);
  }
  }
 +else if ( !strcmp(s, percpu_cmt) )
 +opt_cpu_cmt = 1;
  else if ( val_str  !strcmp(s, rmid_max) )
  opt_rmid_max = simple_strtoul(val_str, NULL, 0);
  
 @@ -94,8 +99,8 @@ static void __init init_psr_cmt(unsigned int rmid_max)
  }
  
  psr_cmt-rmid_max = min(psr_cmt-rmid_max, psr_cmt-l3.rmid_max);
 -psr_cmt-rmid_to_dom = xmalloc_array(domid_t, psr_cmt-rmid_max + 1UL);
 -if ( !psr_cmt-rmid_to_dom )
 +psr_cmt-rmids = xmalloc_array(domid_t, psr_cmt-rmid_max + 1UL);
 +if ( !psr_cmt-rmids )
  {
  xfree(psr_cmt);
  psr_cmt = NULL;
 @@ -107,56 +112,86 @@ static void __init init_psr_cmt(unsigned int rmid_max)
   * with it. To reduce the waste of RMID, reserve RMID 0 for all CPUs that
   * have no domain being monitored.
   */
 -psr_cmt-rmid_to_dom[0] = DOMID_XEN;
 +psr_cmt-rmids[0] = DOMID_XEN;
  for ( rmid = 1; rmid = psr_cmt-rmid_max; rmid++ )
 -psr_cmt-rmid_to_dom[rmid] = DOMID_INVALID;
 +psr_cmt-rmids[rmid] = DOMID_INVALID;
  
  printk(XENLOG_INFO Cache Monitoring Technology enabled, RMIDs: %u\n,
 psr_cmt-rmid_max);
  }
  
 -/* Called with domain lock held, no psr specific lock needed */
 -int psr_alloc_rmid(struct domain *d)
 +static int _psr_alloc_rmid(unsigned int *trmid, unsigned int id)
  {
  unsigned int rmid;
  
  ASSERT(psr_cmt_enabled());
  
 -if ( d-arch.psr_rmid  0 )
 +if ( *trmid  0 )
  return 

[Xen-devel] [linux-linus test] 50329: tolerable FAIL - PUSHED

2015-04-06 Thread osstest service user
flight 50329 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/50329/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-freebsd10-i386  7 freebsd-install  fail like 50276
 test-amd64-i386-freebsd10-amd64  7 freebsd-install fail like 50276

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 7 debian-hvm-install fail never 
pass
 test-amd64-amd64-xl-xsm   9 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm   9 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel  9 guest-start  fail  never pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 7 debian-hvm-install fail never 
pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 7 debian-hvm-install fail never 
pass
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 7 debian-hvm-install fail never 
pass
 test-amd64-amd64-xl-pvh-amd   9 guest-start  fail   never pass
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-i386-xl-xsm9 guest-start  fail   never pass
 test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm  9 guest-start  fail   never pass
 test-armhf-armhf-xl-xsm   5 xen-boot fail   never pass
 test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail  never pass
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 10 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-xsm  5 xen-boot fail   never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-armhf-armhf-xl-credit2  10 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-armhf-armhf-xl-arndale  10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass

version targeted for testing:
 linux1cced5015b171415169d938fb179c44fe060dc15
baseline version:
 linux6c310bc1acdd02110182a2ec6efa3e7571a3b80c


People who touched revisions under test:
  Ahmed S. Darwish ahmed.darw...@valeo.com
  Alex Deucher alexander.deuc...@amd.com
  Alex Williamson alex.william...@redhat.com
  Alexey Bogoslavsky ale...@swortex.com
  Alexey Kodanev alexey.koda...@oracle.com
  Andi Kleen a...@linux.intel.com
  Andre Przywara andre.przyw...@arm.com
  Andreas Werner ker...@andy89.org
  Andri Yngvason andri.yngva...@marel.com
  Andy Gospodarek go...@cumulusnetworks.com
  Andy Lutomirski l...@kernel.org
  Anton Nayshtut an...@swortex.com
  Ard Biesheuvel ard.biesheu...@linaro.org
  Arend van Spriel ar...@broadcom.com
  Ariel Elior ariel.el...@qlogic.com
  Axel Lin axel@ingics.com
  Baptiste Reynal b.rey...@virtualopensystems.com
  Ben Hutchings ben.hutchi...@codethink.co.uk
  Benjamin Herrenschmidt b...@kernel.crashing.org
  Bjørn Mork bj...@mork.no
  Borislav Petkov b...@suse.de
  Charlie Mooney charliemoo...@chromium.org
  Chris Wilson ch...@chris-wilson.co.uk
  Christian Hesse m...@eworm.de
  Christian König christian.koe...@amd.com
  Christoph Hellwig h...@lst.de
  Cliff Clark cliff_cl...@selinc.com
  Colin Ian King colin.k...@canonical.com
  Cong Wang xiyou.wangc...@gmail.com
  D.S. Ljungmark ljungm...@modio.se
  Daniel Stone dani...@collabora.com
  Daniel Vetter daniel.vet...@ffwll.ch
  Daniel Vetter daniel.vet...@intel.com
  Dave Airlie airl...@redhat.com
  David Disseldorp 

Re: [Xen-devel] [PATCH 17/19] xen: arm: Remove CNTPCT_EL0 trap handling.

2015-04-06 Thread Julien Grall

Hi Ian,

On 31/03/2015 12:07, Ian Campbell wrote:

We set CNTHCTL_EL2.EL1PCTEN and therefore according to ARMv8 (DDI
0487A.d) D1-1510 Table D1-60 we are not trapping this.

Signed-off-by: Ian Campbell ian.campb...@citrix.com

Reviewed-by: Julien Grall julien.gr...@citrix.com

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 1/7] x86: improve psr scheduling code

2015-04-06 Thread Konrad Rzeszutek Wilk
On Sat, Apr 04, 2015 at 04:14:24AM +0200, Dario Faggioli wrote:
 From: Chao Peng chao.p.p...@linux.intel.com
 
 Switching RMID from previous vcpu to next vcpu only needs to write
 MSR_IA32_PSR_ASSOC once. Write it with the value of next vcpu is enough,
 no need to write '0' first. Idle domain has RMID set to 0 and because MSR
 is already updated lazily, so just switch it as it does.
 
 Also move the initialization of per-CPU variable which used for lazy
 update from context switch to CPU starting.
 
 Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
 ---
  xen/arch/x86/domain.c |7 +---
  xen/arch/x86/psr.c|   89 
 +++--
  xen/include/asm-x86/psr.h |3 +-
  3 files changed, 73 insertions(+), 26 deletions(-)
 
 diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
 index 393aa26..73f5d7f 100644
 --- a/xen/arch/x86/domain.c
 +++ b/xen/arch/x86/domain.c
 @@ -1443,8 +1443,6 @@ static void __context_switch(void)
  {
  memcpy(p-arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES);
  vcpu_save_fpu(p);
 -if ( psr_cmt_enabled() )
 -psr_assoc_rmid(0);
  p-arch.ctxt_switch_from(p);
  }
  
 @@ -1469,11 +1467,10 @@ static void __context_switch(void)
  }
  vcpu_restore_fpu_eager(n);
  n-arch.ctxt_switch_to(n);
 -
 -if ( psr_cmt_enabled()  n-domain-arch.psr_rmid  0 )
 -psr_assoc_rmid(n-domain-arch.psr_rmid);
  }
  
 +psr_ctxt_switch_to(n-domain);
 +
  gdt = !is_pv_32on64_vcpu(n) ? per_cpu(gdt_table, cpu) :
per_cpu(compat_gdt_table, cpu);
  if ( need_full_gdt(n) )
 diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
 index 2ef83df..c902625 100644
 --- a/xen/arch/x86/psr.c
 +++ b/xen/arch/x86/psr.c
 @@ -22,7 +22,6 @@
  
  struct psr_assoc {
  uint64_t val;
 -bool_t initialized;
  };
  
  struct psr_cmt *__read_mostly psr_cmt;
 @@ -115,14 +114,6 @@ static void __init init_psr_cmt(unsigned int rmid_max)
  printk(XENLOG_INFO Cache Monitoring Technology enabled\n);
  }
  
 -static int __init init_psr(void)
 -{
 -if ( (opt_psr  PSR_CMT)  opt_rmid_max )
 -init_psr_cmt(opt_rmid_max);
 -return 0;
 -}
 -__initcall(init_psr);
 -
  /* Called with domain lock held, no psr specific lock needed */
  int psr_alloc_rmid(struct domain *d)
  {
 @@ -168,26 +159,84 @@ void psr_free_rmid(struct domain *d)
  d-arch.psr_rmid = 0;
  }
  
 -void psr_assoc_rmid(unsigned int rmid)
 +static inline void psr_assoc_init(void)
  {
 -uint64_t val;
 -uint64_t new_val;
  struct psr_assoc *psra = this_cpu(psr_assoc);
  
 -if ( !psra-initialized )
 -{
 +if ( psr_cmt_enabled() )
  rdmsrl(MSR_IA32_PSR_ASSOC, psra-val);
 -psra-initialized = 1;
 +}
 +
 +static inline void psr_assoc_reg_read(struct psr_assoc *psra, uint64_t *reg)
 +{
 +*reg = psra-val;
 +}
 +
 +static inline void psr_assoc_reg_write(struct psr_assoc *psra, uint64_t reg)
 +{
 +if ( reg != psra-val )
 +{
 +wrmsrl(MSR_IA32_PSR_ASSOC, reg);
 +psra-val = reg;
  }
 -val = psra-val;
 +}
 +
 +static inline void psr_assoc_rmid(uint64_t *reg, unsigned int rmid)
 +{
 +*reg = (*reg  ~rmid_mask) | (rmid  rmid_mask);
 +}
 +
 +void psr_ctxt_switch_to(struct domain *d)
 +{
 +uint64_t reg;
 +struct psr_assoc *psra = this_cpu(psr_assoc);
 +
 +psr_assoc_reg_read(psra, reg);
  
 -new_val = (val  ~rmid_mask) | (rmid  rmid_mask);
 -if ( val != new_val )
 +if ( psr_cmt_enabled() )
 +psr_assoc_rmid(reg, d-arch.psr_rmid);
 +
 +psr_assoc_reg_write(psra, reg);
 +}
 +
 +static void psr_cpu_init(unsigned int cpu)
 +{
 +psr_assoc_init();
 +}
 +
 +static int cpu_callback(
 +struct notifier_block *nfb, unsigned long action, void *hcpu)
 +{
 +unsigned int cpu = (unsigned long)hcpu;
 +
 +switch ( action )
 +{
 +case CPU_STARTING:
 +psr_cpu_init(cpu);
 +break;
 +}

You could just make it

if ( action == CPU_STARTING )
psr_assoc_init();

return NOTIFY_DONE;

Instead of this big switch statement with casting and such.. Thought
oddly enough, your psr_assoc_init figures out the CPU by running
it with 'this_cpu'. Why not make psr_assoc_init()' accept the CPU value?


 +
 +return NOTIFY_DONE;
 +}
 +
 +static struct notifier_block cpu_nfb = {
 +.notifier_call = cpu_callback
 +};
 +
 +static int __init psr_presmp_init(void)
 +{
 +if ( (opt_psr  PSR_CMT)  opt_rmid_max )
 +init_psr_cmt(opt_rmid_max);
 +
 +if (  psr_cmt_enabled() )

Extra space.
  {
 -wrmsrl(MSR_IA32_PSR_ASSOC, new_val);
 -psra-val = new_val;
 +psr_cpu_init(smp_processor_id());
 +register_cpu_notifier(cpu_nfb);
  }
 +
 +return 0;
  }
 +presmp_initcall(psr_presmp_init);
  
  /*
   * Local variables:
 diff --git a/xen/include/asm-x86/psr.h b/xen/include/asm-x86/psr.h
 index 

Re: [Xen-devel] [PATCH 19/19] xen: arm: Annotate source of ICC SGI register trapping

2015-04-06 Thread Julien Grall

Hi Ian,

On 31/03/2015 12:07, Ian Campbell wrote:

I was unable to find an ARMv8 ARM reference to this, so refer to the
GIC Architecture Specification instead.

ARMv8 ARM does cover other ways of trapping these accesses via
ICH_HCR_EL2 but we don't use those and they trap additional registers
as well.

Signed-off-by: Ian Campbell ian.campb...@citrix.com


Reviewed-by: Julien Grall julien.gr...@citrix.com

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86, paravirt, xen: Remove the 64-bit irq_enable_sysexit pvop

2015-04-06 Thread Konrad Rzeszutek Wilk
On Fri, Apr 03, 2015 at 03:52:30PM -0700, Andy Lutomirski wrote:
 [cc: Boris and Konrad.  Whoops]
 
 On Fri, Apr 3, 2015 at 3:51 PM, Andy Lutomirski l...@kernel.org wrote:
  We don't use irq_enable_sysexit on 64-bit kernels any more.  Remove

Is there an commit (or name of patch) that explains why 
32-bit-user-space-on-64-bit
kernels is unsavory?

Thank you!
  all the paravirt and Xen machinery to support it on 64-bit kernels.
 
  Signed-off-by: Andy Lutomirski l...@kernel.org
  ---
 
  I haven't actually tested this on Xen, but it builds for me.
 
   arch/x86/ia32/ia32entry.S |  6 --
   arch/x86/include/asm/paravirt_types.h |  7 ---
   arch/x86/kernel/asm-offsets.c |  2 ++
   arch/x86/kernel/paravirt.c|  4 +++-
   arch/x86/kernel/paravirt_patch_64.c   |  1 -
   arch/x86/xen/enlighten.c  |  3 ++-
   arch/x86/xen/xen-asm_64.S | 16 
   arch/x86/xen/xen-ops.h|  2 ++
   8 files changed, 13 insertions(+), 28 deletions(-)
 
  diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
  index 5d8f987a340d..eb1eb7b70f4b 100644
  --- a/arch/x86/ia32/ia32entry.S
  +++ b/arch/x86/ia32/ia32entry.S
  @@ -77,12 +77,6 @@ ENTRY(native_usergs_sysret32)
  swapgs
  sysretl
   ENDPROC(native_usergs_sysret32)
  -
  -ENTRY(native_irq_enable_sysexit)
  -   swapgs
  -   sti
  -   sysexit
  -ENDPROC(native_irq_enable_sysexit)
   #endif
 
   /*
  diff --git a/arch/x86/include/asm/paravirt_types.h 
  b/arch/x86/include/asm/paravirt_types.h
  index 7549b8b369e4..38a0ff9ef06e 100644
  --- a/arch/x86/include/asm/paravirt_types.h
  +++ b/arch/x86/include/asm/paravirt_types.h
  @@ -160,13 +160,14 @@ struct pv_cpu_ops {
  u64 (*read_pmc)(int counter);
  unsigned long long (*read_tscp)(unsigned int *aux);
 
  +#ifdef CONFIG_X86_32
  /*
   * Atomically enable interrupts and return to userspace.  This
  -* is only ever used to return to 32-bit processes; in a
  -* 64-bit kernel, it's used for 32-on-64 compat processes, but
  -* never native 64-bit processes.  (Jump, not call.)
  +* is only used in 32-bit kernels.  64-bit kernels use
  +* usergs_sysret32 instead.
   */
  void (*irq_enable_sysexit)(void);
  +#endif
 
  /*
   * Switch to usermode gs and return to 64-bit usermode using
  diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
  index 9f6b9341950f..2d27ebf0aed8 100644
  --- a/arch/x86/kernel/asm-offsets.c
  +++ b/arch/x86/kernel/asm-offsets.c
  @@ -49,7 +49,9 @@ void common(void) {
  OFFSET(PV_IRQ_irq_disable, pv_irq_ops, irq_disable);
  OFFSET(PV_IRQ_irq_enable, pv_irq_ops, irq_enable);
  OFFSET(PV_CPU_iret, pv_cpu_ops, iret);
  +#ifdef CONFIG_X86_32
  OFFSET(PV_CPU_irq_enable_sysexit, pv_cpu_ops, irq_enable_sysexit);
  +#endif
  OFFSET(PV_CPU_read_cr0, pv_cpu_ops, read_cr0);
  OFFSET(PV_MMU_read_cr2, pv_mmu_ops, read_cr2);
   #endif
  diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
  index 548d25f00c90..7563114d9c3a 100644
  --- a/arch/x86/kernel/paravirt.c
  +++ b/arch/x86/kernel/paravirt.c
  @@ -154,7 +154,9 @@ unsigned paravirt_patch_default(u8 type, u16 clobbers, 
  void *insnbuf,
  ret = paravirt_patch_ident_64(insnbuf, len);
 
  else if (type == PARAVIRT_PATCH(pv_cpu_ops.iret) ||
  +#ifdef CONFIG_X86_32
   type == PARAVIRT_PATCH(pv_cpu_ops.irq_enable_sysexit) ||
  +#endif
   type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret32) ||
   type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret64))
  /* If operation requires a jmp, then jmp */
  @@ -371,7 +373,7 @@ __visible struct pv_cpu_ops pv_cpu_ops = {
 
  .load_sp0 = native_load_sp0,
 
  -#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
  +#if defined(CONFIG_X86_32)
  .irq_enable_sysexit = native_irq_enable_sysexit,
   #endif
   #ifdef CONFIG_X86_64
  diff --git a/arch/x86/kernel/paravirt_patch_64.c 
  b/arch/x86/kernel/paravirt_patch_64.c
  index a1da6737ba5b..0de21c62c348 100644
  --- a/arch/x86/kernel/paravirt_patch_64.c
  +++ b/arch/x86/kernel/paravirt_patch_64.c
  @@ -49,7 +49,6 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
  PATCH_SITE(pv_irq_ops, save_fl);
  PATCH_SITE(pv_irq_ops, irq_enable);
  PATCH_SITE(pv_irq_ops, irq_disable);
  -   PATCH_SITE(pv_cpu_ops, irq_enable_sysexit);
  PATCH_SITE(pv_cpu_ops, usergs_sysret32);
  PATCH_SITE(pv_cpu_ops, usergs_sysret64);
  PATCH_SITE(pv_cpu_ops, swapgs);
  diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
  index 81665c9f2132..3797b6b31f95 100644
  --- a/arch/x86/xen/enlighten.c
  +++ b/arch/x86/xen/enlighten.c
  @@ -1267,10 +1267,11 @@ static const struct 

Re: [Xen-devel] [PATCH 04/19] xen: arm: provide and use a handle_raz_wi helper

2015-04-06 Thread Julien Grall



On 02/04/2015 18:19, Ian Campbell wrote:

On Thu, 2015-04-02 at 17:01 +0100, Ian Campbell wrote:

On Thu, 2015-04-02 at 16:50 +0100, Ian Campbell wrote:


Writing to the bottom half (e.g. w0) of a register implicitly clears the
top half, IIRC, so I think a kernel is unlikely to want to do this, even
if it could (which I'm not quite convinced of).


That said, I'll see if I can make something work with the handle_*
taking the reg number instead of a pointer and calling select_user_reg
in each.


Actually don't even need that, I think the following does what is
needed. I'm not 100% convinced it is needed though, but it's simple
enough, and I can't find anything in the ARM ARM right now which rules
out what you are suggesting, even if it is unlikely.


The paragraph Pseudocode description of registers in AArch64 state in 
section B1.2.1 (ARMv8 DDI0487 A.d) confirms your previous mail. I.e 
writing to the bottom half (e.g. w0) of a register implicitly clears 
the top half.


I think it may be worth to mention the paragraph somewhere in the patch.

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 12/19] xen: arm: Annotate the handlers for HSTR_EL2.Tx

2015-04-06 Thread Julien Grall

Hi Ian,

On 31/03/2015 12:07, Ian Campbell wrote:

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
  xen/arch/arm/traps.c |   10 ++
  1 file changed, 10 insertions(+)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index ba120e5..21bef01 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1709,6 +1709,11 @@ static void do_cp15_32(struct cpu_user_regs *regs,
   * ARMv7 (DDI 0406C.b): B1.14.12
   * ARMv8 (DDI 0487A.d): N/A
   *
+ * HSTR_EL2.Tx


I would prefer if you use T15 instead of Tx. This is less confusing as 
we only trap c15 and the bit T14 exists on ARMv8 (even though it's RES0).


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 11/19] xen: arm: Annotate handlers for PCTR_EL2.Tx

2015-04-06 Thread Julien Grall

Hi Ian,

Subject: s/PCTR/CPTR/

On 31/03/2015 12:07, Ian Campbell wrote:

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
  xen/arch/arm/traps.c |   14 ++
  1 file changed, 14 insertions(+)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 9cdbda8..ba120e5 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1704,6 +1704,11 @@ static void do_cp15_32(struct cpu_user_regs *regs,
   * ARMv7 (DDI 0406C.b): B1.14.3
   * ARMv8 (DDI 0487A.d): D1-1501 Table D1-43
   *
+ * CPTR_EL2.T{0..9,12..13}
+ *
+ * ARMv7 (DDI 0406C.b): B1.14.12
+ * ARMv8 (DDI 0487A.d): N/A


I would also update the comment on top of WRITE_SYSREG(..., CPTR_EL2) to 
make clear that CP0..CP9  CP12..CP13 are only traps for ARMv7.


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 10/19] xen: arm: Annotate registers trapped by CPTR_EL2.TTA

2015-04-06 Thread Julien Grall

Hi Ian,

On 31/03/2015 12:07, Ian Campbell wrote:

Add explicit handler for 64-bit CP14 accesses, with more relevant
debug message (as per other handlers) and to provide a place for the
comment.


It's a bit strange to name the patch Annotate... while the main change 
is 64-bit CP14 accesses.


AFAICT, this was a bug in Xen implementation. Although, I'm not sure if 
the current platform we support have Trace registers (maybe the Arndale?).


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 2/7] Xen: x86: print max usable RMID during init

2015-04-06 Thread Konrad Rzeszutek Wilk
On Sat, Apr 04, 2015 at 04:14:33AM +0200, Dario Faggioli wrote:
 Just print it.
 
 Signed-off-by: Dario Faggioli dario.faggi...@citrix.com
 ---
  xen/arch/x86/psr.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
 index c902625..0f2a6ce 100644
 --- a/xen/arch/x86/psr.c
 +++ b/xen/arch/x86/psr.c
 @@ -111,7 +111,8 @@ static void __init init_psr_cmt(unsigned int rmid_max)
  for ( rmid = 1; rmid = psr_cmt-rmid_max; rmid++ )
  psr_cmt-rmid_to_dom[rmid] = DOMID_INVALID;
  
 -printk(XENLOG_INFO Cache Monitoring Technology enabled\n);
 +printk(XENLOG_INFO Cache Monitoring Technology enabled, RMIDs: %u\n,

max RMID: ?

 +   psr_cmt-rmid_max);
  }
  
  /* Called with domain lock held, no psr specific lock needed */
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 18/19] xen: arm: Annotate registers trapped when CNTHCTL_EL2.EL1PCEN == 0

2015-04-06 Thread Julien Grall

Hi Ian,

On 31/03/2015 12:07, Ian Campbell wrote:

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
  xen/arch/arm/traps.c |   20 ++--
  1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 1c9cf21..cc5b8dd 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1642,6 +1642,12 @@ static void do_cp15_32(struct cpu_user_regs *regs,

  switch ( hsr.bits  HSR_CP32_REGS_MASK )
  {
+/*
+ * !CNTHCTL_EL2.EL1PCEN / !CNTHCTL.PL1PCEN


I will be picky. The listing here is ARMv8 (AArch64) / ARMv7, but below 
it's ARMv7 / ARMv8.



+ *
+ * ARMv7 (DDI 0406C.b): B4.1.22
+ * ARMv8 (DDI 0487A.d): D1-1510 Table D1-60
+ */
  case HSR_CPREG32(CNTP_CTL):
  case HSR_CPREG32(CNTP_TVAL):
  if ( !vtimer_emulate(regs, hsr) )
@@ -1757,6 +1763,12 @@ static void do_cp15_64(struct cpu_user_regs *regs,

  switch ( hsr.bits  HSR_CP64_REGS_MASK )
  {
+/*
+ * !CNTHCTL_EL2.EL1PCEN / !CNTHCTL.PL1PCEN
+ *
+ * ARMv7 (DDI 0406C.b): B4.1.22
+ * ARMv8 (DDI 0487A.d): D1-1510 Table D1-60
+ */
  case HSR_CPREG64(CNTP_CVAL):
  if ( !vtimer_emulate(regs, hsr) )
  return inject_undef_exception(regs, hsr);
@@ -2120,14 +2132,18 @@ static void do_sysreg(struct cpu_user_regs *regs,
   */
  return handle_raz_wi(regs, x, hsr.sysreg.read, hsr, 1);

-/* Write only, Write ignore registers: */
-


This comment should have been dropped in patch #14.

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 3/3] xen/arm: smmu: Renaming struct iommu_domain *domain to, struct iommu_domain *iommu_domain

2015-04-06 Thread Julien Grall

Hi Ian,

On 01/04/2015 10:30, Ian Campbell wrote:

On Tue, 2015-03-31 at 17:48 +0100, Stefano Stabellini wrote:

If it helps we could add a couple of comments on top of the structs in
smmu.c to explain the meaning of the fields, like:


/* iommu_domain, not to be confused with a Xen domain */


I was going to suggest something similar but more expansive, i.e. a
table of them all in one place (i.e. at the top of the file) for ease of
referencing:

Struct NameWhat Wherefrom Normally found in
-
iommu_domain   IOMMU ContextLinux d-arch.blah
arch_smmu_xen_device   Device specific  Xen   device-arch.blurg


The actual name of the structure is arm_smmu_xen_device not 
arch_smmu_xen_device. Did you suggest to rename the name?


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86, paravirt, xen: Remove the 64-bit irq_enable_sysexit pvop

2015-04-06 Thread Andy Lutomirski
On Mon, Apr 6, 2015 at 7:10 AM, Konrad Rzeszutek Wilk
konrad.w...@oracle.com wrote:
 On Fri, Apr 03, 2015 at 03:52:30PM -0700, Andy Lutomirski wrote:
 [cc: Boris and Konrad.  Whoops]

 On Fri, Apr 3, 2015 at 3:51 PM, Andy Lutomirski l...@kernel.org wrote:
  We don't use irq_enable_sysexit on 64-bit kernels any more.  Remove

 Is there an commit (or name of patch) that explains why 
 32-bit-user-space-on-64-bit
 kernels is unsavory?

sysexit never tasted very good :-p

We're (hopefully) not breaking 32-bit-user-space-on-64-bit, but we're
trying an unconventional approach to making the code faster and less
scary.  As a result, 64-bit kernels won't use sysexit any more.
Hopefully Xen is okay with the slightly sneaky thing we're doing.
AFAICT Xen thinks of sysretl and sysexit as slightly funny irets, so I
don't expect there to be any problem.

See:

https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/asmid=4214a16b02971c60960afd675d03544e109e0d75

and

https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/asmid=47091e3c5b072daca29a15d2a3caf40359b0d140

--Andy


 Thank you!
  all the paravirt and Xen machinery to support it on 64-bit kernels.
 
  Signed-off-by: Andy Lutomirski l...@kernel.org
  ---
 
  I haven't actually tested this on Xen, but it builds for me.
 
   arch/x86/ia32/ia32entry.S |  6 --
   arch/x86/include/asm/paravirt_types.h |  7 ---
   arch/x86/kernel/asm-offsets.c |  2 ++
   arch/x86/kernel/paravirt.c|  4 +++-
   arch/x86/kernel/paravirt_patch_64.c   |  1 -
   arch/x86/xen/enlighten.c  |  3 ++-
   arch/x86/xen/xen-asm_64.S | 16 
   arch/x86/xen/xen-ops.h|  2 ++
   8 files changed, 13 insertions(+), 28 deletions(-)
 
  diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
  index 5d8f987a340d..eb1eb7b70f4b 100644
  --- a/arch/x86/ia32/ia32entry.S
  +++ b/arch/x86/ia32/ia32entry.S
  @@ -77,12 +77,6 @@ ENTRY(native_usergs_sysret32)
  swapgs
  sysretl
   ENDPROC(native_usergs_sysret32)
  -
  -ENTRY(native_irq_enable_sysexit)
  -   swapgs
  -   sti
  -   sysexit
  -ENDPROC(native_irq_enable_sysexit)
   #endif
 
   /*
  diff --git a/arch/x86/include/asm/paravirt_types.h 
  b/arch/x86/include/asm/paravirt_types.h
  index 7549b8b369e4..38a0ff9ef06e 100644
  --- a/arch/x86/include/asm/paravirt_types.h
  +++ b/arch/x86/include/asm/paravirt_types.h
  @@ -160,13 +160,14 @@ struct pv_cpu_ops {
  u64 (*read_pmc)(int counter);
  unsigned long long (*read_tscp)(unsigned int *aux);
 
  +#ifdef CONFIG_X86_32
  /*
   * Atomically enable interrupts and return to userspace.  This
  -* is only ever used to return to 32-bit processes; in a
  -* 64-bit kernel, it's used for 32-on-64 compat processes, but
  -* never native 64-bit processes.  (Jump, not call.)
  +* is only used in 32-bit kernels.  64-bit kernels use
  +* usergs_sysret32 instead.
   */
  void (*irq_enable_sysexit)(void);
  +#endif
 
  /*
   * Switch to usermode gs and return to 64-bit usermode using
  diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
  index 9f6b9341950f..2d27ebf0aed8 100644
  --- a/arch/x86/kernel/asm-offsets.c
  +++ b/arch/x86/kernel/asm-offsets.c
  @@ -49,7 +49,9 @@ void common(void) {
  OFFSET(PV_IRQ_irq_disable, pv_irq_ops, irq_disable);
  OFFSET(PV_IRQ_irq_enable, pv_irq_ops, irq_enable);
  OFFSET(PV_CPU_iret, pv_cpu_ops, iret);
  +#ifdef CONFIG_X86_32
  OFFSET(PV_CPU_irq_enable_sysexit, pv_cpu_ops, irq_enable_sysexit);
  +#endif
  OFFSET(PV_CPU_read_cr0, pv_cpu_ops, read_cr0);
  OFFSET(PV_MMU_read_cr2, pv_mmu_ops, read_cr2);
   #endif
  diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
  index 548d25f00c90..7563114d9c3a 100644
  --- a/arch/x86/kernel/paravirt.c
  +++ b/arch/x86/kernel/paravirt.c
  @@ -154,7 +154,9 @@ unsigned paravirt_patch_default(u8 type, u16 clobbers, 
  void *insnbuf,
  ret = paravirt_patch_ident_64(insnbuf, len);
 
  else if (type == PARAVIRT_PATCH(pv_cpu_ops.iret) ||
  +#ifdef CONFIG_X86_32
   type == PARAVIRT_PATCH(pv_cpu_ops.irq_enable_sysexit) ||
  +#endif
   type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret32) ||
   type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret64))
  /* If operation requires a jmp, then jmp */
  @@ -371,7 +373,7 @@ __visible struct pv_cpu_ops pv_cpu_ops = {
 
  .load_sp0 = native_load_sp0,
 
  -#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
  +#if defined(CONFIG_X86_32)
  .irq_enable_sysexit = native_irq_enable_sysexit,
   #endif
   #ifdef CONFIG_X86_64
  diff --git a/arch/x86/kernel/paravirt_patch_64.c 
  b/arch/x86/kernel/paravirt_patch_64.c
  index a1da6737ba5b..0de21c62c348 100644
  --- 

[Xen-devel] [qemu-upstream-4.5-testing test] 50330: regressions - FAIL

2015-04-06 Thread osstest service user
flight 50330 qemu-upstream-4.5-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/50330/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-pair   17 guest-migrate/src_host/dst_host fail REGR. vs. 36517

Tests which are failing intermittently (not blocking):
 test-amd64-i386-freebsd10-i386 11 guest-localmigratefail pass in 50313
 test-armhf-armhf-libvirt  9 guest-start fail pass in 50313
 test-amd64-amd64-xl-qemuu-winxpsp3 13 guest-localmigrate/x10 fail pass in 50313
 test-amd64-amd64-libvirt  9 guest-startfail in 50313 pass in 50330
 test-amd64-i386-freebsd10-i386 14 guest-localmigrate/x10 fail in 50313 pass in 
50283

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-intel  9 guest-start  fail  never pass
 test-amd64-amd64-xl-pvh-amd   9 guest-start  fail   never pass
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2   5 xen-boot fail   never pass
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 10 migrate-support-checkfail never pass
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-sedf 10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-armhf-armhf-libvirt 10 migrate-support-check fail in 50313 never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop  fail in 50313 never pass

version targeted for testing:
 qemuuc9ac5f816bf3a8b56f836b078711dcef6e5c90b8
baseline version:
 qemuu0b8fb1ec3d666d1eb8bbff56c76c5e6daa2789e4


People who touched revisions under test:
  Ian Campbell ian.campb...@citrix.com
  Jan Beulich jbeul...@suse.com


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-amd64-xl-pvh-amd  fail
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64 

[Xen-devel] remove entry in shadow table

2015-04-06 Thread HANNAS YAYA Issa

Hi
I want to remove entry of a given page in the shadow page table so that 
when the next time the guest access to the page there is page fault.

Here is what I try to do:

1. I have a timer which wake up every 30 seconds and remove entry in 
the shadow by calling

sh_remove_all_mappings(d-vcpu[0], _mfn(page_to_mfn(page)))
here d is the domain and page is the page that I want to remove 
from the shadow page table.
2. In the function sh_page_fault() I get the gmfn and compare it with 
the mfn of the page that I removed earlier from the shadow page table.


Is this method correct?

I also get this error: sh error: sh_remove_all_mappings(): can't find 
all mappings of mfn


Thank you

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel