Re: [Xen-devel] [PATCH] VTd/dmar: Tweak how the DMAR table is clobbered

2015-04-09 Thread Andrew Cooper
On 09/04/15 09:51, David Vrabel wrote:
 On 08/04/15 20:44, Andrew Cooper wrote:
 Intead of clobbering DMAR - XMAR and back, clobber to RMAD instead.  This
 means that changing the signature does not alter the checksum, which allows
 the clobbering/unclobbering to be peformed atomically and idempotently, which
 is an advantage on the kexec path which can reenter acpi_dmar_reinstate().
 Could RMAD be specified as a real table in the future?  Does the
 clobbered name have to start with X to avoid future conflicts?

 David

I am not aware of any restrictions imposed by the APCI spec.  Any
clobbered signature is potentially a real table in the future.

This DMAR clobbering was introduced by
83904107a33c9badc34ecdd1f8ca0f9271e5e370 which claims that the dom0 VT-d
driver was capable of playing with the IOMMU(s) while Xen was also using
them.  An alternative approach might be to leave the DMAR table alone
and sprinkle some iomem_deny_access() around to forcibly prevent dom0
from playing.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V9 0/6] xen: Clean-up of mem_event subsystem

2015-04-09 Thread Tamas K Lengyel
This patch series aims to clean up the mem_event subsystem within Xen. The
original use-case for this system was to allow external helper applications
running in privileged domains to control various memory operations performed
by Xen. Amongs these were paging, sharing and access control. The subsystem
has since been extended to also deliver non-memory related events, namely
various HVM debugging events (INT3, MTF, MOV-TO-CR, MOV-TO-MSR). The structures
and naming of related functions however has not caught up to these new
use-cases, thus leaving many ambiguities in the code. Furthermore, future
use-cases envisioned for this subsystem include PV domains and ARM domains,
thus there is a need to establish a common base to build on.

Each patch in the series has been build-tested on x86 and ARM, both with
and without XSM enabled.

This PATCH series is also available at:
https://github.com/tklengyel/xen/tree/mem_event_cleanup9

Tamas K Lengyel (6):
  xen: Introduce monitor_op domctl
  xen/vm_event: Deprecate VM_EVENT_FLAG_DUMMY flag
  xen/vm_event: Decouple vm_event and mem_access.
  xen/vm_event: Relocate memop checks
  xen/xsm: Split vm_event_op into three separate labels
  xen/vm_event: Add RESUME option to vm_event_op domctl

 MAINTAINERS |   1 +
 tools/libxc/Makefile|   1 +
 tools/libxc/include/xenctrl.h   |  49 +++--
 tools/libxc/xc_domain.c |  28 +-
 tools/libxc/xc_mem_access.c |  56 +--
 tools/libxc/xc_mem_paging.c |  12 ++-
 tools/libxc/xc_memshr.c |  15 ++-
 tools/libxc/xc_monitor.c| 137 +
 tools/libxc/xc_private.h|   2 +-
 tools/libxc/xc_vm_event.c   |  11 +-
 tools/tests/xen-access/xen-access.c |  40 
 xen/arch/x86/Makefile   |   1 +
 xen/arch/x86/hvm/emulate.c  |   2 +-
 xen/arch/x86/hvm/event.c|  82 ---
 xen/arch/x86/hvm/hvm.c  |  35 +--
 xen/arch/x86/hvm/vmx/vmcs.c |   7 +-
 xen/arch/x86/hvm/vmx/vmx.c  |   2 +-
 xen/arch/x86/mm/mem_paging.c|  41 ++--
 xen/arch/x86/mm/mem_sharing.c   | 160 ++---
 xen/arch/x86/mm/p2m.c   |  74 --
 xen/arch/x86/monitor.c  | 196 
 xen/arch/x86/x86_64/compat/mm.c |  26 +
 xen/arch/x86/x86_64/mm.c|  24 +
 xen/common/Makefile |  18 ++--
 xen/common/domctl.c |   9 ++
 xen/common/mem_access.c |  51 ++
 xen/common/vm_event.c   | 181 +
 xen/include/asm-arm/monitor.h   |  35 +++
 xen/include/asm-arm/p2m.h   |  18 +++-
 xen/include/asm-x86/domain.h|  22 +++-
 xen/include/asm-x86/hvm/domain.h|   1 -
 xen/include/asm-x86/mem_paging.h|   2 +-
 xen/include/asm-x86/mem_sharing.h   |   4 +-
 xen/include/asm-x86/monitor.h   |  31 ++
 xen/include/asm-x86/p2m.h   |  37 +--
 xen/include/public/domctl.h |  80 +++
 xen/include/public/hvm/params.h |   9 +-
 xen/include/public/memory.h |  18 ++--
 xen/include/public/vm_event.h   |   3 +-
 xen/include/xen/mem_access.h|  14 ++-
 xen/include/xen/vm_event.h  |  59 +--
 xen/include/xsm/dummy.h |  20 +++-
 xen/include/xsm/xsm.h   |  33 +-
 xen/xsm/dummy.c |  13 ++-
 xen/xsm/flask/hooks.c   |  64 
 xen/xsm/flask/policy/access_vectors |  12 ++-
 46 files changed, 1106 insertions(+), 630 deletions(-)
 create mode 100644 tools/libxc/xc_monitor.c
 create mode 100644 xen/arch/x86/monitor.c
 create mode 100644 xen/include/asm-arm/monitor.h
 create mode 100644 xen/include/asm-x86/monitor.h

-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/2] osstest: update FreeBSD guests to 10.1

2015-04-09 Thread Ian Jackson
Roger Pau Monne writes ([PATCH v2 1/2] osstest: update FreeBSD guests to 
10.1):
 Update FreeBSD guests in OSSTest to FreeBSD 10.1. The following images
 should be placed in the osstest images folder:

Thanks for the quick turnaround.  I have pushed this and 2/2 to
osstest pretest (with my acks) and will keep an eye on it.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/6] x86/numa: Correct the extern of cpu_to_node

2015-04-09 Thread Andrew Cooper
On 09/04/15 16:00, Tim Deegan wrote:
 At 18:26 +0100 on 07 Apr (1428431176), Andrew Cooper wrote:
 --- a/xen/include/asm-x86/numa.h
 +++ b/xen/include/asm-x86/numa.h
 @@ -9,7 +9,7 @@
  
  extern int srat_rev;
  
 -extern unsigned char cpu_to_node[];
 +extern nodeid_t  cpu_to_node[NR_CPUS];
 Does the compiler do anything useful with the array size here?

Specifying the size allows ARRAY_SIZE(cpu_to_node) to work in other
translation units.  It also allows static analysers to perform bounds
checks, should they wish.

 In particular does it check that it matches the size at the definition?

It will complain if they are mismatched.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/6] x86/link: Discard the alternatives .discard sections

2015-04-09 Thread Tim Deegan
At 18:26 +0100 on 07 Apr (1428431175), Andrew Cooper wrote:
 This appears to have been missed when porting the alternatives framework from
 Linux, and saves us a section which is otherwise loaded into memory.
 
 Signed-off-by: Andrew Cooper andrew.coop...@citrix.com

Reviewed-by: Tim Deegan t...@xen.org

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]

2015-04-09 Thread Prashant Sreedharan
On Thu, 2015-04-09 at 12:11 +0100, Ian Jackson wrote:
 Prashant Sreedharan writes (Re: tg3 NIC driver bug in 3.14.x under Xen [and 
 3 more messages]):
  On Wed, 2015-04-08 at 14:59 +0100, Ian Jackson wrote:
   Ian Jackson writes (Re: tg3 NIC driver bug in 3.14.x under Xen):
The value for dropped increases steadily.  This particular box is on
a network with a lot of other stuff, so it will be constantly
receiving broadcasts of various kinds even when I am not trying to
address it directly.
   
  Based on the stats, the issue seems to be with the bridge than tg3. Do
  you have any filters enabled on xenbr0 ?
 
 No.  I can try to repro the problem without the bridge, if it would
 help.
yes please do



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC] xen/pvh: use a custom IO bitmap for PVH hardware domains

2015-04-09 Thread Andrew Cooper
On 08/04/15 13:57, Roger Pau Monne wrote:
 Since a PVH hardware domain has access to the physical hardware create a
 custom more permissive IO bitmap. The permissions set on the bitmap are
 populated based on the contents of the ioports rangeset.

 Also add the IO ports of the serial console used by Xen to the list of not
 accessible IO ports.

Thankyou for looking into this.  I think it is the correct general
direction, but I do have some questions/thoughts about this area.

I know that the current implementation is that dom0 is whitelisted and
can play with everything, but is this actually the best API? 
Conceptually, a better approach would be for dom0 to start with no
permissions, and explicitly request access (After all, PV and PVH
domains are expected to know exactly what they are doing under Xen). 
This has an extra advantage in that dom0 can't accidentally grant
permissions for resources it doens't know about to domUs.

Instead of adding to a growing blacklist in contruct_dom0, it might be a
better to maintain a global rangeset (or few) for resources which are
used by Xen and not permitted to be used by any other domains.  This
would allow the ioports_deny_access()/etc calls to move into the correct
drivers, instead of having to extern things like the uart ports.  It is
also far more likely to be kept up to date.  (On that note, we could
probably do with an audit of the currently denied resources.  I highly
doubt there is a PIT driver which could function with access to only
some of the ports).

In addition, some specific review...


 Signed-off-by: Roger Pau Monné roger@citrix.com
 Cc: Jan Beulich jbeul...@suse.com
 Cc: Andrew Cooper andrew.coop...@citrix.com
 Cc: Boris Ostrovsky boris.ostrov...@oracle.com
 Cc: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
 Cc: Aravind Gopalakrishnan aravind.gopalakrish...@amd.com
 Cc: Jun Nakajima jun.nakaj...@intel.com
 Cc: Eddie Dong eddie.d...@intel.com
 Cc: Kevin Tian kevin.t...@intel.com
 ---
  xen/arch/x86/domain_build.c  | 10 ++
  xen/arch/x86/hvm/hvm.c   | 11 +++
  xen/arch/x86/hvm/svm/vmcb.c  |  3 ++-
  xen/arch/x86/hvm/vmx/vmcs.c  |  6 --
  xen/arch/x86/hvm/vmx/vmx.c   |  1 +
  xen/drivers/char/ns16550.c   | 10 ++
  xen/include/asm-x86/hvm/domain.h |  2 ++
  xen/include/asm-x86/hvm/hvm.h|  1 +
  xen/include/xen/serial.h |  4 
  9 files changed, 45 insertions(+), 3 deletions(-)

 diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
 index e5c845c..d0365fe 100644
 --- a/xen/arch/x86/domain_build.c
 +++ b/xen/arch/x86/domain_build.c
 @@ -22,6 +22,7 @@
  #include xen/compat.h
  #include xen/libelf.h
  #include xen/pfn.h
 +#include xen/serial.h
  #include asm/regs.h
  #include asm/system.h
  #include asm/io.h
 @@ -1541,6 +1542,11 @@ int __init construct_dom0(
  rc |= ioports_deny_access(d, 0x40, 0x43);
  /* PIT Channel 2 / PC Speaker Control. */
  rc |= ioports_deny_access(d, 0x61, 0x61);
 +/* Serial console. */
 +if ( uart_ioport1 != 0 )
 +rc |= ioports_deny_access(d, uart_ioport1, uart_ioport1 + 7);
 +if ( uart_ioport2 != 0 )
 +rc |= ioports_deny_access(d, uart_ioport2, uart_ioport2 + 7);
  /* ACPI PM Timer. */
  if ( pmtmr_ioport )
  rc |= ioports_deny_access(d, pmtmr_ioport, pmtmr_ioport + 3);
 @@ -1618,6 +1624,10 @@ int __init construct_dom0(
  
  pvh_map_all_iomem(d, nr_pages);
  pvh_setup_e820(d, nr_pages);
 +
 +for ( i = 0; i  0x1; i++ )
 +if ( ioports_access_permitted(d, i, i) )
 +__clear_bit(i, hvm_hw_io_bitmap);

(There is surely a more efficient way of doing this?  If there isn't,
there probably should be)

There is also a boundary issue between VT-x and SVM.

For VT-x, the IO bitmap is 2 pages.  For SVM, it is 2 pages and 3 bits. 
I suspect the difference is to do with the handling of a 4byte write to
port 0x.  I think you might need to check i  0x10003 instead.

  }
  
  if ( d-domain_id == hardware_domid )
 diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
 index 3ff87c6..6de89b2 100644
 --- a/xen/arch/x86/hvm/hvm.c
 +++ b/xen/arch/x86/hvm/hvm.c
 @@ -82,6 +82,10 @@ struct hvm_function_table hvm_funcs __read_mostly;
  unsigned long __attribute__ ((__section__ (.bss.page_aligned)))
  hvm_io_bitmap[3*PAGE_SIZE/BYTES_PER_LONG];
  
 +/* I/O permission bitmap for HVM hardware domain */
 +unsigned long __attribute__ ((__section__ (.bss.page_aligned)))
 +hvm_hw_io_bitmap[3*PAGE_SIZE/BYTES_PER_LONG];
 +
  /* Xen command-line option to enable HAP */
  static bool_t __initdata opt_hap_enabled = 1;
  boolean_param(hap, opt_hap_enabled);
 @@ -162,6 +166,7 @@ static int __init hvm_enable(void)
   * often used for I/O delays, but the vmexits simply slow things down).
   */
  memset(hvm_io_bitmap, ~0, sizeof(hvm_io_bitmap));
 +memset(hvm_hw_io_bitmap, ~0, sizeof(hvm_hw_io_bitmap));
  if ( 

Re: [Xen-devel] [PATCH 09/10] log-dirty: Refine common code to support PML

2015-04-09 Thread Tim Deegan
At 10:35 +0800 on 27 Mar (1427452553), Kai Huang wrote:
 --- a/xen/arch/x86/mm/paging.c
 +++ b/xen/arch/x86/mm/paging.c
 @@ -411,7 +411,18 @@ static int paging_log_dirty_op(struct domain *d,
  int i4, i3, i2;
  
  if ( !resuming )
 +{
  domain_pause(d);
 +
 +/*
 + * Only need to flush when not resuming, as domain was paused in
 + * resuming case therefore it's not possible to have any new dirty
 + * page.
 + */
 +if ( d-arch.paging.log_dirty.flush_cached_dirty )
 +d-arch.paging.log_dirty.flush_cached_dirty(d);

I think there are too many layers of indirection here. :)  How about:
 - don't add a flush_cached_dirty() function to the log_dirty ops.
 - just call p2m_flush_hardware_cached_dirty(d) here.

Would that work OK?

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 11/12] tools: add tools support for Intel CAT

2015-04-09 Thread Chao Peng
This is the xc/xl changes to support Intel Cache Allocation
Technology(CAT). Two commands are introduced:
- xl psr-cat-cbm-set [-s socket] domain cbm
  Set cache capacity bitmasks(CBM) for a domain.
- xl psr-cat-show domain
  Show Cache Allocation Technology information.

Examples:
[root@vmm-psr]# xl psr-cat-cbm-set 0 0xff
[root@vmm-psr]# xl psr-cat-show
Socket ID   : 0
L3 Cache: 12288KB
Maximum COS : 15
CBM length  : 12
Default CBM : 0xfff
   ID NAME CBM
0 Domain-00xff

Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
---
Changes in v4:
* Add example output in commit message.
* Make libxl__count_physical_sockets private to libxl_psr.c.
* Set errno in several error cases.
* Change libxl_psr_cat_get_l3_info to return all sockets information.
* Remove unused libxl_domain_info call.
Changes in v3:
* Add manpage.
* libxl_psr_cat_set/get_domain_data = libxl_psr_cat_set/get_cbm.
* Move libxl_count_physical_sockets into seperate patch.
* Support LIBXL_PSR_TARGET_ALL for libxl_psr_cat_set_cbm.
* Clean up the print codes.
---
 docs/man/xl.pod.1 |  31 
 tools/libxc/include/xenctrl.h |  15 
 tools/libxc/xc_psr.c  |  76 +++
 tools/libxl/libxl.h   |  26 +++
 tools/libxl/libxl_psr.c   | 168 --
 tools/libxl/libxl_types.idl   |  10 +++
 tools/libxl/xl.h  |   4 +
 tools/libxl/xl_cmdimpl.c  | 140 +++
 tools/libxl/xl_cmdtable.c |  12 +++
 9 files changed, 475 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index b016272..dfab921 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -1492,6 +1492,37 @@ monitor types are:
 
 =back
 
+=head1 CACHE ALLOCATION TECHNOLOGY
+
+Intel Broadwell and later server platforms offer capabilities to configure and
+make use of the Cache Allocation Technology (CAT) mechanisms, which enable more
+cache resources (i.e. L3 cache) to be made available for high priority
+applications. In Xen implementation, CAT is used to control cache allocation
+on VM basis. To enforce cache on a specific domain, just set capacity bitmasks
+(CBM) for the domain.
+
+=over 4
+
+=item Bpsr-cat-cbm-set [IOPTIONS] [Idomain-id] [Icbm]
+
+Set cache capacity bitmasks(CBM) for a domain.
+
+BOPTIONS
+
+=over 4
+
+=item B-s SOCKET, B--socket=SOCKET
+
+Specify the socket to process, otherwise all sockets are processed.
+
+=back
+
+=item Bpsr-cat-show [Idomain-id]
+
+Show CAT settings for a certain domain or all domains.
+
+=back
+
 =head1 TO BE DOCUMENTED
 
 We need better documentation for:
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index df18292..1373a46 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2692,6 +2692,12 @@ enum xc_psr_cmt_type {
 XC_PSR_CMT_LOCAL_MEM_COUNT,
 };
 typedef enum xc_psr_cmt_type xc_psr_cmt_type;
+
+enum xc_psr_cat_type {
+XC_PSR_CAT_L3_CBM = 1,
+};
+typedef enum xc_psr_cat_type xc_psr_cat_type;
+
 int xc_psr_cmt_attach(xc_interface *xch, uint32_t domid);
 int xc_psr_cmt_detach(xc_interface *xch, uint32_t domid);
 int xc_psr_cmt_get_domain_rmid(xc_interface *xch, uint32_t domid,
@@ -2706,6 +2712,15 @@ int xc_psr_cmt_get_data(xc_interface *xch, uint32_t 
rmid, uint32_t cpu,
 uint32_t psr_cmt_type, uint64_t *monitor_data,
 uint64_t *tsc);
 int xc_psr_cmt_enabled(xc_interface *xch);
+
+int xc_psr_cat_set_domain_data(xc_interface *xch, uint32_t domid,
+   xc_psr_cat_type type, uint32_t target,
+   uint64_t data);
+int xc_psr_cat_get_domain_data(xc_interface *xch, uint32_t domid,
+   xc_psr_cat_type type, uint32_t target,
+   uint64_t *data);
+int xc_psr_cat_get_l3_info(xc_interface *xch, uint32_t socket,
+   uint32_t *cos_max, uint32_t *cbm_len);
 #endif
 
 #endif /* XENCTRL_H */
diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c
index e367a80..d8b3a51 100644
--- a/tools/libxc/xc_psr.c
+++ b/tools/libxc/xc_psr.c
@@ -248,6 +248,82 @@ int xc_psr_cmt_enabled(xc_interface *xch)
 
 return 0;
 }
+int xc_psr_cat_set_domain_data(xc_interface *xch, uint32_t domid,
+   xc_psr_cat_type type, uint32_t target,
+   uint64_t data)
+{
+DECLARE_DOMCTL;
+uint32_t cmd;
+
+switch ( type )
+{
+case XC_PSR_CAT_L3_CBM:
+cmd = XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM;
+break;
+default:
+errno = EINVAL;
+return -1;
+}
+
+domctl.cmd = XEN_DOMCTL_psr_cat_op;
+domctl.domain = (domid_t)domid;
+domctl.u.psr_cat_op.cmd = cmd;
+domctl.u.psr_cat_op.target = target;
+domctl.u.psr_cat_op.data = data;
+
+return do_domctl(xch, domctl);
+}
+
+int 

[Xen-devel] [PATCH v5 p2 06/19] xen/dts: Provide an helper to get a DT node from a path provided by a guest

2015-04-09 Thread Julien Grall
From: Julien Grall julien.gr...@linaro.org

The maximum size of the copied string has been chosen based on the value
use by XSM in similar case.

Furthermore, Linux seems to allow path up to 4096 characters. Though
this could vary from one OS to another.

Signed-off-by: Julien Grall julien.gr...@linaro.org

---
Changes in v4:
- Drop DEVICE_TREE_MAX_PATHLEN
- Bump the value to PAGE_SIZE (i.e 4096). It's used in XSM and
this value seems sensible for Linux
- Clarify how the maximum size has been chosen

Changes in v3:
- Use the new prototype of safe_copy_string_from_guest

Changes in v2:
- guest_copy_string_from_guest has been renamed into
safe_copy_string_from_guest
---
 xen/common/device_tree.c  | 18 ++
 xen/include/xen/device_tree.h | 14 ++
 2 files changed, 32 insertions(+)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index 02cae91..31f169b 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -13,6 +13,7 @@
 #include xen/config.h
 #include xen/types.h
 #include xen/init.h
+#include xen/guest_access.h
 #include xen/device_tree.h
 #include xen/kernel.h
 #include xen/lib.h
@@ -23,6 +24,7 @@
 #include xen/cpumask.h
 #include xen/ctype.h
 #include asm/setup.h
+#include xen/err.h
 
 const void *device_tree_flattened;
 dt_irq_xlate_func dt_irq_xlate;
@@ -277,6 +279,22 @@ struct dt_device_node *dt_find_node_by_path(const char 
*path)
 return np;
 }
 
+int dt_find_node_by_gpath(XEN_GUEST_HANDLE(char) u_path, uint32_t u_plen,
+  struct dt_device_node **node)
+{
+char *path;
+
+path = safe_copy_string_from_guest(u_path, u_plen, PAGE_SIZE);
+if ( IS_ERR(path) )
+return PTR_ERR(path);
+
+*node = dt_find_node_by_path(path);
+
+xfree(path);
+
+return (*node == NULL) ? -ESRCH : 0;
+}
+
 struct dt_device_node *dt_find_node_by_alias(const char *alias)
 {
 const struct dt_alias_prop *app;
diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h
index 57eb3ee..e187780 100644
--- a/xen/include/xen/device_tree.h
+++ b/xen/include/xen/device_tree.h
@@ -456,6 +456,20 @@ struct dt_device_node *dt_find_node_by_alias(const char 
*alias);
  */
 struct dt_device_node *dt_find_node_by_path(const char *path);
 
+
+/**
+ * dt_find_node_by_gpath - Same as dt_find_node_by_path but retrieve the
+ * path from the guest
+ *
+ * @u_path: Xen Guest handle to the buffer containing the path
+ * @u_plen: Length of the buffer
+ * @node: TODO
+ *
+ * Return 0 if succeed otherwise -errno
+ */
+int dt_find_node_by_gpath(XEN_GUEST_HANDLE(char) u_path, uint32_t u_plen,
+  struct dt_device_node **node);
+
 /**
  * dt_get_parent - Get a node's parent if any
  * @node: Node to get parent
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]

2015-04-09 Thread Ian Jackson
Ian Jackson writes (Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more 
messages]):
 Prashant Sreedharan writes (Re: tg3 NIC driver bug in 3.14.x under Xen [and 
 3 more messages]):
  yes please do
 
 I will do so.

I did this test:
 - Linux 3.14.21
 - baremetal
 - `iommu=soft swiotlb=force' as suggested by Konrad
 - no bridge
 - manually added arp entries on both ends
   between target box and a server on same network

The results are:

On the test box, `ping 10.80.248.135' and `ping -s 500 10.80.248.135'
generate apparently-good ICMP echo requests which the server replies
to, but they don't seem to be received.

I ran
  tcpdump -pvvs500 -lnieth0 \! ether dst cc:cc:cc:cc:cc:cc and \! \
  ether dst 00:00:00:00:00:00 and \! ether dst 00:00:cc:cc:cc:cc and \
  \! ether dst 00:00:00:00:cc:cc and \! ether dst cc:cc:00:00:00:00
on the test box while pinging it from the server (-s500 and the
default).  No relevant packets matched the tcpdump filter.

However, as time goes by more and more packets with apparently random
data in their address fields start turning up so I have to keep adding
more mac addresses to be filtered out.

root@bedbug:~# ethtool -S eth0 | grep -v ': 0$'
NIC statistics:
 rx_octets: 8196868
 rx_ucast_packets: 633
 rx_mcast_packets: 1
 rx_bcast_packets: 123789
 tx_octets: 42854
 tx_ucast_packets: 9
 tx_mcast_packets: 8
 tx_bcast_packets: 603
root@bedbug:~# ifconfig eth0
eth0  Link encap:Ethernet  HWaddr 00:13:72:14:c0:51  
  inet addr:10.80.249.102  Bcast:10.80.251.255
  Mask:255.255.252.0
  inet6 addr: fe80::213:72ff:fe14:c051/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:124774 errors:0 dropped:88921 overruns:0 frame:0
  TX packets:620 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:8222158 (7.8 MiB)  TX bytes:42854 (41.8 KiB)
  Interrupt:17 

root@bedbug:~#

It appears therefore that packets are being corrupted on the receive
path, and the kernel then drops them (as misaddressed).

I also tried under Xen (rather than with baremetal and Konrad's
iommu/swiotlb kernel options), but that seems to be a less effective
repro.  Under Xen, without the bridge, I got ~6-8% packet loss,
compared to ~25-30% with the bridge.  I didn't investigate that
configuration in detail.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 8/8] Refactor package dependency checking and installation

2015-04-09 Thread George Dunlap
First, create a new global variable, PKGTYPE.  At the moment support deb and 
rpm.

Define _check-package-$PKGTYPE which returns true if the package is
installed, false otherwise, and _install-package-$PKGTYPE which will
install a list of packages.

Define check-package(), which will take a list of packages, and check
to see if they're installed.  Any missing packages will be added to an
array called missing.

Change _${COMPONENT}_install_dependencies to
${COMPONENT}_check_package.  Have these call check-package.

Don't call _${COMPONENT}_install_dependencies from ${COMPONENT}_build.

Define check-builddeps().  Define an empty missing array.  Call
check-package for raisin dependincies (like git and rpmbuild).  Then
call for_each_component check_package.

At this point we have an array with all missing packages.  If it's
empty, be happy.  If it's non-empty, and deps=true, try to install the
packages; otherwise print the missing packages and exit.

Add install-builddeps(), which is basically check-builddeps() with
deps=true by default.

Call check-builddeps from build() to close the loop.

Signed-off-by: George Dunlap george.dun...@eu.citrix.com
---
CC: Stefano Stabellini stefano.stabell...@citrix.com
---
 components/grub |  6 ++--
 components/libvirt  |  6 ++--
 components/xen  | 10 +++---
 lib/build.sh| 87 +++
 lib/common-functions.sh | 89 +++--
 5 files changed, 148 insertions(+), 50 deletions(-)

diff --git a/components/grub b/components/grub
index a5aa27d..839e001 100644
--- a/components/grub
+++ b/components/grub
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 
-function _grub_install_dependencies() {
+function grub_check_package() {
 local DEP_Debian_common=build-essential tar autoconf bison flex
 local DEP_Debian_x86_32=$DEP_Debian_common
 local DEP_Debian_x86_64=$DEP_Debian_common libc6-dev-i386
@@ -18,8 +18,8 @@ function _grub_install_dependencies() {
 echo grub is only supported on x86_32 and x86_64
 return
 fi
-echo installing Grub dependencies
-eval install_dependencies \$DEP_$DISTRO_$ARCH
+echo checking Grub dependencies
+eval check-package \$DEP_$DISTRO_$ARCH
 }
 
 
diff --git a/components/libvirt b/components/libvirt
index 6602dcf..b106970 100644
--- a/components/libvirt
+++ b/components/libvirt
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 
-function _libvirt_install_dependencies() {
+function libvirt_check_package() {
 local DEP_Debian_common=build-essential libtool autoconf autopoint \
  xsltproc libxml2-utils pkg-config python-dev   \
  libxml-xpath-perl libyajl-dev libxml2-dev  \
@@ -18,8 +18,8 @@ function _libvirt_install_dependencies() {
 local DEP_Fedora_x86_32=$DEP_Fedora_common
 local DEP_Fedora_x86_64=$DEP_Fedora_common
 
-echo installing Libvirt dependencies
-eval install_dependencies \$DEP_$DISTRO_$ARCH
+echo checking Libvirt dependencies
+eval check-package \$DEP_$DISTRO_$ARCH
 }
 
 function libvirt_build() {
diff --git a/components/xen b/components/xen
index 7a9f22d..ce46e3d 100644
--- a/components/xen
+++ b/components/xen
@@ -1,6 +1,8 @@
 #!/usr/bin/env bash
 
-function _xen_install_dependencies() {
+function xen_check_package() {
+$requireargs DISTRO ARCH
+
 local DEP_Debian_common=build-essential python-dev gettext uuid-dev   \
  libncurses5-dev libyajl-dev libaio-dev pkg-config libglib2.0-dev  
\
  libssl-dev libpixman-1-dev bridge-utils wget
@@ -15,13 +17,11 @@ function _xen_install_dependencies() {
 local DEP_Fedora_x86_32=$DEP_Fedora_common dev86 acpica-tools texinfo
 local DEP_Fedora_x86_64=$DEP_Fedora_x86_32 glibc-devel.i686
 
-echo installing Xen dependencies
-eval install_dependencies \$DEP_$DISTRO_$ARCH
+echo Checking Xen dependencies
+eval check-package \$DEP_$DISTRO_$ARCH
 }
 
 function xen_build() {
-_xen_install_dependencies
-
 cd $BASEDIR
 git-checkout $XEN_UPSTREAM_URL $XEN_UPSTREAM_REVISION xen-dir
 cd xen-dir
diff --git a/lib/build.sh b/lib/build.sh
index ab1e087..a453874 100755
--- a/lib/build.sh
+++ b/lib/build.sh
@@ -2,32 +2,72 @@
 
 set -e
 
-_help() {
-echo Usage: ./build.sh options command
+RAISIN_HELP+=(check-builddep   Check to make sure we have all dependencies 
installed)
+function check-builddep() {
+local -a missing
+
+$arg_parse
+
+default deps false ; $default_post
+
+$requireargs PKGTYPE DISTRO
+
+check-package git
+
+if [[ $DISTRO = Fedora ]] ; then
+check-package rpm-build
+fi
+
+for_each_component check_package
+
+if [[ -n ${missing[@]} ]] ; then
+   echo Missing packages: ${missing[@]}
+   if $deps ; then
+   echo Installing...
+   install-package ${missing[@]}
+   else
+   echo Please install, or run ./raise install-builddep
+   exit 1
+   

Re: [Xen-devel] [PATCH 5/6] x86/smp: Allocate pcpu stacks on their local numa node

2015-04-09 Thread Tim Deegan
At 18:26 +0100 on 07 Apr (1428431179), Andrew Cooper wrote:
 Previously, all pcpu stacks tended to be allocated on node 0.
 
 Signed-off-by: Andrew Cooper andrew.coop...@citrix.com

Reviewed-by: Tim Deegan t...@xen.org

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] domU jiffies not incrementing - timer issue? - Kernel 3.18.10 on Xen 4.5.0

2015-04-09 Thread Mark Chambers
On 31 March 2015 at 17:31, Mark Chambers m...@overnetdata.com wrote:


 On 31 March 2015 at 11:56, Mark Chambers m...@overnetdata.com wrote:


 It's nested under Hyper-V in the same manner as the problematic install.
 I was deliberately trying to replicate the issue, but the problem doesn't
 manifest.

 Mark



 Hi,

 I've got it booting.

 The machine without boot problems reports the use of emulated TSC:

 (XEN) TSC not marked as either constant or reliable, warp=575 (count=2)
 (XEN) dom109: mode=0,ofs=0x417376aa9c8c,khz=2633032,inc=1,vtsc count:
 3576850 kernel, 9534 user

 The machine with problems reports no domains having emulated TSC:

 (XEN) TSC has constant rate, deep Cstates possible, so not reliable,
 warp=0 (count=3)
 (XEN) dom23: mode=0,ofs=0x41dc316839ac,khz=2208968,inc=1
 (XEN) No domains have emulated TSC

 I have nothing specified in the xl config for tsc_mode. If I set
 tsc_mode='native'
 and restart the DomU it boots without any problems.

 If I explicitly specify any of the other tsc_mode it gets stuck with
 jiffies not
 incrementing as before.

 Mark




Hi all,

As Xen is reporting that TSC has constant rate, deep Cstates possible, so
not
reliable it would seem risky to use native mode on the domU and I would
prefer
to use emulated mode.

I added debug code to xen to help understand why jiffies increment
correctly in
a DomU on one system but not at all on another system with an identical
software
setup.

I don't understand the mechanism for updating jiffies in a Linux PV under
Xen
but I have been looking at the x86 time and trap code inside Xen and have
gained
a little insight.

System 1 and system 2 have identical software configurations. Both are
running
windows 2012 RC2 hyper-V, running a VM which contains Xen 4.5.0 running a
linux
3.18.10 dom0 running a PV domU. The biggest difference are their CPUs.

System 1 has a AMD Athlon(tm) 7750 Dual-Core Processor
System 2 has a Intel(R) Xeon(R) CPU E5520

From what I can deduce when using emulated TSC xen should receive lots of
RDTSC
traps (opcode 0x31). The system that isn't working doesn't receive any RDTSC
traps. I suspect this may be a bug in Xen or the PV code in the linux
kernel.

i.e:

System 1 tsc_mode='always_emulate'  - xen receives RDTSC traps
System 1 tsc_mode='native'  - xen doesn't receive RDTSC traps

System 2 tsc_mode='always_emulate' -  xen doesn't receive RDTSC traps and
  DomU's jiffies do not increment.
System 2 tsc_mode='native' -  works, xen doesn't receive RDTSC traps

I don't know if this is useful but the tsc lines from cpuid on system 1
report:

  RDTSCP= false
  TscInvariant   = false
  MSR based TSC rate control= false

while on system 2:

  IA32_TSC_ADJUST MSR supported= false
  RDTSCP = false
  TscInvariant   = false


I'm trying to understand how the jiffies are updated in a PV DomU when the
TSC
is emulated.

If anyone can point me in the right direction it'd be much appreciated.

Thanks for your time,

Mark
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] qemu-trad: xenstore: use relative path for device-model node

2015-04-09 Thread Ian Jackson
Wei Liu writes (Re: [PATCH] qemu-trad: xenstore: use relative path for 
device-model node):
 On Thu, Apr 09, 2015 at 06:46:31PM +0100, Ian Jackson wrote:
  Right.  So that means that this patch needs to go in at the same time
  as the corresponding libxl change.
 
 I don't follow go in at the same time. They are in two different
 trees, don't they?

The commit id of the qemu-trad tree is in Config.mk in xen.git.  So it
is possible to update them simultaneously.  (Of course not every way
of building and deploying Xen will honour this, but if you don't
honour it you deserve what you get.)

  And the answer is that unless both libxl and qemu change at the same
  time, it would be a regression in -unstable ?
 
 It would be a regression because stubdom in -unstable is working now
 with Paul's workaround. So yes, both changes need to go in at the same
 time -- though I don't know how you would do that.

Right.  That's what the Config.mk update is for.

So if the libxl patch is otherwise ready, we can commit both at once.
I will commit and push to qemu-trad, update the libxl patch to contain
the Config.mk update as well, and push the result to xen.git.

We normally explain the need to do this in the commit message for the
patches, and cross reference the two commits.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/6] x86/smp: Clean up use of memflags in cpu_smpboot_alloc()

2015-04-09 Thread Andrew Cooper
On 09/04/15 16:02, Tim Deegan wrote:
 At 18:26 +0100 on 07 Apr (1428431177), Andrew Cooper wrote:
 Hoist MEMF_node(cpu_to_node(cpu)) to the start of the function, and avoid
 passing (potentially bogus) memflags if node information is not available.

 Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
 As it happens, MEMF_node(NUMA_NO_NODE) is already == 0.

Only because of a masked overflow.  That is why (potentially) is in
brackets.

 I'm not sure if that's by design or not, but this looks robuster. :)

Indeed.


 Reviewed-by: Tim Deegan t...@xen.org

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V9 5/6] xen/xsm: Split vm_event_op into three separate labels

2015-04-09 Thread Tamas K Lengyel
The XSM label vm_event_op has been used to control the three memops
controlling mem_access, mem_paging and mem_sharing. While these systems
rely on vm_event, these are not vm_event operations themselves. Thus,
in this patch we introduce three separate labels for each of these memops.

Signed-off-by: Tamas K Lengyel tamas.leng...@zentific.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
Acked-by: Daniel De Graaf dgde...@tycho.nsa.gov
Acked-by: Tim Deegan t...@xen.org
---
 xen/arch/x86/mm/mem_paging.c|  2 +-
 xen/arch/x86/mm/mem_sharing.c   |  2 +-
 xen/common/mem_access.c |  2 +-
 xen/include/xsm/dummy.h | 20 +++-
 xen/include/xsm/xsm.h   | 33 ++---
 xen/xsm/dummy.c | 13 -
 xen/xsm/flask/hooks.c   | 33 ++---
 xen/xsm/flask/policy/access_vectors |  6 ++
 8 files changed, 100 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/mm/mem_paging.c b/xen/arch/x86/mm/mem_paging.c
index 17d2319..9ee3aba 100644
--- a/xen/arch/x86/mm/mem_paging.c
+++ b/xen/arch/x86/mm/mem_paging.c
@@ -39,7 +39,7 @@ int 
mem_paging_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_paging_op_t) arg)
 if ( rc )
 return rc;
 
-rc = xsm_vm_event_op(XSM_DM_PRIV, d, XENMEM_paging_op);
+rc = xsm_mem_paging(XSM_DM_PRIV, d);
 if ( rc )
 goto out;
 
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index ff01378..78fb013 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1311,7 +1311,7 @@ int 
mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 if ( rc )
 return rc;
 
-rc = xsm_vm_event_op(XSM_DM_PRIV, d, XENMEM_sharing_op);
+rc = xsm_mem_sharing(XSM_DM_PRIV, d);
 if ( rc )
 goto out;
 
diff --git a/xen/common/mem_access.c b/xen/common/mem_access.c
index 511c8c5..aa00513 100644
--- a/xen/common/mem_access.c
+++ b/xen/common/mem_access.c
@@ -48,7 +48,7 @@ int mem_access_memop(unsigned long cmd,
 if ( !p2m_mem_access_sanity_check(d) )
 goto out;
 
-rc = xsm_vm_event_op(XSM_DM_PRIV, d, XENMEM_access_op);
+rc = xsm_mem_access(XSM_DM_PRIV, d);
 if ( rc )
 goto out;
 
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 50ee929..16967ed 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -519,11 +519,29 @@ static XSM_INLINE int 
xsm_vm_event_control(XSM_DEFAULT_ARG struct domain *d, int
 return xsm_default_action(action, current-domain, d);
 }
 
-static XSM_INLINE int xsm_vm_event_op(XSM_DEFAULT_ARG struct domain *d, int op)
+#ifdef HAS_MEM_ACCESS
+static XSM_INLINE int xsm_mem_access(XSM_DEFAULT_ARG struct domain *d)
 {
 XSM_ASSERT_ACTION(XSM_DM_PRIV);
 return xsm_default_action(action, current-domain, d);
 }
+#endif
+
+#ifdef HAS_MEM_PAGING
+static XSM_INLINE int xsm_mem_paging(XSM_DEFAULT_ARG struct domain *d)
+{
+XSM_ASSERT_ACTION(XSM_DM_PRIV);
+return xsm_default_action(action, current-domain, d);
+}
+#endif
+
+#ifdef HAS_MEM_SHARING
+static XSM_INLINE int xsm_mem_sharing(XSM_DEFAULT_ARG struct domain *d)
+{
+XSM_ASSERT_ACTION(XSM_DM_PRIV);
+return xsm_default_action(action, current-domain, d);
+}
+#endif
 
 #ifdef CONFIG_X86
 static XSM_INLINE int xsm_do_mca(XSM_DEFAULT_VOID)
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index ca8371c..49f06c9 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -142,7 +142,18 @@ struct xsm_operations {
 int (*get_vnumainfo) (struct domain *d);
 
 int (*vm_event_control) (struct domain *d, int mode, int op);
-int (*vm_event_op) (struct domain *d, int op);
+
+#ifdef HAS_MEM_ACCESS
+int (*mem_access) (struct domain *d);
+#endif
+
+#ifdef HAS_MEM_PAGING
+int (*mem_paging) (struct domain *d);
+#endif
+
+#ifdef HAS_MEM_SHARING
+int (*mem_sharing) (struct domain *d);
+#endif
 
 #ifdef CONFIG_X86
 int (*do_mca) (void);
@@ -546,10 +557,26 @@ static inline int xsm_vm_event_control (xsm_default_t 
def, struct domain *d, int
 return xsm_ops-vm_event_control(d, mode, op);
 }
 
-static inline int xsm_vm_event_op (xsm_default_t def, struct domain *d, int op)
+#ifdef HAS_MEM_ACCESS
+static inline int xsm_mem_access (xsm_default_t def, struct domain *d)
 {
-return xsm_ops-vm_event_op(d, op);
+return xsm_ops-mem_access(d);
 }
+#endif
+
+#ifdef HAS_MEM_PAGING
+static inline int xsm_mem_paging (xsm_default_t def, struct domain *d)
+{
+return xsm_ops-mem_paging(d);
+}
+#endif
+
+#ifdef HAS_MEM_SHARING
+static inline int xsm_mem_sharing (xsm_default_t def, struct domain *d)
+{
+return xsm_ops-mem_sharing(d);
+}
+#endif
 
 #ifdef CONFIG_X86
 static inline int xsm_do_mca(xsm_default_t def)
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 6d12d32..3ddb4f6 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -119,7 +119,18 @@ void xsm_fixup_ops (struct 

Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations

2015-04-09 Thread Ian Jackson
Euan Harris writes (Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous 
operations):
 Yes, that would work, but an open loop approach like that can lead to
 frustratingly unreliable tests.   I think it would be best to make
 the test aware of the state of the helper - or even in control of it.
 That would allow us to wait for the helper to reach a particular state
 before killing it.

This is less bad than you might think because the helper's progress
messages to libxl are at fairly predictable progress points.

In any case, the helper (in general) runs concurrently with libxl, so
when libxl decides to stop the progress there will often be a race.
(Sometimes the helper has to stop and wait for libxl to confirm.)

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 6/6] x86/boot: Ensure the BSS is aligned on an 8 byte boundary

2015-04-09 Thread Tim Deegan
At 16:34 +0100 on 09 Apr (1428597298), Andrew Cooper wrote:
 On 09/04/15 16:15, Tim Deegan wrote:
  At 18:26 +0100 on 07 Apr (1428431180), Andrew Cooper wrote:
  --- a/xen/arch/x86/boot/head.S
  +++ b/xen/arch/x86/boot/head.S
  @@ -127,7 +127,8 @@ __start:
   mov $sym_phys(__bss_end),%ecx
   sub %edi,%ecx
   xor %eax,%eax
  -rep stosb
  +shr $2,%ecx
  +rep stosl
  Should this be shr $3 and stosq?  You are aligning to 8 bytes in the
  linker runes.
 
 It is still 32bit code here, so no stosq available.

Fair enough. :)

 I do however happen to know that the impending multiboot2 entry point is
 64bit and is able to clear the BSS with stosq.

OK.

   /* Interrogate CPU extended features via CPUID. */
   mov $0x8000,%eax
  diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
  index 4699a04..b1926e3 100644
  --- a/xen/arch/x86/xen.lds.S
  +++ b/xen/arch/x86/xen.lds.S
  @@ -163,6 +163,7 @@ SECTIONS
 __init_end = .;
   
 .bss : { /* BSS */
  +   . = ALIGN(8);
  Here, we're already aligned to STACK_SIZE
 
 So we are - that should be fixed up.
 
 That alignment is not relevant to .init, but is relevant to .bss

Yeah, I'm not sure whether it's a problem if __init_end != .bss; if
not the alignment could just be moved down a bit.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 6/6] x86/boot: Ensure the BSS is aligned on an 8 byte boundary

2015-04-09 Thread Andrew Cooper
On 09/04/15 16:15, Tim Deegan wrote:
 At 18:26 +0100 on 07 Apr (1428431180), Andrew Cooper wrote:
 --- a/xen/arch/x86/boot/head.S
 +++ b/xen/arch/x86/boot/head.S
 @@ -127,7 +127,8 @@ __start:
  mov $sym_phys(__bss_end),%ecx
  sub %edi,%ecx
  xor %eax,%eax
 -rep stosb
 +shr $2,%ecx
 +rep stosl
 Should this be shr $3 and stosq?  You are aligning to 8 bytes in the
 linker runes.

It is still 32bit code here, so no stosq available.

I do however happen to know that the impending multiboot2 entry point is
64bit and is able to clear the BSS with stosq.


  
  /* Interrogate CPU extended features via CPUID. */
  mov $0x8000,%eax
 diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
 index 4699a04..b1926e3 100644
 --- a/xen/arch/x86/xen.lds.S
 +++ b/xen/arch/x86/xen.lds.S
 @@ -163,6 +163,7 @@ SECTIONS
__init_end = .;
  
.bss : { /* BSS */
 +   . = ALIGN(8);
 Here, we're already aligned to STACK_SIZE

So we are - that should be fixed up.

That alignment is not relevant to .init, but is relevant to .bss

 , which the
 .bss.stack_aligned just below is relying on.  So on the one hand this
 new alignment comment is sort-of-harmless, but on the other hand it
 distracts from the larger and more important alignment.

I will see about fixing this up differently, but with the same overall
effect that stosl/stosq can be used.

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch V2 02/15] xen: save linear p2m list address in shared info structure

2015-04-09 Thread David Vrabel
On 09/04/15 07:55, Juergen Gross wrote:
 The virtual address of the linear p2m list should be stored in the
 shared info structure read by the Xen tools to be able to support
 64 bit pv-domains larger than 512 GB. Additionally the linear p2m
 list interface includes a generation count which is changed prior
 to and after each mapping change of the p2m list. Reading the
 generation count the Xen tools can detect changes of the mappings
 and re-read the p2m list eventually.

Reviewed-by: David Vrabel david.vra...@citrix.com

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 p2 16/19] tools/libxl: arm: Use an higher value for the GIC phandle

2015-04-09 Thread Ian Jackson
Julien Grall writes ([PATCH v5 p2 16/19] tools/libxl: arm: Use an higher value 
for the GIC phandle):
 The partial device tree may contains phandle. The Device Tree Compiler
 tends to allocate the phandle from 1.

I have to say I have no idea what a phandle is...

 Reserve the ID 65000 for the GIC phandle. I think we can safely assume
 that the partial device tree will never contain a such ID.

Do we control the DT compiler ?  What if it should change its
phandle allocation algorithm ?

 +/*
 + * The device tree compiler (DTC) is allocating the phandle from 1 to
 + * onwards. Reserve a high value for the GIC phandle.
 + */

FYI this should read The device tree compiler (DTC) allocates phandle
values frrom 1 onwards.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v20 08/13] x86/VPMU: When handling MSR accesses, leave fault injection to callers

2015-04-09 Thread Boris Ostrovsky
With this patch return value of 1 of vpmu_do_msr() will now indicate whether an
error was encountered during MSR processing (instead of stating that the access
was to a VPMU register).

As part of this patch we also check for validity of certain MSR accesses right
when we determine which register is being written, as opposed to postponing this
until later.

Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
Acked-by: Kevin Tian kevin.t...@intel.com
Reviewed-by: Dietmar Hahn dietmar.h...@ts.fujitsu.com
Tested-by: Dietmar Hahn dietmar.h...@ts.fujitsu.com
---
 xen/arch/x86/hvm/svm/svm.c|  6 ++-
 xen/arch/x86/hvm/svm/vpmu.c   |  6 +--
 xen/arch/x86/hvm/vmx/vmx.c| 24 +---
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 82 ++-
 4 files changed, 55 insertions(+), 63 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index e523d12..4fe36e9 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1709,7 +1709,8 @@ static int svm_msr_read_intercept(unsigned int msr, 
uint64_t *msr_content)
 case MSR_AMD_FAM15H_EVNTSEL3:
 case MSR_AMD_FAM15H_EVNTSEL4:
 case MSR_AMD_FAM15H_EVNTSEL5:
-vpmu_do_rdmsr(msr, msr_content);
+if ( vpmu_do_rdmsr(msr, msr_content) )
+goto gpf;
 break;
 
 case MSR_AMD64_DR0_ADDRESS_MASK:
@@ -1860,7 +1861,8 @@ static int svm_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 case MSR_AMD_FAM15H_EVNTSEL3:
 case MSR_AMD_FAM15H_EVNTSEL4:
 case MSR_AMD_FAM15H_EVNTSEL5:
-vpmu_do_wrmsr(msr, msr_content, 0);
+if ( vpmu_do_wrmsr(msr, msr_content, 0) )
+goto gpf;
 break;
 
 case MSR_IA32_MCx_MISC(4): /* Threshold register */
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 58a0dc4..474d0db 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -305,7 +305,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t 
msr_content,
 is_pmu_enabled(msr_content)  !vpmu_is_set(vpmu, VPMU_RUNNING) )
 {
 if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
-return 1;
+return 0;
 vpmu_set(vpmu, VPMU_RUNNING);
 
 if ( has_hvm_container_vcpu(v)  is_msr_bitmap_on(vpmu) )
@@ -335,7 +335,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t 
msr_content,
 
 /* Write to hw counters */
 wrmsrl(msr, msr_content);
-return 1;
+return 0;
 }
 
 static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
@@ -353,7 +353,7 @@ static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t 
*msr_content)
 
 rdmsrl(msr, *msr_content);
 
-return 1;
+return 0;
 }
 
 static void amd_vpmu_destroy(struct vcpu *v)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index d71aa07..e31c38d 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2133,12 +2133,17 @@ static int vmx_msr_read_intercept(unsigned int msr, 
uint64_t *msr_content)
 *msr_content |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL |
MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL;
 /* Perhaps vpmu will change some bits. */
+/* FALLTHROUGH */
+case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
+case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
+case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+case MSR_IA32_PEBS_ENABLE:
+case MSR_IA32_DS_AREA:
 if ( vpmu_do_rdmsr(msr, msr_content) )
-goto done;
+goto gp_fault;
 break;
 default:
-if ( vpmu_do_rdmsr(msr, msr_content) )
-break;
 if ( passive_domain_do_rdmsr(msr, msr_content) )
 goto done;
 switch ( long_mode_do_msr_read(msr, msr_content) )
@@ -2314,7 +2319,7 @@ static int vmx_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 if ( msr_content  ~supported )
 {
 /* Perhaps some other bits are supported in vpmu. */
-if ( !vpmu_do_wrmsr(msr, msr_content, supported) )
+if ( vpmu_do_wrmsr(msr, msr_content, supported) )
 break;
 }
 if ( msr_content  IA32_DEBUGCTLMSR_LBR )
@@ -2342,9 +2347,16 @@ static int vmx_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 if ( !nvmx_msr_write_intercept(msr, msr_content) )
 goto gp_fault;
 break;
+case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
+case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(7):
+case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+case MSR_IA32_PEBS_ENABLE:
+case MSR_IA32_DS_AREA:
+ if ( vpmu_do_wrmsr(msr, msr_content, 0) )
+goto gp_fault;
+break;
 default:
-if ( vpmu_do_wrmsr(msr, msr_content, 0) )
-return 

[Xen-devel] [PATCH V9 2/6] xen/vm_event: Deprecate VM_EVENT_FLAG_DUMMY flag

2015-04-09 Thread Tamas K Lengyel
There are no use-cases for this flag.

Signed-off-by: Tamas K Lengyel tamas.leng...@zentific.com
Acked-by: Tim Deegan t...@xen.org
---
 xen/arch/x86/mm/mem_sharing.c | 3 ---
 xen/arch/x86/mm/p2m.c | 3 ---
 xen/common/mem_access.c   | 3 ---
 xen/include/public/vm_event.h | 1 -
 4 files changed, 10 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 4e5477a..e6572af 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -606,9 +606,6 @@ int mem_sharing_sharing_resume(struct domain *d)
 continue;
 }
 
-if ( rsp.flags  VM_EVENT_FLAG_DUMMY )
-continue;
-
 /* Validate the vcpu_id in the response. */
 if ( (rsp.vcpu_id = d-max_vcpus) || !d-vcpu[rsp.vcpu_id] )
 continue;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 1d3356a..4032c62 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1312,9 +1312,6 @@ void p2m_mem_paging_resume(struct domain *d)
 continue;
 }
 
-if ( rsp.flags  VM_EVENT_FLAG_DUMMY )
-continue;
-
 /* Validate the vcpu_id in the response. */
 if ( (rsp.vcpu_id = d-max_vcpus) || !d-vcpu[rsp.vcpu_id] )
 continue;
diff --git a/xen/common/mem_access.c b/xen/common/mem_access.c
index f925ac7..7ed8a4e 100644
--- a/xen/common/mem_access.c
+++ b/xen/common/mem_access.c
@@ -44,9 +44,6 @@ void mem_access_resume(struct domain *d)
 continue;
 }
 
-if ( rsp.flags  VM_EVENT_FLAG_DUMMY )
-continue;
-
 /* Validate the vcpu_id in the response. */
 if ( (rsp.vcpu_id = d-max_vcpus) || !d-vcpu[rsp.vcpu_id] )
 continue;
diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h
index ed9105b..c7426de 100644
--- a/xen/include/public/vm_event.h
+++ b/xen/include/public/vm_event.h
@@ -47,7 +47,6 @@
 #define VM_EVENT_FLAG_VCPU_PAUSED (1  0)
 /* Flags to aid debugging mem_event */
 #define VM_EVENT_FLAG_FOREIGN (1  1)
-#define VM_EVENT_FLAG_DUMMY   (1  2)
 
 /*
  * Reasons for the vm event request
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 p2 13/19] tools/libxl: Create a per-arch function to map IRQ to a domain

2015-04-09 Thread Julien Grall
From: Julien Grall julien.gr...@linaro.org

ARM and x86 use a different hypercall to map an IRQ to a domain.

The hypercall to give IRQ permission to the domain has also been moved
to be an x86 specific function as ARM guest won't be able to manage the IRQ.
We may want to support it later.

Signed-off-by: Julien Grall julien.gr...@linaro.org
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Wei Liu wei.l...@citrix.com

---
Changes in v5:
- Use the new function xc_domain_bind_pt_spi_irq
- Fix typoes

Changes in v4:
- Patch added
---
 tools/libxl/libxl_arch.h   |  4 
 tools/libxl/libxl_arm.c|  5 +
 tools/libxl/libxl_create.c |  6 ++
 tools/libxl/libxl_x86.c| 13 +
 4 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
index cae64c0..77b1f2a 100644
--- a/tools/libxl/libxl_arch.h
+++ b/tools/libxl/libxl_arch.h
@@ -39,4 +39,8 @@ int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
   uint32_t domid,
   libxl_domain_build_info *b_info,
   libxl__domain_build_state *state);
+
+/* arch specific irq map function */
+int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq);
+
 #endif
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 5a5cb3f..aa302fd 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -742,6 +742,11 @@ int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
 return libxl__vnuma_build_vmemrange_pv_generic(gc, domid, info, state);
 }
 
+int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq)
+{
+return xc_domain_bind_pt_spi_irq(CTX-xch, domid, irq);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index e5a343f..15b464e 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1205,11 +1205,9 @@ static void domcreate_launch_dm(libxl__egc *egc, 
libxl__multidev *multidev,
 
 LOG(DEBUG, dom%d irq %d, domid, irq);
 
-ret = irq = 0 ? xc_physdev_map_pirq(CTX-xch, domid, irq, irq)
+ret = irq = 0 ? libxl__arch_domain_map_irq(gc, domid, irq)
: -EOVERFLOW;
-if (!ret)
-ret = xc_domain_irq_permission(CTX-xch, domid, irq, 1);
-if (ret  0) {
+if (ret) {
 LOGE(ERROR, failed give dom%d access to irq %d, domid, irq);
 ret = ERROR_FAIL;
 goto error_out;
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 5e9a8d2..ed2bd38 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -424,6 +424,19 @@ out:
 return rc;
 }
 
+int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq)
+{
+int ret;
+
+ret = xc_physdev_map_pirq(CTX-xch, domid, irq, irq);
+if (ret)
+return ret;
+
+ret = xc_domain_irq_permission(CTX-xch, domid, irq, 1);
+
+return ret;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 p2 16/19] tools/libxl: arm: Use an higher value for the GIC phandle

2015-04-09 Thread Julien Grall
From: Julien Grall julien.gr...@linaro.org

The partial device tree may contains phandle. The Device Tree Compiler
tends to allocate the phandle from 1.

Reserve the ID 65000 for the GIC phandle. I think we can safely assume
that the partial device tree will never contain a such ID.

Signed-off-by: Julien Grall julien.gr...@linaro.org
Acked-by: Ian Campbell ian.campb...@citrix.com
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Wei Liu wei.l...@citrix.com

---
To allocate dynamically the phandle, we would need to fill in
post-hoc (like we do with e.g the initramfs location) the
#interrupt-parent in /. That would also require some refactoring
in the code to pass the phandle every time.

Defer this solution to a follow-up in order as having 65000 would be
very unlikely.

Changes in v5:
- Add Ian's Ack.

Changes in v3:
- Patch added
---
 tools/libxl/libxl_arm.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 2ce7e23..cf1379d 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -80,10 +80,11 @@ static struct arch_info {
 {xen-3.0-aarch64, arm,armv8-timer, arm,armv8 },
 };
 
-enum {
-PHANDLE_NONE = 0,
-PHANDLE_GIC,
-};
+/*
+ * The device tree compiler (DTC) is allocating the phandle from 1 to
+ * onwards. Reserve a high value for the GIC phandle.
+ */
+#define PHANDLE_GIC (65000)
 
 typedef uint32_t be32;
 typedef be32 gic_interrupt[3];
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 00/12] enable Cache Allocation Technology (CAT) for VMs

2015-04-09 Thread Chao Peng
Changes in v4:
* Address comments from Andrew and Ian(Detail in patch).
* Split COS/CBM management patch into 4 small patches.
* Add documentation xl-psr.markdown.
Changes in v3:
* Address comments from Jan and Ian(Detail in patch).
* Add xl sample output in cover letter.
Changes in v2:
* Address comments from Konrad and Jan(Detail in patch):
* Make all cat unrelated changes into the preparation patches. 

This patch serial enable the new Cache Allocation Technology (CAT) feature
found in Intel Broadwell and later server platform. In Xen's implementation,
CAT is used to control cache allocation on VM basis.

Detail hardware spec can be found in section 17.15 of the Intel SDM [1].
The design for XEN can be found at [2].

patch1-2:  preparation.
patch3-11: real work for CAT.
patch12:   xl document for CMT/MBM/CAT.

[1] Intel SDM 
(http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf)
[2] CAT design for XEN( 
http://lists.xen.org/archives/html/xen-devel/2014-12/msg01382.html)

Chao Peng (12):
  x86: clean up psr boot parameter parsing
  x86: improve psr scheduling code
  x86: detect and initialize Intel CAT feature
  x86: maintain COS to CBM mapping for each socket
  x86: maintain socket CPU mask for CAT
  x86: add COS information for each domain
  x86: expose CBM length and COS number information
  x86: dynamically get/set CBM for a domain
  x86: add scheduling support for Intel CAT
  xsm: add CAT related xsm policies
  tools: add tools support for Intel CAT
  docs: add xl-psr.markdown

 docs/man/xl.pod.1|  38 +++
 docs/misc/xen-command-line.markdown  |  13 +-
 docs/misc/xl-psr.markdown| 111 +++
 tools/flask/policy/policy/modules/xen/xen.if |   2 +-
 tools/flask/policy/policy/modules/xen/xen.te |   4 +-
 tools/libxc/include/xenctrl.h|  15 +
 tools/libxc/xc_psr.c |  76 +
 tools/libxl/libxl.h  |  26 ++
 tools/libxl/libxl_psr.c  | 168 +-
 tools/libxl/libxl_types.idl  |  10 +
 tools/libxl/xl.h |   4 +
 tools/libxl/xl_cmdimpl.c | 140 +
 tools/libxl/xl_cmdtable.c|  12 +
 xen/arch/x86/domain.c|  13 +-
 xen/arch/x86/domctl.c|  18 ++
 xen/arch/x86/psr.c   | 446 ---
 xen/arch/x86/sysctl.c|  18 ++
 xen/include/asm-x86/cpufeature.h |   1 +
 xen/include/asm-x86/domain.h |   5 +-
 xen/include/asm-x86/msr-index.h  |   1 +
 xen/include/asm-x86/psr.h|  14 +-
 xen/include/public/domctl.h  |  12 +
 xen/include/public/sysctl.h  |  16 +
 xen/xsm/flask/hooks.c|   6 +
 xen/xsm/flask/policy/access_vectors  |   6 +
 25 files changed, 1120 insertions(+), 55 deletions(-)
 create mode 100644 docs/misc/xl-psr.markdown

-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 05/12] x86: maintain socket CPU mask for CAT

2015-04-09 Thread Chao Peng
Some CAT resource/registers exist in socket level and they must be
accessed from the CPU of the corresponding socket. It's common to pick
an arbitrary CPU from the socket. To make the picking easy, it's useful
to maintain a reference to the cpu_core_mask which contains all the
siblings of a CPU in the same socket. The reference needs to be
synchronized with the CPU up/down.

Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
---
 xen/arch/x86/psr.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 4aff5f6..7de2504 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -32,6 +32,7 @@ struct psr_cat_socket_info {
 unsigned int cbm_len;
 unsigned int cos_max;
 struct psr_cat_cbm *cos_cbm_map;
+cpumask_t *socket_cpu_mask;
 };
 
 struct psr_assoc {
@@ -234,6 +235,8 @@ static void cat_cpu_init(unsigned int cpu)
 ASSERT(socket  nr_sockets);
 
 info = cat_socket_info + socket;
+if ( info-socket_cpu_mask == NULL )
+info-socket_cpu_mask = per_cpu(cpu_core_mask, cpu);
 
 /* Avoid initializing more than one times for the same socket. */
 if ( test_and_set_bool(info-initialized) )
@@ -274,6 +277,24 @@ static void psr_cpu_init(unsigned int cpu)
 psr_assoc_init(cpu);
 }
 
+static void psr_cpu_fini(unsigned int cpu)
+{
+unsigned int socket, next;
+cpumask_t *cpu_mask;
+
+if ( cat_socket_info )
+{
+socket = cpu_to_socket(cpu);
+cpu_mask = cat_socket_info[socket].socket_cpu_mask;
+
+if ( (next = cpumask_cycle(cpu, cpu_mask)) == cpu )
+cat_socket_info[socket].socket_cpu_mask = NULL;
+else
+cat_socket_info[socket].socket_cpu_mask =
+per_cpu(cpu_core_mask, next);
+}
+}
+
 static int cpu_callback(
 struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
@@ -284,6 +305,9 @@ static int cpu_callback(
 case CPU_STARTING:
 psr_cpu_init(cpu);
 break;
+case CPU_DYING:
+psr_cpu_fini(cpu);
+break;
 }
 
 return NOTIFY_DONE;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 08/12] x86: dynamically get/set CBM for a domain

2015-04-09 Thread Chao Peng
For CAT, COS is maintained in hypervisor only while CBM is exposed to
user space directly to allow getting/setting domain's cache capacity.
For each specified CBM, hypervisor will either use a existed COS which
has the same CBM or allocate a new one if the same CBM is not found. If
the allocation fails because of no enough COS available then error is
returned. The getting/setting are always operated on a specified socket.
For multiple sockets system, the interface may be called several times.

Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
---
 xen/arch/x86/domctl.c   |  18 ++
 xen/arch/x86/psr.c  | 126 
 xen/include/asm-x86/msr-index.h |   1 +
 xen/include/asm-x86/psr.h   |   2 +
 xen/include/public/domctl.h |  12 
 5 files changed, 159 insertions(+)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index d4f6ccf..89a6b33 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1326,6 +1326,24 @@ long arch_do_domctl(
 }
 break;
 
+case XEN_DOMCTL_psr_cat_op:
+switch ( domctl-u.psr_cat_op.cmd )
+{
+case XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM:
+ret = psr_set_l3_cbm(d, domctl-u.psr_cat_op.target,
+ domctl-u.psr_cat_op.data);
+break;
+case XEN_DOMCTL_PSR_CAT_OP_GET_L3_CBM:
+ret = psr_get_l3_cbm(d, domctl-u.psr_cat_op.target,
+ domctl-u.psr_cat_op.data);
+copyback = 1;
+break;
+default:
+ret = -EOPNOTSUPP;
+break;
+}
+break;
+
 default:
 ret = iommu_do_domctl(domctl, d, u_domctl);
 break;
diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index e390fd9..5247bcd 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -56,6 +56,17 @@ static unsigned int get_socket_count(void)
 return DIV_ROUND_UP(nr_cpu_ids, cpus_per_socket);
 }
 
+static unsigned int get_socket_cpu(unsigned int socket)
+{
+if ( socket  nr_sockets )
+{
+cpumask_t *cpu_mask = cat_socket_info[socket].socket_cpu_mask;
+ASSERT(cpu_mask != NULL);
+return cpumask_any(cpu_mask);
+}
+return nr_cpu_ids;
+}
+
 static void __init parse_psr_bool(char *s, char *value, char *feature,
   unsigned int mask)
 {
@@ -252,6 +263,121 @@ int psr_get_cat_l3_info(unsigned int socket, uint32_t 
*cbm_len,
 return 0;
 }
 
+int psr_get_l3_cbm(struct domain *d, unsigned int socket, uint64_t *cbm)
+{
+unsigned int cos;
+struct psr_cat_socket_info *info;
+int ret = get_cat_socket_info(socket, info);
+
+if ( ret )
+return ret;
+
+cos = d-arch.psr_cos_ids[socket];
+*cbm = info-cos_cbm_map[cos].cbm;
+return 0;
+}
+
+static bool_t psr_check_cbm(unsigned int cbm_len, uint64_t cbm)
+{
+unsigned int first_bit, zero_bit;
+
+/* Set bits should only in the range of [0, cbm_len). */
+if ( cbm  (~0ull  cbm_len) )
+return 0;
+
+first_bit = find_first_bit(cbm, cbm_len);
+zero_bit = find_next_zero_bit(cbm, cbm_len, first_bit);
+
+/* Set bits should be contiguous. */
+if ( zero_bit  cbm_len 
+ find_next_bit(cbm, cbm_len, zero_bit)  cbm_len )
+return 0;
+
+return 1;
+}
+
+struct cos_cbm_info
+{
+unsigned int cos;
+uint64_t cbm;
+};
+
+static void do_write_l3_cbm(void *data)
+{
+struct cos_cbm_info *info = data;
+wrmsrl(MSR_IA32_PSR_L3_MASK(info-cos), info-cbm);
+}
+
+static int write_l3_cbm(unsigned int socket, unsigned int cos, uint64_t cbm)
+{
+struct cos_cbm_info info = { .cos = cos, .cbm = cbm };
+
+if ( socket == cpu_to_socket(smp_processor_id()) )
+do_write_l3_cbm(info);
+else
+{
+unsigned int cpu = get_socket_cpu(socket);
+
+if ( cpu = nr_cpu_ids )
+return -EBADSLT;
+on_selected_cpus(cpumask_of(cpu), do_write_l3_cbm, info, 1);
+}
+
+return 0;
+}
+
+int psr_set_l3_cbm(struct domain *d, unsigned int socket, uint64_t cbm)
+{
+unsigned int old_cos, cos;
+struct psr_cat_cbm *map, *find;
+struct psr_cat_socket_info *info;
+int ret = get_cat_socket_info(socket, info);
+
+if ( ret )
+return ret;
+
+if ( !psr_check_cbm(info-cbm_len, cbm) )
+return -EINVAL;
+
+old_cos = d-arch.psr_cos_ids[socket];
+map = info-cos_cbm_map;
+find = NULL;
+
+for ( cos = 0; cos = info-cos_max; cos++ )
+{
+/* If still not found, then keep unused one. */
+if ( !find  cos != 0  map[cos].ref == 0 )
+find = map + cos;
+else if ( map[cos].cbm == cbm )
+{
+if ( unlikely(cos == old_cos) )
+return -EEXIST;
+find = map + cos;
+break;
+}
+}
+
+/* If old cos is referred only by the domain, then use it. */
+if ( !find  map[old_cos].ref == 1 )
+find = map + 

[Xen-devel] [PATCH v4 03/12] x86: detect and initialize Intel CAT feature

2015-04-09 Thread Chao Peng
Detect Intel Cache Allocation Technology(CAT) feature and store the
cpuid information for later use. Currently only L3 cache allocation is
supported. The L3 CAT features may vary among sockets so per-socket
feature information is stored. The initialization can happen either at
boot time or when CPU(s) is hot plugged after booting.

Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
---
Changes in v4:
* check X86_FEATURE_CAT available before doing initialization.
Changes in v3:
* Remove num_sockets boot option instead calculate it at boot time.
* Name hardcoded CAT cpuid leaf as PSR_CPUID_LEVEL_CAT.
Changes in v2:
* socket_num = num_sockets and fix several documentaion issues.
* refactor boot line parameters parsing into standlone patch.
* set opt_num_sockets = NR_CPUS when opt_num_sockets  NR_CPUS.
* replace CPU_ONLINE with CPU_STARTING and integrate that into scheduling
  improvement patch.
* reimplement get_max_socket() with cpu_to_socket();
* cbm is still uint64 as there is a path forward for supporting long masks.
---
 docs/misc/xen-command-line.markdown | 13 +--
 xen/arch/x86/psr.c  | 68 +++--
 xen/include/asm-x86/cpufeature.h|  1 +
 xen/include/asm-x86/psr.h   |  3 ++
 4 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index 1dda1f0..9ad8801 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1122,9 +1122,9 @@ This option can be specified more than once (up to 8 
times at present).
  `= integer`
 
 ### psr (Intel)
- `= List of ( cmt:boolean | rmid_max:integer )`
+ `= List of ( cmt:boolean | rmid_max:integer | cat:boolean )`
 
- Default: `psr=cmt:0,rmid_max:255`
+ Default: `psr=cmt:0,rmid_max:255,cat:0`
 
 Platform Shared Resource(PSR) Services.  Intel Haswell and later server
 platforms offer information about the sharing of resources.
@@ -1134,6 +1134,11 @@ Monitoring ID(RMID) is used to bind the domain to 
corresponding shared
 resource.  RMID is a hardware-provided layer of abstraction between software
 and logical processors.
 
+To use the PSR cache allocation service for a certain domain, a capacity
+bitmasks(CBM) is used to bind the domain to corresponding shared resource.
+CBM represents cache capacity and indicates the degree of overlap and isolation
+between domains.
+
 The following resources are available:
 
 * Cache Monitoring Technology (Haswell and later).  Information regarding the
@@ -1144,6 +1149,10 @@ The following resources are available:
   total/local memory bandwidth. Follow the same options with Cache Monitoring
   Technology.
 
+* Cache Alllocation Technology (Broadwell and later).  Information regarding
+  the cache allocation.
+  * `cat` instructs Xen to enable/disable Cache Allocation Technology.
+
 ### reboot
  `= t[riple] | k[bd] | a[cpi] | p[ci] | P[ower] | e[fi] | n[o] [, [w]arm | 
  [c]old]`
 
diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 6119c6e..16c37dd 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -19,17 +19,36 @@
 #include asm/psr.h
 
 #define PSR_CMT(10)
+#define PSR_CAT(11)
+
+struct psr_cat_socket_info {
+bool_t initialized;
+bool_t enabled;
+unsigned int cbm_len;
+unsigned int cos_max;
+};
 
 struct psr_assoc {
 uint64_t val;
 };
 
 struct psr_cmt *__read_mostly psr_cmt;
+static struct psr_cat_socket_info *__read_mostly cat_socket_info;
+
 static unsigned int __initdata opt_psr;
 static unsigned int __initdata opt_rmid_max = 255;
+static unsigned int __read_mostly nr_sockets;
 static uint64_t rmid_mask;
 static DEFINE_PER_CPU(struct psr_assoc, psr_assoc);
 
+static unsigned int get_socket_count(void)
+{
+unsigned int cpus_per_socket = boot_cpu_data.x86_max_cores *
+   boot_cpu_data.x86_num_siblings;
+
+return DIV_ROUND_UP(nr_cpu_ids, cpus_per_socket);
+}
+
 static void __init parse_psr_bool(char *s, char *value, char *feature,
   unsigned int mask)
 {
@@ -63,6 +82,7 @@ static void __init parse_psr_param(char *s)
 *val_str++ = '\0';
 
 parse_psr_bool(s, val_str, cmt, PSR_CMT);
+parse_psr_bool(s, val_str, cat, PSR_CAT);
 
 if ( val_str  !strcmp(s, rmid_max) )
 opt_rmid_max = simple_strtoul(val_str, NULL, 0);
@@ -194,8 +214,49 @@ void psr_ctxt_switch_to(struct domain *d)
 }
 }
 
+static void cat_cpu_init(unsigned int cpu)
+{
+unsigned int eax, ebx, ecx, edx;
+struct psr_cat_socket_info *info;
+unsigned int socket;
+const struct cpuinfo_x86 *c = cpu_data + cpu;
+
+if ( !cpu_has(c, X86_FEATURE_CAT) )
+return;
+
+socket = cpu_to_socket(cpu);
+ASSERT(socket  nr_sockets);
+
+info = cat_socket_info + socket;
+
+/* Avoid initializing more than one times for the same socket. */
+if ( test_and_set_bool(info-initialized) )
+return;
+
+

[Xen-devel] [PATCH v4 04/12] x86: maintain COS to CBM mapping for each socket

2015-04-09 Thread Chao Peng
For each socket, a COS to CBM mapping structure is maintained for each
COS. The mapping is indexed by COS and the value is the corresponding
CBM. Different VMs may use the same CBM, a reference count is used to
indicate if the CBM is available.

Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
---
 xen/arch/x86/psr.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 16c37dd..4aff5f6 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -21,11 +21,17 @@
 #define PSR_CMT(10)
 #define PSR_CAT(11)
 
+struct psr_cat_cbm {
+unsigned int ref;
+uint64_t cbm;
+};
+
 struct psr_cat_socket_info {
 bool_t initialized;
 bool_t enabled;
 unsigned int cbm_len;
 unsigned int cos_max;
+struct psr_cat_cbm *cos_cbm_map;
 };
 
 struct psr_assoc {
@@ -240,6 +246,14 @@ static void cat_cpu_init(unsigned int cpu)
 info-cbm_len = (eax  0x1f) + 1;
 info-cos_max = (edx  0x);
 
+info-cos_cbm_map = xzalloc_array(struct psr_cat_cbm,
+  info-cos_max + 1UL);
+if ( !info-cos_cbm_map )
+return;
+
+/* cos=0 is reserved as default cbm(all ones). */
+info-cos_cbm_map[0].cbm = (1ull  info-cbm_len) - 1;
+
 info-enabled = 1;
 printk(XENLOG_INFO CAT: enabled on socket %u, cos_max:%u, 
cbm_len:%u\n,
socket, info-cos_max, info-cbm_len);
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 3/8] Move common-functions.sh and git-checkout.sh into lib

2015-04-09 Thread George Dunlap
script implies something which is designed to be run standalone.

lib implies that this is going to be sourced from another bash script.

Also change git-checkout to be a function rather than a script

Signed-off-by: George Dunlap george.dun...@eu.citrix.com
---
CC: Stefano Stabellini stefano.stabell...@citrix.com
---
 components/grub  |  2 +-
 components/libvirt   |  2 +-
 components/xen   |  2 +-
 {scripts = lib}/common-functions.sh |  0
 lib/git-checkout.sh  | 32 
 raise.sh |  3 ++-
 scripts/git-checkout.sh  | 30 --
 unraise.sh   |  2 +-
 8 files changed, 38 insertions(+), 35 deletions(-)
 rename {scripts = lib}/common-functions.sh (100%)
 create mode 100755 lib/git-checkout.sh
 delete mode 100755 scripts/git-checkout.sh

diff --git a/components/grub b/components/grub
index 5a42000..a5aa27d 100644
--- a/components/grub
+++ b/components/grub
@@ -29,7 +29,7 @@ function grub_build() {
 cd $BASEDIR
 rm -f memdisk.tar
 tar cf memdisk.tar -C data grub.cfg
-./scripts/git-checkout.sh $GRUB_UPSTREAM_URL $GRUB_UPSTREAM_REVISION 
grub-dir
+git-checkout $GRUB_UPSTREAM_URL $GRUB_UPSTREAM_REVISION grub-dir
 cd grub-dir
 ./autogen.sh
 ## GRUB32
diff --git a/components/libvirt b/components/libvirt
index e22996e..6602dcf 100644
--- a/components/libvirt
+++ b/components/libvirt
@@ -26,7 +26,7 @@ function libvirt_build() {
 _libvirt_install_dependencies
 
 cd $BASEDIR
-./scripts/git-checkout.sh $LIBVIRT_UPSTREAM_URL $LIBVIRT_UPSTREAM_REVISION 
libvirt-dir
+git-checkout $LIBVIRT_UPSTREAM_URL $LIBVIRT_UPSTREAM_REVISION libvirt-dir
 cd libvirt-dir
 CFLAGS=-I$INST_DIR/$PREFIX/include \
 LDFLAGS=-L$INST_DIR/$PREFIX/lib -Wl,-rpath-link=$INST_DIR/$PREFIX/lib \
diff --git a/components/xen b/components/xen
index a49a1d1..70b72b0 100644
--- a/components/xen
+++ b/components/xen
@@ -23,7 +23,7 @@ function xen_build() {
 _xen_install_dependencies
 
 cd $BASEDIR
-./scripts/git-checkout.sh $XEN_UPSTREAM_URL $XEN_UPSTREAM_REVISION xen-dir
+git-checkout $XEN_UPSTREAM_URL $XEN_UPSTREAM_REVISION xen-dir
 cd xen-dir
 ./configure --prefix=$PREFIX
 $MAKE
diff --git a/scripts/common-functions.sh b/lib/common-functions.sh
similarity index 100%
rename from scripts/common-functions.sh
rename to lib/common-functions.sh
diff --git a/lib/git-checkout.sh b/lib/git-checkout.sh
new file mode 100755
index 000..2ca8f25
--- /dev/null
+++ b/lib/git-checkout.sh
@@ -0,0 +1,32 @@
+#!/usr/bin/env bash
+
+function git-checkout() {
+if [[ $# -lt 3 ]]
+then
+   echo Usage: $0 tree tag dir
+   exit 1
+fi
+
+TREE=$1
+TAG=$2
+DIR=$3
+
+set -e
+
+if [[ ! -d $DIR-remote ]]
+then
+   rm -rf $DIR-remote $DIR-remote.tmp
+   mkdir -p $DIR-remote.tmp; rmdir $DIR-remote.tmp
+   $GIT clone $TREE $DIR-remote.tmp
+   if [[ $TAG ]]
+   then
+   cd $DIR-remote.tmp
+   $GIT branch -D dummy /dev/null 21 ||:
+   $GIT checkout -b dummy $TAG
+   cd ..
+   fi
+   mv $DIR-remote.tmp $DIR-remote
+fi
+rm -f $DIR
+ln -sf $DIR-remote $DIR
+}
diff --git a/raise.sh b/raise.sh
index 3c8281e..422fbe4 100755
--- a/raise.sh
+++ b/raise.sh
@@ -3,7 +3,8 @@
 set -e
 
 source config
-source scripts/common-functions.sh
+source lib/common-functions.sh
+source lib/git-checkout.sh
 
 _help() {
 echo Usage: ./build.sh options command
diff --git a/scripts/git-checkout.sh b/scripts/git-checkout.sh
deleted file mode 100755
index 912bfae..000
--- a/scripts/git-checkout.sh
+++ /dev/null
@@ -1,30 +0,0 @@
-#!/usr/bin/env bash
-
-if [[ $# -lt 3 ]]
-then
-   echo Usage: $0 tree tag dir
-   exit 1
-fi
-
-TREE=$1
-TAG=$2
-DIR=$3
-
-set -e
-
-if [[ ! -d $DIR-remote ]]
-then
-   rm -rf $DIR-remote $DIR-remote.tmp
-   mkdir -p $DIR-remote.tmp; rmdir $DIR-remote.tmp
-   $GIT clone $TREE $DIR-remote.tmp
-   if [[ $TAG ]]
-   then
-   cd $DIR-remote.tmp
-   $GIT branch -D dummy /dev/null 21 ||:
-   $GIT checkout -b dummy $TAG
-   cd ..
-   fi
-   mv $DIR-remote.tmp $DIR-remote
-fi
-rm -f $DIR
-ln -sf $DIR-remote $DIR
diff --git a/unraise.sh b/unraise.sh
index 2f08901..50ce310 100755
--- a/unraise.sh
+++ b/unraise.sh
@@ -3,7 +3,7 @@
 set -e
 
 source config
-source scripts/common-functions.sh
+source lib/common-functions.sh
 
 
 # start execution
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4/8] Import raise.sh and unraise.sh into library

2015-04-09 Thread George Dunlap
Make as few changes as possible to begin with, just to separate code
motion from changes.

For now, remove raise.sh and unraise.sh from package creation, until
we can figure out what to do instead.

Signed-off-by: George Dunlap george.dun...@eu.citrix.com
---
CC: Stefano Stabellini stefano.stabell...@citrix.com
---
 raise.sh = lib/build.sh | 62 
 raise|  6 +
 scripts/mkdeb|  3 ++-
 scripts/mkrpm|  5 ++--
 unraise.sh   | 17 -
 5 files changed, 31 insertions(+), 62 deletions(-)
 rename raise.sh = lib/build.sh (77%)
 delete mode 100755 unraise.sh

diff --git a/raise.sh b/lib/build.sh
similarity index 77%
rename from raise.sh
rename to lib/build.sh
index 422fbe4..ab1e087 100755
--- a/raise.sh
+++ b/lib/build.sh
@@ -2,10 +2,6 @@
 
 set -e
 
-source config
-source lib/common-functions.sh
-source lib/git-checkout.sh
-
 _help() {
 echo Usage: ./build.sh options command
 echo where options are:
@@ -18,7 +14,9 @@ _help() {
 echo configureConfigure the system  (requires sudo)
 }
 
-_build() {
+build() {
+$arg_parse
+
 if [[ $YES != y ]]
 then
 echo Do you want Raisin to automatically install build time 
dependencies for you? (y/n)
@@ -50,7 +48,20 @@ _build() {
 build_package xen-system
 }
 
-_install() {
+unraise() {
+$arg_parse
+
+for_each_component clean
+
+uninstall_package xen-system
+for_each_component unconfigure
+
+rm -rf $INST_DIR
+}
+
+install() {
+$arg_parse
+
 # need single braces for filename matching expansion
 if [ ! -f xen-sytem*rpm ]  [ ! -f xen-system*deb ]
 then
@@ -60,7 +71,9 @@ _install() {
 install_package xen-system
 }
 
-_configure() {
+configure() {
+$arg_parse
+
 if [[ $YES != y ]]
 then
 echo Proceeding we'll make changes to the running system,
@@ -82,38 +95,3 @@ _configure() {
 for_each_component configure
 }
 
-# start execution
-common_init
-
-# parameters check
-export VERBOSE=0
-export YES=n
-export NO_DEPS=0
-while [[ $# -gt 1 ]]
-do
-  if [[ $1 = -v || $1 = --verbose ]]
-  then
-VERBOSE=1
-shift 1
-  elif [[ $1 = -y || $1 = --yes ]]
-  then
-YES=y
-shift 1
-  else
-_help
-exit 1
-  fi
-done
-
-case $1 in
-build | install | configure )
-COMMAND=$1
-;;
-*)
-_help
-exit 1
-;;
-esac
-
-_$COMMAND
-
diff --git a/raise b/raise
index 7f3faae..142956d 100755
--- a/raise
+++ b/raise
@@ -10,6 +10,12 @@ fi
 
 # Then as many as the sub-libraries as you need
 . ${RAISIN_PATH}/core.sh
+. ${RAISIN_PATH}/common-functions.sh
+. ${RAISIN_PATH}/git-checkout.sh
+. ${RAISIN_PATH}/build.sh
+
+# Set up basic functionality
+common_init
 
 # And do your own thing rather than running commands
 # I suggest defining a main function of your own and running it like this.
diff --git a/scripts/mkdeb b/scripts/mkdeb
index 46ade07..cb2a1b6 100755
--- a/scripts/mkdeb
+++ b/scripts/mkdeb
@@ -35,7 +35,8 @@ mkdir -p deb/opt/raisin
 cp -r data deb/opt/raisin
 cp -r components deb/opt/raisin
 cp -r scripts deb/opt/raisin
-cp config raise.sh unraise.sh deb/opt/raisin
+# FIXME
+#cp config raise.sh unraise.sh deb/opt/raisin
 
 
 # Debian doesn't use /usr/lib64 for 64-bit libraries
diff --git a/scripts/mkrpm b/scripts/mkrpm
index c530466..90d9bdc 100755
--- a/scripts/mkrpm
+++ b/scripts/mkrpm
@@ -48,8 +48,9 @@ cp -r $BASEDIR/data \$RPM_BUILD_ROOT/opt/raisin
 cp -r $BASEDIR/components \$RPM_BUILD_ROOT/opt/raisin
 cp -r $BASEDIR/scripts \$RPM_BUILD_ROOT/opt/raisin
 cp $BASEDIR/config \$RPM_BUILD_ROOT/opt/raisin
-cp $BASEDIR/raise.sh \$RPM_BUILD_ROOT/opt/raisin
-cp $BASEDIR/unraise.sh \$RPM_BUILD_ROOT/opt/raisin
+# FIXME
+# cp $BASEDIR/raise.sh \$RPM_BUILD_ROOT/opt/raisin
+# cp $BASEDIR/unraise.sh \$RPM_BUILD_ROOT/opt/raisin
 
 %clean
 
diff --git a/unraise.sh b/unraise.sh
deleted file mode 100755
index 50ce310..000
--- a/unraise.sh
+++ /dev/null
@@ -1,17 +0,0 @@
-#!/usr/bin/env bash
-
-set -e
-
-source config
-source lib/common-functions.sh
-
-
-# start execution
-common_init
-
-for_each_component clean
-
-uninstall_package xen-system
-for_each_component unconfigure
-
-rm -rf $INST_DIR
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V9 4/6] xen/vm_event: Relocate memop checks

2015-04-09 Thread Tamas K Lengyel
The memop handler function for paging/sharing responsible for calling XSM
doesn't really have anything to do with vm_event, thus in this patch we
relocate it into mem_paging_memop and mem_sharing_memop. This has already
been the approach in mem_access_memop, so in this patch we just make it
consistent.

Signed-off-by: Tamas K Lengyel tamas.leng...@zentific.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
Reviewed-by: Tim Deegan t...@xen.org
---
v7: Minor fixes with returning error codes on rcu lock failure
v6: Don't pass superfluous cmd to the memops.
Unlock rcu's in sharing/paging
Style fixes
---
 xen/arch/x86/mm/mem_paging.c  |  41 ++---
 xen/arch/x86/mm/mem_sharing.c | 125 +-
 xen/arch/x86/x86_64/compat/mm.c   |  26 +---
 xen/arch/x86/x86_64/mm.c  |  24 +---
 xen/common/vm_event.c |  43 -
 xen/include/asm-x86/mem_paging.h  |   2 +-
 xen/include/asm-x86/mem_sharing.h |   3 +-
 xen/include/xen/vm_event.h|   1 -
 8 files changed, 124 insertions(+), 141 deletions(-)

diff --git a/xen/arch/x86/mm/mem_paging.c b/xen/arch/x86/mm/mem_paging.c
index e63d8c1..17d2319 100644
--- a/xen/arch/x86/mm/mem_paging.c
+++ b/xen/arch/x86/mm/mem_paging.c
@@ -22,27 +22,45 @@
 
 
 #include asm/p2m.h
-#include xen/vm_event.h
+#include xen/guest_access.h
+#include xsm/xsm.h
 
-
-int mem_paging_memop(struct domain *d, xen_mem_paging_op_t *mpo)
+int mem_paging_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_paging_op_t) arg)
 {
-int rc = -ENODEV;
-if ( unlikely(!d-vm_event-paging.ring_page) )
+int rc;
+xen_mem_paging_op_t mpo;
+struct domain *d;
+bool_t copyback = 0;
+
+if ( copy_from_guest(mpo, arg, 1) )
+return -EFAULT;
+
+rc = rcu_lock_live_remote_domain_by_id(mpo.domain, d);
+if ( rc )
 return rc;
 
-switch( mpo-op )
+rc = xsm_vm_event_op(XSM_DM_PRIV, d, XENMEM_paging_op);
+if ( rc )
+goto out;
+
+rc = -ENODEV;
+if ( unlikely(!d-vm_event-paging.ring_page) )
+goto out;
+
+switch( mpo.op )
 {
 case XENMEM_paging_op_nominate:
-rc = p2m_mem_paging_nominate(d, mpo-gfn);
+rc = p2m_mem_paging_nominate(d, mpo.gfn);
 break;
 
 case XENMEM_paging_op_evict:
-rc = p2m_mem_paging_evict(d, mpo-gfn);
+rc = p2m_mem_paging_evict(d, mpo.gfn);
 break;
 
 case XENMEM_paging_op_prep:
-rc = p2m_mem_paging_prep(d, mpo-gfn, mpo-buffer);
+rc = p2m_mem_paging_prep(d, mpo.gfn, mpo.buffer);
+if ( !rc )
+copyback = 1;
 break;
 
 default:
@@ -50,6 +68,11 @@ int mem_paging_memop(struct domain *d, xen_mem_paging_op_t 
*mpo)
 break;
 }
 
+if ( copyback  __copy_to_guest(arg, mpo, 1) )
+rc = -EFAULT;
+
+out:
+rcu_unlock_domain(d);
 return rc;
 }
 
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 4959407..ff01378 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -28,6 +28,7 @@
 #include xen/grant_table.h
 #include xen/sched.h
 #include xen/rcupdate.h
+#include xen/guest_access.h
 #include xen/vm_event.h
 #include asm/page.h
 #include asm/string.h
@@ -1293,39 +1294,66 @@ int relinquish_shared_pages(struct domain *d)
 return rc;
 }
 
-int mem_sharing_memop(struct domain *d, xen_mem_sharing_op_t *mec)
+int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
-int rc = 0;
+int rc;
+xen_mem_sharing_op_t mso;
+struct domain *d;
+
+rc = -EFAULT;
+if ( copy_from_guest(mso, arg, 1) )
+return rc;
+
+if ( mso.op == XENMEM_sharing_op_audit )
+return mem_sharing_audit();
+
+rc = rcu_lock_live_remote_domain_by_id(mso.domain, d);
+if ( rc )
+return rc;
+
+rc = xsm_vm_event_op(XSM_DM_PRIV, d, XENMEM_sharing_op);
+if ( rc )
+goto out;
 
 /* Only HAP is supported */
+rc = -ENODEV;
 if ( !hap_enabled(d) || !d-arch.hvm_domain.mem_sharing_enabled )
- return -ENODEV;
+goto out;
 
-switch(mec-op)
+rc = -ENODEV;
+if ( unlikely(!d-vm_event-share.ring_page) )
+goto out;
+
+switch ( mso.op )
 {
 case XENMEM_sharing_op_nominate_gfn:
 {
-unsigned long gfn = mec-u.nominate.u.gfn;
+unsigned long gfn = mso.u.nominate.u.gfn;
 shr_handle_t handle;
+
+rc = -EINVAL;
 if ( !mem_sharing_enabled(d) )
-return -EINVAL;
+goto out;
+
 rc = mem_sharing_nominate_page(d, gfn, 0, handle);
-mec-u.nominate.handle = handle;
+mso.u.nominate.handle = handle;
 }
 break;
 
 case XENMEM_sharing_op_nominate_gref:
 {
-grant_ref_t gref = mec-u.nominate.u.grant_ref;
+grant_ref_t gref = mso.u.nominate.u.grant_ref;
 unsigned long gfn;
 shr_handle_t 

[Xen-devel] [PATCH V9 3/6] xen/vm_event: Decouple vm_event and mem_access.

2015-04-09 Thread Tamas K Lengyel
The vm_event subsystem has been artifically tied to the presence of mem_access.
While mem_access does depend on vm_event, vm_event is an entirely independent
subsystem that can be used for arbitrary function-offloading to helper apps in
domains. This patch removes the dependency that mem_access needs to be supported
in order to enable vm_event.

A new vm_event_resume function is introduced which pulls all responses off from
given ring and delegates handling to appropriate helper functions (if
necessary). By default, vm_event_resume just pulls the response from the ring
and unpauses the corresponding vCPU. This approach reduces code duplication
and present a single point of entry for the entire vm_event subsystem's
response handling mechanism.

Signed-off-by: Tamas K Lengyel tamas.leng...@zentific.com
Acked-by: Daniel De Graaf dgde...@tycho.nsa.gov
Acked-by: Tim Deegan t...@xen.org
---
v4: Consolidate resume routines into vm_event_resume
Style fixes
Sort xen/common/Makefile to be alphabetical
v3: Move ring processing out from mem_access.c to monitor.c in common
---
 xen/arch/x86/mm/mem_sharing.c   | 32 ++---
 xen/arch/x86/mm/p2m.c   | 62 
 xen/common/Makefile | 18 +-
 xen/common/mem_access.c | 31 +---
 xen/common/vm_event.c   | 72 +++--
 xen/include/asm-x86/mem_sharing.h   |  1 -
 xen/include/asm-x86/p2m.h   |  2 +-
 xen/include/xen/mem_access.h| 14 ++--
 xen/include/xen/vm_event.h  | 58 ++
 xen/include/xsm/dummy.h |  2 --
 xen/include/xsm/xsm.h   |  4 ---
 xen/xsm/dummy.c |  2 --
 xen/xsm/flask/hooks.c   | 36 ---
 xen/xsm/flask/policy/access_vectors |  8 ++---
 14 files changed, 128 insertions(+), 214 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index e6572af..4959407 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -591,35 +591,6 @@ unsigned int mem_sharing_get_nr_shared_mfns(void)
 return (unsigned int)atomic_read(nr_shared_mfns);
 }
 
-int mem_sharing_sharing_resume(struct domain *d)
-{
-vm_event_response_t rsp;
-
-/* Get all requests off the ring */
-while ( vm_event_get_response(d, d-vm_event-share, rsp) )
-{
-struct vcpu *v;
-
-if ( rsp.version != VM_EVENT_INTERFACE_VERSION )
-{
-printk(XENLOG_G_WARNING vm_event interface version mismatch\n);
-continue;
-}
-
-/* Validate the vcpu_id in the response. */
-if ( (rsp.vcpu_id = d-max_vcpus) || !d-vcpu[rsp.vcpu_id] )
-continue;
-
-v = d-vcpu[rsp.vcpu_id];
-
-/* Unpause domain/vcpu */
-if ( rsp.flags  VM_EVENT_FLAG_VCPU_PAUSED )
-vm_event_vcpu_unpause(v);
-}
-
-return 0;
-}
-
 /* Functions that change a page's type and ownership */
 static int page_make_sharable(struct domain *d, 
struct page_info *page, 
@@ -1470,7 +1441,8 @@ int mem_sharing_memop(struct domain *d, 
xen_mem_sharing_op_t *mec)
 {
 if ( !mem_sharing_enabled(d) )
 return -EINVAL;
-rc = mem_sharing_sharing_resume(d);
+
+vm_event_resume(d, d-vm_event-share);
 }
 break;
 
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 4032c62..6403172 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1279,13 +1279,13 @@ int p2m_mem_paging_prep(struct domain *d, unsigned long 
gfn, uint64_t buffer)
 }
 
 /**
- * p2m_mem_paging_resume - Resume guest gfn and vcpus
+ * p2m_mem_paging_resume - Resume guest gfn
  * @d: guest domain
- * @gfn: guest page in paging state
+ * @rsp: vm_event response received
+ *
+ * p2m_mem_paging_resume() will forward the p2mt of a gfn to ram_rw. It is
+ * called by the pager.
  *
- * p2m_mem_paging_resume() will forward the p2mt of a gfn to ram_rw and all
- * waiting vcpus will be unpaused again. It is called by the pager.
- * 
  * The gfn was previously either evicted and populated, or nominated and
  * populated. If the page was evicted the p2mt will be p2m_ram_paging_in. If
  * the page was just nominated the p2mt will be p2m_ram_paging_in_start because
@@ -1293,51 +1293,33 @@ int p2m_mem_paging_prep(struct domain *d, unsigned long 
gfn, uint64_t buffer)
  *
  * If the gfn was dropped the vcpu needs to be unpaused.
  */
-void p2m_mem_paging_resume(struct domain *d)
+
+void p2m_mem_paging_resume(struct domain *d, vm_event_response_t *rsp)
 {
 struct p2m_domain *p2m = p2m_get_hostp2m(d);
-vm_event_response_t rsp;
 p2m_type_t p2mt;
 p2m_access_t a;
 mfn_t mfn;
 
-/* Pull all responses off the ring */
-while( vm_event_get_response(d, d-vm_event-paging, rsp) )
+/* Fix p2m entry if the page was not dropped */
+if ( 

[Xen-devel] [PATCH v20 06/13] x86/VPMU: Initialize PMU for PV(H) guests

2015-04-09 Thread Boris Ostrovsky
Code for initializing/tearing down PMU for PV guests

Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
Acked-by: Kevin Tian kevin.t...@intel.com
Acked-by: Daniel De Graaf dgde...@tycho.nsa.gov
---
Changes in v20:
* Moved page freeing/unmapping from under vpmu_lock in 
pvpmu_init()/pvpmu_finish():
* Using is_hardware_domain() instead of open-coding
* Added comments to explain how vpmu_count is used.
* Don't test d-vcpu as it is covered by preceding d-max_vcpus check

 tools/flask/policy/policy/modules/xen/xen.te |   4 +
 xen/arch/x86/domain.c|   2 +
 xen/arch/x86/hvm/hvm.c   |   1 +
 xen/arch/x86/hvm/svm/svm.c   |   4 +-
 xen/arch/x86/hvm/svm/vpmu.c  |  44 ++---
 xen/arch/x86/hvm/vmx/vmx.c   |   4 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c|  79 +++-
 xen/arch/x86/hvm/vpmu.c  | 131 ---
 xen/common/event_channel.c   |   1 +
 xen/include/asm-x86/hvm/vpmu.h   |   2 +
 xen/include/public/pmu.h |   2 +
 xen/include/public/xen.h |   1 +
 xen/include/xsm/dummy.h  |   3 +
 xen/xsm/flask/hooks.c|   4 +
 xen/xsm/flask/policy/access_vectors  |   2 +
 15 files changed, 232 insertions(+), 52 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te 
b/tools/flask/policy/policy/modules/xen/xen.te
index 963ed44..c47369a 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -120,6 +120,10 @@ domain_comms(dom0_t, dom0_t)
 # Allow all domains to use (unprivileged parts of) the tmem hypercall
 allow domain_type xen_t:xen tmem_op;
 
+# Allow all domains to use PMU (but not to change its settings --- that's what
+# pmu_ctrl is for)
+allow domain_type xen_t:xen2 pmu_use;
+
 ###
 #
 # Domain creation
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 9d5a527..dd10223 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -438,6 +438,8 @@ int vcpu_initialise(struct vcpu *v)
 vmce_init_vcpu(v);
 }
 
+spin_lock_init(v-arch.vpmu.vpmu_lock);
+
 if ( has_hvm_container_domain(d) )
 {
 rc = hvm_vcpu_initialise(v);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 3ff87c6..7fcbb3e 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4914,6 +4914,7 @@ static hvm_hypercall_t *const 
pvh_hypercall64_table[NR_hypercalls] = {
 HYPERCALL(hvm_op),
 HYPERCALL(sysctl),
 HYPERCALL(domctl),
+HYPERCALL(xenpmu_op),
 [ __HYPERVISOR_arch_1 ] = (hvm_hypercall_t *)paging_domctl_continuation
 };
 
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index b6e77cd..e523d12 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1166,7 +1166,9 @@ static int svm_vcpu_initialise(struct vcpu *v)
 return rc;
 }
 
-vpmu_initialise(v);
+/* PVH's VPMU is initialized via hypercall */
+if ( is_hvm_vcpu(v) )
+vpmu_initialise(v);
 
 svm_guest_osvw_init(v);
 
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index b60ca40..58a0dc4 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -360,17 +360,19 @@ static void amd_vpmu_destroy(struct vcpu *v)
 {
 struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
-if ( has_hvm_container_vcpu(v)  is_msr_bitmap_on(vpmu) )
-amd_vpmu_unset_msr_bitmap(v);
+if ( has_hvm_container_vcpu(v) )
+{
+if ( is_msr_bitmap_on(vpmu) )
+amd_vpmu_unset_msr_bitmap(v);
 
-xfree(vpmu-context);
-vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
+if ( is_hvm_vcpu(v) )
+xfree(vpmu-context);
 
-if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
-{
-vpmu_reset(vpmu, VPMU_RUNNING);
 release_pmu_ownship(PMU_OWNER_HVM);
 }
+
+vpmu-context = NULL;
+vpmu_clear(vpmu);
 }
 
 /* VPMU part of the 'q' keyhandler */
@@ -435,15 +437,19 @@ int svm_vpmu_initialise(struct vcpu *v)
 if ( !counters )
 return -EINVAL;
 
-ctxt = xzalloc_bytes(sizeof(*ctxt) +
- 2 * sizeof(uint64_t) * num_counters);
-if ( !ctxt )
+if ( is_hvm_vcpu(v) )
 {
-printk(XENLOG_G_WARNING Insufficient memory for PMU, 
-PMU feature is unavailable on domain %d vcpu %d.\n,
-   v-vcpu_id, v-domain-domain_id);
-return -ENOMEM;
+ctxt = xzalloc_bytes(sizeof(*ctxt) +
+ 2 * sizeof(uint64_t) * num_counters);
+if ( !ctxt )
+{
+printk(XENLOG_G_WARNING %pv: Insufficient memory for PMU, 
+PMU feature is unavailable\n, v);
+return -ENOMEM;
+}
 }
+else
+ctxt = 

[Xen-devel] [PATCH v20 10/13] x86/VPMU: Handle PMU interrupts for PV(H) guests

2015-04-09 Thread Boris Ostrovsky
Add support for handling PMU interrupts for PV(H) guests.

VPMU for the interrupted VCPU is unloaded until the guest issues XENPMU_flush
hypercall. This allows the guest to access PMU MSR values that are stored in
VPMU context which is shared between hypervisor and domain, thus avoiding
traps to hypervisor.

Since the interrupt handler may now force VPMU context save (i.e. set
VPMU_CONTEXT_SAVE flag) we need to make changes to amd_vpmu_save() which
until now expected this flag to be set only when the counters were stopped.

Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
Acked-by: Daniel De Graaf dgde...@tycho.nsa.gov
Acked-by: Jan Beulich jbeul...@suse.com
---
* Updated patch title (include PVH guests)
* vpmu_lvtpc_update() initializes curr at definition time
* Drop curr in do_xenpmu_op()'s XENPMU_lvtpc_set case.
* Declared domid as domid_t type

 xen/arch/x86/hvm/svm/vpmu.c   |  11 +-
 xen/arch/x86/hvm/vpmu.c   | 211 +++---
 xen/include/public/arch-x86/pmu.h |   6 ++
 xen/include/public/pmu.h  |   2 +
 xen/include/xsm/dummy.h   |   4 +-
 xen/xsm/flask/hooks.c |   2 +
 6 files changed, 215 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 474d0db..0997901 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -228,17 +228,12 @@ static int amd_vpmu_save(struct vcpu *v)
 struct vpmu_struct *vpmu = vcpu_vpmu(v);
 unsigned int i;
 
-/*
- * Stop the counters. If we came here via vpmu_save_force (i.e.
- * when VPMU_CONTEXT_SAVE is set) counters are already stopped.
- */
+for ( i = 0; i  num_counters; i++ )
+wrmsrl(ctrls[i], 0);
+
 if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
 {
 vpmu_set(vpmu, VPMU_FROZEN);
-
-for ( i = 0; i  num_counters; i++ )
-wrmsrl(ctrls[i], 0);
-
 return 0;
 }
 
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 5fbb799..37e612a 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -85,31 +85,56 @@ static void __init parse_vpmu_param(char *s)
 void vpmu_lvtpc_update(uint32_t val)
 {
 struct vpmu_struct *vpmu;
+struct vcpu *curr = current;
 
-if ( vpmu_mode == XENPMU_MODE_OFF )
+if ( likely(vpmu_mode == XENPMU_MODE_OFF) )
 return;
 
-vpmu = vcpu_vpmu(current);
+vpmu = vcpu_vpmu(curr);
 
 vpmu-hw_lapic_lvtpc = PMU_APIC_VECTOR | (val  APIC_LVT_MASKED);
-apic_write(APIC_LVTPC, vpmu-hw_lapic_lvtpc);
+
+/* Postpone APIC updates for PV(H) guests if PMU interrupt is pending */
+if ( is_hvm_vcpu(curr) || !vpmu-xenpmu_data ||
+ !(vpmu-xenpmu_data-pmu.pmu_flags  PMU_CACHED) )
+apic_write(APIC_LVTPC, vpmu-hw_lapic_lvtpc);
 }
 
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported)
 {
-struct vpmu_struct *vpmu = vcpu_vpmu(current);
+struct vcpu *curr = current;
+struct vpmu_struct *vpmu;
 
 if ( vpmu_mode == XENPMU_MODE_OFF )
 return 0;
 
+vpmu = vcpu_vpmu(curr);
 if ( vpmu-arch_vpmu_ops  vpmu-arch_vpmu_ops-do_wrmsr )
-return vpmu-arch_vpmu_ops-do_wrmsr(msr, msr_content, supported);
+{
+int ret = vpmu-arch_vpmu_ops-do_wrmsr(msr, msr_content, supported);
+
+/*
+ * We may have received a PMU interrupt during WRMSR handling
+ * and since do_wrmsr may load VPMU context we should save
+ * (and unload) it again.
+ */
+if ( !is_hvm_vcpu(curr)  vpmu-xenpmu_data 
+ (vpmu-xenpmu_data-pmu.pmu_flags  PMU_CACHED) )
+{
+vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+vpmu-arch_vpmu_ops-arch_vpmu_save(curr);
+vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+}
+return ret;
+}
+
 return 0;
 }
 
 int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 {
-struct vpmu_struct *vpmu = vcpu_vpmu(current);
+struct vcpu *curr = current;
+struct vpmu_struct *vpmu;
 
 if ( vpmu_mode == XENPMU_MODE_OFF )
 {
@@ -117,24 +142,163 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t 
*msr_content)
 return 0;
 }
 
+vpmu = vcpu_vpmu(curr);
 if ( vpmu-arch_vpmu_ops  vpmu-arch_vpmu_ops-do_rdmsr )
-return vpmu-arch_vpmu_ops-do_rdmsr(msr, msr_content);
+{
+int ret = vpmu-arch_vpmu_ops-do_rdmsr(msr, msr_content);
+
+if ( !is_hvm_vcpu(curr)  vpmu-xenpmu_data 
+ (vpmu-xenpmu_data-pmu.pmu_flags  PMU_CACHED) )
+{
+vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+vpmu-arch_vpmu_ops-arch_vpmu_save(curr);
+vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+}
+return ret;
+}
 else
 *msr_content = 0;
 
 return 0;
 }
 
+static inline struct vcpu *choose_hwdom_vcpu(void)
+{
+unsigned idx;
+
+if ( hardware_domain-max_vcpus == 0 )
+return NULL;

[Xen-devel] [PATCH v20 09/13] x86/VPMU: Add support for PMU register handling on PV guests

2015-04-09 Thread Boris Ostrovsky
Intercept accesses to PMU MSRs and process them in VPMU module. If vpmu ops
for VCPU are not initialized (which is the case, for example, for PV guests that
are not VPMU-enlightened) access to MSRs will return failure.

Dump VPMU state for all domains (HVM and PV) when requested.

Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
Acked-by: Jan Beulich jbeul...@suse.com
Acked-by: Kevin Tian kevin.t...@intel.com
Reviewed-by: Dietmar Hahn dietmar.h...@ts.fujitsu.com
Tested-by: Dietmar Hahn dietmar.h...@ts.fujitsu.com
---
 xen/arch/x86/domain.c |  3 +--
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 49 +++--
 xen/arch/x86/hvm/vpmu.c   |  3 +++
 xen/arch/x86/traps.c  | 51 +--
 4 files changed, 95 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 03bcbd3..d9f48a3 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2066,8 +2066,7 @@ void arch_dump_vcpu_info(struct vcpu *v)
 {
 paging_dump_vcpu_info(v);
 
-if ( is_hvm_vcpu(v) )
-vpmu_dump(v);
+vpmu_dump(v);
 }
 
 void domain_cpuid(
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c 
b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index a00d06c..fc89eb7 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -27,6 +27,7 @@
 #include asm/regs.h
 #include asm/types.h
 #include asm/apic.h
+#include asm/traps.h
 #include asm/msr.h
 #include asm/msr-index.h
 #include asm/hvm/support.h
@@ -299,12 +300,18 @@ static inline void __core2_vpmu_save(struct vcpu *v)
 rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
 for ( i = 0; i  arch_pmc_cnt; i++ )
 rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
+
+if ( !has_hvm_container_vcpu(v) )
+rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt-global_status);
 }
 
 static int core2_vpmu_save(struct vcpu *v)
 {
 struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
+if ( !has_hvm_container_vcpu(v) )
+wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+
 if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
 return 0;
 
@@ -342,6 +349,13 @@ static inline void __core2_vpmu_load(struct vcpu *v)
 wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt-fixed_ctrl);
 wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt-ds_area);
 wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt-pebs_enable);
+
+if ( !has_hvm_container_vcpu(v) )
+{
+wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, core2_vpmu_cxt-global_ovf_ctrl);
+core2_vpmu_cxt-global_ovf_ctrl = 0;
+wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt-global_ctrl);
+}
 }
 
 static void core2_vpmu_load(struct vcpu *v)
@@ -442,7 +456,6 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int 
*type, int *index)
 static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
uint64_t supported)
 {
-u64 global_ctrl;
 int i, tmp;
 int type = -1, index = -1;
 struct vcpu *v = current;
@@ -486,7 +499,12 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t 
msr_content,
 switch ( msr )
 {
 case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+if ( msr_content  ~(0xC000 |
+ (((1ULL  fixed_pmc_cnt) - 1)  32) |
+ ((1ULL  arch_pmc_cnt) - 1)) )
+return 1;
 core2_vpmu_cxt-global_status = ~msr_content;
+wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
 return 0;
 case MSR_CORE_PERF_GLOBAL_STATUS:
 gdprintk(XENLOG_INFO, Can not write readonly MSR: 
@@ -514,14 +532,18 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t 
msr_content,
 gdprintk(XENLOG_WARNING, Guest setting of DTS is ignored.\n);
 return 0;
 case MSR_CORE_PERF_GLOBAL_CTRL:
-global_ctrl = msr_content;
+core2_vpmu_cxt-global_ctrl = msr_content;
 break;
 case MSR_CORE_PERF_FIXED_CTR_CTRL:
 if ( msr_content 
  ( ~((1ull  (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1)) )
 return 1;
 
-vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
+if ( has_hvm_container_vcpu(v) )
+vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+   core2_vpmu_cxt-global_ctrl);
+else
+rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt-global_ctrl);
 *enabled_cntrs = ~(((1ULL  fixed_pmc_cnt) - 1)  32);
 if ( msr_content != 0 )
 {
@@ -546,7 +568,11 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t 
msr_content,
 if ( msr_content  (~((1ull  32) - 1)) )
 return 1;
 
-vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
+if ( has_hvm_container_vcpu(v) )
+vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+   

[Xen-devel] [PATCH v4 10/12] xsm: add CAT related xsm policies

2015-04-09 Thread Chao Peng
Add xsm policies for Cache Allocation Technology(CAT) related hypercalls
to restrict the functions visibility to control domain only.

Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
Acked-by:  Daniel De Graaf dgde...@tycho.nsa.gov
---
 tools/flask/policy/policy/modules/xen/xen.if | 2 +-
 tools/flask/policy/policy/modules/xen/xen.te | 4 +++-
 xen/xsm/flask/hooks.c| 6 ++
 xen/xsm/flask/policy/access_vectors  | 6 ++
 4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.if 
b/tools/flask/policy/policy/modules/xen/xen.if
index 2d32e1c..8bb081a 100644
--- a/tools/flask/policy/policy/modules/xen/xen.if
+++ b/tools/flask/policy/policy/modules/xen/xen.if
@@ -51,7 +51,7 @@ define(`create_domain_common', `
getaffinity setaffinity setvcpuextstate };
allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim
set_max_evtchn set_vnumainfo get_vnumainfo cacheflush
-   psr_cmt_op configure_domain };
+   psr_cmt_op configure_domain psr_cat_op };
allow $1 $2:security check_context;
allow $1 $2:shadow enable;
allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage 
mmuext_op updatemp };
diff --git a/tools/flask/policy/policy/modules/xen/xen.te 
b/tools/flask/policy/policy/modules/xen/xen.te
index c0128aa..d431aaf 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -67,6 +67,7 @@ allow dom0_t xen_t:xen {
 allow dom0_t xen_t:xen2 {
 resource_op
 psr_cmt_op
+psr_cat_op
 };
 allow dom0_t xen_t:mmu memorymap;
 
@@ -80,7 +81,8 @@ allow dom0_t dom0_t:domain {
getpodtarget setpodtarget set_misc_info set_virq_handler
 };
 allow dom0_t dom0_t:domain2 {
-   set_cpuid gettsc settsc setscheduler set_max_evtchn set_vnumainfo 
get_vnumainfo psr_cmt_op
+   set_cpuid gettsc settsc setscheduler set_max_evtchn set_vnumainfo
+   get_vnumainfo psr_cmt_op psr_cat_op
 };
 allow dom0_t dom0_t:resource { add remove };
 
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 05dafed..8964321 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -729,6 +729,9 @@ static int flask_domctl(struct domain *d, int cmd)
 case XEN_DOMCTL_psr_cmt_op:
 return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__PSR_CMT_OP);
 
+case XEN_DOMCTL_psr_cat_op:
+return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__PSR_CAT_OP);
+
 case XEN_DOMCTL_arm_configure_domain:
 return current_has_perm(d, SECCLASS_DOMAIN2, 
DOMAIN2__CONFIGURE_DOMAIN);
 
@@ -790,6 +793,9 @@ static int flask_sysctl(int cmd)
 case XEN_SYSCTL_psr_cmt_op:
 return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
 XEN2__PSR_CMT_OP, NULL);
+case XEN_SYSCTL_psr_cat_op:
+return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
+XEN2__PSR_CAT_OP, NULL);
 
 default:
 printk(flask_sysctl: Unknown op %d\n, cmd);
diff --git a/xen/xsm/flask/policy/access_vectors 
b/xen/xsm/flask/policy/access_vectors
index 8f44b9d..8cc1ef3 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -84,6 +84,9 @@ class xen2
 resource_op
 # XEN_SYSCTL_psr_cmt_op
 psr_cmt_op
+# XEN_SYSCTL_psr_cat_op
+psr_cat_op
+
 }
 
 # Classes domain and domain2 consist of operations that a domain performs on
@@ -221,6 +224,9 @@ class domain2
 psr_cmt_op
 # XEN_DOMCTL_configure_domain
 configure_domain
+# XEN_DOMCTL_psr_cat_op
+psr_cat_op
+
 }
 
 # Similar to class domain, but primarily contains domctls related to HVM 
domains
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 12/12] docs: add xl-psr.markdown

2015-04-09 Thread Chao Peng
Add document to introduce basic concepts and terms in PSR family
techonologies and the xl/libxl interfaces.

Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
---
 docs/man/xl.pod.1 |   7 +++
 docs/misc/xl-psr.markdown | 111 ++
 2 files changed, 118 insertions(+)
 create mode 100644 docs/misc/xl-psr.markdown

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index dfab921..b71d6e6 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -1472,6 +1472,9 @@ occupancy monitoring share the same set of underlying 
monitoring service. Once
 a domain is attached to the monitoring service, monitoring data can be showed
 for any of these monitoring types.
 
+See Lhttp://xenbits.xen.org/docs/unstable/misc/xl-psr.html for more
+informations.
+
 =over 4
 
 =item Bpsr-cmt-attach [Idomain-id]
@@ -1501,6 +1504,9 @@ applications. In Xen implementation, CAT is used to 
control cache allocation
 on VM basis. To enforce cache on a specific domain, just set capacity bitmasks
 (CBM) for the domain.
 
+See Lhttp://xenbits.xen.org/docs/unstable/misc/xl-psr.html for more
+informations.
+
 =over 4
 
 =item Bpsr-cat-cbm-set [IOPTIONS] [Idomain-id] [Icbm]
@@ -1546,6 +1552,7 @@ And the following documents on the xen.org website:
 Lhttp://xenbits.xen.org/docs/unstable/misc/xl-network-configuration.html
 Lhttp://xenbits.xen.org/docs/unstable/misc/xl-disk-configuration.txt
 Lhttp://xenbits.xen.org/docs/unstable/misc/xsm-flask.txt
+Lhttp://xenbits.xen.org/docs/unstable/misc/xl-psr.html
 
 For systems that don't automatically bring CPU online:
 
diff --git a/docs/misc/xl-psr.markdown b/docs/misc/xl-psr.markdown
new file mode 100644
index 000..44f6f8c
--- /dev/null
+++ b/docs/misc/xl-psr.markdown
@@ -0,0 +1,111 @@
+# Intel Platform Shared Resource Monitoring/Control in xl/libxl
+
+This document introduces Intel Platform Shared Resource Monitoring/Control
+technologies, their basic concepts and the xl/libxl interfaces.
+
+## Cache Monitoring Technology (CMT)
+
+Cache Monitoring Technology (CMT) is a new feature available on Intel Haswell
+and later server platforms that allows an OS or Hypervisor/VMM to determine
+the usage of cache(currently only L3 cache supported) by applications running
+on the platform. A Resource Monitoring ID (RMID) is the abstraction of the
+application(s) that will be monitored for its cache usage. The CMT hardware
+tracks cache utilization of memory accesses according to the RMID and reports
+monitored data via a counter register.
+
+Detailed information please refer to Intel SDM chapter 17.14.
+
+In Xen's implementation, each domain in the system can be assigned a RMID
+independently, while RMID=0 is reserved for monitoring domains that doesn't
+enable CMT service. RMID is opaque for xl/libxl and is only used in
+hypervisor.
+
+### xl interfaces
+
+A domain is assigned a RMID implicitly by attaching it to CMT service:
+
+xl psr-cmt-attach domid
+
+After that, cache usage for the domain can be showed by:
+
+xl psr-cmt-show cache_occupancy domid
+
+Once monitoring is not needed any more, the domain can be detached from the
+CMT service by:
+
+xl psr-cmt-detach domid
+
+The attaching may fail because of no free RMID available. In such case
+unused RMID(s) can be freed by detaching corresponding domains from CMT
+services. Maximum COS number in the system can also be obtained by:
+
+xl psr_cmt-show
+
+## Memory Bandwidth Monitoring (MBM)
+
+Memory Bandwidth Monitoring(MBM) is a new hardware feature available on Intel
+Broadwell and later server platforms which builds on the CMT infrastructure to
+allow monitoring of system memory bandwidth. It introduces two new monitoring
+event type to monitor system total/local memory bandwidth. The same RMID can
+be used to monitor both cache usage and memory bandwidth at the same time.
+
+Detailed information please refer to Intel SDM chapter 17.14.
+
+In Xen's implementation, MBM shares the same set of underlying monitoring
+service with CMT and can be used to monitor memory bandwidth on domain basis.
+
+The xl/libxl interface is the same with that of CMT. The difference is the
+monitor type is corresponding memory monitoring type(local_mem_bandwidth/
+total_mem_bandwidth) but not cache_occupancy.
+
+## Cache Allocation Technology (CAT)
+
+Cache Allocation Technology (CAT) is a new feature available on Intel
+Broadwell and later server platforms that allows an OS or Hypervisor/VMM to
+partition cache allocation(i.e. L3 cache) based on application priority or
+Class of Service(COS). Each COS is configured using capacity bitmasks (CBM)
+which represent cache capacity and indicate the degree of overlap and
+isolation between classes. System cache resource is divided into numbers of
+minimum portions which is then made up into subset for cache partition. Each
+portion corresponds to a bit in CBM and the set bit represents the
+corresponding cache portion is available.
+
+Detailed information please refer to Intel SDM 

[Xen-devel] [PATCH v4 02/12] x86: improve psr scheduling code

2015-04-09 Thread Chao Peng
Switching RMID from previous vcpu to next vcpu only needs to write
MSR_IA32_PSR_ASSOC once. Write it with the value of next vcpu is enough,
no need to write '0' first. Idle domain has RMID set to 0 and because MSR
is already updated lazily, so just switch it as it does.

Also move the initialization of per-CPU variable which used for lazy
update from context switch to CPU starting.

Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
---
Changes in v4:
* Move psr_assoc_reg_read/psr_assoc_reg_write into psr_ctxt_switch_to.
* Use 0 instead of smp_processor_id() for boot cpu.
* add cpu parameter to psr_assoc_init.
Changes in v2:
* Move initialization for psr_assoc from context switch to CPU_STARTING.
---
 xen/arch/x86/domain.c |  7 ++---
 xen/arch/x86/psr.c| 75 ++-
 xen/include/asm-x86/psr.h |  3 +-
 3 files changed, 59 insertions(+), 26 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 04c1898..695a2eb 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1444,8 +1444,6 @@ static void __context_switch(void)
 {
 memcpy(p-arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES);
 vcpu_save_fpu(p);
-if ( psr_cmt_enabled() )
-psr_assoc_rmid(0);
 p-arch.ctxt_switch_from(p);
 }
 
@@ -1470,11 +1468,10 @@ static void __context_switch(void)
 }
 vcpu_restore_fpu_eager(n);
 n-arch.ctxt_switch_to(n);
-
-if ( psr_cmt_enabled()  n-domain-arch.psr_rmid  0 )
-psr_assoc_rmid(n-domain-arch.psr_rmid);
 }
 
+psr_ctxt_switch_to(n-domain);
+
 gdt = !is_pv_32on64_vcpu(n) ? per_cpu(gdt_table, cpu) :
   per_cpu(compat_gdt_table, cpu);
 if ( need_full_gdt(n) )
diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 344de3c..6119c6e 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -22,7 +22,6 @@
 
 struct psr_assoc {
 uint64_t val;
-bool_t initialized;
 };
 
 struct psr_cmt *__read_mostly psr_cmt;
@@ -122,14 +121,6 @@ static void __init init_psr_cmt(unsigned int rmid_max)
 printk(XENLOG_INFO Cache Monitoring Technology enabled\n);
 }
 
-static int __init init_psr(void)
-{
-if ( (opt_psr  PSR_CMT)  opt_rmid_max )
-init_psr_cmt(opt_rmid_max);
-return 0;
-}
-__initcall(init_psr);
-
 /* Called with domain lock held, no psr specific lock needed */
 int psr_alloc_rmid(struct domain *d)
 {
@@ -175,26 +166,70 @@ void psr_free_rmid(struct domain *d)
 d-arch.psr_rmid = 0;
 }
 
-void psr_assoc_rmid(unsigned int rmid)
+static inline void psr_assoc_init(unsigned int cpu)
+{
+struct psr_assoc *psra = per_cpu(psr_assoc, cpu);
+
+if ( psr_cmt_enabled() )
+rdmsrl(MSR_IA32_PSR_ASSOC, psra-val);
+}
+
+static inline void psr_assoc_rmid(uint64_t *reg, unsigned int rmid)
+{
+*reg = (*reg  ~rmid_mask) | (rmid  rmid_mask);
+}
+
+void psr_ctxt_switch_to(struct domain *d)
 {
-uint64_t val;
-uint64_t new_val;
 struct psr_assoc *psra = this_cpu(psr_assoc);
+uint64_t reg = psra-val;
+
+if ( psr_cmt_enabled() )
+psr_assoc_rmid(reg, d-arch.psr_rmid);
 
-if ( !psra-initialized )
+if ( reg != psra-val )
 {
-rdmsrl(MSR_IA32_PSR_ASSOC, psra-val);
-psra-initialized = 1;
+wrmsrl(MSR_IA32_PSR_ASSOC, reg);
+psra-val = reg;
 }
-val = psra-val;
+}
 
-new_val = (val  ~rmid_mask) | (rmid  rmid_mask);
-if ( val != new_val )
+static void psr_cpu_init(unsigned int cpu)
+{
+psr_assoc_init(cpu);
+}
+
+static int cpu_callback(
+struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+unsigned int cpu = (unsigned long)hcpu;
+
+switch ( action )
 {
-wrmsrl(MSR_IA32_PSR_ASSOC, new_val);
-psra-val = new_val;
+case CPU_STARTING:
+psr_cpu_init(cpu);
+break;
 }
+
+return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_nfb = {
+.notifier_call = cpu_callback
+};
+
+static int __init psr_presmp_init(void)
+{
+if ( (opt_psr  PSR_CMT)  opt_rmid_max )
+init_psr_cmt(opt_rmid_max);
+
+psr_cpu_init(0);
+if ( psr_cmt_enabled() )
+register_cpu_notifier(cpu_nfb);
+
+return 0;
 }
+presmp_initcall(psr_presmp_init);
 
 /*
  * Local variables:
diff --git a/xen/include/asm-x86/psr.h b/xen/include/asm-x86/psr.h
index c6076e9..585350c 100644
--- a/xen/include/asm-x86/psr.h
+++ b/xen/include/asm-x86/psr.h
@@ -46,7 +46,8 @@ static inline bool_t psr_cmt_enabled(void)
 
 int psr_alloc_rmid(struct domain *d);
 void psr_free_rmid(struct domain *d);
-void psr_assoc_rmid(unsigned int rmid);
+
+void psr_ctxt_switch_to(struct domain *d);
 
 #endif /* __ASM_PSR_H__ */
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 07/12] x86: expose CBM length and COS number information

2015-04-09 Thread Chao Peng
General CAT information such as maximum COS and CBM length are exposed to
user space by a SYSCTL hypercall, to help user space to construct the CBM.

Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
---
 xen/arch/x86/psr.c  | 31 +++
 xen/arch/x86/sysctl.c   | 18 ++
 xen/include/asm-x86/psr.h   |  3 +++
 xen/include/public/sysctl.h | 16 
 4 files changed, 68 insertions(+)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 51faa70..e390fd9 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -221,6 +221,37 @@ void psr_ctxt_switch_to(struct domain *d)
 }
 }
 
+static int get_cat_socket_info(unsigned int socket,
+   struct psr_cat_socket_info **info)
+{
+if ( !cat_socket_info )
+return -ENODEV;
+
+if ( socket = nr_sockets )
+return -EBADSLT;
+
+if ( !cat_socket_info[socket].enabled )
+return -ENOENT;
+
+*info = cat_socket_info + socket;
+return 0;
+}
+
+int psr_get_cat_l3_info(unsigned int socket, uint32_t *cbm_len,
+uint32_t *cos_max)
+{
+struct psr_cat_socket_info *info;
+int ret = get_cat_socket_info(socket, info);
+
+if ( ret )
+return ret;
+
+*cbm_len = info-cbm_len;
+*cos_max = info-cos_max;
+
+return 0;
+}
+
 /* Called with domain lock held, no psr specific lock needed */
 static void psr_free_cos(struct domain *d)
 {
diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index 611a291..8a9e120 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -171,6 +171,24 @@ long arch_do_sysctl(
 
 break;
 
+case XEN_SYSCTL_psr_cat_op:
+switch ( sysctl-u.psr_cat_op.cmd )
+{
+case XEN_SYSCTL_PSR_CAT_get_l3_info:
+ret = psr_get_cat_l3_info(sysctl-u.psr_cat_op.target,
+  sysctl-u.psr_cat_op.u.l3_info.cbm_len,
+  sysctl-u.psr_cat_op.u.l3_info.cos_max);
+
+if ( !ret  __copy_to_guest(u_sysctl, sysctl, 1) )
+ret = -EFAULT;
+
+break;
+default:
+ret = -EOPNOTSUPP;
+break;
+}
+break;
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/include/asm-x86/psr.h b/xen/include/asm-x86/psr.h
index 45392bf..3a8a406 100644
--- a/xen/include/asm-x86/psr.h
+++ b/xen/include/asm-x86/psr.h
@@ -52,6 +52,9 @@ void psr_free_rmid(struct domain *d);
 
 void psr_ctxt_switch_to(struct domain *d);
 
+int psr_get_cat_l3_info(unsigned int socket, uint32_t *cbm_len,
+uint32_t *cos_max);
+
 int psr_domain_init(struct domain *d);
 void psr_domain_free(struct domain *d);
 
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 8552dc6..91d90b8 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -656,6 +656,20 @@ struct xen_sysctl_psr_cmt_op {
 typedef struct xen_sysctl_psr_cmt_op xen_sysctl_psr_cmt_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cmt_op_t);
 
+#define XEN_SYSCTL_PSR_CAT_get_l3_info   0
+struct xen_sysctl_psr_cat_op {
+uint32_t cmd;   /* IN: XEN_SYSCTL_PSR_CAT_* */
+uint32_t target;/* IN: socket to be operated on */
+union {
+struct {
+uint32_t cbm_len;   /* OUT: CBM length */
+uint32_t cos_max;   /* OUT: Maximum COS */
+} l3_info;
+} u;
+};
+typedef struct xen_sysctl_psr_cat_op xen_sysctl_psr_cat_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cat_op_t);
+
 struct xen_sysctl {
 uint32_t cmd;
 #define XEN_SYSCTL_readconsole1
@@ -678,6 +692,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_scheduler_op  19
 #define XEN_SYSCTL_coverage_op   20
 #define XEN_SYSCTL_psr_cmt_op21
+#define XEN_SYSCTL_psr_cat_op22
 uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
 union {
 struct xen_sysctl_readconsole   readconsole;
@@ -700,6 +715,7 @@ struct xen_sysctl {
 struct xen_sysctl_scheduler_op  scheduler_op;
 struct xen_sysctl_coverage_op   coverage_op;
 struct xen_sysctl_psr_cmt_oppsr_cmt_op;
+struct xen_sysctl_psr_cat_oppsr_cat_op;
 uint8_t pad[128];
 } u;
 };
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015-04-09 Thread Waiman Long

On 04/09/2015 02:23 PM, Peter Zijlstra wrote:

On Thu, Apr 09, 2015 at 08:13:27PM +0200, Peter Zijlstra wrote:

On Mon, Apr 06, 2015 at 10:55:44PM -0400, Waiman Long wrote:

+#define PV_HB_PER_LINE (SMP_CACHE_BYTES / sizeof(struct pv_hash_bucket))
+static struct qspinlock **pv_hash(struct qspinlock *lock, struct pv_node *node)
+{
+   unsigned long init_hash, hash = hash_ptr(lock, pv_lock_hash_bits);
+   struct pv_hash_bucket *hb, *end;
+
+   if (!hash)
+   hash = 1;
+
+   init_hash = hash;
+   hb =pv_lock_hash[hash_align(hash)];
+   for (;;) {
+   for (end = hb + PV_HB_PER_LINE; hb  end; hb++) {
+   if (!cmpxchg(hb-lock, NULL, lock)) {
+   WRITE_ONCE(hb-node, node);
+   /*
+* We haven't set the _Q_SLOW_VAL yet. So
+* the order of writing doesn't matter.
+*/
+   smp_wmb(); /* matches rmb from pv_hash_find */
+   goto done;
+   }
+   }
+
+   hash = lfsr(hash, pv_lock_hash_bits, 0);

Since pv_lock_hash_bits is a variable, you end up running through that
massive if() forest to find the corresponding tap every single time. It
cannot compile-time optimize it.

Hence:
hash = lfsr(hash, pv_taps);

(I don't get the bits argument to the lfsr).

In any case, like I said before, I think we should try a linear probe
sequence first, the lfsr was over engineering from my side.


+   hb =pv_lock_hash[hash_align(hash)];
  

So one thing this does -- and one of the reasons I figured I should
ditch the LFSR instead of fixing it -- is that you end up scanning each
bucket HB_PER_LINE times.


I am aware of that when I was trying to add the hash table debug code, 
but I want to get the code out for review and so hasn't made any change 
yet. I have just done testing by adding some debug code to check the 
hashing efficiency. With the kernel build workload, with over 1M calls 
to pv_hash(), all of them get an empty entry on the first try. Maybe the 
minimum hash table size of 256 helps.




The 'fix' would be to LFSR on cachelines instead of HBs but then you're
stuck with the 0-th cacheline.


This should not be a big problem. I just need to add a check at the end 
of the for loop that if hash is 0, change it to a certain non-0 value 
instead of calling lfsr().


As for ditching the lfsr idea, I am fine with that. So there will be 4 
entries (1 cacheline) for each hash value. If all the entries are full, 
we proceed to the next cacheline.  Right?


Cheers,
Longman

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 3/3] Revert x86/hvm: wait for at least one ioreq server to be enabled

2015-04-09 Thread Wei Liu
This reverts commit dd748d128d86996592afafea02e578cc7d4e6d42.

We don't need this workaround anymore since we have fixed the toolstack
interlock problem that affects stubdom.

Signed-off-by: Wei Liu wei.l...@citrix.com
Cc: Paul Durrant paul.durr...@citrix.com
Cc: Jan Beulich jbeul...@suse.com
Acked-by: Ian Campbell ian.campb...@citrix.com
---
 xen/arch/x86/hvm/hvm.c   | 21 -
 xen/include/asm-x86/hvm/domain.h |  1 -
 2 files changed, 22 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index bfde380..8b62296 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -893,13 +893,6 @@ static void hvm_ioreq_server_enable(struct 
hvm_ioreq_server *s,
 
   done:
 spin_unlock(s-lock);
-
-/* This check is protected by the domain ioreq server lock. */
-if ( d-arch.hvm_domain.ioreq_server.waiting )
-{
-d-arch.hvm_domain.ioreq_server.waiting = 0;
-domain_unpause(d);
-}
 }
 
 static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s,
@@ -1451,20 +1444,6 @@ int hvm_domain_initialise(struct domain *d)
 
 spin_lock_init(d-arch.hvm_domain.ioreq_server.lock);
 INIT_LIST_HEAD(d-arch.hvm_domain.ioreq_server.list);
-
-/*
- * In the case where a stub domain is providing emulation for
- * the guest, there is no interlock in the toolstack to prevent
- * the guest from running before the stub domain is ready.
- * Hence the domain must remain paused until at least one ioreq
- * server is created and enabled.
- */
-if ( !is_pvh_domain(d) )
-{
-domain_pause(d);
-d-arch.hvm_domain.ioreq_server.waiting = 1;
-}
-
 spin_lock_init(d-arch.hvm_domain.irq_lock);
 spin_lock_init(d-arch.hvm_domain.uc_lock);
 
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 0702bf5..2757c7f 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -83,7 +83,6 @@ struct hvm_domain {
 struct {
 spinlock_t   lock;
 ioservid_t   id;
-bool_t   waiting;
 struct list_head list;
 } ioreq_server;
 struct hvm_ioreq_server *default_ioreq_server;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] xen/pci: Try harder to get PXM information for Xen

2015-04-09 Thread Ross Lagerwall
If the device being added to Xen is not contained in the ACPI table,
walk the PCI device tree to find a parent that is contained in the ACPI
table before finding the PXM information from this device.

Previously, it would try to get a handle for the device, then the
device's bridge, then the physfn.  This changes the order so that it
tries to get a handle for the device, then the physfn, the walks up the
PCI device tree.

Signed-off-by: Ross Lagerwall ross.lagerw...@citrix.com
---
 drivers/xen/pci.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 95ee430..7494dbe 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -19,6 +19,7 @@
 
 #include linux/pci.h
 #include linux/acpi.h
+#include linux/pci-acpi.h
 #include xen/xen.h
 #include xen/interface/physdev.h
 #include xen/interface/xen.h
@@ -67,12 +68,22 @@ static int xen_add_device(struct device *dev)
 
 #ifdef CONFIG_ACPI
handle = ACPI_HANDLE(pci_dev-dev);
-   if (!handle  pci_dev-bus-bridge)
-   handle = ACPI_HANDLE(pci_dev-bus-bridge);
 #ifdef CONFIG_PCI_IOV
if (!handle  pci_dev-is_virtfn)
handle = ACPI_HANDLE(physfn-bus-bridge);
 #endif
+   if (!handle) {
+   /*
+* This device was not listed in the ACPI name space at
+* all. Try to get acpi handle of parent pci bus.
+*/
+   struct pci_bus *pbus;
+   for (pbus = pci_dev-bus; pbus; pbus = pbus-parent) {
+   handle = acpi_pci_get_bridge_handle(pbus);
+   if (handle)
+   break;
+   }
+   }
if (handle) {
acpi_status status;
 
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/6] x86/link: Introduce and use __bss_end

2015-04-09 Thread Tim Deegan
At 18:26 +0100 on 07 Apr (1428431178), Andrew Cooper wrote:
 No functional change.
 
 Signed-off-by: Andrew Cooper andrew.coop...@citrix.com

Reviewed-by: Tim Deegan t...@xen.org

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V8 00/12] xen: Clean-up of mem_event subsystem

2015-04-09 Thread Tamas Lengyel
On Thu, Apr 9, 2015 at 1:03 PM, Tim Deegan t...@xen.org wrote:

 Hi,

 Sorry for the delay - I have been away.

 At 22:06 +0100 on 26 Mar (1427407612), Tamas K Lengyel wrote:
  Tamas K Lengyel (12):
xen/mem_event: Cleanup of mem_event structures
xen/mem_event: Cleanup mem_event names in rings, functions and domctls
xen/mem_paging: Convert mem_event_op to mem_paging_op and cleanup
xen: Rename mem_event to vm_event
tools/tests: Clean-up tools/tests/xen-access
x86/hvm: factor out and rename vm_event related functions

 I have applied these six patches.

xen: Introduce monitor_op domctl

 This one no longer applies cleanly - looks like a conflict with a7511905
 (xen: Extend DOMCTL createdomain to support arch configuration)

 Can you rebase the second half of the series please?


Absolutely. Will be sending it shortly, thanks.

Tamas



 Cheers,

 Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch V2 11/15] xen: check for initrd conflicting with e820 map

2015-04-09 Thread David Vrabel
On 09/04/2015 07:55, Juergen Gross wrote:
 Check whether the initrd is placed at a location which is conflicting
 with the target E820 map. If this is the case relocate it to a new
 area unused up to now and compliant to the E820 map.

Reviewed-by: David Vrabel david.vra...@citrix.com

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 05/10] VMX: add help functions to support PML

2015-04-09 Thread Tim Deegan
At 10:35 +0800 on 27 Mar (1427452549), Kai Huang wrote:
 +void vmx_vcpu_disable_pml(struct vcpu *v)
 +{
 +ASSERT(vmx_vcpu_pml_enabled(v));
 +

I think this function ought to call vmx_vcpu_flush_pml_buffer() before
disabling PML.  That way we don't need to worry about losing any
information if a guest vcpu is reset or offlined during migration. 

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 00/10] PML (Paging Modification Logging) support

2015-04-09 Thread Tim Deegan
At 10:24 +0100 on 07 Apr (1428402277), Tim Deegan wrote:
 Hi,
 
 At 16:30 +0800 on 07 Apr (1428424218), Kai Huang wrote:
  Hi Jan, Tim, other maintainers,
  
  Do you have comments? Or should I send out the v2 addressing Andrew's 
  comments, as it's been more than a week since this patch series were 
  sent out?
 
 I'm sorry, I was away last week so I haven't had a chance to review
 these patches.  I'll probably be able to look at them on Thursday.

Done.  They seem to be in good shape for a first cut!  I've commented
on the patches where there was anything I think needs improvement.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch V2 10/15] xen: check pre-allocated page tables for conflict with memory map

2015-04-09 Thread Juergen Gross

On 04/09/2015 02:47 PM, David Vrabel wrote:

On 09/04/2015 07:55, Juergen Gross wrote:

Check whether the page tables built by the domain builder are at
memory addresses which are in conflict with the target memory map.
If this is the case just panic instead of running into problems
later.

Signed-off-by: Juergen Gross jgr...@suse.com
---
  arch/x86/xen/mmu.c | 19 ---
  arch/x86/xen/setup.c   |  6 ++
  arch/x86/xen/xen-ops.h |  1 +
  3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 1ca5197..41aeb1c 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -116,6 +116,7 @@ static pud_t level3_user_vsyscall[PTRS_PER_PUD] 
__page_aligned_bss;
  DEFINE_PER_CPU(unsigned long, xen_cr3);/* cr3 stored as physaddr */
  DEFINE_PER_CPU(unsigned long, xen_current_cr3);/* actual vcpu cr3 */

+static phys_addr_t xen_pt_base, xen_pt_size;


These be __init, but the use of globals in this way is confusing.


How else would you want to do it?





  /*
   * Just beyond the highest usermode address.  STACK_TOP_MAX has a
@@ -1998,7 +1999,9 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, 
unsigned long max_pfn)
check_pt_base(pt_base, pt_end, addr[i]);

/* Our (by three pages) smaller Xen pagetable that we are using */
-   memblock_reserve(PFN_PHYS(pt_base), (pt_end - pt_base) * PAGE_SIZE);
+   xen_pt_base = PFN_PHYS(pt_base);
+   xen_pt_size = (pt_end - pt_base) * PAGE_SIZE;
+   memblock_reserve(xen_pt_base, xen_pt_size);


Why not provide a xen_memblock_check_and_reserve() call that has the
xen_is_e820_reserved() check and the memblock_reserve() call?  This may
also be useful for patch #9 as well.


Uuh, not really. memblock_reserve() for those areas is called much
earlier than the e820 map is constructed.

Thinking more about it, I even have to modify patch 11 and 13:
relocation must be done _after_ doing the memblock_reserve() of all
pre-populated areas to avoid relocating to such an area.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015-04-09 Thread Peter Zijlstra
On Mon, Apr 06, 2015 at 10:55:44PM -0400, Waiman Long wrote:
 +++ b/kernel/locking/qspinlock_paravirt.h
 @@ -0,0 +1,321 @@
 +#ifndef _GEN_PV_LOCK_SLOWPATH
 +#error do not include this file
 +#endif
 +
 +/*
 + * Implement paravirt qspinlocks; the general idea is to halt the vcpus 
 instead
 + * of spinning them.
 + *
 + * This relies on the architecture to provide two paravirt hypercalls:
 + *
 + *   pv_wait(u8 *ptr, u8 val) -- suspends the vcpu if *ptr == val
 + *   pv_kick(cpu) -- wakes a suspended vcpu
 + *
 + * Using these we implement __pv_queue_spin_lock_slowpath() and
 + * __pv_queue_spin_unlock() to replace native_queue_spin_lock_slowpath() and
 + * native_queue_spin_unlock().
 + */
 +
 +#define _Q_SLOW_VAL  (3U  _Q_LOCKED_OFFSET)
 +
 +enum vcpu_state {
 + vcpu_running = 0,
 + vcpu_halted,
 +};
 +
 +struct pv_node {
 + struct mcs_spinlock mcs;
 + struct mcs_spinlock __res[3];
 +
 + int cpu;
 + u8  state;
 +};
 +
 +/*
 + * Hash table using open addressing with an LFSR probe sequence.
 + *
 + * Since we should not be holding locks from NMI context (very rare indeed) 
 the
 + * max load factor is 0.75, which is around the point where open addressing
 + * breaks down.
 + *
 + * Instead of probing just the immediate bucket we probe all buckets in the
 + * same cacheline.
 + *
 + * http://en.wikipedia.org/wiki/Hash_table#Open_addressing
 + *
 + * Dynamically allocate a hash table big enough to hold at least 4X the
 + * number of possible cpus in the system. Allocation is done on page
 + * granularity. So the minimum number of hash buckets should be at least
 + * 256 to fully utilize a 4k page.
 + */
 +#define LFSR_MIN_BITS8
 +#define  LFSR_MAX_BITS   (2 + NR_CPUS_BITS)
 +#if LFSR_MAX_BITS  LFSR_MIN_BITS
 +#undef  LFSR_MAX_BITS
 +#define LFSR_MAX_BITSLFSR_MIN_BITS
 +#endif
 +
 +struct pv_hash_bucket {
 + struct qspinlock *lock;
 + struct pv_node   *node;
 +};
 +#define PV_HB_PER_LINE   (SMP_CACHE_BYTES / sizeof(struct 
 pv_hash_bucket))
 +#define HB_RESERVED  ((struct qspinlock *)1)

This is unused.

 +
 +static struct pv_hash_bucket *pv_lock_hash;
 +static unsigned int pv_lock_hash_bits __read_mostly;

static unsigned int pv_taps __read_mostly;

 +
 +#include linux/hash.h
 +#include linux/lfsr.h
 +#include linux/bootmem.h
 +
 +/*
 + * Allocate memory for the PV qspinlock hash buckets
 + *
 + * This function should be called from the paravirt spinlock initialization
 + * routine.
 + */
 +void __init __pv_init_lock_hash(void)
 +{
 + int pv_hash_size = 4 * num_possible_cpus();
 +
 + if (pv_hash_size  (1U  LFSR_MIN_BITS))
 + pv_hash_size = (1U  LFSR_MIN_BITS);
 + /*
 +  * Allocate space from bootmem which should be page-size aligned
 +  * and hence cacheline aligned.
 +  */
 + pv_lock_hash = alloc_large_system_hash(PV qspinlock,
 +sizeof(struct pv_hash_bucket),
 +pv_hash_size, 0, HASH_EARLY,
 +pv_lock_hash_bits, NULL,
 +pv_hash_size, pv_hash_size);

pv_taps = lfsr_taps(pv_lock_hash_bits);

 +}
 +
 +static inline u32 hash_align(u32 hash)
 +{
 + return hash  ~(PV_HB_PER_LINE - 1);
 +}
 +
 +static struct qspinlock **pv_hash(struct qspinlock *lock, struct pv_node 
 *node)
 +{
 + unsigned long init_hash, hash = hash_ptr(lock, pv_lock_hash_bits);
 + struct pv_hash_bucket *hb, *end;
 +
 + if (!hash)
 + hash = 1;
 +
 + init_hash = hash;
 + hb = pv_lock_hash[hash_align(hash)];
 + for (;;) {
 + for (end = hb + PV_HB_PER_LINE; hb  end; hb++) {
 + if (!cmpxchg(hb-lock, NULL, lock)) {
 + WRITE_ONCE(hb-node, node);
 + /*
 +  * We haven't set the _Q_SLOW_VAL yet. So
 +  * the order of writing doesn't matter.
 +  */
 + smp_wmb(); /* matches rmb from pv_hash_find */

This doesn't make sense. Both sites do -lock first and -node second.
No amount of ordering can 'fix' that.

I think we can safely remove this wmb and the rmb below, because the
required ordering is already provided by setting/observing l-locked ==
SLOW.

 + goto done;
 + }
 + }
 +
 + hash = lfsr(hash, pv_lock_hash_bits, 0);

Since pv_lock_hash_bits is a variable, you end up running through that
massive if() forest to find the corresponding tap every single time. It
cannot compile-time optimize it.

Hence:
hash = lfsr(hash, pv_taps);

(I don't get the bits argument to the lfsr).

In any case, like I said before, I think we should try a linear probe
sequence first, the lfsr was over engineering from my side.


Re: [Xen-devel] [Patch V2 10/15] xen: check pre-allocated page tables for conflict with memory map

2015-04-09 Thread David Vrabel
On 09/04/2015 07:55, Juergen Gross wrote:
 Check whether the page tables built by the domain builder are at
 memory addresses which are in conflict with the target memory map.
 If this is the case just panic instead of running into problems
 later.
 
 Signed-off-by: Juergen Gross jgr...@suse.com
 ---
  arch/x86/xen/mmu.c | 19 ---
  arch/x86/xen/setup.c   |  6 ++
  arch/x86/xen/xen-ops.h |  1 +
  3 files changed, 23 insertions(+), 3 deletions(-)
 
 diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
 index 1ca5197..41aeb1c 100644
 --- a/arch/x86/xen/mmu.c
 +++ b/arch/x86/xen/mmu.c
 @@ -116,6 +116,7 @@ static pud_t level3_user_vsyscall[PTRS_PER_PUD] 
 __page_aligned_bss;
  DEFINE_PER_CPU(unsigned long, xen_cr3);   /* cr3 stored as physaddr */
  DEFINE_PER_CPU(unsigned long, xen_current_cr3);   /* actual vcpu cr3 */
  
 +static phys_addr_t xen_pt_base, xen_pt_size;

These be __init, but the use of globals in this way is confusing.

  
  /*
   * Just beyond the highest usermode address.  STACK_TOP_MAX has a
 @@ -1998,7 +1999,9 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, 
 unsigned long max_pfn)
   check_pt_base(pt_base, pt_end, addr[i]);
  
   /* Our (by three pages) smaller Xen pagetable that we are using */
 - memblock_reserve(PFN_PHYS(pt_base), (pt_end - pt_base) * PAGE_SIZE);
 + xen_pt_base = PFN_PHYS(pt_base);
 + xen_pt_size = (pt_end - pt_base) * PAGE_SIZE;
 + memblock_reserve(xen_pt_base, xen_pt_size);

Why not provide a xen_memblock_check_and_reserve() call that has the
xen_is_e820_reserved() check and the memblock_reserve() call?  This may
also be useful for patch #9 as well.

 +void __init xen_pt_check_e820(void)
 +{
 + if (xen_chk_e820_reserved(xen_pt_base, xen_pt_size)) {
 + xen_raw_console_write(Xen hypervisor allocated page table 
 memory conflicts with E820 map\n);
 + BUG();
 + }
 +}
 +
  static unsigned char dummy_mapping[PAGE_SIZE] __page_aligned_bss;

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 p2 03/19] xen/arm: Release IRQ routed to a domain when it's destroying

2015-04-09 Thread Julien Grall
From: Julien Grall julien.gr...@linaro.org

Xen has to release IRQ routed to a domain in order to reuse later.
Currently only SPIs can be routed to the guest so we only need to
browse SPIs for a specific domain.

Furthermore, a guest can crash and leave the IRQ in an incorrect state
(i.e has not been EOIed). Xen will have to reset the IRQ in order to
be able to reuse the IRQ later.

Introduce 2 new functions for release an IRQ routed to a domain:
- release_guest_irq: upper level to retrieve the IRQ, call the GIC
code and release the action
- gic_remove_guest_irq: Check if we can remove the IRQ, and reset
it if necessary

Signed-off-by: Julien Grall julien.gr...@linaro.org
Acked-by: Ian Campbell ian.campb...@citrix.com

---
Changes in v5:
- Typoes in the commit message
- Add Ian's Ack

Changes in v4:
- Reorder the code flow
- Typoes and coding style
- Use the newly helper spi_to_pending

Changes in v3:
- Take the vgic rank lock to protect p-desc
- Correctly check if the IRQ is disabled
- Extend the check on the virq in release_guest_irq
- Use vgic_get_target_vcpu to get the target vCPU
- Remove spurious change

Changes in v2:
- Drop the desc-handler = no_irq_type in release_irq as it's
buggy if the IRQ is routed to Xen
- Add release_guest_irq and gic_remove_guest_irq
---
 xen/arch/arm/gic.c| 45 +
 xen/arch/arm/irq.c| 46 ++
 xen/arch/arm/vgic.c   | 16 
 xen/include/asm-arm/gic.h |  4 
 xen/include/asm-arm/irq.h |  2 ++
 5 files changed, 113 insertions(+)

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 5f34997..f023e4f 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -163,6 +163,51 @@ out:
 return res;
 }
 
+/* This function only works with SPIs for now */
+int gic_remove_irq_from_guest(struct domain *d, unsigned int virq,
+  struct irq_desc *desc)
+{
+struct vcpu *v_target = vgic_get_target_vcpu(d-vcpu[0], virq);
+struct vgic_irq_rank *rank = vgic_rank_irq(v_target, virq);
+struct pending_irq *p = irq_to_pending(v_target, virq);
+unsigned long flags;
+
+ASSERT(spin_is_locked(desc-lock));
+ASSERT(test_bit(_IRQ_GUEST, desc-status));
+ASSERT(p-desc == desc);
+
+vgic_lock_rank(v_target, rank, flags);
+
+if ( d-is_dying )
+{
+desc-handler-shutdown(desc);
+
+/* EOI the IRQ it it has not been done by the guest */
+if ( test_bit(_IRQ_INPROGRESS, desc-status) )
+gic_hw_ops-deactivate_irq(desc);
+clear_bit(_IRQ_INPROGRESS, desc-status);
+}
+else
+{
+/*
+ * TODO: Handle eviction from LRs For now, deny
+ * remove if the IRQ is inflight or not disabled.
+ */
+if ( test_bit(_IRQ_INPROGRESS, desc-status) ||
+ !test_bit(_IRQ_DISABLED, desc-status) )
+return -EBUSY;
+}
+
+clear_bit(_IRQ_GUEST, desc-status);
+desc-handler = no_irq_type;
+
+p-desc = NULL;
+
+vgic_unlock_rank(v_target, rank, flags);
+
+return 0;
+}
+
 int gic_irq_xlate(const u32 *intspec, unsigned int intsize,
   unsigned int *out_hwirq,
   unsigned int *out_type)
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index b2ddf6b..376c9f2 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -513,6 +513,52 @@ free_info:
 return retval;
 }
 
+int release_guest_irq(struct domain *d, unsigned int virq)
+{
+struct irq_desc *desc;
+struct irq_guest *info;
+unsigned long flags;
+struct pending_irq *p;
+int ret;
+
+/* Only SPIs are supported */
+if ( virq  NR_LOCAL_IRQS || virq = vgic_num_irqs(d) )
+return -EINVAL;
+
+p = spi_to_pending(d, virq);
+if ( !p-desc )
+return -EINVAL;
+
+desc = p-desc;
+
+spin_lock_irqsave(desc-lock, flags);
+
+ret = -EINVAL;
+if ( !test_bit(_IRQ_GUEST, desc-status) )
+goto unlock;
+
+info = irq_get_guest_info(desc);
+ret = -EINVAL;
+if ( d != info-d )
+goto unlock;
+
+ret = gic_remove_irq_from_guest(d, virq, desc);
+if ( ret )
+goto unlock;
+
+spin_unlock_irqrestore(desc-lock, flags);
+
+release_irq(desc-irq, info);
+xfree(info);
+
+return 0;
+
+unlock:
+spin_unlock_irqrestore(desc-lock, flags);
+
+return ret;
+}
+
 /*
  * pirq event channels. We don't use these on ARM, instead we use the
  * features of the GIC to inject virtualised normal interrupts.
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 55c8927..05c5010 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -135,6 +135,22 @@ void register_vgic_ops(struct domain *d, const struct 
vgic_ops *ops)
 
 void domain_vgic_free(struct domain *d)
 {
+int i;
+int ret;
+
+for ( i = 0; i  

[Xen-devel] [PATCH v2 0/2] osstest: update FreeBSD guests and cleanup

2015-04-09 Thread Roger Pau Monne
The first patch in this series updates FreeBSD guests in OSSTest to use raw 
images instead of qcow2 (which are no longer provided by upstream). The 
second patch is a cleanup for ts-freebsd-install which should not change 
functionality.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 2/2] osstest: clean ts-freebsd-install script

2015-04-09 Thread Roger Pau Monne
Remove some unused variables from ts-freebsd-install script. Also make the
third parameter of target_put_guest_image optional and fix both callers of
this function.

Signed-off-by: Roger Pau Monné roger@citrix.com
Cc: Ian Jackson ian.jack...@eu.citrix.com
---
 Osstest/TestSupport.pm |  4 ++--
 ts-freebsd-install | 21 ++---
 2 files changed, 4 insertions(+), 21 deletions(-)

diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index 8754e22..96942bd 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -1554,7 +1554,7 @@ END
 return $cfgpath;
 }
 
-sub target_put_guest_image ($$$) {
+sub target_put_guest_image ($$;$) {
 my ($ho, $gho, $default) = @_;
 my $specimage = $r{$gho-{Guest}_image};
 $specimage = $default if !defined $specimage;
@@ -1574,7 +1574,7 @@ sub more_prepareguest_hvm (;@) {
 my @disks = phy:$gho-{Lvdev},hda,w;
 
 if (!$xopts{NoCdromImage}) {
-   target_put_guest_image($ho, $gho, undef);
+   target_put_guest_image($ho, $gho);
 
my $postimage_hook= $xopts{PostImageHook};
$postimage_hook-() if $postimage_hook;
diff --git a/ts-freebsd-install b/ts-freebsd-install
index 61d2f83..4449fd1 100755
--- a/ts-freebsd-install
+++ b/ts-freebsd-install
@@ -36,18 +36,6 @@ our $gho;
 
 our $mnt= '/root/freebsd_root';
 
-our $freebsd_version= 10.0-BETA3;
-
-# Folder where the FreeBSD VM images are stored inside of the host
-#
-# The naming convention of the stored images is:
-# FreeBSD-$freebsd_version-$arch.qcow2.xz
-# ie: FreeBSD-10.0-BETA3-amd64.qcow2.xz
-#
-# Used only if the runvar guest_image is not set.
-#
-our $freebsd_vm_repo= '/var/images';
-
 sub prep () {
 my $authkeys= authorized_keys();
 
@@ -59,13 +47,8 @@ sub prep () {
 
 more_prepareguest_hvm($ho, $gho, $ram_mb, $disk_mb, NoCdromImage = 1);
 
-target_put_guest_image($ho, $gho,
-  $freebsd_vm_repo/FreeBSD-$freebsd_version-.
-  (defined($r{$gho-{Guest}_arch})
-   # Use amd64 as default arch
-   ? $r{$gho-{Guest}_arch} : 'amd64').
-  .qcow2.xz);
-   
+target_put_guest_image($ho, $gho);
+
 my $rootpartition_dev = target_guest_lv_name($ho, $gho-{Name}) . 
--disk3;
 
 target_cmd_root($ho, umount $gho-{Lvdev} ||:);
-- 
1.9.5 (Apple Git-50.3)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 0/7] Intel Cache Monitoring: Current Status and Future Opportunities

2015-04-09 Thread Meng Xu
Hi Dario,

2015-04-03 22:14 GMT-04:00 Dario Faggioli dario.faggi...@citrix.com:
 Hi Everyone,

 This RFC series is the outcome of an investigation I've been doing about
 whether we can take better advantage of features like Intel CMT (and of PSR
 features in general). By take better advantage of them I mean, for example,
 use the data obtained from monitoring within the scheduler and/or within
 libxl's automatic NUMA placement algorithm, or similar.

 I'm putting here in the cover letter a markdown document I wrote to better
 describe my findings and ideas (sorry if it's a bit long! :-D). You can also
 fetch it at the following links:

  * http://xenbits.xen.org/people/dariof/CMT-in-scheduling.pdf
  * http://xenbits.xen.org/people/dariof/CMT-in-scheduling.markdown

 See the document itself and the changelog of the various patches for details.

 The series includes one Chao's patch on top, as I found it convenient to build
 on top of it. The series itself is available here:

   git://xenbits.xen.org/people/dariof/xen.git  wip/sched/icachemon
   
 http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/wip/sched/icachemon

 Thanks a lot to everyone that will read and reply! :-)

 Regards,
 Dario
 ---

 # Intel Cache Monitoring: Present and Future

 ## About this document

 This document represents the result of in investigation on whether it would be
 possible to more extensively exploit the Platform Shared Resource Monitoring
 (PSR) capabilities of recent Intel x86 server chips. Examples of such features
 are the Cache Monitoring Technology (CMT) and the Memory Bandwidth Monitoring
 (MBM).

 More specifically, it focuses on Cache Monitoring Technology, support for 
 which
 has recently been introduced in Xen by Intel, trying to figure out whether it
 can be used for high level load balancing, such as libxl automatic domain
 placement, and/or within Xen vCPU scheduler(s).

 Note that, although the document only speaks about CMT, most of the
 considerations apply (or can easily be extended) to MBM as well.

 The fact that, currently, support is provided for monitoring L3 cache only,
 somewhat limits the benefits of more extensively exploiting such technology,
 which is exactly the purpose here. Nevertheless, some improvements are 
 possible
 already, and if at some point support for monitoring other cache layers will 
 be
 available, this can be the basic building block for taking advantage of that
 too.

I'm wondering if you really want to know the cache usage at different
levels of cache, you may use the (4) general PMC on each logical core
to monitor that. This could bypass the limitation of the current HW,
but the concern is that it may affect the other mechanisms in Xen,
like perf, which also use the PMC.)

Another thought on the CMT is that it seems that Intel introduces CMT
along with CAT. So I assume they want to use CMT along with CAT so
that it gives some hint on how to allocate LLC to different guests?
For example, if a crazy guest is thrashing the LLC, they can apply CAT
to constraint/calm down this crazy guest.


Best,

Meng

---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 04/12] x86: maintain COS to CBM mapping for each socket

2015-04-09 Thread Andrew Cooper
On 09/04/2015 10:18, Chao Peng wrote:
 For each socket, a COS to CBM mapping structure is maintained for each
 COS. The mapping is indexed by COS and the value is the corresponding
 CBM. Different VMs may use the same CBM, a reference count is used to
 indicate if the CBM is available.

 Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
 ---
  xen/arch/x86/psr.c | 14 ++
  1 file changed, 14 insertions(+)

 diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
 index 16c37dd..4aff5f6 100644
 --- a/xen/arch/x86/psr.c
 +++ b/xen/arch/x86/psr.c
 @@ -21,11 +21,17 @@
  #define PSR_CMT(10)
  #define PSR_CAT(11)
  
 +struct psr_cat_cbm {
 +unsigned int ref;
 +uint64_t cbm;
 +};
 +
  struct psr_cat_socket_info {
  bool_t initialized;
  bool_t enabled;
  unsigned int cbm_len;
  unsigned int cos_max;
 +struct psr_cat_cbm *cos_cbm_map;

cos_to_cmb would be more in keeping with Xen style, and IMO easier to
read in code.

  };
  
  struct psr_assoc {
 @@ -240,6 +246,14 @@ static void cat_cpu_init(unsigned int cpu)
  info-cbm_len = (eax  0x1f) + 1;
  info-cos_max = (edx  0x);

Apologies for missing this in the previous patch, but cos_max should
have a command line parameter like rmid_max if a lower limit wants to be
enforced.

Otherwise, Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

  
 +info-cos_cbm_map = xzalloc_array(struct psr_cat_cbm,
 +  info-cos_max + 1UL);
 +if ( !info-cos_cbm_map )
 +return;
 +
 +/* cos=0 is reserved as default cbm(all ones). */
 +info-cos_cbm_map[0].cbm = (1ull  info-cbm_len) - 1;
 +
  info-enabled = 1;
  printk(XENLOG_INFO CAT: enabled on socket %u, cos_max:%u, 
 cbm_len:%u\n,
 socket, info-cos_max, info-cbm_len);


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] VTd/dmar: Tweak how the DMAR table is clobbered

2015-04-09 Thread David Vrabel
On 08/04/15 20:44, Andrew Cooper wrote:
 Intead of clobbering DMAR - XMAR and back, clobber to RMAD instead.  This
 means that changing the signature does not alter the checksum, which allows
 the clobbering/unclobbering to be peformed atomically and idempotently, which
 is an advantage on the kexec path which can reenter acpi_dmar_reinstate().

Could RMAD be specified as a real table in the future?  Does the
clobbered name have to start with X to avoid future conflicts?

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch V2 01/15] xen: sync with xen headers

2015-04-09 Thread David Vrabel
On 09/04/15 07:55, Juergen Gross wrote:
 Use the newest headers from the xen tree to get some new structure
 layouts.

Reviewed-by: David Vrabel david.vra...@citrix.com

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC v2 3/3] xen: rework paging_log_dirty_op to work with hvm guests

2015-04-09 Thread Tim Deegan
At 12:09 +0200 on 07 Apr (1428408556), Roger Pau Monné wrote:
 Hello,
 
 El 03/04/15 a les 16.12, Tim Deegan ha escrit:
  Hi,
  
  At 20:46 +0100 on 02 Apr (1428007593), Andrew Cooper wrote:
  On 02/04/15 11:26, Roger Pau Monne wrote:
  When the caller of paging_log_dirty_op is a hvm guest Xen would choke when
  trying to copy the dirty bitmap to the guest because the paging lock is
  already held.
 
  Are you sure? Presumably you get an mm lock ordering violation, because
  paging_log_dirty_op() should take the target domains paging lock, rather
  than your own (which is prohibited by the current check at the top of
  paging_domctl()).
 
  Unfortunately, dropping the paging_lock() here is unsafe, as it will
  result in corruption of the logdirty bitmap from non-domain sources such
  as HVMOP_modified_memory.
 
  I will need to find some time with a large pot of coffee and a
  whiteboard, but I suspect it might actually be safe to alter the current
  mm_lock() enforcement to maintain independent levels for a source and
  destination domain.
  
  We discussed this in an earlier thread and agreed it would be better
  to try to do this work in batches rather than add more complexity to
  the mm locking rules.  (I'm AFK this week so I haven't had a chance to
  review the actual pacth yet.)
 
 I don't know about the locking rules or how much complexity would
 permitting this kind of accesses add to it, but IMHO this patch makes
 the code quite more complex and possibly error prone, so finding a
 simpler approach seems like a good option to me.

I'm happier with this (relatively contained) complexity than with
adding yet more logic to the mm_locks code.

I don't think there are any new races introduced here that weren't
already present in the -ERESTART case.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 08/10] VMX: disable PML in vmx_vcpu_destroy

2015-04-09 Thread Tim Deegan
At 10:35 +0800 on 27 Mar (1427452552), Kai Huang wrote:
 It's possible domain still remains in log-dirty mode when it is about to be
 destroyed, in which case we should manually disable PML for it.
 
 Signed-off-by: Kai Huang kai.hu...@linux.intel.com
 ---
  xen/arch/x86/hvm/vmx/vmx.c | 9 +
  1 file changed, 9 insertions(+)
 
 diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
 index fce3aa2..75ac44b 100644
 --- a/xen/arch/x86/hvm/vmx/vmx.c
 +++ b/xen/arch/x86/hvm/vmx/vmx.c
 @@ -153,6 +153,15 @@ static int vmx_vcpu_initialise(struct vcpu *v)
  
  static void vmx_vcpu_destroy(struct vcpu *v)
  {
 +/*
 + * There are cases that domain still remains in log-dirty mode when it is
 + * about to be destroyed (ex, user types 'xl destroy dom'), in which 
 case
 + * we should disable PML manually here. Note that vmx_vcpu_destroy is 
 called
 + * prior to vmx_domain_destroy so we need to disable PML for each vcpu
 + * separately here.
 + */
 +if ( vmx_vcpu_pml_enabled(v) )
 +vmx_vcpu_disable_pml(v);

Looking at this and other callers of these enable/disable functions, I
think it would be better to make those functions idempotent (i.e.
*_{en,dis}able_pml() should just return success if PML is already
enabled/disabled).  Then you don't need to check in every caller, and
there's no risk of a crash if one caller is missing the check.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [Patch V2 08/15] xen: find unused contiguous memory area

2015-04-09 Thread Juergen Gross
For being able to relocate pre-allocated data areas like initrd or
p2m list it is mandatory to find a contiguous memory area which is
not yet in use and doesn't conflict with the memory map we want to
be in effect.

In case such an area is found reserve it at once as this will be
required to be done in any case.

Signed-off-by: Juergen Gross jgr...@suse.com
---
 arch/x86/xen/setup.c   | 34 ++
 arch/x86/xen/xen-ops.h |  1 +
 2 files changed, 35 insertions(+)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 4666adf..606ac2b 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -597,6 +597,40 @@ bool __init xen_chk_e820_reserved(phys_addr_t start, 
phys_addr_t size)
 }
 
 /*
+ * Find a free area in physical memory not yet reserved and compliant with
+ * E820 map.
+ * Used to relocate pre-allocated areas like initrd or p2m list which are in
+ * conflict with the to be used E820 map.
+ * In case no area is found, return 0. Otherwise return the physical address
+ * of the area which is already reserved for convenience.
+ */
+phys_addr_t __init xen_find_free_area(phys_addr_t size)
+{
+   unsigned mapcnt;
+   phys_addr_t addr, start;
+   struct e820entry *entry = xen_e820_map;
+
+   for (mapcnt = 0; mapcnt  xen_e820_map_entries; mapcnt++, entry++) {
+   if (entry-type != E820_RAM || entry-size  size)
+   continue;
+   start = entry-addr;
+   for (addr = start; addr  start + size; addr += PAGE_SIZE) {
+   if (!memblock_is_reserved(addr))
+   continue;
+   start = addr + PAGE_SIZE;
+   if (start + size  entry-addr + entry-size)
+   break;
+   }
+   if (addr = start + size) {
+   memblock_reserve(start, size);
+   return start;
+   }
+   }
+
+   return 0;
+}
+
+/*
  * Reserve Xen mfn_list.
  * See comment above struct start_info in xen/interface/xen.h
  * We tried to make the the memblock_reserve more selective so
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 56650bb..c7fa0a3 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -43,6 +43,7 @@ bool __init xen_chk_e820_reserved(phys_addr_t start, 
phys_addr_t size);
 unsigned long __ref xen_chk_extra_mem(unsigned long pfn);
 void __init xen_inv_extra_mem(void);
 void __init xen_remap_memory(void);
+phys_addr_t __init xen_find_free_area(phys_addr_t size);
 char * __init xen_memory_setup(void);
 char * xen_auto_xlated_memory_setup(void);
 void __init xen_arch_setup(void);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [Patch V2 06/15] xen: split counting of extra memory pages from remapping

2015-04-09 Thread Juergen Gross
Memory pages in the initial memory setup done by the Xen hypervisor
conflicting with the target E820 map are remapped. In order to do this
those pages are counted and remapped in xen_set_identity_and_remap().

Split the counting from the remapping operation to be able to setup
the needed memory sizes in time but doing the remap operation at a
later time. This enables us to simplify the interface to
xen_set_identity_and_remap() as the number of remapped and released
pages is no longer needed here.

Finally move the remapping further down to prepare relocating
conflicting memory contents before the memory might be clobbered by
xen_set_identity_and_remap(). This requires to not destroy the Xen
E820 map when the one for the system is being constructed.

Signed-off-by: Juergen Gross jgr...@suse.com
---
 arch/x86/xen/setup.c | 98 +++-
 1 file changed, 58 insertions(+), 40 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index ab6c36e..87251b4 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -223,7 +223,7 @@ static int __init xen_free_mfn(unsigned long mfn)
  * as a fallback if the remapping fails.
  */
 static void __init xen_set_identity_and_release_chunk(unsigned long start_pfn,
-   unsigned long end_pfn, unsigned long nr_pages, unsigned long *released)
+   unsigned long end_pfn, unsigned long nr_pages)
 {
unsigned long pfn, end;
int ret;
@@ -243,7 +243,7 @@ static void __init 
xen_set_identity_and_release_chunk(unsigned long start_pfn,
WARN(ret != 1, Failed to release pfn %lx err=%d\n, pfn, ret);
 
if (ret == 1) {
-   (*released)++;
+   xen_released_pages++;
if (!__set_phys_to_machine(pfn, INVALID_P2M_ENTRY))
break;
} else
@@ -359,8 +359,7 @@ static void __init xen_do_set_identity_and_remap_chunk(
  */
 static unsigned long __init xen_set_identity_and_remap_chunk(
unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_pages,
-   unsigned long remap_pfn, unsigned long *released,
-   unsigned long *remapped)
+   unsigned long remap_pfn)
 {
unsigned long pfn;
unsigned long i = 0;
@@ -385,7 +384,7 @@ static unsigned long __init 
xen_set_identity_and_remap_chunk(
if (!remap_range_size) {
pr_warning(Unable to find available pfn range, not 
remapping identity pages\n);
xen_set_identity_and_release_chunk(cur_pfn,
-   cur_pfn + left, nr_pages, released);
+   cur_pfn + left, nr_pages);
break;
}
/* Adjust size to fit in current e820 RAM region */
@@ -397,7 +396,6 @@ static unsigned long __init 
xen_set_identity_and_remap_chunk(
/* Update variables to reflect new mappings. */
i += size;
remap_pfn += size;
-   *remapped += size;
}
 
/*
@@ -412,14 +410,11 @@ static unsigned long __init 
xen_set_identity_and_remap_chunk(
return remap_pfn;
 }
 
-static void __init xen_set_identity_and_remap(unsigned long nr_pages,
-   unsigned long *released, unsigned long *remapped)
+static void __init xen_set_identity_and_remap(unsigned long nr_pages)
 {
phys_addr_t start = 0;
unsigned long last_pfn = nr_pages;
const struct e820entry *entry = xen_e820_map;
-   unsigned long num_released = 0;
-   unsigned long num_remapped = 0;
int i;
 
/*
@@ -445,16 +440,12 @@ static void __init xen_set_identity_and_remap(unsigned 
long nr_pages,
if (start_pfn  end_pfn)
last_pfn = xen_set_identity_and_remap_chunk(
start_pfn, end_pfn, nr_pages,
-   last_pfn, num_released,
-   num_remapped);
+   last_pfn);
start = end;
}
}
 
-   *released = num_released;
-   *remapped = num_remapped;
-
-   pr_info(Released %ld page(s)\n, num_released);
+   pr_info(Released %ld page(s)\n, xen_released_pages);
 }
 
 /*
@@ -560,6 +551,28 @@ static void __init xen_ignore_unusable(void)
}
 }
 
+static unsigned long __init xen_count_remap_pages(unsigned long max_pfn)
+{
+   unsigned long extra = 0;
+   const struct e820entry *entry = xen_e820_map;
+   int i;
+
+   for (i = 0; i  xen_e820_map_entries; i++, entry++) {
+   unsigned long start_pfn = PFN_DOWN(entry-addr);
+   unsigned long end_pfn = PFN_UP(entry-addr + entry-size);
+
+   if (start_pfn = max_pfn)
+   

[Xen-devel] [Patch V2 13/15] xen: move p2m list if conflicting with e820 map

2015-04-09 Thread Juergen Gross
Check whether the hypervisor supplied p2m list is placed at a location
which is conflicting with the target E820 map. If this is the case
relocate it to a new area unused up to now and compliant to the E820
map.

As the p2m list might by huge (up to several GB) and is required to be
mapped virtually, set up a temporary mapping for the copied list.

For pvh domains just delete the p2m related information from start
info instead of reserving the p2m memory, as we don't need it at all.

For 32 bit kernels adjust the memblock_reserve() parameters in order
to cover the page tables only. This requires to memblock_reserve() the
start_info page on it's own.

Signed-off-by: Juergen Gross jgr...@suse.com
---
 arch/x86/xen/mmu.c | 232 ++---
 arch/x86/xen/setup.c   |  51 +--
 arch/x86/xen/xen-ops.h |   3 +
 3 files changed, 247 insertions(+), 39 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 41aeb1c..3689fb8 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1094,6 +1094,16 @@ static void xen_exit_mmap(struct mm_struct *mm)
 
 static void xen_post_allocator_init(void);
 
+static void __init pin_pagetable_pfn(unsigned cmd, unsigned long pfn)
+{
+   struct mmuext_op op;
+
+   op.cmd = cmd;
+   op.arg1.mfn = pfn_to_mfn(pfn);
+   if (HYPERVISOR_mmuext_op(op, 1, NULL, DOMID_SELF))
+   BUG();
+}
+
 #ifdef CONFIG_X86_64
 static void __init xen_cleanhighmap(unsigned long vaddr,
unsigned long vaddr_end)
@@ -1129,10 +1139,12 @@ static void __init xen_free_ro_pages(unsigned long 
paddr, unsigned long size)
memblock_free(paddr, size);
 }
 
-static void __init xen_cleanmfnmap_free_pgtbl(void *pgtbl)
+static void __init xen_cleanmfnmap_free_pgtbl(void *pgtbl, bool unpin)
 {
unsigned long pa = __pa(pgtbl)  PHYSICAL_PAGE_MASK;
 
+   if (unpin)
+   pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, PFN_DOWN(pa));
ClearPagePinned(virt_to_page(__va(pa)));
xen_free_ro_pages(pa, PAGE_SIZE);
 }
@@ -1151,7 +1163,9 @@ static void __init xen_cleanmfnmap(unsigned long vaddr)
pmd_t *pmd;
pte_t *pte;
unsigned int i;
+   bool unpin;
 
+   unpin = (vaddr == 2 * PGDIR_SIZE);
set_pgd(pgd, __pgd(0));
do {
pud = pud_page + pud_index(va);
@@ -1168,22 +1182,24 @@ static void __init xen_cleanmfnmap(unsigned long vaddr)
xen_free_ro_pages(pa, PMD_SIZE);
} else if (!pmd_none(*pmd)) {
pte = pte_offset_kernel(pmd, va);
+   set_pmd(pmd, __pmd(0));
for (i = 0; i  PTRS_PER_PTE; ++i) {
if (pte_none(pte[i]))
break;
pa = pte_pfn(pte[i])  PAGE_SHIFT;
xen_free_ro_pages(pa, PAGE_SIZE);
}
-   xen_cleanmfnmap_free_pgtbl(pte);
+   xen_cleanmfnmap_free_pgtbl(pte, unpin);
}
va += PMD_SIZE;
if (pmd_index(va))
continue;
-   xen_cleanmfnmap_free_pgtbl(pmd);
+   set_pud(pud, __pud(0));
+   xen_cleanmfnmap_free_pgtbl(pmd, unpin);
}
 
} while (pud_index(va) || pmd_index(va));
-   xen_cleanmfnmap_free_pgtbl(pud_page);
+   xen_cleanmfnmap_free_pgtbl(pud_page, unpin);
 }
 
 static void __init xen_pagetable_p2m_free(void)
@@ -1219,6 +1235,12 @@ static void __init xen_pagetable_p2m_free(void)
} else {
xen_cleanmfnmap(addr);
}
+}
+
+static void __init xen_pagetable_cleanhighmap(void)
+{
+   unsigned long size;
+   unsigned long addr;
 
/* At this stage, cleanup_highmap has already cleaned __ka space
 * from _brk_limit way up to the max_pfn_mapped (which is the end of
@@ -1251,6 +1273,8 @@ static void __init xen_pagetable_p2m_setup(void)
 
 #ifdef CONFIG_X86_64
xen_pagetable_p2m_free();
+
+   xen_pagetable_cleanhighmap();
 #endif
/* And revector! Bye bye old array */
xen_start_info-mfn_list = (unsigned long)xen_p2m_addr;
@@ -1586,15 +1610,6 @@ static void __init xen_set_pte_init(pte_t *ptep, pte_t 
pte)
native_set_pte(ptep, pte);
 }
 
-static void __init pin_pagetable_pfn(unsigned cmd, unsigned long pfn)
-{
-   struct mmuext_op op;
-   op.cmd = cmd;
-   op.arg1.mfn = pfn_to_mfn(pfn);
-   if (HYPERVISOR_mmuext_op(op, 1, NULL, DOMID_SELF))
-   BUG();
-}
-
 /* Early in boot, while setting up the initial pagetable, assume
everything is pinned. */
 static void __init xen_alloc_pte_init(struct mm_struct *mm, unsigned long pfn)
@@ 

[Xen-devel] [Patch V2 01/15] xen: sync with xen headers

2015-04-09 Thread Juergen Gross
Use the newest headers from the xen tree to get some new structure
layouts.

Signed-off-by: Juergen Gross jgr...@suse.com
---
 arch/x86/include/asm/xen/interface.h | 96 
 include/xen/interface/xen.h  | 10 ++--
 2 files changed, 93 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/xen/interface.h 
b/arch/x86/include/asm/xen/interface.h
index 3400dba..3b88eea 100644
--- a/arch/x86/include/asm/xen/interface.h
+++ b/arch/x86/include/asm/xen/interface.h
@@ -3,12 +3,38 @@
  *
  * Guest OS interface to x86 Xen.
  *
- * Copyright (c) 2004, K A Fraser
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the Software), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2004-2006, K A Fraser
  */
 
 #ifndef _ASM_X86_XEN_INTERFACE_H
 #define _ASM_X86_XEN_INTERFACE_H
 
+/*
+ * XEN_GUEST_HANDLE represents a guest pointer, when passed as a field
+ * in a struct in memory.
+ * XEN_GUEST_HANDLE_PARAM represent a guest pointer, when passed as an
+ * hypercall argument.
+ * XEN_GUEST_HANDLE_PARAM and XEN_GUEST_HANDLE are the same on X86 but
+ * they might not be on other architectures.
+ */
 #ifdef __XEN__
 #define __DEFINE_GUEST_HANDLE(name, type) \
 typedef struct { type *p; } __guest_handle_ ## name
@@ -88,13 +114,16 @@ DEFINE_GUEST_HANDLE(xen_ulong_t);
  * start of the GDT because some stupid OSes export hard-coded selector values
  * in their ABI. These hard-coded values are always near the start of the GDT,
  * so Xen places itself out of the way, at the far end of the GDT.
+ *
+ * NB The LDT is set using the MMUEXT_SET_LDT op of HYPERVISOR_mmuext_op
  */
 #define FIRST_RESERVED_GDT_PAGE  14
 #define FIRST_RESERVED_GDT_BYTE  (FIRST_RESERVED_GDT_PAGE * 4096)
 #define FIRST_RESERVED_GDT_ENTRY (FIRST_RESERVED_GDT_BYTE / 8)
 
 /*
- * Send an array of these to HYPERVISOR_set_trap_table()
+ * Send an array of these to HYPERVISOR_set_trap_table().
+ * Terminate the array with a sentinel entry, with traps[].address==0.
  * The privilege level specifies which modes may enter a trap via a software
  * interrupt. On x86/64, since rings 1 and 2 are unavailable, we allocate
  * privilege levels as follows:
@@ -118,10 +147,41 @@ struct trap_info {
 DEFINE_GUEST_HANDLE_STRUCT(trap_info);
 
 struct arch_shared_info {
-unsigned long max_pfn;  /* max pfn that appears in table */
-/* Frame containing list of mfns containing list of mfns containing p2m. */
-unsigned long pfn_to_mfn_frame_list_list;
-unsigned long nmi_reason;
+   /*
+* Number of valid entries in the p2m table(s) anchored at
+* pfn_to_mfn_frame_list_list and/or p2m_vaddr.
+*/
+   unsigned long max_pfn;
+   /*
+* Frame containing list of mfns containing list of mfns containing p2m.
+* A value of 0 indicates it has not yet been set up, ~0 indicates it
+* has been set to invalid e.g. due to the p2m being too large for the
+* 3-level p2m tree. In this case the linear mapper p2m list anchored
+* at p2m_vaddr is to be used.
+*/
+   xen_pfn_t pfn_to_mfn_frame_list_list;
+   unsigned long nmi_reason;
+   /*
+* Following three fields are valid if p2m_cr3 contains a value
+* different from 0.
+* p2m_cr3 is the root of the address space where p2m_vaddr is valid.
+* p2m_cr3 is in the same format as a cr3 value in the vcpu register
+* state and holds the folded machine frame number (via xen_pfn_to_cr3)
+* of a L3 or L4 page table.
+* p2m_vaddr holds the virtual address of the linear p2m list. All
+* entries in the range [0...max_pfn[ are accessible via this pointer.
+* p2m_generation will be incremented by the guest before and after each
+* change of the mappings of the p2m list. p2m_generation starts at 0
+* and a value with the least significant bit set indicates that a
+* mapping update is in progress. This allows guest external 

[Xen-devel] [Patch V2 11/15] xen: check for initrd conflicting with e820 map

2015-04-09 Thread Juergen Gross
Check whether the initrd is placed at a location which is conflicting
with the target E820 map. If this is the case relocate it to a new
area unused up to now and compliant to the E820 map.

Signed-off-by: Juergen Gross jgr...@suse.com
---
 arch/x86/xen/setup.c | 51 +++
 1 file changed, 51 insertions(+)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 5d0f4e2..6985730 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -632,6 +632,36 @@ phys_addr_t __init xen_find_free_area(phys_addr_t size)
 }
 
 /*
+ * Like memcpy, but with physical addresses for dest and src.
+ */
+static void __init xen_phys_memcpy(phys_addr_t dest, phys_addr_t src,
+  phys_addr_t n)
+{
+   phys_addr_t dest_off, src_off, dest_len, src_len, len;
+   void *from, *to;
+
+   while (n) {
+   dest_off = dest  ~PAGE_MASK;
+   src_off = src  ~PAGE_MASK;
+   dest_len = n;
+   if (dest_len  (NR_FIX_BTMAPS  PAGE_SHIFT) - dest_off)
+   dest_len = (NR_FIX_BTMAPS  PAGE_SHIFT) - dest_off;
+   src_len = n;
+   if (src_len  (NR_FIX_BTMAPS  PAGE_SHIFT) - src_off)
+   src_len = (NR_FIX_BTMAPS  PAGE_SHIFT) - src_off;
+   len = min(dest_len, src_len);
+   to = early_memremap(dest - dest_off, dest_len + dest_off);
+   from = early_memremap(src - src_off, src_len + src_off);
+   memcpy(to, from, len);
+   early_memunmap(to, dest_len + dest_off);
+   early_memunmap(from, src_len + src_off);
+   n -= len;
+   dest += len;
+   src += len;
+   }
+}
+
+/*
  * Reserve Xen mfn_list.
  * See comment above struct start_info in xen/interface/xen.h
  * We tried to make the the memblock_reserve more selective so
@@ -808,6 +838,27 @@ char * __init xen_memory_setup(void)
 */
xen_pt_check_e820();
 
+   /* Check for a conflict of the initrd with the target E820 map. */
+   if (xen_chk_e820_reserved(boot_params.hdr.ramdisk_image,
+ boot_params.hdr.ramdisk_size)) {
+   phys_addr_t new_area, start, size;
+
+   new_area = xen_find_free_area(boot_params.hdr.ramdisk_size);
+   if (!new_area) {
+   xen_raw_console_write(Can't find new memory area for 
initrd needed due to E820 map conflict\n);
+   BUG();
+   }
+
+   start = boot_params.hdr.ramdisk_image;
+   size = boot_params.hdr.ramdisk_size;
+   xen_phys_memcpy(new_area, start, size);
+   pr_info(initrd moved from [mem %#010llx-%#010llx] to [mem 
%#010llx-%#010llx]\n,
+   start, start + size, new_area, new_area + size);
+   memblock_free(start, size);
+   boot_params.hdr.ramdisk_image = new_area;
+   boot_params.ext_ramdisk_image = new_area  32;
+   }
+
xen_reserve_xen_mfnlist();
 
/*
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [Patch V2 04/15] xen: eliminate scalability issues from initial mapping setup

2015-04-09 Thread Juergen Gross
Direct Xen to place the initial P-M table outside of the initial
mapping, as otherwise the 1G (implementation) / 2G (theoretical)
restriction on the size of the initial mapping limits the amount
of memory a domain can be handed initially.

As the initial P-M table is copied rather early during boot to
domain private memory and it's initial virtual mapping is dropped,
the easiest way to avoid virtual address conflicts with other
addresses in the kernel is to use a user address area for the
virtual address of the initial P-M table. This allows us to just
throw away the page tables of the initial mapping after the copy
without having to care about address invalidation.

It should be noted that this patch won't enable a pv-domain to USE
more than 512 GB of RAM. It just enables it to be started with a
P-M table covering more memory. This is especially important for
being able to boot a Dom0 on a system with more than 512 GB memory.

Signed-off-by: Juergen Gross jgr...@suse.com
Based-on-patch-by: Jan Beulich jbeul...@suse.com
---
 arch/x86/xen/mmu.c  | 126 
 arch/x86/xen/setup.c|  67 ++---
 arch/x86/xen/xen-head.S |   2 +
 3 files changed, 156 insertions(+), 39 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index adca9e2..1ca5197 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1114,6 +1114,77 @@ static void __init xen_cleanhighmap(unsigned long vaddr,
xen_mc_flush();
 }
 
+/*
+ * Make a page range writeable and free it.
+ */
+static void __init xen_free_ro_pages(unsigned long paddr, unsigned long size)
+{
+   void *vaddr = __va(paddr);
+   void *vaddr_end = vaddr + size;
+
+   for (; vaddr  vaddr_end; vaddr += PAGE_SIZE)
+   make_lowmem_page_readwrite(vaddr);
+
+   memblock_free(paddr, size);
+}
+
+static void __init xen_cleanmfnmap_free_pgtbl(void *pgtbl)
+{
+   unsigned long pa = __pa(pgtbl)  PHYSICAL_PAGE_MASK;
+
+   ClearPagePinned(virt_to_page(__va(pa)));
+   xen_free_ro_pages(pa, PAGE_SIZE);
+}
+
+/*
+ * Since it is well isolated we can (and since it is perhaps large we should)
+ * also free the page tables mapping the initial P-M table.
+ */
+static void __init xen_cleanmfnmap(unsigned long vaddr)
+{
+   unsigned long va = vaddr  PMD_MASK;
+   unsigned long pa;
+   pgd_t *pgd = pgd_offset_k(va);
+   pud_t *pud_page = pud_offset(pgd, 0);
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
+   unsigned int i;
+
+   set_pgd(pgd, __pgd(0));
+   do {
+   pud = pud_page + pud_index(va);
+   if (pud_none(*pud)) {
+   va += PUD_SIZE;
+   } else if (pud_large(*pud)) {
+   pa = pud_val(*pud)  PHYSICAL_PAGE_MASK;
+   xen_free_ro_pages(pa, PUD_SIZE);
+   va += PUD_SIZE;
+   } else {
+   pmd = pmd_offset(pud, va);
+   if (pmd_large(*pmd)) {
+   pa = pmd_val(*pmd)  PHYSICAL_PAGE_MASK;
+   xen_free_ro_pages(pa, PMD_SIZE);
+   } else if (!pmd_none(*pmd)) {
+   pte = pte_offset_kernel(pmd, va);
+   for (i = 0; i  PTRS_PER_PTE; ++i) {
+   if (pte_none(pte[i]))
+   break;
+   pa = pte_pfn(pte[i])  PAGE_SHIFT;
+   xen_free_ro_pages(pa, PAGE_SIZE);
+   }
+   xen_cleanmfnmap_free_pgtbl(pte);
+   }
+   va += PMD_SIZE;
+   if (pmd_index(va))
+   continue;
+   xen_cleanmfnmap_free_pgtbl(pmd);
+   }
+
+   } while (pud_index(va) || pmd_index(va));
+   xen_cleanmfnmap_free_pgtbl(pud_page);
+}
+
 static void __init xen_pagetable_p2m_free(void)
 {
unsigned long size;
@@ -1128,18 +1199,25 @@ static void __init xen_pagetable_p2m_free(void)
/* using __ka address and sticking INVALID_P2M_ENTRY! */
memset((void *)xen_start_info-mfn_list, 0xff, size);
 
-   /* We should be in __ka space. */
-   BUG_ON(xen_start_info-mfn_list  __START_KERNEL_map);
addr = xen_start_info-mfn_list;
-   /* We roundup to the PMD, which means that if anybody at this stage is
-* using the __ka address of xen_start_info or 
xen_start_info-shared_info
-* they are in going to crash. Fortunatly we have already revectored
-* in xen_setup_kernel_pagetable and in xen_setup_shared_info. */
+   /*
+* We could be in __ka space.
+* We roundup to the PMD, which means that if anybody at this stage is
+* using the __ka address of xen_start_info or
+* 

[Xen-devel] [Patch V2 15/15] xen: remove no longer needed p2m.h

2015-04-09 Thread Juergen Gross
Cleanup by removing arch/x86/xen/p2m.h as it isn't needed any more.

Most definitions in this file are used in p2m.c only. Move those into
p2m.c.

set_phys_range_identity() is already declared in
arch/x86/include/asm/xen/page.h, add __init annotation there.

MAX_REMAP_RANGES isn't used at all, just delete it.

The only define left is P2M_PER_PAGE which is moved to page.h as well.

Signed-off-by: Juergen Gross jgr...@suse.com
---
 arch/x86/include/asm/xen/page.h |  6 --
 arch/x86/xen/p2m.c  |  6 +-
 arch/x86/xen/p2m.h  | 15 ---
 arch/x86/xen/setup.c|  1 -
 4 files changed, 9 insertions(+), 19 deletions(-)
 delete mode 100644 arch/x86/xen/p2m.h

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 18a11f2..b858592 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -35,6 +35,8 @@ typedef struct xpaddr {
 #define FOREIGN_FRAME(m)   ((m) | FOREIGN_FRAME_BIT)
 #define IDENTITY_FRAME(m)  ((m) | IDENTITY_FRAME_BIT)
 
+#define P2M_PER_PAGE   (PAGE_SIZE / sizeof(unsigned long))
+
 extern unsigned long *machine_to_phys_mapping;
 extern unsigned long  machine_to_phys_nr;
 extern unsigned long *xen_p2m_addr;
@@ -44,8 +46,8 @@ extern unsigned long  xen_max_p2m_pfn;
 extern unsigned long get_phys_to_machine(unsigned long pfn);
 extern bool set_phys_to_machine(unsigned long pfn, unsigned long mfn);
 extern bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn);
-extern unsigned long set_phys_range_identity(unsigned long pfn_s,
-unsigned long pfn_e);
+extern unsigned long __init set_phys_range_identity(unsigned long pfn_s,
+   unsigned long pfn_e);
 
 extern int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
   struct gnttab_map_grant_ref *kmap_ops,
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 365a64a..1f63ad2 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -78,10 +78,14 @@
 #include xen/balloon.h
 #include xen/grant_table.h
 
-#include p2m.h
 #include multicalls.h
 #include xen-ops.h
 
+#define P2M_MID_PER_PAGE   (PAGE_SIZE / sizeof(unsigned long *))
+#define P2M_TOP_PER_PAGE   (PAGE_SIZE / sizeof(unsigned long **))
+
+#define MAX_P2M_PFN(P2M_TOP_PER_PAGE * P2M_MID_PER_PAGE * P2M_PER_PAGE)
+
 #define PMDS_PER_MID_PAGE  (P2M_MID_PER_PAGE / PTRS_PER_PTE)
 
 unsigned long *xen_p2m_addr __read_mostly;
diff --git a/arch/x86/xen/p2m.h b/arch/x86/xen/p2m.h
deleted file mode 100644
index ad8aee2..000
--- a/arch/x86/xen/p2m.h
+++ /dev/null
@@ -1,15 +0,0 @@
-#ifndef _XEN_P2M_H
-#define _XEN_P2M_H
-
-#define P2M_PER_PAGE(PAGE_SIZE / sizeof(unsigned long))
-#define P2M_MID_PER_PAGE(PAGE_SIZE / sizeof(unsigned long *))
-#define P2M_TOP_PER_PAGE(PAGE_SIZE / sizeof(unsigned long **))
-
-#define MAX_P2M_PFN (P2M_TOP_PER_PAGE * P2M_MID_PER_PAGE * 
P2M_PER_PAGE)
-
-#define MAX_REMAP_RANGES10
-
-extern unsigned long __init set_phys_range_identity(unsigned long pfn_s,
-  unsigned long pfn_e);
-
-#endif  /* _XEN_P2M_H */
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 13394b1..5561608 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -30,7 +30,6 @@
 #include xen/hvc-console.h
 #include xen-ops.h
 #include vdso.h
-#include p2m.h
 #include mmu.h
 
 #define GB(x) ((uint64_t)(x) * 1024 * 1024 * 1024)
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [Patch V2 07/15] xen: check memory area against e820 map

2015-04-09 Thread Juergen Gross
Provide a service routine to check a physical memory area against the
E820 map. The routine will return false if the complete area is RAM
according to the E820 map and true otherwise.

Signed-off-by: Juergen Gross jgr...@suse.com
---
 arch/x86/xen/setup.c   | 23 +++
 arch/x86/xen/xen-ops.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 87251b4..4666adf 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -573,6 +573,29 @@ static unsigned long __init xen_count_remap_pages(unsigned 
long max_pfn)
return extra;
 }
 
+bool __init xen_chk_e820_reserved(phys_addr_t start, phys_addr_t size)
+{
+   struct e820entry *entry;
+   unsigned mapcnt;
+   phys_addr_t end;
+
+   if (!size)
+   return false;
+
+   end = start + size;
+   entry = xen_e820_map;
+
+   for (mapcnt = 0; mapcnt  xen_e820_map_entries; mapcnt++) {
+   if (entry-type == E820_RAM  entry-addr = start 
+   (entry-addr + entry-size) = end)
+   return false;
+
+   entry++;
+   }
+
+   return true;
+}
+
 /*
  * Reserve Xen mfn_list.
  * See comment above struct start_info in xen/interface/xen.h
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 9e195c6..56650bb 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -39,6 +39,7 @@ void xen_reserve_top(void);
 void xen_mm_pin_all(void);
 void xen_mm_unpin_all(void);
 
+bool __init xen_chk_e820_reserved(phys_addr_t start, phys_addr_t size);
 unsigned long __ref xen_chk_extra_mem(unsigned long pfn);
 void __init xen_inv_extra_mem(void);
 void __init xen_remap_memory(void);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 02/12] x86: improve psr scheduling code

2015-04-09 Thread Andrew Cooper
On 09/04/2015 10:18, Chao Peng wrote:
 Switching RMID from previous vcpu to next vcpu only needs to write
 MSR_IA32_PSR_ASSOC once. Write it with the value of next vcpu is enough,
 no need to write '0' first. Idle domain has RMID set to 0 and because MSR
 is already updated lazily, so just switch it as it does.

 Also move the initialization of per-CPU variable which used for lazy
 update from context switch to CPU starting.

 Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
 ---
 Changes in v4:
 * Move psr_assoc_reg_read/psr_assoc_reg_write into psr_ctxt_switch_to.
 * Use 0 instead of smp_processor_id() for boot cpu.
 * add cpu parameter to psr_assoc_init.
 Changes in v2:
 * Move initialization for psr_assoc from context switch to CPU_STARTING.
 ---
  xen/arch/x86/domain.c |  7 ++---
  xen/arch/x86/psr.c| 75 
 ++-
  xen/include/asm-x86/psr.h |  3 +-
  3 files changed, 59 insertions(+), 26 deletions(-)

 diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
 index 04c1898..695a2eb 100644
 --- a/xen/arch/x86/domain.c
 +++ b/xen/arch/x86/domain.c
 @@ -1444,8 +1444,6 @@ static void __context_switch(void)
  {
  memcpy(p-arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES);
  vcpu_save_fpu(p);
 -if ( psr_cmt_enabled() )
 -psr_assoc_rmid(0);
  p-arch.ctxt_switch_from(p);
  }
  
 @@ -1470,11 +1468,10 @@ static void __context_switch(void)
  }
  vcpu_restore_fpu_eager(n);
  n-arch.ctxt_switch_to(n);
 -
 -if ( psr_cmt_enabled()  n-domain-arch.psr_rmid  0 )
 -psr_assoc_rmid(n-domain-arch.psr_rmid);
  }
  
 +psr_ctxt_switch_to(n-domain);
 +
  gdt = !is_pv_32on64_vcpu(n) ? per_cpu(gdt_table, cpu) :
per_cpu(compat_gdt_table, cpu);
  if ( need_full_gdt(n) )
 diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
 index 344de3c..6119c6e 100644
 --- a/xen/arch/x86/psr.c
 +++ b/xen/arch/x86/psr.c
 @@ -22,7 +22,6 @@
  
  struct psr_assoc {
  uint64_t val;
 -bool_t initialized;
  };
  
  struct psr_cmt *__read_mostly psr_cmt;
 @@ -122,14 +121,6 @@ static void __init init_psr_cmt(unsigned int rmid_max)
  printk(XENLOG_INFO Cache Monitoring Technology enabled\n);
  }
  
 -static int __init init_psr(void)
 -{
 -if ( (opt_psr  PSR_CMT)  opt_rmid_max )
 -init_psr_cmt(opt_rmid_max);
 -return 0;
 -}
 -__initcall(init_psr);
 -
  /* Called with domain lock held, no psr specific lock needed */
  int psr_alloc_rmid(struct domain *d)
  {
 @@ -175,26 +166,70 @@ void psr_free_rmid(struct domain *d)
  d-arch.psr_rmid = 0;
  }
  
 -void psr_assoc_rmid(unsigned int rmid)
 +static inline void psr_assoc_init(unsigned int cpu)
 +{
 +struct psr_assoc *psra = per_cpu(psr_assoc, cpu);
 +
 +if ( psr_cmt_enabled() )
 +rdmsrl(MSR_IA32_PSR_ASSOC, psra-val);
 +}

On further consideration, this would probably be better as a void
function which used this_cpu() rather than per_cpu().

Absolutely nothing good can come of calling it with cpu !=
smp_processor_id(), so we should avoid that situation arising in the
first place.

 +
 +static inline void psr_assoc_rmid(uint64_t *reg, unsigned int rmid)
 +{
 +*reg = (*reg  ~rmid_mask) | (rmid  rmid_mask);
 +}
 +
 +void psr_ctxt_switch_to(struct domain *d)
  {
 -uint64_t val;
 -uint64_t new_val;
  struct psr_assoc *psra = this_cpu(psr_assoc);
 +uint64_t reg = psra-val;
 +
 +if ( psr_cmt_enabled() )
 +psr_assoc_rmid(reg, d-arch.psr_rmid);
  
 -if ( !psra-initialized )
 +if ( reg != psra-val )
  {
 -rdmsrl(MSR_IA32_PSR_ASSOC, psra-val);
 -psra-initialized = 1;
 +wrmsrl(MSR_IA32_PSR_ASSOC, reg);
 +psra-val = reg;
  }
 -val = psra-val;
 +}
  
 -new_val = (val  ~rmid_mask) | (rmid  rmid_mask);
 -if ( val != new_val )
 +static void psr_cpu_init(unsigned int cpu)
 +{
 +psr_assoc_init(cpu);
 +}

This can also turn into a void helper.

Otherwise, Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

~Andrew

 +
 +static int cpu_callback(
 +struct notifier_block *nfb, unsigned long action, void *hcpu)
 +{
 +unsigned int cpu = (unsigned long)hcpu;
 +
 +switch ( action )
  {
 -wrmsrl(MSR_IA32_PSR_ASSOC, new_val);
 -psra-val = new_val;
 +case CPU_STARTING:
 +psr_cpu_init(cpu);
 +break;
  }
 +
 +return NOTIFY_DONE;
 +}
 +
 +static struct notifier_block cpu_nfb = {
 +.notifier_call = cpu_callback
 +};
 +
 +static int __init psr_presmp_init(void)
 +{
 +if ( (opt_psr  PSR_CMT)  opt_rmid_max )
 +init_psr_cmt(opt_rmid_max);
 +
 +psr_cpu_init(0);
 +if ( psr_cmt_enabled() )
 +register_cpu_notifier(cpu_nfb);
 +
 +return 0;
  }
 +presmp_initcall(psr_presmp_init);
  
  /*
   * Local variables:
 diff --git a/xen/include/asm-x86/psr.h b/xen/include/asm-x86/psr.h
 index c6076e9..585350c 100644
 --- 

Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations

2015-04-09 Thread Euan Harris
On Tue, Apr 07, 2015 at 06:19:52PM +0100, Ian Jackson wrote:
 On the contrary, I think many long-running operations, such as suspend
 and migrations, involve multiple iterations of the libxl event loop.
 Actual suspend/migrate is done in a helper process; the main process
 is responsible for progress report handling, coordination, etc.

Yes, that would work, but an open loop approach like that can lead to
frustratingly unreliable tests.   I think it would be best to make
the test aware of the state of the helper - or even in control of it.
That would allow us to wait for the helper to reach a particular state
before killing it.

Thanks,
Euan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]

2015-04-09 Thread Ian Jackson
Prashant Sreedharan writes (Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 
more messages]):
 On Thu, 2015-04-09 at 12:11 +0100, Ian Jackson wrote:
  No.  I can try to repro the problem without the bridge, if it would
  help.
 yes please do

I will do so.

FYI, when I came back to this test box just now (after leaving it
since yesterday) and now it is completely broken:

[89210.340696] DMA: Out of SW-IOMMU space for 1600 bytes at device :03:00.0
[89210.449936] tg3 :03:00.0: swiotlb buffer is full (sz: 1600 bytes)

The root fs block device is also unuseable and gives lots of EIO.

This is with 3.14.21, baremetal, with `iommu=soft swiotlb=force'.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 01/12] x86: clean up psr boot parameter parsing

2015-04-09 Thread Chao Peng
Change type of opt_psr from bool to int so more psr features can fit.

Introduce a new routine to parse bool parameter so that both cmt and
future psr features like cat can use it.

Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
---
Changes in v4:
* change 'int bit' to 'unsigned int mask'.
* Remove printk that will never be called.
Changes in v3:
* Set off value explicity if requested.
---
 xen/arch/x86/psr.c | 39 +++
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 2ef83df..344de3c 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -26,11 +26,30 @@ struct psr_assoc {
 };
 
 struct psr_cmt *__read_mostly psr_cmt;
-static bool_t __initdata opt_psr;
+static unsigned int __initdata opt_psr;
 static unsigned int __initdata opt_rmid_max = 255;
 static uint64_t rmid_mask;
 static DEFINE_PER_CPU(struct psr_assoc, psr_assoc);
 
+static void __init parse_psr_bool(char *s, char *value, char *feature,
+  unsigned int mask)
+{
+if ( !strcmp(s, feature) )
+{
+if ( !value )
+opt_psr |= mask;
+else
+{
+int val_int = parse_bool(value);
+
+if ( val_int == 0 )
+opt_psr = ~mask;
+else if ( val_int == 1 )
+opt_psr |= mask;
+}
+}
+}
+
 static void __init parse_psr_param(char *s)
 {
 char *ss, *val_str;
@@ -44,21 +63,9 @@ static void __init parse_psr_param(char *s)
 if ( val_str )
 *val_str++ = '\0';
 
-if ( !strcmp(s, cmt) )
-{
-if ( !val_str )
-opt_psr |= PSR_CMT;
-else
-{
-int val_int = parse_bool(val_str);
-if ( val_int == 1 )
-opt_psr |= PSR_CMT;
-else if ( val_int != 0 )
-printk(PSR: unknown cmt value: %s - CMT disabled!\n,
-val_str);
-}
-}
-else if ( val_str  !strcmp(s, rmid_max) )
+parse_psr_bool(s, val_str, cmt, PSR_CMT);
+
+if ( val_str  !strcmp(s, rmid_max) )
 opt_rmid_max = simple_strtoul(val_str, NULL, 0);
 
 s = ss + 1;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] osstest: update FreeBSD guests to 10.1

2015-04-09 Thread Ian Jackson
Roger Pau Monne writes ([PATCH] osstest: update FreeBSD guests to 10.1):
 Update FreeBSD guests in OSSTest to FreeBSD 10.1. The following images
 should be placed in the osstest images folder:
 
 ftp://ftp.freebsd.org/pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/amd64/Latest/FreeBSD-10.1-RELEASE-amd64.qcow2.xz
 ftp://ftp.freebsd.org/pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/i386/Latest/FreeBSD-10.1-RELEASE-i386.qcow2.xz

Sadly,

iwj@OSSTEST:~$ wget 
ftp://ftp.freebsd.org/pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/amd64/Latest/FreeBSD-10.1-RELEASE-amd64.qcow2.xz
--2015-04-09 14:36:34--  
ftp://ftp.freebsd.org/pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/amd64/Latest/FreeBSD-10.1-RELEASE-amd64.qcow2.xz
   = `FreeBSD-10.1-RELEASE-amd64.qcow2.xz'
Resolving ftp.freebsd.org (ftp.freebsd.org)... 96.47.72.72, 
2610:1c1:1:606c::15:0
Connecting to ftp.freebsd.org (ftp.freebsd.org)|96.47.72.72|:21... connected.
Logging in as anonymous ... Logged in!
== SYST ... done.== PWD ... done.
== TYPE I ... done.  == CWD (1) 
/pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/amd64/Latest ... done.
== SIZE FreeBSD-10.1-RELEASE-amd64.qcow2.xz ... done.
== PASV ... done.== RETR FreeBSD-10.1-RELEASE-amd64.qcow2.xz ... 
No such file `FreeBSD-10.1-RELEASE-amd64.qcow2.xz'.

iwj@OSSTEST:~$ wget 
ftp://ftp.freebsd.org/pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/i386/Latest/FreeBSD-10.1-RELEASE-i386.qcow2.xz
--2015-04-09 14:36:40--  
ftp://ftp.freebsd.org/pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/i386/Latest/FreeBSD-10.1-RELEASE-i386.qcow2.xz
   = `FreeBSD-10.1-RELEASE-i386.qcow2.xz'
Resolving ftp.freebsd.org (ftp.freebsd.org)... 96.47.72.72, 
2610:1c1:1:606c::15:0
Connecting to ftp.freebsd.org (ftp.freebsd.org)|96.47.72.72|:21... connected.
Logging in as anonymous ... Logged in!
== SYST ... done.== PWD ... done.
== TYPE I ... done.  == CWD (1) 
/pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/i386/Latest ... done.
== SIZE FreeBSD-10.1-RELEASE-i386.qcow2.xz ... done.
== PASV ... done.== RETR FreeBSD-10.1-RELEASE-i386.qcow2.xz ... 
No such file `FreeBSD-10.1-RELEASE-i386.qcow2.xz'.

iwj@OSSTEST:~$

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v15 13/15] pvqspinlock: Only kick CPU at unlock time

2015-04-09 Thread Peter Zijlstra
On Mon, Apr 06, 2015 at 10:55:48PM -0400, Waiman Long wrote:

 @@ -219,24 +236,30 @@ static void pv_wait_node(struct mcs_spinlock *node)
  }
  
  /*
 + * Called after setting next-locked = 1  lock acquired.
 + * Check if the the CPU has been halted. If so, set the _Q_SLOW_VAL flag
 + * and put an entry into the lock hash table to be waken up at unlock time.
   */
 -static void pv_kick_node(struct mcs_spinlock *node)
 +static void pv_scan_next(struct qspinlock *lock, struct mcs_spinlock *node)

I'm not too sure about that name change..

  {
   struct pv_node *pn = (struct pv_node *)node;
 + struct __qspinlock *l = (void *)lock;
  
   /*
 +  * Transition CPU state: halted = hashed
 +  * Quit if the transition failed.
*/
 + if (cmpxchg(pn-state, vcpu_halted, vcpu_hashed) != vcpu_halted)
 + return;
 +
 + /*
 +  * Put the lock into the hash table  set the _Q_SLOW_VAL in the lock.
 +  * As this is the same CPU that will check the _Q_SLOW_VAL value and
 +  * the hash table later on at unlock time, no atomic instruction is
 +  * needed.
 +  */
 + WRITE_ONCE(l-locked, _Q_SLOW_VAL);
 + (void)pv_hash(lock, pn);
  }

This is broken. The unlock path relies on:

  pv_hash()
   MB
  l-locked = SLOW

such that when it observes SLOW, it must then also observe a consistent
bucket.

The above can have us do pv_hash_find() _before_ we actually hash the
lock, which will result in us triggering that BUG_ON() in there.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 05/10] VMX: add help functions to support PML

2015-04-09 Thread Tim Deegan
Hi,

At 10:35 +0800 on 27 Mar (1427452549), Kai Huang wrote:
 +void vmx_vcpu_flush_pml_buffer(struct vcpu *v)
 +{
 +uint64_t *pml_buf;
 +unsigned long pml_idx;
 +
 +ASSERT(vmx_vcpu_pml_enabled(v));
 +
 +vmx_vmcs_enter(v);
 +
 +__vmread(GUEST_PML_INDEX, pml_idx);
 +
 +/* Do nothing if PML buffer is empty */
 +if ( pml_idx == (PML_ENTITY_NUM - 1) )
 +goto out;
 +
 +pml_buf = map_domain_page(page_to_mfn(v-arch.hvm_vmx.pml_pg));
 +
 +/*
 + * PML index can be either 2^16-1 (buffer is full), or 0~511 (buffer is 
 not
 + * full), and in latter case PML index always points to next available
 + * entity.
 + */
 +if (pml_idx = PML_ENTITY_NUM)
 +pml_idx = 0;
 +else
 +pml_idx++;
 +
 +for ( ; pml_idx  PML_ENTITY_NUM; pml_idx++ )
 +{
 +struct p2m_domain *p2m = p2m_get_hostp2m(v-domain);
 +unsigned long gfn;
 +mfn_t mfn;
 +p2m_type_t t;
 +p2m_access_t a;
 +
 +gfn = pml_buf[pml_idx]  PAGE_SHIFT;
 +mfn = p2m-get_entry(p2m, gfn, t, a, 0, NULL);

Please don't call p2m-get_entry() directly -- that interface should
only be used inside the p2m code.  As it happens, I don't think this
lookup is correct anyway: the logging only sees races (which are not
interesting) or buggy hardware (which is not worth the extra lookup to
detect).

So you only need this to get 'mfn' to pass to paging_mark_dirty().
That's also buggy, because there's no locking here to make sure
gfn-mfn-gfn ends up in the right place. :(

I think the right thing to do is:

 - split paging_park_dirty() into paging_mark_gfn_dirty() (the bulk of
   the current function) and a paging_mark_dirty() wrapper that does
   get_gpfn_from_mfn(mfn_x(gmfn)) and calls paging_mark_gfn_dirty().

 - call paging_mark_gfn_dirty() from vmx_vcpu_flush_pml_buffer().

That will avoid _two_ p2m lookups in this function. :)

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [OSSTEST Nested PATCH v7 4/6] Add new script to custmize nested test configuration

2015-04-09 Thread Pang, LongtaoX


 -Original Message-
 From: Ian Campbell [mailto:ian.campb...@citrix.com]
 Sent: Wednesday, April 01, 2015 4:59 PM
 To: Pang, LongtaoX; ian.jack...@eu.citrix.com
 Cc: xen-devel@lists.xen.org; wei.l...@citrix.com; Hu, Robert
 Subject: Re: [OSSTEST Nested PATCH v7 4/6] Add new script to custmize nested
 test configuration
 
 On Wed, 2015-04-01 at 08:45 +, Pang, LongtaoX wrote:
   As it happens I was rebasing that series this morning but due to
   other issues I've not managed to run it yet. Once I've managed to at
   least smoke test I'll CC you on the repost.
  
  OK. What's more, for the below codes which is used for starting
  'osstest-confirm-booted' script to confirm whether L1 is fully booted
  after reboot it. I think it's necessary here.
 
   +target_cmd_root($l1, END);
   +wget -O overlay.tar $url
   +tar -xf overlay.tar -C /
   +rm overlay.tar -f
   +update-rc.d osstest-confirm-booted start 99 2 .
  +END
 
 In my distro series I also have some patches refactoring the overlay stuff, 
 which
 would mean you could reuse that.
 http://article.gmane.org/gmane.comp.emulators.xen.devel/224433
 I'll CC you on that one too.
 
 I don't think there would be any harm in adding those overlays for all guests
 and enabling the initscript, but Ian may disagree or know something which I
 don't.
 
I have modified and updated the v7 patches that according to your reply. It 
seems that your patchs[v4 04,05,06] has not been pushed into OSSTest master 
tree, should I 
waiting for that till these patches pushed or release my v8 nested patches to 
you firstly? Since we prepared this for a long time.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 01/12] x86: clean up psr boot parameter parsing

2015-04-09 Thread Andrew Cooper
On 09/04/2015 10:18, Chao Peng wrote:
 Change type of opt_psr from bool to int so more psr features can fit.

 Introduce a new routine to parse bool parameter so that both cmt and
 future psr features like cat can use it.

 Signed-off-by: Chao Peng chao.p.p...@linux.intel.com

Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 p2 12/19] xen/passthrough: Extend XEN_DOMCTL_*assign_device to support DT device

2015-04-09 Thread Julien Grall
From: Julien Grall julien.gr...@linaro.org

A device node is described by a path. It will be used to retrieved the
node in the device tree and assign the related device to the domain.

Only non-PCI protected by an IOMMU can be assigned to a guest.

Also document the behavior of XEN_DOMCTL_deassign_device in the public
headers which differ between non-PCI and PCI.

Signed-off-by: Julien Grall julien.gr...@linaro.org
Acked-by: Jan Beulich jbeul...@suse.com
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Wei Liu wei.l...@citrix.com

---
Changes in v5:
- Fix comment in public/domctl.h
- Remove unecessary comment in drivers/passthrough/device_tree.c
- Check d-is_dying before assigning a device (consistency with
  PCI code)
- Invert the if in iommu.c in order to avoid extra return
- Add Jan's ack for non-ARM part

Changes in v4:
- Add XSM bits
- Return -ENODEV rather than -ENOSYS
- Move the if (...) into the ifdef (see iommu.c)
- Document the behavior of XEN_DOMCTL_deassign_device
- Use PCI_BUS and PCI_DEVFN2 when it's possible
- iommu_dt_device_is_assigned now returns 0 when the device is
not protected

Changes in v2:
- Use a different number for XEN_DOMCTL_assign_dt_device
---
 tools/libxc/include/xenctrl.h |  10 
 tools/libxc/xc_domain.c   |  95 --
 xen/drivers/passthrough/device_tree.c | 108 +-
 xen/drivers/passthrough/iommu.c   |   7 ++-
 xen/drivers/passthrough/pci.c |  47 ++-
 xen/include/public/domctl.h   |  24 +++-
 xen/include/xen/iommu.h   |   3 +
 7 files changed, 269 insertions(+), 25 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index cc78ed6..a26d222 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2057,6 +2057,16 @@ int xc_deassign_device(xc_interface *xch,
  uint32_t domid,
  uint32_t machine_sbdf);
 
+int xc_assign_dt_device(xc_interface *xch,
+uint32_t domid,
+char *path);
+int xc_test_assign_dt_device(xc_interface *xch,
+ uint32_t domid,
+ char *path);
+int xc_deassign_dt_device(xc_interface *xch,
+  uint32_t domid,
+  char *path);
+
 int xc_domain_memory_mapping(xc_interface *xch,
  uint32_t domid,
  unsigned long first_gfn,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 676ec50..a6fcf14 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1650,7 +1650,8 @@ int xc_assign_device(
 
 domctl.cmd = XEN_DOMCTL_assign_device;
 domctl.domain = domid;
-domctl.u.assign_device.machine_sbdf = machine_sbdf;
+domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
+domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
 
 return do_domctl(xch, domctl);
 }
@@ -1699,7 +1700,8 @@ int xc_test_assign_device(
 
 domctl.cmd = XEN_DOMCTL_test_assign_device;
 domctl.domain = domid;
-domctl.u.assign_device.machine_sbdf = machine_sbdf;
+domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
+domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
 
 return do_domctl(xch, domctl);
 }
@@ -1713,11 +1715,96 @@ int xc_deassign_device(
 
 domctl.cmd = XEN_DOMCTL_deassign_device;
 domctl.domain = domid;
-domctl.u.assign_device.machine_sbdf = machine_sbdf;
- 
+domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
+domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+
 return do_domctl(xch, domctl);
 }
 
+int xc_assign_dt_device(
+xc_interface *xch,
+uint32_t domid,
+char *path)
+{
+int rc;
+size_t size = strlen(path);
+DECLARE_DOMCTL;
+DECLARE_HYPERCALL_BOUNCE(path, size, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+
+if ( xc_hypercall_bounce_pre(xch, path) )
+return -1;
+
+domctl.cmd = XEN_DOMCTL_assign_device;
+domctl.domain = (domid_t)domid;
+
+domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
+domctl.u.assign_device.u.dt.size = size;
+set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
+
+rc = do_domctl(xch, domctl);
+
+xc_hypercall_bounce_post(xch, path);
+
+return rc;
+}
+
+int xc_test_assign_dt_device(
+xc_interface *xch,
+uint32_t domid,
+char *path)
+{
+int rc;
+size_t size = strlen(path);
+DECLARE_DOMCTL;
+DECLARE_HYPERCALL_BOUNCE(path, size, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+
+if ( xc_hypercall_bounce_pre(xch, path) )
+return -1;
+
+domctl.cmd = XEN_DOMCTL_test_assign_device;
+domctl.domain = (domid_t)domid;
+
+domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
+domctl.u.assign_device.u.dt.size = size;
+

Re: [Xen-devel] [PATCH 2/6] x86/numa: Correct the extern of cpu_to_node

2015-04-09 Thread Tim Deegan
At 16:05 +0100 on 09 Apr (1428595536), Andrew Cooper wrote:
 On 09/04/15 16:00, Tim Deegan wrote:
  At 18:26 +0100 on 07 Apr (1428431176), Andrew Cooper wrote:
  --- a/xen/include/asm-x86/numa.h
  +++ b/xen/include/asm-x86/numa.h
  @@ -9,7 +9,7 @@
   
   extern int srat_rev;
   
  -extern unsigned char cpu_to_node[];
  +extern nodeid_t  cpu_to_node[NR_CPUS];
  Does the compiler do anything useful with the array size here?
 
 Specifying the size allows ARRAY_SIZE(cpu_to_node) to work in other
 translation units.  It also allows static analysers to perform bounds
 checks, should they wish.
 
  In particular does it check that it matches the size at the definition?
 
 It will complain if they are mismatched.

Excellent.  In that case,  Reviewed-by: Tim Deegan t...@xen.org

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 0/7] Intel Cache Monitoring: Current Status and Future Opportunities

2015-04-09 Thread Meng Xu
2015-04-07 9:10 GMT-04:00 Dario Faggioli dario.faggi...@citrix.com:
 On Tue, 2015-04-07 at 11:27 +0100, Andrew Cooper wrote:
 On 04/04/2015 03:14, Dario Faggioli wrote:

  I'm putting here in the cover letter a markdown document I wrote to better
  describe my findings and ideas (sorry if it's a bit long! :-D). You can 
  also
  fetch it at the following links:
 
   * http://xenbits.xen.org/people/dariof/CMT-in-scheduling.pdf
   * http://xenbits.xen.org/people/dariof/CMT-in-scheduling.markdown
 
  See the document itself and the changelog of the various patches for 
  details.


 There seem to be several areas of confusion indicated in your document.

 I see. Sorry for that then.

 I am unsure whether this is a side effect of the way you have written
 it, but here are (hopefully) some words of clarification.

 And thanks for this. :-)

 PSR CMT works by tagging cache lines with the currently-active RMID.
 The cache utilisation is a count of the number of lines which are tagged
 with a specific RMID.  MBM on the other hand counts the number of cache
 line fills and cache line evictions tagged with a specific RMID.

 Ok.

 By this nature, the information will never reveal the exact state of
 play.  e.g. a core with RMID A which gets a cache line hit against a
 line currently tagged with RMID B will not alter any accounting.

 So, you're saying that the information we get is an approximation of
 reality, not it's 100% accurate representation. That is no news, IMO.
 When, inside Credit2, we try to track the average load on each runqueue,
 that is an approximation. When, in Credit1, we consider a vcpu cache
 hot if it run recently, that is an approximation. Etc. These
 approximations happens fully in software, because it is possible, in
 those cases.

 PSR provides data and insights on something that, without hardware
 support, we couldn't possibly hope to know anything about. Whether we
 should think about using such data or not, it depends whether they are
 represents a (base for a) reasonable enough approximation, or they are
 just a bunch of pseudo random numbers.

 It seems to me that you are suggesting the latter to be more likely than
 the former, i.e., PSR does not provide a good enough approximation for
 being used from inside Xen and toolstack, is my understanding correct?

 Furthermore, as alterations of the RMID only occur in
 __context_switch(), Xen actions such as handling an interrupt will be
 accounted against the currently active domain (or other future
 granularity of RMID).

 Yes, I thought about this. However, this is certainly important for
 per-domain, or for a (unlikely) future per-vcpu, monitoring, but if you
 attach an RMID to a pCPU (or groups of pCPU) then that is not really a
 problem.

 Actually, it's the correct behavior: running Xen and serving interrupts
 in a certain core, in that case, *do* need to be accounted! So,
 considering that both the document and the RFC series are mostly focused
 on introducing per-pcpu/core/socket monitoring, rather than on
 per-domain monitoring, and given that the document was becoming quite
 long, I decided not to add a section about this.

 max_rmid is a per-socket property.  There is no requirement for it to
 be the same for each socket in a system, although it is likely, given a
 homogeneous system.

 I know. Again this was not mentioned for document length reasons, but I
 planned to ask about this (as I've done that already this morning, as
 you can see. :-D).

 In this case, though, it probably was something worth being mentioned,
 so I will if there will ever be a v2 of the document. :-)

 Mostly, I was curious to learn why that is not reflected in the current
 implementation, i.e., whether there are any reasons why we should not
 take advantage of per-socketness of RMIDs, as reported by SDM, as that
 can greatly help mitigating RMID shortage in the per-CPU/core/socket
 configuration (in general, actually, but it's per-cpu that I'm
 interested in).

 The limit on RMID is based on the size of the
 accounting table.

 Did not know in details, but it makes sense. Getting feedback on what
 should be expected as number of available RMIDs in current and future
 hardware, from Intel people and from everyone who knows (like you :-D ),
 was the main purpose of sending this out, so thanks.

 As far as MSRs themselves go, an extra MSR write in the context switch
 path is likely to pale into the noise.  However, querying the data is an
 indirect MSR read (write to the event select MSR, read from  the data
 MSR).  Furthermore there is no way to atomically read all data at once
 which means that activity on other cores can interleave with
 back-to-back reads in the scheduler.

 All true. And in fact, how and how frequent data should be gathered
 remains to be decided (as said in the document). I was thinking more to
 some periodic sampling, rather than to throw handfuls of rdmsr/wrmsr
 against the code that makes scheduling decisions! :-D



Actually, 

Re: [Xen-devel] [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015-04-09 Thread Peter Zijlstra
On Thu, Apr 09, 2015 at 08:13:27PM +0200, Peter Zijlstra wrote:
 On Mon, Apr 06, 2015 at 10:55:44PM -0400, Waiman Long wrote:
  +#define PV_HB_PER_LINE (SMP_CACHE_BYTES / sizeof(struct 
  pv_hash_bucket))

  +static struct qspinlock **pv_hash(struct qspinlock *lock, struct pv_node 
  *node)
  +{
  +   unsigned long init_hash, hash = hash_ptr(lock, pv_lock_hash_bits);
  +   struct pv_hash_bucket *hb, *end;
  +
  +   if (!hash)
  +   hash = 1;
  +
  +   init_hash = hash;
  +   hb = pv_lock_hash[hash_align(hash)];
  +   for (;;) {
  +   for (end = hb + PV_HB_PER_LINE; hb  end; hb++) {
  +   if (!cmpxchg(hb-lock, NULL, lock)) {
  +   WRITE_ONCE(hb-node, node);
  +   /*
  +* We haven't set the _Q_SLOW_VAL yet. So
  +* the order of writing doesn't matter.
  +*/
  +   smp_wmb(); /* matches rmb from pv_hash_find */
  +   goto done;
  +   }
  +   }
  +
  +   hash = lfsr(hash, pv_lock_hash_bits, 0);
 
 Since pv_lock_hash_bits is a variable, you end up running through that
 massive if() forest to find the corresponding tap every single time. It
 cannot compile-time optimize it.
 
 Hence:
   hash = lfsr(hash, pv_taps);
 
 (I don't get the bits argument to the lfsr).
 
 In any case, like I said before, I think we should try a linear probe
 sequence first, the lfsr was over engineering from my side.
 
  +   hb = pv_lock_hash[hash_align(hash)];

So one thing this does -- and one of the reasons I figured I should
ditch the LFSR instead of fixing it -- is that you end up scanning each
bucket HB_PER_LINE times.

The 'fix' would be to LFSR on cachelines instead of HBs but then you're
stuck with the 0-th cacheline.

  +   BUG_ON(hash == init_hash);
  +   }
  +
  +done:
  +   return hb-lock;
  +}

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 p2 09/19] xen/passthrough: iommu_deassign_device_dt: By default reassign device to nobody

2015-04-09 Thread Julien Grall
From: Julien Grall julien.gr...@linaro.org

Currently, when the device is deassigned from a domain, we directly reassign
to DOM0.

As the device may not have been correctly reset, this may lead to corruption or
expose some part of DOM0 memory. Also, we may have no way to reset some
platform devices.

If Xen reassigns the device to nobody, it may receive some global/context
fault because the transaction has failed (indeed the context has been
marked invalid). Unfortunately there is no simple way to quiesce a buggy
hardware. I think we could live with that for a first version of platform
device passthrough.

DOM0 will have to issue an hypercall to assign the device to itself if it
wants to use it.

Signed-off-by: Julien Grall julien.gr...@linaro.org
Acked-by: Stefano Stabellini stefano.stabell...@citrix.com
Acked-by: Ian Campbell ian.campb...@citrix.com

---
Note: This behavior is documented in a following patch which extend
DOMCT_*assign_device to support non-PCI passthrough.

Changes in v5:
- Add Ian's ack

Changes in v4:
- Add Stefano's ack

Changes in v3:
- Use the coding style of the new SMMU drivers

Changes in v2:
- Fix typoes in the commit message
- Update commit message
---
 xen/drivers/passthrough/arm/smmu.c| 8 +++-
 xen/drivers/passthrough/device_tree.c | 9 +++--
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index 8a9b58b..65de50b 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2692,7 +2692,7 @@ static int arm_smmu_reassign_dev(struct domain *s, struct 
domain *t,
int ret = 0;
 
/* Don't allow remapping on other domain than hwdom */
-   if (t != hardware_domain)
+   if (t  t != hardware_domain)
return -EPERM;
 
if (t == s)
@@ -2702,6 +2702,12 @@ static int arm_smmu_reassign_dev(struct domain *s, 
struct domain *t,
if (ret)
return ret;
 
+   if (t) {
+   ret = arm_smmu_assign_dev(t, devfn, dev);
+   if (ret)
+   return ret;
+   }
+
return 0;
 }
 
diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 05ab274..0ec4103 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -80,15 +80,12 @@ int iommu_deassign_dt_device(struct domain *d, struct 
dt_device_node *dev)
 
 spin_lock(dtdevs_lock);
 
-rc = hd-platform_ops-reassign_device(d, hardware_domain,
-   0, dt_to_dev(dev));
+rc = hd-platform_ops-reassign_device(d, NULL, 0, dt_to_dev(dev));
 if ( rc )
 goto fail;
 
-list_del(dev-domain_list);
-
-dt_device_set_used_by(dev, hardware_domain-domain_id);
-list_add(dev-domain_list, 
domain_hvm_iommu(hardware_domain)-dt_devices);
+list_del_init(dev-domain_list);
+dt_device_set_used_by(dev, DOMID_IO);
 
 fail:
 spin_unlock(dtdevs_lock);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/pci: Try harder to get PXM information for Xen

2015-04-09 Thread Ross Lagerwall

On 08/04/15 18:17, Boris Ostrovsky wrote:

On 04/08/2015 12:44 PM, David Vrabel wrote:

On 08/04/15 15:01, Boris Ostrovsky wrote:

On 04/08/2015 09:39 AM, Ross Lagerwall wrote:

If the device being added to Xen is not contained in the ACPI table,
walk the PCI device tree to find a parent that is contained in the ACPI
table before finding the PXM information from this device.

Signed-off-by: Ross Lagerwall ross.lagerw...@citrix.com
---
   drivers/xen/pci.c | 15 +--
   1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 95ee430..6837181 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -19,6 +19,7 @@
 #include linux/pci.h
   #include linux/acpi.h
+#include linux/pci-acpi.h
   #include xen/xen.h
   #include xen/interface/physdev.h
   #include xen/interface/xen.h
@@ -67,8 +68,18 @@ static int xen_add_device(struct device *dev)
 #ifdef CONFIG_ACPI
   handle = ACPI_HANDLE(pci_dev-dev);
-if (!handle  pci_dev-bus-bridge)
-handle = ACPI_HANDLE(pci_dev-bus-bridge);
+if (!handle) {
+/*
+ * This device was not listed in the ACPI name space at
+ * all. Try to get acpi handle of parent pci bus.
+ */
+struct pci_bus *pbus;
+for (pbus = pci_dev-bus; pbus; pbus = pbus-parent) {
+handle = acpi_pci_get_bridge_handle(pbus);
+if (handle)
+break;
+}
+}
   #ifdef CONFIG_PCI_IOV
   if (!handle  pci_dev-is_virtfn)
   handle = ACPI_HANDLE(physfn-bus-bridge);


Shouldn't we first look at physfn, before going up the tree?

That sounds sensible but should be a separate pre-requisite patch.


It's already there: the last two (unchanged) lines above. The added
chunk should just move to after those two.



OK, I can swap it around.

--
Ross Lagerwall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V8 00/12] xen: Clean-up of mem_event subsystem

2015-04-09 Thread Tim Deegan
Hi, 

Sorry for the delay - I have been away.

At 22:06 +0100 on 26 Mar (1427407612), Tamas K Lengyel wrote:
 Tamas K Lengyel (12):
   xen/mem_event: Cleanup of mem_event structures
   xen/mem_event: Cleanup mem_event names in rings, functions and domctls
   xen/mem_paging: Convert mem_event_op to mem_paging_op and cleanup
   xen: Rename mem_event to vm_event
   tools/tests: Clean-up tools/tests/xen-access
   x86/hvm: factor out and rename vm_event related functions

I have applied these six patches.

   xen: Introduce monitor_op domctl

This one no longer applies cleanly - looks like a conflict with a7511905
(xen: Extend DOMCTL createdomain to support arch configuration)

Can you rebase the second half of the series please?

Cheers,

Tim.

   xen/vm_event: Deprecate VM_EVENT_FLAG_DUMMY flag
   xen/vm_event: Decouple vm_event and mem_access.
   xen/vm_event: Relocate memop checks
   xen/xsm: Split vm_event_op into three separate labels
   xen/vm_event: Add RESUME option to vm_event_op domctl
 
  MAINTAINERS|   6 +-
  docs/misc/xsm-flask.txt|   2 +-
  tools/libxc/Makefile   |   3 +-
  tools/libxc/include/xenctrl.h  |  59 ++-
  tools/libxc/xc_domain.c|  28 +-
  tools/libxc/xc_domain_restore.c|  14 +-
  tools/libxc/xc_domain_save.c   |   4 +-
  tools/libxc/xc_hvm_build_x86.c |   2 +-
  tools/libxc/xc_mem_access.c|  56 ++-
  tools/libxc/xc_mem_paging.c|  80 ++--
  tools/libxc/xc_memshr.c|  29 +-
  tools/libxc/xc_monitor.c   | 137 +++
  tools/libxc/xc_private.h   |  15 +-
  tools/libxc/{xc_mem_event.c = xc_vm_event.c}  |  59 +--
  tools/libxc/xg_save_restore.h  |   2 +-
  tools/tests/xen-access/xen-access.c| 264 +
  tools/xenpaging/pagein.c   |   2 +-
  tools/xenpaging/xenpaging.c| 155 
  tools/xenpaging/xenpaging.h|   8 +-
  xen/arch/x86/Makefile  |   1 +
  xen/arch/x86/domain.c  |   2 +-
  xen/arch/x86/domctl.c  |   4 +-
  xen/arch/x86/hvm/Makefile  |   3 +-
  xen/arch/x86/hvm/emulate.c |   8 +-
  xen/arch/x86/hvm/event.c   | 196 ++
  xen/arch/x86/hvm/hvm.c | 189 +
  xen/arch/x86/hvm/vmx/vmcs.c|  11 +-
  xen/arch/x86/hvm/vmx/vmx.c |   9 +-
  xen/arch/x86/mm/hap/nested_ept.c   |   4 +-
  xen/arch/x86/mm/hap/nested_hap.c   |   4 +-
  xen/arch/x86/mm/mem_paging.c   |  61 +--
  xen/arch/x86/mm/mem_sharing.c  | 180 -
  xen/arch/x86/mm/p2m-pod.c  |   4 +-
  xen/arch/x86/mm/p2m-pt.c   |   4 +-
  xen/arch/x86/mm/p2m.c  | 271 +++--
  xen/arch/x86/monitor.c | 195 ++
  xen/arch/x86/x86_64/compat/mm.c|  24 +-
  xen/arch/x86/x86_64/mm.c   |  24 +-
  xen/common/Makefile|  18 +-
  xen/common/domain.c|  12 +-
  xen/common/domctl.c|  17 +-
  xen/common/mem_access.c|  55 +--
  xen/common/{mem_event.c = vm_event.c} | 505 
 +
  xen/drivers/passthrough/pci.c  |   2 +-
  xen/include/asm-arm/monitor.h  |  35 ++
  xen/include/asm-arm/p2m.h  |  22 +-
  xen/include/asm-x86/domain.h   |  26 +-
  xen/include/asm-x86/hvm/domain.h   |   1 -
  xen/include/asm-x86/hvm/emulate.h  |   2 +-
  xen/include/asm-x86/hvm/event.h|  40 ++
  xen/include/asm-x86/hvm/hvm.h  |  11 -
  xen/include/asm-x86/mem_paging.h   |   5 +-
  xen/include/asm-x86/mem_sharing.h  |   4 +-
  xen/include/asm-x86/monitor.h  |  31 ++
  xen/include/asm-x86/p2m.h  |  41 +-
  xen/include/public/domctl.h| 113 --
  xen/include/public/hvm/params.h|  11 +-
  xen/include/public/memory.h|  27 +-
  xen/include/public/{mem_event.h = vm_event.h} | 183 ++---
  xen/include/xen/mem_access.h   |  18 +-
  xen/include/xen/p2m-common.h   |   4 +-
  xen/include/xen/sched.h|  28 +-
  xen/include/xen/{mem_event.h = vm_event.h}| 103 ++---
  xen/include/xsm/dummy.h|  22 +-
  xen/include/xsm/xsm.h  |  35 +-
  xen/xsm/dummy.c|  13 +-
  xen/xsm/flask/hooks.c  |  66 +++-

[Xen-devel] [PATCH 1/3] xen/x86: Infrastructure to create BUG_FRAMES in asm code

2015-04-09 Thread Andrew Cooper
Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
---
 xen/include/asm-x86/bug.h |   48 -
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/xen/include/asm-x86/bug.h b/xen/include/asm-x86/bug.h
index cd862e3..365c6b8 100644
--- a/xen/include/asm-x86/bug.h
+++ b/xen/include/asm-x86/bug.h
@@ -5,6 +5,13 @@
 #define BUG_LINE_LO_WIDTH (31 - BUG_DISP_WIDTH)
 #define BUG_LINE_HI_WIDTH (31 - BUG_DISP_WIDTH)
 
+#define BUGFRAME_run_fn 0
+#define BUGFRAME_warn   1
+#define BUGFRAME_bug2
+#define BUGFRAME_assert 3
+
+#ifndef __ASSEMBLY__
+
 struct bug_frame {
 signed int loc_disp:BUG_DISP_WIDTH;
 unsigned int line_hi:BUG_LINE_HI_WIDTH;
@@ -22,11 +29,6 @@ struct bug_frame {
   ((1  BUG_LINE_LO_WIDTH) - 1)))
 #define bug_msg(b) ((const char *)(b) + (b)-msg_disp[1])
 
-#define BUGFRAME_run_fn 0
-#define BUGFRAME_warn   1
-#define BUGFRAME_bug2
-#define BUGFRAME_assert 3
-
 #define BUG_FRAME(type, line, ptr, second_frame, msg) do {   \
 BUILD_BUG_ON((line)  (BUG_LINE_LO_WIDTH + BUG_LINE_HI_WIDTH)); \
 asm volatile ( .Lbug%=: ud2\n  \
@@ -66,4 +68,40 @@ struct bug_frame {
   __stop_bug_frames_2[],
   __stop_bug_frames_3[];
 
+#else  /* !__ASSEMBLY__ */
+
+/*
+ * Construct a bugframe, suitable for using in assembly code.  Should always
+ * match the C version above.  One complication is having to stash the strings
+ * in .rodata (TODO - figure out how to get GAS to elide duplicate file_str's)
+ */
+.macro BUG_FRAME type, line, file_str, second_frame, msg
+92: ud2a
+
+.pushsection .rodata
+94: .asciz \file_str
+.popsection
+
+.pushsection .bug_frames.\type, a, @progbits
+93:
+.long (92b - 93b) + ((\line  BUG_LINE_LO_WIDTH)  BUG_DISP_WIDTH)
+.long (94b - 93b) + ((\line  ((1  BUG_LINE_LO_WIDTH) - 1))  
BUG_DISP_WIDTH)
+
+.if \second_frame
+ .pushsection .rodata
+ 95: .asciz \msg
+ .popsection
+.long 0, (95b - 93b)
+.endif
+.popsection
+.endm
+
+#define WARN() BUG_FRAME BUGFRAME_warn, __LINE__, __FILE__, 0, 0
+#define BUG()  BUG_FRAME BUGFRAME_bug,  __LINE__, __FILE__, 0, 0
+
+#define ASSERT_FAILED(msg)  \
+ BUG_FRAME BUGFRAME_assert, __LINE__, __FILE__, 1, msg
+
+#endif /* !__ASSEMBLY__ */
+
 #endif /* __X86_BUG_H__ */
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/3] xen/x86: Use real assert frames for ASSERT_INTERRUPTS_{EN, DIS}ABLED

2015-04-09 Thread Andrew Cooper
Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
---
 xen/include/asm-x86/asm_defns.h |   25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/xen/include/asm-x86/asm_defns.h b/xen/include/asm-x86/asm_defns.h
index 1674c7c..e8a678e 100644
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -6,6 +6,7 @@
 /* NB. Auto-generated from arch/.../asm-offsets.c */
 #include asm/asm-offsets.h
 #endif
+#include asm/bug.h
 #include asm/processor.h
 #include asm/percpu.h
 #include xen/stringify.h
@@ -26,18 +27,24 @@
 #endif
 
 #ifndef NDEBUG
-#define ASSERT_INTERRUPT_STATUS(x)  \
-pushf;  \
-testb $X86_EFLAGS_IF8,1(%rsp);\
-j##x  1f;   \
-ud2a;   \
-1:  addq  $8,%rsp;
+#define ASSERT_INTERRUPTS_ENABLED   \
+pushf;  \
+testb $X86_EFLAGS_IF8,1(%rsp);\
+jnz   1f;   \
+ASSERT_FAILED(INTERRUPTS ENABLED);\
+1:  addq  $8,%rsp;
+
+#define ASSERT_INTERRUPTS_DISABLED  \
+pushf;  \
+testb $X86_EFLAGS_IF8,1(%rsp);\
+jz1f;   \
+ASSERT_FAILED(INTERRUPTS DISABLED);   \
+1:  addq  $8,%rsp;
 #else
-#define ASSERT_INTERRUPT_STATUS(x)
+#define ASSERT_INTERRUPTS_ENABLED
+#define ASSERT_INTERRUPTS_DISABLED
 #endif
 
-#define ASSERT_INTERRUPTS_ENABLED  ASSERT_INTERRUPT_STATUS(nz)
-#define ASSERT_INTERRUPTS_DISABLED ASSERT_INTERRUPT_STATUS(z)
 
 /*
  * This flag is set in an exception frame when registers R12-R15 did not get
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen

2015-04-09 Thread Stefano Stabellini
On Thu, 9 Apr 2015, Eric Dumazet wrote:
 On Thu, 2015-04-09 at 16:46 +0100, Stefano Stabellini wrote:
  Hi all,
  
  I found a performance regression when running netperf -t TCP_MAERTS from
  an external host to a Xen VM on ARM64: v3.19 and v4.0-rc4 running in the
  virtual machine are 30% slower than v3.18.
  
  Through bisection I found that the perf regression is caused by the
  prensence of the following commit in the guest kernel:
  
  
  commit 605ad7f184b60cfaacbc038aa6c55ee68dee3c89
  Author: Eric Dumazet eduma...@google.com
  Date:   Sun Dec 7 12:22:18 2014 -0800
  
  tcp: refine TSO autosizing
  
  
  A simple revert would fix the issue.
  
  Does anybody have any ideas on what could be the cause of the problem?
  Suggestions on what to do to fix it?
 
 You sent this to lkml while networking discussions are on netdev.
 
 This topic had been discussed on netdev multiple times.

Sorry, and many thanks for the quick reply!


 This commit restored original TCP Small Queue behavior, which is the
 first step to fight bufferbloat.
 
 Some network drivers are known to be problematic because of a delayed TX
 completion.
 
 So far this commit did not impact max single flow throughput on 40Gb
 mlx4 NIC. (ie : line rate is possible)
 
 Try to tweak /proc/sys/net/ipv4/tcp_limit_output_bytes to see if it
 makes a difference ?

A very big difference:

echo 262144  /proc/sys/net/ipv4/tcp_limit_output_bytes
brings us much closer to the original performance, the slowdown is just
8%

echo 1048576  /proc/sys/net/ipv4/tcp_limit_output_bytes
fills the gap entirely, same performance as before refine TSO
autosizing


What would be the next step for here?  Should I just document this as an
important performance tweaking step for Xen, or is there something else
we can do?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 1/2] osstest: update FreeBSD guests to 10.1

2015-04-09 Thread Roger Pau Monne
Update FreeBSD guests in OSSTest to FreeBSD 10.1. The following images
should be placed in the osstest images folder:

ftp://ftp.freebsd.org/pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/amd64/Latest/FreeBSD-10.1-RELEASE-amd64.raw.xz
ftp://ftp.freebsd.org/pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/i386/Latest/FreeBSD-10.1-RELEASE-i386.raw.xz

Since new images are in raw format rather than qcow2 remove the runes to
convert from qcow2 to raw.

Signed-off-by: Roger Pau Monné roger@citrix.com
Cc: Ian Jackson ian.jack...@eu.citrix.com
---
Changes since v1:
 - Remove the runes to convert the image from qcow2 to raw.
---
 make-flight| 2 +-
 ts-freebsd-install | 7 ++-
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/make-flight b/make-flight
index 8ac3a87..b340d78 100755
--- a/make-flight
+++ b/make-flight
@@ -150,7 +150,7 @@ do_freebsd_tests () {
  job_create_test 
test-$xenarch$kern-$dom0arch$qemuu_suffix-freebsd10-$freebsdarch \
 test-freebsd xl $xenarch $dom0arch \
 freebsd_arch=$freebsdarch \
- 
freebsd_image=${FREEBSD_IMAGE_PREFIX-FreeBSD-10.0-RELEASE-}$freebsdarch${FREEBSD_IMAGE_SUFFIX--20140116-r260789.qcow2.xz}
 \
+ 
freebsd_image=${FREEBSD_IMAGE_PREFIX-FreeBSD-10.1-RELEASE-}$freebsdarch${FREEBSD_IMAGE_SUFFIX-.raw.xz}
 \
 all_hostflags=$most_hostflags
 
   done
diff --git a/ts-freebsd-install b/ts-freebsd-install
index 6c6abbe..61d2f83 100755
--- a/ts-freebsd-install
+++ b/ts-freebsd-install
@@ -51,8 +51,7 @@ our $freebsd_vm_repo= '/var/images';
 sub prep () {
 my $authkeys= authorized_keys();
 
-target_install_packages_norec($ho, qw(rsync lvm2 qemu-utils
-  xz-utils kpartx));
+target_install_packages_norec($ho, qw(rsync lvm2 xz-utils kpartx));
 
 $gho= prepareguest($ho, $gn, $guesthost, 22,
$disk_mb + 1,
@@ -76,9 +75,7 @@ sub prep () {
 
 target_cmd_root($ho, END, 900);
 set -ex
-xz -dkc $rimage  $rimagebase.qcow2
-qemu-img convert -f qcow2 $rimagebase.qcow2 -O raw $rimagebase.raw
-rm $rimagebase.qcow2
+xz -dkc $rimage  $rimagebase.raw
 dd if=$rimagebase.raw of=$gho-{Lvdev} bs=1M
 rm $rimagebase.raw
 
-- 
1.9.5 (Apple Git-50.3)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 p2 16/19] tools/libxl: arm: Use an higher value for the GIC phandle

2015-04-09 Thread Ian Jackson
Julien Grall writes (Re: [Xen-devel] [PATCH v5 p2 16/19] tools/libxl: arm: Use 
an higher value for the GIC phandle):
 On 09/04/15 17:17, Ian Jackson wrote:
  I have to say I have no idea what a phandle is...
 
 A phandle is a way to reference another node in the device tree.
 Any node that can referenced defines a phandle property with a unique
 unsigned 32 bit value.

Thanks for the explanation.

  Reserve the ID 65000 for the GIC phandle. I think we can safely assume
  that the partial device tree will never contain a such ID.
  
  Do we control the DT compiler ?  What if it should change its
  phandle allocation algorithm ?
 
 We don't control the DT compiler. But the algorithm of the phandle will
 unlikely change. FWIW, the compiler is very tiny, it's not GCC.

Right.

 I only expect people using the partial device tree in very specific use
 case. Generic use case is not even possible with the current status of
 non-PCI (i.e device tree) passthrough. So people control their environment.
 
 As I said later in patch, supporting dynamic allocation will require
 some rework in the device tree creation for the guest.
 
 So I was suggesting this solution as temporary in order to not block the
 DT passthrough.

What would happen if our assumption about the DT compiler were
violated ?

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] remove entry in shadow table

2015-04-09 Thread Tim Deegan
Hi,

At 18:15 +0200 on 06 Apr (1428344115), HANNAS YAYA Issa wrote:
 I want to remove entry of a given page in the shadow page table so that 
 when the next time the guest access to the page there is page fault.
 Here is what I try to do:
 
 1. I have a timer which wake up every 30 seconds and remove entry in 
 the shadow by calling
 sh_remove_all_mappings(d-vcpu[0], _mfn(page_to_mfn(page)))
 here d is the domain and page is the page that I want to remove 
 from the shadow page table.
 2. In the function sh_page_fault() I get the gmfn and compare it with 
 the mfn of the page that I removed earlier from the shadow page table.
 
 Is this method correct?

Yes, though it may be extremely slow if you're doing it for large
numbers of mfns, since sh_remove_all_mappings() may have to do a
brute-force search of all PTEs for each one.

You should probably put your check for the gmfn in _sh_propagate(),
rather than sh_page_fault().  That way it will also see things like
prefetched mappings.

 I also get this error: sh error: sh_remove_all_mappings(): can't find 
 all mappings of mfn

That usually means that there's a mapping of that frame from another
domain.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V8 00/12] xen: Clean-up of mem_event subsystem

2015-04-09 Thread Tamas Lengyel
On Thu, Apr 9, 2015 at 1:07 PM, Tamas Lengyel tamas.leng...@zentific.com
wrote:



 On Thu, Apr 9, 2015 at 1:03 PM, Tim Deegan t...@xen.org wrote:

 Hi,

 Sorry for the delay - I have been away.

 At 22:06 +0100 on 26 Mar (1427407612), Tamas K Lengyel wrote:
  Tamas K Lengyel (12):
xen/mem_event: Cleanup of mem_event structures
xen/mem_event: Cleanup mem_event names in rings, functions and domctls
xen/mem_paging: Convert mem_event_op to mem_paging_op and cleanup
xen: Rename mem_event to vm_event
tools/tests: Clean-up tools/tests/xen-access
x86/hvm: factor out and rename vm_event related functions

 I have applied these six patches.

xen: Introduce monitor_op domctl

 This one no longer applies cleanly - looks like a conflict with a7511905
 (xen: Extend DOMCTL createdomain to support arch configuration)

 Can you rebase the second half of the series please?


 Absolutely. Will be sending it shortly, thanks.

 Tamas



 Cheers,

 Tim.


What's the policy on reusing DOMCTL numbers? I see
XEN_DOMCTL_arm_configure_domain
has been retired in the conflicting patch. Should I just reuse it's number
for monitor_op? For the most part domctl numbers seem to be continuous but
there are holes (30-32) so I'm not sure.

Tamas
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 30/33] tools/libxl: arm: Use an higher value for the GIC phandle

2015-04-09 Thread Julien Grall
Hi Ian,

On 31/03/15 12:43, Ian Campbell wrote:
 On Thu, 2015-03-19 at 19:29 +, Julien Grall wrote:
 The partial device tree may contains phandle. The Device Tree Compiler
 tends to allocate the phandle from 1.

 Reserve the ID 65000 for the GIC phandle. I think we can safely assume
 that the partial device tree will never contain a such ID.

 Signed-off-by: Julien Grall julien.gr...@linaro.org
 Cc: Ian Jackson ian.jack...@eu.citrix.com
 Cc: Wei Liu wei.l...@citrix.com

 ---
 It's not easily possible to track the maximum phandle in the partial
 device tree.

 We would need to parse it twice: one for looking the maximum
 phandle, and one for copying the nodes. This is because we have to
 know the phandle of the GIC when we create the properties of the
 root.
 
 Or you could fill it in post-hoc like we do with e.g. the initramfs
 location?

That would work. I will see for a follow-up of this patch series.

 Anyway, this'll do for now:
 Acked-by: Ian Campbell ian.ampb...@citrix.com
 

 As the phandle is encoded an unsigned 32 bits, I could use an higher
 value. Though, having 65000 phandle is already a lot...

 TODO: If it's necessary, I can check if the value has been used by
 another phandle in the device tree.
 
 If that's easy enough to add then yes please, but if it is complex then
 don't bother.

I would prefer to postpone and replace with a follow-up to allocate
dynamically the phandle.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]

2015-04-09 Thread Prashant Sreedharan
On Thu, 2015-04-09 at 18:25 +0100, Ian Jackson wrote:

 root@bedbug:~# ethtool -S eth0 | grep -v ': 0$'
 NIC statistics:
  rx_octets: 8196868
  rx_ucast_packets: 633
  rx_mcast_packets: 1
  rx_bcast_packets: 123789
  tx_octets: 42854
  tx_ucast_packets: 9
  tx_mcast_packets: 8
  tx_bcast_packets: 603
 root@bedbug:~# ifconfig eth0
 eth0  Link encap:Ethernet  HWaddr 00:13:72:14:c0:51  
   inet addr:10.80.249.102  Bcast:10.80.251.255
   Mask:255.255.252.0
   inet6 addr: fe80::213:72ff:fe14:c051/64 Scope:Link
   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
   RX packets:124774 errors:0 dropped:88921 overruns:0 frame:0
   TX packets:620 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:1000 
   RX bytes:8222158 (7.8 MiB)  TX bytes:42854 (41.8 KiB)
   Interrupt:17 
 
 root@bedbug:~#
 
 It appears therefore that packets are being corrupted on the receive
 path, and the kernel then drops them (as misaddressed).
 
thanks for the repo, the RX drop counter is updated at few places in the
driver. Please use the attached debug patch and provide the logs
From 777363eb77bddd52b9983c0025fed8b4ec151417 Mon Sep 17 00:00:00 2001
From: Prashant Sreedharan prash...@broadcom.com
Date: Thu, 9 Apr 2015 10:52:17 -0700
Subject: [stable: 3.14.37]tg3: debug_patch

---
 drivers/net/ethernet/broadcom/tg3.c |   13 +++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 8206113..5e2c9d6 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6871,8 +6871,11 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
 
 			skb_size = tg3_alloc_rx_data(tp, tpr, opaque_key,
 		*post_ptr, frag_size);
-			if (skb_size  0)
+			if (skb_size  0) {
+netdev_err(tp-dev, alloc_rx failure %x %x %x\n,
+	   skb_size, opaque_key, frag_size);
 goto drop_it;
+			}
 
 			pci_unmap_single(tp-pdev, dma_addr, skb_size,
 	 PCI_DMA_FROMDEVICE);
@@ -6886,6 +6889,8 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
 
 			skb = build_skb(data, frag_size);
 			if (!skb) {
+netdev_err(tp-dev, build_skb failure %d\n,
+	   frag_size);
 tg3_frag_free(frag_size != 0, data);
 goto drop_it_no_recycle;
 			}
@@ -6896,8 +6901,10 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
 
 			skb = netdev_alloc_skb(tp-dev,
 	   len + TG3_RAW_IP_ALIGN);
-			if (skb == NULL)
+			if (skb == NULL) {
+netdev_err(tp-dev, alloc_skb fail %d\n, len);
 goto drop_it_no_recycle;
+			}
 
 			skb_reserve(skb, TG3_RAW_IP_ALIGN);
 			pci_dma_sync_single_for_cpu(tp-pdev, dma_addr, len, PCI_DMA_FROMDEVICE);
@@ -6925,6 +6932,8 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
 		if (len  (tp-dev-mtu + ETH_HLEN) 
 		skb-protocol != htons(ETH_P_8021Q) 
 		skb-protocol != htons(ETH_P_8021AD)) {
+			netdev_err(tp-dev, Proto %x %x\n,
+   skb-protocol, len);
 			dev_kfree_skb(skb);
 			goto drop_it_no_recycle;
 		}
-- 
1.7.1

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] machine address

2015-04-09 Thread Tim Deegan
At 10:26 +0100 on 31 Mar (1427797566), George Dunlap wrote:
 On Mon, Mar 30, 2015 at 3:00 PM, HANNAS YAYA Issa
 issa.hannasy...@enseeiht.fr wrote:
  Hi
  When there is a page fault the trapper of the page fault in the hypervisor
  is do_page_fault in xen/arch/x86/traps.c right?
 
 That's for PV guests.  For HVM guests, the page fault causes a VMEXIT,
 which will be handled in
 xen/arch/x86/hvm/vmx/vmx.c:vmx_vmexit_handler()  (on Intel).
 
  in this funcion i found a method read_cr2() which return the virtual adrress
  of the page who generate the page fault.
  My question is : is it possible to get the machine address of the page table
  entry for this virtual address?
 
 In general the way you have to do that is to use the virtual address
 to walk the guest's pagetables (exactly the same way the hardware
 would do on a TLB miss).
 
 For HVM guests (or PV guests in shadow mode) there's already code to
 do walk for you in xen/arch/x86/mm/guest_walk.c:guest_walk().  You can
 see how it's called from the HAP code and the shadow code if you want.
 
 I don't immediately see a walker for PV guests.

There one in __page_fault_type().  You can also use the linear
pagetable mappings, if you know what you're doing -- see,
e.g., guest_map_l1e() which does something very like this.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 p2 16/19] tools/libxl: arm: Use an higher value for the GIC phandle

2015-04-09 Thread Julien Grall
On 09/04/15 17:52, Ian Jackson wrote:
 Julien Grall writes (Re: [Xen-devel] [PATCH v5 p2 16/19] tools/libxl: arm: 
 Use an higher value for the GIC phandle):
 On 09/04/15 17:17, Ian Jackson wrote:
 I only expect people using the partial device tree in very specific use
 case. Generic use case is not even possible with the current status of
 non-PCI (i.e device tree) passthrough. So people control their environment.

 As I said later in patch, supporting dynamic allocation will require
 some rework in the device tree creation for the guest.

 So I was suggesting this solution as temporary in order to not block the
 DT passthrough.
 
 What would happen if our assumption about the DT compiler were
 violated ?

The phandle would be present in 2 different nodes of the DT. FYI, that
may also happen if a user use 2 times the same phandle in the partial DT.

The guest may retrieve the wrong node and warn/crash depending on the
implementation.

Although, it won't impact neither Xen nor the toolstack.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch V2 07/15] xen: check memory area against e820 map

2015-04-09 Thread David Vrabel
On 09/04/15 07:55, Juergen Gross wrote:
 Provide a service routine to check a physical memory area against the
 E820 map. The routine will return false if the complete area is RAM
 according to the E820 map and true otherwise.
 
 Signed-off-by: Juergen Gross jgr...@suse.com
 ---
  arch/x86/xen/setup.c   | 23 +++
  arch/x86/xen/xen-ops.h |  1 +
  2 files changed, 24 insertions(+)
 
 diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
 index 87251b4..4666adf 100644
 --- a/arch/x86/xen/setup.c
 +++ b/arch/x86/xen/setup.c
 @@ -573,6 +573,29 @@ static unsigned long __init 
 xen_count_remap_pages(unsigned long max_pfn)
   return extra;
  }
  
 +bool __init xen_chk_e820_reserved(phys_addr_t start, phys_addr_t size)

Can you rename this to xen_is_e280_reserved().

Otherwise,

Reviewed-by: David Vrabel david.vra...@citrix.com

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] xen/pci: Try harder to get PXM information for Xen

2015-04-09 Thread David Vrabel
On 09/04/15 08:05, Ross Lagerwall wrote:
 If the device being added to Xen is not contained in the ACPI table,
 walk the PCI device tree to find a parent that is contained in the ACPI
 table before finding the PXM information from this device.
 
 Previously, it would try to get a handle for the device, then the
 device's bridge, then the physfn.  This changes the order so that it
 tries to get a handle for the device, then the physfn, the walks up the
 PCI device tree.

Applied to devel/for-linus-4.1, thanks.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen

2015-04-09 Thread Eric Dumazet
On Thu, 2015-04-09 at 16:46 +0100, Stefano Stabellini wrote:
 Hi all,
 
 I found a performance regression when running netperf -t TCP_MAERTS from
 an external host to a Xen VM on ARM64: v3.19 and v4.0-rc4 running in the
 virtual machine are 30% slower than v3.18.
 
 Through bisection I found that the perf regression is caused by the
 prensence of the following commit in the guest kernel:
 
 
 commit 605ad7f184b60cfaacbc038aa6c55ee68dee3c89
 Author: Eric Dumazet eduma...@google.com
 Date:   Sun Dec 7 12:22:18 2014 -0800
 
 tcp: refine TSO autosizing
 
 
 A simple revert would fix the issue.
 
 Does anybody have any ideas on what could be the cause of the problem?
 Suggestions on what to do to fix it?

You sent this to lkml while networking discussions are on netdev.

This topic had been discussed on netdev multiple times.

This commit restored original TCP Small Queue behavior, which is the
first step to fight bufferbloat.

Some network drivers are known to be problematic because of a delayed TX
completion.

So far this commit did not impact max single flow throughput on 40Gb
mlx4 NIC. (ie : line rate is possible)

Try to tweak /proc/sys/net/ipv4/tcp_limit_output_bytes to see if it
makes a difference ?




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


  1   2   >