Re: [Xen-devel] Request a freeze exception for COLO in v4.6

2015-07-20 Thread Yang Hongyang



On 07/17/2015 06:18 PM, Ian Jackson wrote:

Wei Liu writes (Re: [Xen-devel] Request a freeze exception for COLO in v4.6):

Ian and Ian, anything to add? What are your opinions?


I think COLO is an exciting and important feature.

Unfortunately I agree with Wei that at this stage taking both series
as they currently stand into 4.6 is not tenable.  (Perhaps if I had
been able to help earlier in the release cycle this would have been
different; so for that I'm sorry.)


I would like to see at least the early parts of `COLOPre' in staging
as soon as possible.  There is much preparatory work there which would
be annoying to rebase around, and which has collateral benefits.


Thanks, if you are going to merge some of the patches, there is the
latest version of `COLOPre' patchset:
https://github.com/macrosheep/xen/tree/colo-v9
The latest 26 commits.

I think all patches are ready to be merged except:
7  libxl/remus: init checkpoint_callback in Remus checkpoint callback
11 tools/libxc: support to resume uncooperative HVM guests
20 tools/libx{l,c}: add back channel to libxc



If I can take one example, 11/25 tools/libxc: support to resume
uncooperative HVM guests.  Based on my current understanding this is
even a bugfix.  Sadly it is not quite ready (or at least, wasn't last
night).  But with a few more days we can probably get much of this
cleanup and preparatory work in.


Assuming nothing goes horribly wrong, I am very optimistic about COLO
making it upstream soon, just not in 4.6.

Thanks,
Ian.
.



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v10][PATCH 00/16] Fix RMRR

2015-07-20 Thread Tiejun Chen
v10:

* Patch #6: hvmloader/pci: Try to avoid placing BARs in RMRRs
  This is from George' draft patch which implements an acceptable
  solution in current cycle. Here I just implemented check_overlap_all() and
  some cleanups.

* Patch #7: hvmloader/e820: construct guest e820 table
  Instead of correcting e820, I'd like to correct memory_map.map[]
  and then copy them into e820 directly. I think this can make sure
  hvm_info, memory_map.map[] and e820 are on the same page.

v9:

* Patch #3: xen/passthrough: extend hypercall to support rdm reservation policy
  Correct one check condition of XEN_DOMCTL_DEV_RDM_RELAXED

* Patch #5: hvmloader: get guest memory map into memory_map[]
  Correct the patch head description:
  [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END]
- [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END);
  Merge two if{} as one if{};

* Patch #6: hvmloader/pci: disable all pci devices conflicting with rdm
  A little improvement to code implementation but again, its still argued
  about this solution. Myself prefer to take a look at v7 if possible.

* Patch #7: hvmloader/e820: construct guest e820 table
  Refine that chunk of codes to check/modify highmem

* Patch #15: xen/vtd: prevent from assign the device with shared rmrr
  Correct one indentation issue

v8:

* Patch #3: xen/passthrough: extend hypercall to support rdm reservation policy
  Force to pass 0(strict) when add or move a device in hardware domain,
  and improve some associated code comments.

* Patch #5: hvmloader: get guest memory map into memory_map[]
  Actually we should check this range started from
  RESERVED_MEMORY_DYNAMIC_START, not RESERVED_MEMORY_DYNAMIC_START - 1.
  So correct this and sync the patch head description.

* Patch #6: hvmloader/pci: disable all pci devices conflicting
  We have a big change to this patch:

  Based on current discussion its hard to reshape the original mmio
  allocation mechanism but we haven't a good and simple way to in short term.
  So instead, we don't bring more complicated to intervene that process but
  still check any conflicts to disable all associated devices.

  I know this is still argumented but I'd like to discuss this based on this
  revision and thanks for your time.

* Patch #7: hvmloader/e820: construct guest e820 table
  define low_mem_end as uint32_t;
  Correct those two wrong loops, memory_map.nr_map - nr
  when we're trying to revise low/high memory e820 entries;
  Improve code comments and the patch head description;
  Add one check if highmem is just populated by hvmloader itself

* Patch #11: tools/libxl: detect and avoid conflicts with RDM
  Introduce pfn_to_paddr(x) - ((uint64_t)x  XC_PAGE_SHIFT)
  and set_rdm_entries() to factor out current codes.

* Patch #13: libxl: construct e820 map with RDM information for HVM guest
  make that core construction function as arch-specific to make sure
  we don't break ARM at this point.

* Patch #15:  xen/vtd: prevent from assign the device with shared rmrr
  Merge two if{} as one if{};
  Add to print RMRR range info when stop assign a group device

* Some minimal code style changes

v7:

* Need to rename some parameters:
  In the xl rdm config parsing, `reserve=' should be `policy='.
  In the xl pci config parsing, `rdm_reserve=' should be `rdm_policy='.
  The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'.
  The field name `reserve' in `libxl_rdm_reserve' should be `policy'.

* Just sync with the fallout of renaming parameters above.

Note I also mask patch #10 Acked by Wei Liu, Ian Jackson and Ian
Campbell. ( If I'm wrong just let me know at this point. ) And
as we discussed I'd further improve something as next step after
this round of review.

v6:

* Inside patch #01, add a comments to the nr_entries field inside
  xen_reserved_device_memory_map. Note this is from Jan.

* Inside patch #10,  we need rename something to make our policy reasonable
  type - strategy
  none - ignore
  and based on our discussion, we won't expose ignore in xl level and just
  keep that as a default, and then sync docs and the patch head description

* Inside patch #10, we fix some code stypes and especially we refine
  libxl__xc_device_get_rdm()

* Inside patch #16, we need to sync those renames introduced by patch #10.

v5:

* Fold our original patch #2 and #3 as this new, and here
  introduce a new, clear_identity_p2m_entry, which can wrapper
  guest_physmap_remove_page(). And we use this to clean our
  identity mapping. 

* Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our policy flag, so
  now 0 means strict and 1 means relaxed, and also make DT device
  ignore the flag field simply. And then correct all associated code
  comments.

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.

* Improve some descriptions in doc.

* Make all rdm variables specific to .hvm

* Inside patch #6, we're trying to rename 

Re: [Xen-devel] [PATCH v5 05/15] x86/altp2m: basic data structures and support routines.

2015-07-20 Thread Jan Beulich
 On 18.07.15 at 00:36, ravi.sah...@intel.com wrote:
 From: Jan Beulich [mailto:jbeul...@suse.com]
Sent: Thursday, July 16, 2015 2:08 AM

 On 16.07.15 at 10:57, ravi.sah...@intel.com wrote:
 From: Jan Beulich [mailto:jbeul...@suse.com]
Sent: Tuesday, July 14, 2015 6:13 AM
 On 14.07.15 at 02:14, edmund.h.wh...@intel.com wrote:
 @@ -722,6 +731,27 @@ void nestedp2m_write_p2m_entry(struct
p2m_domain *p2m, unsigned long gfn,
  l1_pgentry_t *p, l1_pgentry_t new, unsigned int level);

  /*
 + * Alternate p2m: shadow p2m tables used for alternate memory views
 + */
 +
 +/* get current alternate p2m table */ static inline struct
 +p2m_domain *p2m_get_altp2m(struct vcpu *v) {
 +struct domain *d = v-domain;
 +uint16_t index = vcpu_altp2m(v).p2midx;
 +
 +ASSERT(index  MAX_ALTP2M);
 +
 +return (index == INVALID_ALTP2M) ? NULL :
 +d-arch.altp2m_p2m[index]; }

Looking at this again, I'm afraid I'd still prefer index  MAX_ALTP2M
in the return statement (and the ASSERT() dropped): The ASSERT() does
nothing in a debug=n build, and hence wouldn't shield us from possibly
having to issue an XSA if somehow an access outside the array's bounds
turned out possible.


 the assert was breaking v5 anyway. BUG_ON (with the right check) is
 probably the right thing to do, as we do in the exit handling that
 checks for a VMFUNC having changed the index.
 So will make that change.

But why use a BUG_ON() when you can deal with this more gracefully? Please
try to avoid crashing the hypervisor when there are other ways to recover.

 
 So in this case there isnt a graceful fallback; this case can happen only if 
 there is a bug in the hypervisor - which should be reported via the BUG_ON.

Generally (as an example), if a hypervisor bug can be confined to a
guest, killing just the guest instead of the hypervisor would still be
preferred (allowing the admin to attempt to gracefully shut down
other guests before updating/restarting).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 06/15] VMX/altp2m: add code to support EPTP switching and #VE.

2015-07-20 Thread Jan Beulich
 On 17.07.15 at 23:08, ravi.sah...@intel.com wrote:
 From: Jan Beulich [mailto:jbeul...@suse.com]
Sent: Thursday, July 16, 2015 2:39 AM

 On 16.07.15 at 11:20, ravi.sah...@intel.com wrote:
 From: Jan Beulich [mailto:jbeul...@suse.com]
Sent: Tuesday, July 14, 2015 6:57 AM
 On 14.07.15 at 02:14, edmund.h.wh...@intel.com wrote:
 +static bool_t vmx_vcpu_emulate_ve(struct vcpu *v) {
 +bool_t rc = 0;
 +ve_info_t *veinfo = gfn_x(vcpu_altp2m(v).veinfo_gfn) !=
 +INVALID_GFN
?
 +hvm_map_guest_frame_rw(gfn_x(vcpu_altp2m(v).veinfo_gfn), 0) :
 +NULL;
 +
 +if ( !veinfo )
 +return 0;
 +
 +if ( veinfo-semaphore != 0 )
 +goto out;
 +
 +rc = 1;
 +
 +veinfo-exit_reason = EXIT_REASON_EPT_VIOLATION;
 +veinfo-semaphore = ~0l;

Isn't semaphore a 32-bit quantity?

 Yes.

I.e. the l suffix can and should be dropped.

 
 Ok.
 
 +{
 +unsigned long idx;
 +
 +if ( v-arch.hvm_vmx.secondary_exec_control 
 +SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS )
 +__vmread(EPTP_INDEX, idx);
 +else
 +{
 +unsigned long eptp;
 +
 +__vmread(EPT_POINTER, eptp);
 +
 +if ( (idx = p2m_find_altp2m_by_eptp(v-domain, eptp)) ==
 + INVALID_ALTP2M )
 +{
 +gdprintk(XENLOG_ERR, EPTP not found in alternate p2m
list\n);
 +domain_crash(v-domain);
 +}
 +}
 +
 +if ( (uint16_t)idx != vcpu_altp2m(v).p2midx )

Is this cast really necessary?

 Yes - The index is 16-bits, this reflects how the field is specified
 in the vmcs also.

While yes answers the question, the explanation you give suggests that the
answer may be wrong: Can idx indeed have bits set beyond bit 15? Because if
it can't, the cast is pointless.

 
 We were just trying to ensure we matched the hardware behavior (I think 
 there was a message George had posted earlier for SVE that asked for that).
 Since hardware considers only a 16 bit field we were doing the same.
 
 +{
 +BUG_ON(idx = MAX_ALTP2M);
 +atomic_dec(p2m_get_altp2m(v)-active_vcpus);
 +vcpu_altp2m(v).p2midx = (uint16_t)idx;

This one surely isn't (or else the field type is wrong).

 Again required. idx can't be uint16_t because __vmread() requires
 unsigned long*, but the index is 16 bits.

But it's a 16-bit VMCS field that you read it from, and hence the upper 48 
 bits
are necessarily zero.

Just to re-iterate: Casts are necessary in certain places, yes, but I see them
used pointlessly or even wrongly more often than not.
 
 Same approach as above - emulating hardware exactly.
 Should we add a comment?

No, drop the casts.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] design philosophy of blktap

2015-07-20 Thread Xuehan Xu
Hi, everyone.

I don't quite follow the design philosophy of blktap. Since every virtual
disk is backed by a tapdisk process, when there are hundreds of domU
running and doing I/O operation simultaneously, which means that hundreds
of tapdisk process are doing I/O read/write simulataneously in dom0, won't
the I/O performance of domU be hurt badly?
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] VT-d: add iommu=igfx_off option to workaround graphics issues

2015-07-20 Thread Andrew Cooper
On 20/07/2015 02:28, Tian, Kevin wrote:
 From: Ting-Wei Lan [mailto:lant...@gmail.com]
 Sent: Saturday, July 18, 2015 3:06 AM

 When using Linux = 3.19 (commit 47591df) as dom0 on some Intel Ironlake
 devices, It is possible to encounter graphics issues that make screen
 unreadable or crash the system. It was reported in freedesktop bugzilla:

 https://bugs.freedesktop.org/show_bug.cgi?id=90037

 As we still cannot find a proper fix for this problem, this patch adds
 iommu=igfx_off option that is similar to Linux intel_iommu=igfx_off for
 users to manually workaround the problem.

 Signed-off-by: Ting-Wei Lan lant...@gmail.com
 Since igfx works before, I'd think a more proper fix should be on the
 bisected Linux commit or i915 to have two working correctly together.
 Otherwise this patch is just hiding problem.

The linux commit is the one which actually fixes PAT support for Linux
under Xen.

It will cause the i915 driver to actually get WC mappings when it asks
for them.

 There is one possible usage to do selective IOMMU disable other than
 global iommu=off switch. Then making this option general would
 be better than igfx_off, e.g. based on BDF. But I'm not sure how it
 is useful in reality.

It is curious that just disabling the IOMMU appears to fix the problem. 
Are there any errata you are aware of on this class of system?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v10][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary

2015-07-20 Thread Tiejun Chen
Previously we always fix that predefined boundary as 2G to handle
conflict between memory and rdm, but now this predefined boundar
can be changes with the parameter rdm_mem_boundary in .cfg file.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Acked-by: Ian Jackson ian.jack...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v8 ~ v10:

* Nothing is changed.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Nothing is changed.

v5:

* Make this variable rdm_mem_boundary_memkb specific to .hvm 

v4:

* Separated from the previous patch to provide a parameter to set that
  predefined boundary dynamically.

 docs/man/xl.cfg.pod.5   | 22 ++
 tools/libxl/libxl.h |  6 ++
 tools/libxl/libxl_create.c  |  4 
 tools/libxl/libxl_dom.c |  8 +---
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c|  3 +++
 6 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 6c55a8b..23068ec 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -867,6 +867,28 @@ More information about Xen gfx_passthru feature is 
available
 on the XenVGAPassthrough Lhttp://wiki.xen.org/wiki/XenVGAPassthrough
 wiki page.
 
+=item Brdm_mem_boundary=MBYTES
+
+Number of megabytes to set a boundary for checking rdm conflict.
+
+When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
+Especially multiple RDM entries would worsen this to lead a complicated
+memory layout. So here we're trying to figure out a simple solution to
+avoid breaking existing layout. So when a conflict occurs,
+
+#1. Above a predefined boundary
+- move lowmem_end below reserved region to solve conflict;
+
+#2. Below a predefined boundary
+- Check strict/relaxed policy.
+strict policy leads to fail libxl. Note when both policies
+are specified on a given region, 'strict' is always preferred.
+relaxed policy issue a warning message and also mask this
+entry INVALID to indicate we shouldn't expose this entry to
+hvmloader.
+
+Here the default is 2G.
+
 =item Bdtdev=[ DTDEV_PATH, DTDEV_PATH, ... ]
 
 Specifies the host device tree nodes to passthrough to this guest. Each
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index a1c5d15..6f157c9 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -863,6 +863,12 @@ const char *libxl_defbool_to_string(libxl_defbool b);
 #define LIBXL_TIMER_MODE_DEFAULT -1
 #define LIBXL_MEMKB_DEFAULT ~0ULL
 
+/*
+ * We'd like to set a memory boundary to determine if we need to check
+ * any overlap with reserved device memory.
+ */
+#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
+
 #define LIBXL_MS_VM_GENID_LEN 16
 typedef struct {
 uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index c8a32d5..3de86a6 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc, 
libxl_domain_build_info *b_info)
 {
 if (b_info-u.hvm.rdm.policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
 b_info-u.hvm.rdm.policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+
+if (b_info-u.hvm.rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
+b_info-u.hvm.rdm_mem_boundary_memkb =
+LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
 }
 
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 80fa17d..e41d54a 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -922,12 +922,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 int ret, rc = ERROR_FAIL;
 uint64_t mmio_start, lowmem_end, highmem_end;
 libxl_domain_build_info *const info = d_config-b_info;
-/*
- * Currently we fix this as 2G to guarantee how to handle
- * our rdm policy. But we'll provide a parameter to set
- * this dynamically.
- */
-uint64_t rdm_mem_boundary = 0x8000;
 
 memset(args, 0, sizeof(struct xc_hvm_build_args));
 /* The params from the configuration file are in Mb, which are then
@@ -966,7 +960,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 args.mmio_start = mmio_start;
 
 rc = libxl__domain_device_construct_rdm(gc, d_config,
-rdm_mem_boundary,
+
info-u.hvm.rdm_mem_boundary_memkb*1024,
 args);
 if (rc) {
 LOG(ERROR, checking reserved device memory failed);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index a3ad8d1..4eb4f8a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ 

[Xen-devel] [v10][PATCH 05/16] hvmloader: get guest memory map into memory_map[]

2015-07-20 Thread Tiejun Chen
Now we get this map layout by call XENMEM_memory_map then
save them into one global variable memory_map[]. It should
include lowmem range, rdm range and highmem range. Note
rdm range and highmem range may not exist in some cases.

And here we need to check if any reserved memory conflicts with
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END).
This range is used to allocate memory in hvmloder level, and
we would lead hvmloader failed in case of conflict since its
another rare possibility in real world.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
Reviewed-by: George Dunlap george.dun...@eu.citrix.com
Acked-by: Jan Beulich jbeul...@suse.com
---
v10:

* Nothing is changed.

v9:

* Correct [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END]
- [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END) in
  the patch head description;
  Merge two if{} as one if{};

v8:

* Actually we should check this range started from
  RESERVED_MEMORY_DYNAMIC_START, not RESERVED_MEMORY_DYNAMIC_START - 1.
  So correct this and sync the patch head description.

v5 ~ v7:

* Nothing is changed.

v4:

* Move some codes related to e820 to that specific file, e820.c.

* Consolidate printf()+BUG() and BUG_ON()

* Avoid another fixed width type for the parameter of get_mem_mapping_layout()

 tools/firmware/hvmloader/e820.c  | 32 
 tools/firmware/hvmloader/e820.h  |  7 +++
 tools/firmware/hvmloader/hvmloader.c |  2 ++
 tools/firmware/hvmloader/util.c  | 26 ++
 tools/firmware/hvmloader/util.h  | 12 
 5 files changed, 79 insertions(+)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..7a414ab 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -23,6 +23,38 @@
 #include config.h
 #include util.h
 
+struct e820map memory_map;
+
+void memory_map_setup(void)
+{
+unsigned int nr_entries = E820MAX, i;
+int rc;
+uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START;
+uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
+
+rc = get_mem_mapping_layout(memory_map.map, nr_entries);
+
+if ( rc || !nr_entries )
+{
+printf(Get guest memory maps[%d] failed. (%d)\n, nr_entries, rc);
+BUG();
+}
+
+memory_map.nr_map = nr_entries;
+
+for ( i = 0; i  nr_entries; i++ )
+{
+if ( memory_map.map[i].type == E820_RESERVED 
+ check_overlap(alloc_addr, alloc_size,
+   memory_map.map[i].addr, memory_map.map[i].size) )
+{
+printf(Fail to setup memory map due to conflict);
+printf( on dynamic reserved memory range.\n);
+BUG();
+}
+}
+}
+
 void dump_e820_table(struct e820entry *e820, unsigned int nr)
 {
 uint64_t last_end = 0, start, end;
diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
index b2ead7f..8b5a9e0 100644
--- a/tools/firmware/hvmloader/e820.h
+++ b/tools/firmware/hvmloader/e820.h
@@ -15,6 +15,13 @@ struct e820entry {
 uint32_t type;
 } __attribute__((packed));
 
+#define E820MAX128
+
+struct e820map {
+unsigned int nr_map;
+struct e820entry map[E820MAX];
+};
+
 #endif /* __HVMLOADER_E820_H__ */
 
 /*
diff --git a/tools/firmware/hvmloader/hvmloader.c 
b/tools/firmware/hvmloader/hvmloader.c
index 25b7f08..84c588c 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -262,6 +262,8 @@ int main(void)
 
 init_hypercalls();
 
+memory_map_setup();
+
 xenbus_setup();
 
 bios = detect_bios();
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 80d822f..122e3fa 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -27,6 +27,17 @@
 #include xen/memory.h
 #include xen/sched.h
 
+/*
+ * Check whether there exists overlap in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+bool check_overlap(uint64_t start, uint64_t size,
+   uint64_t reserved_start, uint64_t reserved_size)
+{
+return (start + size  reserved_start) 
+(start  reserved_start + reserved_size);
+}
+
 void wrmsr(uint32_t idx, uint64_t v)
 {
 asm volatile (
@@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
 *p = '\0';
 }
 
+int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
+{
+int rc;
+struct xen_memory_map memmap = {
+.nr_entries = *max_entries
+};
+
+set_xen_guest_handle(memmap.buffer, entries);
+
+rc = 

[Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-20 Thread Tiejun Chen
While building a VM, HVM domain builder provides struct hvm_info_table{}
to help hvmloader. Currently it includes two fields to construct guest
e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
check them to fix any conflict with RDM.

RMRR can reside in address space beyond 4G theoretically, but we never
see this in real world. So in order to avoid breaking highmem layout
we don't solve highmem conflict. Note this means highmem rmrr could still
be supported if no conflict.

But in the case of lowmem, RMRR probably scatter the whole RAM space.
Especially multiple RMRR entries would worsen this to lead a complicated
memory layout. And then its hard to extend hvm_info_table{} to work
hvmloader out. So here we're trying to figure out a simple solution to
avoid breaking existing layout. So when a conflict occurs,

#1. Above a predefined boundary (2G)
- move lowmem_end below reserved region to solve conflict;

#2. Below a predefined boundary (2G)
- Check strict/relaxed policy.
strict policy leads to fail libxl. Note when both policies
are specified on a given region, 'strict' is always preferred.
relaxed policy issue a warning message and also mask this entry 
INVALID
to indicate we shouldn't expose this entry to hvmloader.

Note later we need to provide a parameter to set that predefined boundary
dynamically.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
---
v9 ~ v10:

* Nothing is changed.

v8:

* Introduce pfn_to_paddr(x) - ((uint64_t)x  XC_PAGE_SHIFT)
  and set_rdm_entries() to factor out current codes.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* fix some code stypes
* Refine libxl__xc_device_get_rdm()

v5:

* A little change to make sure the per-device policy always override the global
  policy and correct its associated code comments.
* Fix one typo in the patch head description
* Rename xc_device_get_rdm() with libxl__xc_device_get_rdm(), and then replace
  malloc() with libxl__malloc(), and finally cleanup this fallout.
* libxl__xc_device_get_rdm() should return proper libxl error code, ERROR_FAIL.
  Then instead, the allocated RDM entries would be returned with an out 
parameter.

v4:

* Consistent to use term RDM.
* Unconditionally set *nr_entries to 0
* Grab to all sutffs to provide a parameter to set our predefined boundary
  dynamically to as a separated patch later

 tools/libxl/libxl_create.c   |   2 +-
 tools/libxl/libxl_dm.c   | 273 +++
 tools/libxl/libxl_dom.c  |  17 ++-
 tools/libxl/libxl_internal.h |  11 +-
 tools/libxl/libxl_types.idl  |   7 ++
 5 files changed, 307 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f75d4f1..c8a32d5 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -459,7 +459,7 @@ int libxl__domain_build(libxl__gc *gc,
 
 switch (info-type) {
 case LIBXL_DOMAIN_TYPE_HVM:
-ret = libxl__build_hvm(gc, domid, info, state);
+ret = libxl__build_hvm(gc, domid, d_config, state);
 if (ret)
 goto out;
 
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 317a8eb..692258b 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -90,6 +90,279 @@ const char *libxl__domain_device_model(libxl__gc *gc,
 return dm;
 }
 
+static int
+libxl__xc_device_get_rdm(libxl__gc *gc,
+ uint32_t flag,
+ uint16_t seg,
+ uint8_t bus,
+ uint8_t devfn,
+ unsigned int *nr_entries,
+ struct xen_reserved_device_memory **xrdm)
+{
+int rc = 0, r;
+
+/*
+ * We really can't presume how many entries we can get in advance.
+ */
+*nr_entries = 0;
+r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,
+  NULL, nr_entries);
+assert(r = 0);
+/* 0 means we have no any rdm entry. */
+if (!r) goto out;
+
+if (errno != ENOBUFS) {
+rc = ERROR_FAIL;
+goto out;
+}
+
+*xrdm = libxl__malloc(gc,
+  *nr_entries * sizeof(xen_reserved_device_memory_t));
+r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,
+  *xrdm, nr_entries);
+if (r)
+rc = ERROR_FAIL;
+
+ out:
+if (rc) {
+*nr_entries = 0;
+*xrdm = NULL;
+LOG(ERROR, Could not get reserved device memory maps.\n);
+}
+return rc;
+}
+
+/*
+ * Check whether there exists rdm hole in the specified memory range.
+ * Returns true if exists, else 

[Xen-devel] [v10][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-20 Thread Tiejun Chen
This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
rdm = strategy=host,policy=strict/relaxed
Per-device RDM parameter:
pci = [ 'sbdf, rdm_policy=strict/relaxed' ]

Global RDM parameter, strategy, allows user to specify reserved regions
explicitly, Currently, using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. By default this isn't set so we don't
check all rdms. Instead, we just check rdm specific to a given device if
you're assigning this kind of device. Note this option is not recommended
unless you can make sure any conflict does exist.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM can't keep running, while 'relaxed' allows moving forward with a
warning message thrown out.

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Acked-by: Ian Jackson ian.jack...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v9 ~ v10:

* Nothing is changed.

v8:

* One minimal code style change

v7:

* Need to rename some parameters:
  In the xl rdm config parsing, `reserve=' should be `policy='.
  In the xl pci config parsing, `rdm_reserve=' should be `rdm_policy='.
  The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'.
  The field name `reserve' in `libxl_rdm_reserve' should be `policy'.

v6:

* Some rename to make our policy reasonable
  type - strategy
  none - ignore
* Don't expose ignore in xl level and just keep that as a default.
  And then sync docs and the patch head description

v5:

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.
* A little change to follow one bit, XEN_DOMCTL_DEV_RDM_RELAXED.
* Improve all descriptions in doc.
* Make all rdm variables specific to .hvm

v4:

* No need to define init_val for libxl_rdm_reserve_type since its just zero
* Grab those changes to xl/libxlu to as a final patch

 docs/man/xl.cfg.pod.5| 81 
 docs/misc/vtd.txt| 24 +
 tools/libxl/libxl_create.c   |  7 
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_pci.c  |  9 +
 tools/libxl/libxl_types.idl  | 18 ++
 6 files changed, 141 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..6c55a8b 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -655,6 +655,79 @@ assigned slave device.
 
 =back
 
+=item Brdm=RDM_RESERVATION_STRING
+
+(HVM/x86 only) Specifies information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough. One example of RDM
+is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
+on x86 platform.
+
+BRDM_RESERVE_STRING has the form C[KEY=VALUE,KEY=VALUE,... where:
+
+=over 4
+
+=item BKEY=VALUE
+
+Possible BKEYs are:
+
+=over 4
+
+=item Bstrategy=STRING
+
+Currently there is only one valid type:
+
+host means all reserved device memory on this platform should be checked to
+reserve regions in this VM's guest address space. This global rdm parameter
+allows user to specify reserved regions explicitly, and using host includes
+all reserved regions reported on this platform, which is useful when doing
+hotplug.
+
+By default this isn't set so we don't check all rdms. Instead, we just check
+rdm specific to a given device if you're assigning this kind of device. Note
+this option is not recommended unless you can make sure any conflict does 
exist.
+
+For example, you're trying to set memory = 2800 to allocate memory to one
+given VM but the platform owns two RDM regions like,
+
+Device A [sbdf_A]: RMRR region_A: base_addr ac6d3000 end_address ac6e6fff
+Device B [sbdf_B]: RMRR region_B: base_addr ad80 end_address afff
+
+In this conflict case,
+
+#1. If Bstrategy is set to host, for example,
+
+rdm = strategy=host,policy=strict or rdm = strategy=host,policy=relaxed
+
+It means all conflicts will be handled according to the policy
+introduced by Bpolicy as described below.
+
+#2. If Bstrategy is not set at all, but
+
+pci = [ 'sbdf_A, rdm_policy=x' ]
+
+It means only one conflict of region_A will be handled according to the policy
+introduced by Brdm_policy=STRING as described inside pci options.
+
+=item Bpolicy=STRING
+

[Xen-devel] [v10][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-20 Thread Tiejun Chen
Now use the hypervisor-supplied memory map to build our final e820 table:
* Add regions for BIOS ranges and other special mappings not in the
  hypervisor map
* Add in the hypervisor supplied regions
* Adjust the lowmem and highmem regions if we've had to relocate
  memory (adding a highmem region if necessary)
* Sort all the ranges so that they appear in memory order.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v10:

* Instead of correcting e820, I'd like to correct memory_map.map[]
  and then copy them into e820 directly. I think this can make sure
  hvm_info, memory_map.map[] and e820 are on the same page.

v9:

* Refine that chunk of codes to check/modify highmem

v8:

* define low_mem_end as uint32_t

* Correct those two wrong loops, memory_map.nr_map - nr
  when we're trying to revise low/high memory e820 entries.

* Improve code comments and the patch head description

* Add one check if highmem is just populated by hvmloader itself

v5 ~ v7:

* Nothing is changed.

v4:

* Rename local variable, low_mem_pgend, to low_mem_end.

* Improve some code comments

* Adjust highmem after lowmem is changed.
 
 tools/firmware/hvmloader/e820.c | 108 +++-
 1 file changed, 95 insertions(+), 13 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 7a414ab..ca794ad 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -105,7 +105,11 @@ int build_e820_table(struct e820entry *e820,
  unsigned int lowmem_reserved_base,
  unsigned int bios_image_base)
 {
-unsigned int nr = 0;
+unsigned int nr = 0, i, j;
+uint32_t low_mem_end = hvm_info-low_mem_pgend  PAGE_SHIFT;
+uint32_t add_high_mem = 0;
+uint64_t high_mem_end = (uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT;
+uint64_t map_start, map_size, map_end;
 
 if ( !lowmem_reserved_base )
 lowmem_reserved_base = 0xA;
@@ -149,13 +153,6 @@ int build_e820_table(struct e820entry *e820,
 e820[nr].type = E820_RESERVED;
 nr++;
 
-/* Low RAM goes here. Reserve space for special pages. */
-BUG_ON((hvm_info-low_mem_pgend  PAGE_SHIFT)  (2u  20));
-e820[nr].addr = 0x10;
-e820[nr].size = (hvm_info-low_mem_pgend  PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
-nr++;
-
 /*
  * Explicitly reserve space for special pages.
  * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -191,16 +188,101 @@ int build_e820_table(struct e820entry *e820,
 nr++;
 }
 
+/* Low RAM goes here. Reserve space for special pages. */
+BUG_ON(low_mem_end  (2u  20));
 
-if ( hvm_info-high_mem_pgend )
+/*
+ * Construct E820 table according to recorded memory map.
+ *
+ * The memory map created by toolstack may include,
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ *
+ * #2. Reserved regions if they exist
+ *
+ * #3. High memory region if it exists
+ *
+ * Note we just have one low memory entry and one high mmeory entry if
+ * exists.
+ *
+ * But we may have relocated RAM to allocate sufficient MMIO previously
+ * so low_mem_pgend would be changed over there. And here memory_map[]
+ * records the original low/high memory, so if low_mem_end is less than
+ * the original we need to revise low/high memory range firstly.
+ */
+for ( i = 0; i  memory_map.nr_map; i++ )
 {
-e820[nr].addr = ((uint64_t)1  32);
-e820[nr].size =
-((uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
+map_start = memory_map.map[i].addr;
+map_size = memory_map.map[i].size;
+map_end = map_start + map_size;
+
+/* If we need to adjust lowmem. */
+if ( memory_map.map[i].type == E820_RAM 
+ low_mem_end  map_start  low_mem_end  map_end )
+{
+add_high_mem = map_end - low_mem_end;
+memory_map.map[i].size = low_mem_end - map_start;
+break;
+}
+}
+
+/* If we need to adjust highmem. */
+if ( add_high_mem )
+{
+/* Modify the existing highmem region if it exists. */
+for ( i = 0; i  memory_map.nr_map; i++ )
+{
+map_start = memory_map.map[i].addr;
+map_size = memory_map.map[i].size;
+map_end = map_start + map_size;
+
+if ( memory_map.map[i].type == E820_RAM 
+ 

[Xen-devel] [v10][PATCH 16/16] tools: parse to enable new rdm policy parameters

2015-07-20 Thread Tiejun Chen
This patch parses to enable user configurable parameters to specify
RDM resource and according policies which are defined previously,

Global RDM parameter:
rdm = strategy=host,policy=strict/relaxed
Per-device RDM parameter:
pci = [ 'sbdf, rdm_policy=strict/relaxed' ]

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v9 ~ v10:

* Nothing is changed.

v8:

* Clean some codes style issues.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Just sync those renames introduced by patch #10.

v5:

* Need a rebase after we make all rdm variables specific to .hvm.
* Like other pci option, the per-device policy always follows
  the global policy by default.

v4:

* Separated from current patch #11 to parse/enable our rdm policy parameters
  since its make a lot sense and these stuffs are specific to xl/libxlu.

 tools/libxl/libxlu_pci.c | 92 +++-
 tools/libxl/libxlutil.h  |  4 +++
 tools/libxl/xl_cmdimpl.c | 13 +++
 3 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c
index 26fb143..026413b 100644
--- a/tools/libxl/libxlu_pci.c
+++ b/tools/libxl/libxlu_pci.c
@@ -42,6 +42,9 @@ static int pcidev_struct_fill(libxl_device_pci *pcidev, 
unsigned int domain,
 #define STATE_OPTIONS_K 6
 #define STATE_OPTIONS_V 7
 #define STATE_TERMINAL  8
+#define STATE_TYPE  9
+#define STATE_RDM_STRATEGY  10
+#define STATE_RESERVE_POLICY11
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char 
*str)
 {
 unsigned state = STATE_DOMAIN;
@@ -143,7 +146,18 @@ int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci 
*pcidev, const char *str
 pcidev-permissive = atoi(tok);
 }else if ( !strcmp(optkey, seize) ) {
 pcidev-seize = atoi(tok);
-}else{
+} else if (!strcmp(optkey, rdm_policy)) {
+if (!strcmp(tok, strict)) {
+pcidev-rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
+} else if (!strcmp(tok, relaxed)) {
+pcidev-rdm_policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+} else {
+XLU__PCI_ERR(cfg, %s is not an valid PCI RDM property
+   policy: 'strict' or 'relaxed'.,
+ tok);
+goto parse_error;
+}
+} else {
 XLU__PCI_ERR(cfg, Unknown PCI BDF option: %s, optkey);
 }
 tok = ptr + 1;
@@ -167,6 +181,82 @@ parse_error:
 return ERROR_INVAL;
 }
 
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str)
+{
+unsigned state = STATE_TYPE;
+char *buf2, *tok, *ptr, *end;
+
+if (NULL == (buf2 = ptr = strdup(str)))
+return ERROR_NOMEM;
+
+for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr  end; ptr++) {
+switch(state) {
+case STATE_TYPE:
+if (*ptr == '=') {
+state = STATE_RDM_STRATEGY;
+*ptr = '\0';
+if (strcmp(tok, strategy)) {
+XLU__PCI_ERR(cfg, Unknown RDM state option: %s, tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_RDM_STRATEGY:
+if (*ptr == '\0' || *ptr == ',') {
+state = STATE_RESERVE_POLICY;
+*ptr = '\0';
+if (!strcmp(tok, host)) {
+rdm-strategy = LIBXL_RDM_RESERVE_STRATEGY_HOST;
+} else {
+XLU__PCI_ERR(cfg, Unknown RDM strategy option: %s, tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_RESERVE_POLICY:
+if (*ptr == '=') {
+state = STATE_OPTIONS_V;
+*ptr = '\0';
+if (strcmp(tok, policy)) {
+XLU__PCI_ERR(cfg, Unknown RDM property value: %s, tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_OPTIONS_V:
+if (*ptr == ',' || *ptr == '\0') {
+state = STATE_TERMINAL;
+*ptr = '\0';
+if (!strcmp(tok, strict)) {
+rdm-policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
+} else if (!strcmp(tok, 

[Xen-devel] [v10][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map

2015-07-20 Thread Tiejun Chen
From: Jan Beulich jbeul...@suse.com

This is a prerequisite for punching holes into HVM and PVH guests' P2M
to allow passing through devices that are associated with (on VT-d)
RMRRs.

CC: Jan Beulich jbeul...@suse.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Jan Beulich jbeul...@suse.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v7 ~ v10:

* Nothing is changed.

v6:

* Add a comments to the nr_entries field inside xen_reserved_device_memory_map

v5 ~ v4:

* Nothing is changed.

 xen/common/compat/memory.c   | 66 
 xen/common/memory.c  | 64 ++
 xen/drivers/passthrough/iommu.c  | 10 ++
 xen/drivers/passthrough/vtd/dmar.c   | 32 +
 xen/drivers/passthrough/vtd/extern.h |  1 +
 xen/drivers/passthrough/vtd/iommu.c  |  1 +
 xen/include/public/memory.h  | 37 +++-
 xen/include/xen/iommu.h  | 10 ++
 xen/include/xen/pci.h|  2 ++
 xen/include/xlat.lst |  3 +-
 10 files changed, 224 insertions(+), 2 deletions(-)

diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index b258138..b608496 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -17,6 +17,45 @@ CHECK_TYPE(domid);
 CHECK_mem_access_op;
 CHECK_vmemrange;
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+struct compat_reserved_device_memory_map map;
+unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+  u32 id, void *ctxt)
+{
+struct get_reserved_device_memory *grdm = ctxt;
+u32 sbdf;
+struct compat_reserved_device_memory rdm = {
+.start_pfn = start, .nr_pages = nr
+};
+
+sbdf = PCI_SBDF2(grdm-map.seg, grdm-map.bus, grdm-map.devfn);
+if ( (grdm-map.flag  PCI_DEV_RDM_ALL) || (sbdf == id) )
+{
+if ( grdm-used_entries  grdm-map.nr_entries )
+{
+if ( rdm.start_pfn != start || rdm.nr_pages != nr )
+return -ERANGE;
+
+if ( __copy_to_compat_offset(grdm-map.buffer,
+ grdm-used_entries,
+ rdm,
+ 1) )
+{
+return -EFAULT;
+}
+}
+++grdm-used_entries;
+return 1;
+}
+
+return 0;
+}
+#endif
+
 int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 {
 int split, op = cmd  MEMOP_CMD_MASK;
@@ -303,6 +342,33 @@ int compat_memory_op(unsigned int cmd, 
XEN_GUEST_HANDLE_PARAM(void) compat)
 break;
 }
 
+#ifdef HAS_PASSTHROUGH
+case XENMEM_reserved_device_memory_map:
+{
+struct get_reserved_device_memory grdm;
+
+if ( copy_from_guest(grdm.map, compat, 1) ||
+ !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+return -EFAULT;
+
+grdm.used_entries = 0;
+rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+  grdm);
+
+if ( !rc  grdm.map.nr_entries  grdm.used_entries )
+rc = -ENOBUFS;
+
+grdm.map.nr_entries = grdm.used_entries;
+if ( grdm.map.nr_entries )
+{
+if ( __copy_to_guest(compat, grdm.map, 1) )
+rc = -EFAULT;
+}
+
+return rc;
+}
+#endif
+
 default:
 return compat_arch_memory_op(cmd, compat);
 }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index c84fcdd..7b6281b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -748,6 +748,43 @@ static int construct_memop_from_reservation(
 return 0;
 }
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+struct xen_reserved_device_memory_map map;
+unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+  u32 id, void *ctxt)
+{
+struct get_reserved_device_memory *grdm = ctxt;
+u32 sbdf;
+
+sbdf = PCI_SBDF2(grdm-map.seg, grdm-map.bus, grdm-map.devfn);
+if ( (grdm-map.flag  PCI_DEV_RDM_ALL) || (sbdf == id) )
+{
+if ( grdm-used_entries  grdm-map.nr_entries )
+{
+struct xen_reserved_device_memory rdm = {
+.start_pfn = start, .nr_pages = nr
+};
+
+if ( __copy_to_guest_offset(grdm-map.buffer,
+grdm-used_entries,
+rdm,
+1) )
+{
+return -EFAULT;
+}
+}
+++grdm-used_entries;
+return 

[Xen-devel] [v10][PATCH 08/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map

2015-07-20 Thread Tiejun Chen
We will introduce the hypercall xc_reserved_device_memory_map
approach to libxc. This helps us get rdm entry info according to
different parameters. If flag == PCI_DEV_RDM_ALL, all entries
should be exposed. Or we just expose that rdm entry specific to
a SBDF.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v4 ~ v10:

* Nothing is changed.

 tools/libxc/include/xenctrl.h |  8 
 tools/libxc/xc_domain.c   | 36 
 2 files changed, 44 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index d1d2ab3..9160623 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1326,6 +1326,14 @@ int xc_domain_set_memory_map(xc_interface *xch,
 int xc_get_machine_memory_map(xc_interface *xch,
   struct e820entry entries[],
   uint32_t max_entries);
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+  uint32_t flag,
+  uint16_t seg,
+  uint8_t bus,
+  uint8_t devfn,
+  struct xen_reserved_device_memory entries[],
+  uint32_t *max_entries);
 #endif
 int xc_domain_set_time_offset(xc_interface *xch,
   uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index ce51e69..0951291 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -684,6 +684,42 @@ int xc_domain_set_memory_map(xc_interface *xch,
 
 return rc;
 }
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+  uint32_t flag,
+  uint16_t seg,
+  uint8_t bus,
+  uint8_t devfn,
+  struct xen_reserved_device_memory entries[],
+  uint32_t *max_entries)
+{
+int rc;
+struct xen_reserved_device_memory_map xrdmmap = {
+.flag = flag,
+.seg = seg,
+.bus = bus,
+.devfn = devfn,
+.nr_entries = *max_entries
+};
+DECLARE_HYPERCALL_BOUNCE(entries,
+ sizeof(struct xen_reserved_device_memory) *
+ *max_entries, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+if ( xc_hypercall_bounce_pre(xch, entries) )
+return -1;
+
+set_xen_guest_handle(xrdmmap.buffer, entries);
+
+rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
+  xrdmmap, sizeof(xrdmmap));
+
+xc_hypercall_bounce_post(xch, entries);
+
+*max_entries = xrdmmap.nr_entries;
+
+return rc;
+}
+
 int xc_get_machine_memory_map(xc_interface *xch,
   struct e820entry entries[],
   uint32_t max_entries)
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v10][PATCH 04/16] xen: enable XENMEM_memory_map in hvm

2015-07-20 Thread Tiejun Chen
This patch enables XENMEM_memory_map in hvm. So hvmloader can
use it to setup the e820 mappings.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Tim Deegan t...@xen.org
Reviewed-by: Kevin Tian kevin.t...@intel.com
Acked-by: Jan Beulich jbeul...@suse.com
Acked-by: George Dunlap george.dun...@eu.citrix.com
---
v5 ~ v10:

* Nothing is changed.

v4:

* Just refine the patch head description as Jan commented.

 xen/arch/x86/hvm/hvm.c | 2 --
 xen/arch/x86/mm.c  | 6 --
 2 files changed, 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..638daee 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4741,7 +4741,6 @@ static long hvm_memory_op(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd  MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
@@ -4817,7 +4816,6 @@ static long hvm_memory_op_compat32(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd  MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index fd151c6..92eccd0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4717,12 +4717,6 @@ long arch_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 return rc;
 }
 
-if ( is_hvm_domain(d) )
-{
-rcu_unlock_domain(d);
-return -EPERM;
-}
-
 e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries);
 if ( e820 == NULL )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v10][PATCH 02/16] xen/vtd: create RMRR mapping

2015-07-20 Thread Tiejun Chen
RMRR reserved regions must be setup in the pfn space with an identity
mapping to reported mfn. However existing code has problem to setup
correct mapping when VT-d shares EPT page table, so lead to problem
when assigning devices (e.g GPU) with RMRR reported. So instead, this
patch aims to setup identity mapping in p2m layer, regardless of
whether EPT is shared or not. And we still keep creating VT-d table.

And we also need to introduce a pair of helper to create/clear this
sort of identity mapping as follows:

set_identity_p2m_entry():

If the gfn space is unoccupied, we just set the mapping. If space
is already occupied by desired identity mapping, do nothing.
Otherwise, failure is returned.

clear_identity_p2m_entry():

We just define macro to wrapper guest_physmap_remove_page() with
a returning value as necessary.

CC: Tim Deegan t...@xen.org
CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
Reviewed-by: Tim Deegan t...@xen.org
Acked-by: George Dunlap george.dun...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6 ~ v10:

* Nothing is changed.

v5:

* Fold our original patch #2 and #3 as this new

* Introduce a new, clear_identity_p2m_entry, which can wrapper
  guest_physmap_remove_page(). And we use this to clean our
  identity mapping. 

v4:

* Change that orginal condition,

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
  
  to make sure we catch those invalid mfn mapping as we expected.

* To have

  if ( !paging_mode_translate(p2m-domain) )
return 0;

  at the start, instead of indenting the whole body of the function
  in an inner scope. 

* extend guest_physmap_remove_page() to return a value as a proper
  unmapping helper

* Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. 

* Drop iommu_map_page() since actually ept_set_entry() can do this
  internally.

 xen/arch/x86/mm/p2m.c   | 40 +++--
 xen/drivers/passthrough/vtd/iommu.c |  5 ++---
 xen/include/asm-x86/p2m.h   | 13 +---
 3 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6b39733..99a26ca 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -584,14 +584,16 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long 
gfn, unsigned long mfn,
  p2m-default_access);
 }
 
-void
+int
 guest_physmap_remove_page(struct domain *d, unsigned long gfn,
   unsigned long mfn, unsigned int page_order)
 {
 struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int rc;
 gfn_lock(p2m, gfn, page_order);
-p2m_remove_page(p2m, gfn, mfn, page_order);
+rc = p2m_remove_page(p2m, gfn, mfn, page_order);
 gfn_unlock(p2m, gfn, page_order);
+return rc;
 }
 
 int
@@ -898,6 +900,40 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long 
gfn, mfn_t mfn,
 return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
 }
 
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+   p2m_access_t p2ma)
+{
+p2m_type_t p2mt;
+p2m_access_t a;
+mfn_t mfn;
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int ret;
+
+if ( !paging_mode_translate(p2m-domain) )
+return 0;
+
+gfn_lock(p2m, gfn, 0);
+
+mfn = p2m-get_entry(p2m, gfn, p2mt, a, 0, NULL);
+
+if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
+ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
+p2m_mmio_direct, p2ma);
+else if ( mfn_x(mfn) == gfn  p2mt == p2m_mmio_direct  a == p2ma )
+ret = 0;
+else
+{
+ret = -EBUSY;
+printk(XENLOG_G_WARNING
+   Cannot setup identity map d%d:%lx,
+gfn already mapped to %lx.\n,
+   d-domain_id, gfn, mfn_x(mfn));
+}
+
+gfn_unlock(p2m, gfn, 0);
+return ret;
+}
+
 /* Returns: 0 for success, -errno for failure */
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 44ed23d..8415958 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn  end_pfn )
 {
-if ( intel_iommu_unmap_page(d, base_pfn) )
+if ( clear_identity_p2m_entry(d, base_pfn, 0) )
 ret = -ENXIO;
 base_pfn++;
 }
@@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn  end_pfn )
 {
-int err = intel_iommu_map_page(d, base_pfn, base_pfn,
-   

[Xen-devel] [v10][PATCH 06/16] hvmloader/pci: Try to avoid placing BARs in RMRRs

2015-07-20 Thread Tiejun Chen
Try to avoid placing PCI BARs over RMRRs:

- If mmio_hole_size is not specified, and the existing MMIO range has
  RMRRs in it, and there is space to expand the hole in lowmem without
  moving more memory, then make the MMIO hole as large as possible.

- When placing RMRRs, find the next RMRR higher than the current base
  in the lowmem mmio hole.  If it overlaps, skip ahead of it and find
  the next one.

This certainly won't work in all cases, but it should work in a
significant number of cases.  Additionally, users should be able to
work around problems by setting mmio_hole_size larger in the guest
config.

Signed-off-by: George Dunlap george.dun...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v10:

* This is from George' draft patch which implements an acceptable solution in
  current cycle. Here I just implemented check_overlap_all() and some cleanups.

v9:

* A little improvement to code implementation but again, its still argued about
  this solution.

v8:

* Based on current discussion its hard to reshape the original mmio
  allocation mechanism but we haven't a good and simple way to in short term.
  So instead, we don't bring more complicated to intervene that process but
  still check any conflicts to disable all associated devices.

v6 ~ v7:

* Nothing is changed.

v5:

* Rename that field, is_64bar, inside struct bars with flag, and
  then extend to also indicate if this bar is already allocated.

v4:

* We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2.

  2. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3.

  3. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource-base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource-base to 
reserved_end
  just to reallocate this bar.

 tools/firmware/hvmloader/pci.c | 62 ++
 1 file changed, 62 insertions(+)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..f229a91 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,43 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;
 
+/* Check if any conflicts with all reserved device memory. */
+static bool check_overlap_all(uint64_t start, uint64_t size)
+{
+unsigned int i;
+
+for ( i = 0; i  memory_map.nr_map; i++ )
+{
+if ( memory_map.map[i].type == E820_RESERVED 
+ check_overlap(start, size,
+   memory_map.map[i].addr,
+   memory_map.map[i].size) )
+return true;
+}
+
+return false;
+}
+
+/* Find the lowest RMRR higher than base. */
+static int find_next_rmrr(uint32_t base)
+{
+unsigned int i;
+int next_rmrr = -1;
+uint64_t min_base = (1ull  32);
+
+for ( i = 0; i  memory_map.nr_map ; i++ )
+{
+if ( memory_map.map[i].type == E820_RESERVED 
+ memory_map.map[i].addr  base 
+ memory_map.map[i].addr  min_base )
+{
+next_rmrr = i;
+min_base = memory_map.map[i].addr;
+}
+}
+return next_rmrr;
+}
+
 void pci_setup(void)
 {
 uint8_t is_64bar, using_64bar, bar64_relocate = 0;
@@ -46,6 +83,7 @@ void pci_setup(void)
 uint32_t vga_devfn = 256;
 uint16_t class, 

[Xen-devel] [v10][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest

2015-07-20 Thread Tiejun Chen
Here we'll construct a basic guest e820 table via
XENMEM_set_memory_map. This table includes lowmem, highmem
and RDMs if they exist, and hvmloader would need this info
later.

Note this guest e820 table would be same as before if the
platform has no any RDM or we disable RDM (by default).

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v8 ~ v10:

* Nothing is changed.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Nothing is changed.

v5:

* Make this variable rdm_mem_boundary_memkb specific to .hvm 

v4:

* Separated from the previous patch to provide a parameter to set that
  predefined boundary dynamically.

 tools/libxl/libxl_arch.h |  7 
 tools/libxl/libxl_arm.c  |  8 +
 tools/libxl/libxl_dom.c  |  5 +++
 tools/libxl/libxl_x86.c  | 83 
 4 files changed, 103 insertions(+)

diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
index d04871c..939178a 100644
--- a/tools/libxl/libxl_arch.h
+++ b/tools/libxl/libxl_arch.h
@@ -49,4 +49,11 @@ int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
 _hidden
 int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq);
 
+/* arch specific to construct memory mapping function */
+_hidden
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+libxl_domain_config *d_config,
+uint32_t domid,
+struct xc_hvm_build_args *args);
+
 #endif
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index f09c860..1526467 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -926,6 +926,14 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t 
domid, int irq)
 return xc_domain_bind_pt_spi_irq(CTX-xch, domid, irq, irq);
 }
 
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+libxl_domain_config *d_config,
+uint32_t domid,
+struct xc_hvm_build_args *args)
+{
+return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index e41d54a..a8c6aa9 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 goto out;
 }
 
+if (libxl__arch_domain_construct_memmap(gc, d_config, domid, args)) {
+LOG(ERROR, setting domain memory map failed);
+goto out;
+}
+
 ret = hvm_build_set_params(ctx-xch, domid, info, state-store_port,
state-store_mfn, state-console_port,
state-console_mfn, state-store_domid,
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index ed2bd38..66b3d7f 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -438,6 +438,89 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t 
domid, int irq)
 }
 
 /*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+#define GUEST_LOW_MEM_START_DEFAULT 0x10
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+libxl_domain_config *d_config,
+uint32_t domid,
+struct xc_hvm_build_args *args)
+{
+int rc = 0;
+unsigned int nr = 0, i;
+/* We always own at least one lowmem entry. */
+unsigned int e820_entries = 1;
+struct e820entry *e820 = NULL;
+uint64_t highmem_size =
+args-highmem_end ? args-highmem_end - (1ull  32) : 0;
+
+/* Add all rdm entries. */
+for (i = 0; i  d_config-num_rdms; i++)
+if (d_config-rdms[i].policy != LIBXL_RDM_RESERVE_POLICY_INVALID)
+e820_entries++;
+
+
+/* If we should have a highmem range. */
+if (highmem_size)
+e820_entries++;
+
+if (e820_entries = E820MAX) {
+LOG(ERROR, Ooops! Too many entries in the memory map!\n);
+rc = ERROR_INVAL;
+goto out;
+}
+
+e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries);
+
+/* Low 

[Xen-devel] [v10][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy

2015-07-20 Thread Tiejun Chen
This patch extends the existing hypercall to support rdm reservation policy.
We return error or just throw out a warning message depending on whether
the policy is strict or relaxed when reserving RDM regions in pfn space.
Note in some special cases, e.g. add a device to hwdomain, and remove a
device from user domain, 'relaxed' is fine enough since this is always safe
to hwdomain.

CC: Tim Deegan t...@xen.org
CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
CC: Aravind Gopalakrishnan aravind.gopalakrish...@amd.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Stefano Stabellini stefano.stabell...@citrix.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: George Dunlap george.dun...@eu.citrix.com
Acked-by: Jan Beulich jbeul...@suse.com
---
v10:

* Nothing is changed.

v9:

* Correct one check condition of XEN_DOMCTL_DEV_RDM_RELAXED

v8:

* Force to pass 0(strict) when add or move a device in hardware domain,
  and improve some associated code comments.

v6 ~ v7:

* Nothing is changed.

v5:

* Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our flag, so
  0 means strict and 1 means relaxed.

* So make DT device ignore the flag field

* Improve the code comments

v4:

* Add code comments to describer why we fix to set a policy flag in some
  cases like adding a device to hwdomain, and removing a device from user 
domain.

* Avoid using fixed width types for the parameter of set_identity_p2m_entry()

* Fix one judging condition
  domctl-u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  - domctl-u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

* Add to range check the flag passed to make future extensions possible
  (and to avoid ambiguity on what out of range values would mean).

 xen/arch/x86/mm/p2m.c   |  7 --
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 ++-
 xen/drivers/passthrough/arm/smmu.c  |  2 +-
 xen/drivers/passthrough/device_tree.c   |  3 ++-
 xen/drivers/passthrough/pci.c   | 15 
 xen/drivers/passthrough/vtd/iommu.c | 37 ++---
 xen/include/asm-x86/p2m.h   |  2 +-
 xen/include/public/domctl.h |  3 +++
 xen/include/xen/iommu.h |  2 +-
 9 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 99a26ca..47785dc 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -901,7 +901,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, 
mfn_t mfn,
 }
 
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-   p2m_access_t p2ma)
+   p2m_access_t p2ma, unsigned int flag)
 {
 p2m_type_t p2mt;
 p2m_access_t a;
@@ -923,7 +923,10 @@ int set_identity_p2m_entry(struct domain *d, unsigned long 
gfn,
 ret = 0;
 else
 {
-ret = -EBUSY;
+if ( flag  XEN_DOMCTL_DEV_RDM_RELAXED )
+ret = 0;
+else
+ret = -EBUSY;
 printk(XENLOG_G_WARNING
Cannot setup identity map d%d:%lx,
 gfn already mapped to %lx.\n,
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index e83bb35..920b35a 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct 
domain *target,
 }
 
 static int amd_iommu_assign_device(struct domain *d, u8 devfn,
-   struct pci_dev *pdev)
+   struct pci_dev *pdev,
+   u32 flag)
 {
 struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev-seg);
 int bdf = PCI_BDF2(pdev-bus, devfn);
diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index 6cc4394..9a667e9 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct 
iommu_domain *domain)
 }
 
 static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
-  struct device *dev)
+  struct device *dev, u32 flag)
 {
struct iommu_domain *domain;
struct arm_smmu_xen_domain *xen_domain;
diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 5d3842a..7ff79f8 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct 
dt_device_node *dev)
 goto fail;
 }
 
-rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev));
+/* The flag field doesn't matter 

[Xen-devel] [v10][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy

2015-07-20 Thread Tiejun Chen
This patch passes rdm reservation policy to xc_assign_device() so the policy
is checked when assigning devices to a VM.

Note this also bring some fallout to python usage of xc_assign_device().

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
CC: David Scott dave.sc...@eu.citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6 ~ v10:

* Nothing is changed.

v5:

* Fix the flag field as 0 to DT device

v4:

* In the patch head description, I add to explain why we need to sync
  the xc.c file

 tools/libxc/include/xenctrl.h   |  3 ++-
 tools/libxc/xc_domain.c |  9 -
 tools/libxl/libxl_pci.c |  3 ++-
 tools/ocaml/libs/xc/xenctrl_stubs.c | 16 
 tools/python/xen/lowlevel/xc/xc.c   | 30 --
 5 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 9160623..89cbc5a 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2079,7 +2079,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
  uint32_t domid,
- uint32_t machine_sbdf);
+ uint32_t machine_sbdf,
+ uint32_t flag);
 
 int xc_get_device_group(xc_interface *xch,
  uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 0951291..ef41228 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch,
 int xc_assign_device(
 xc_interface *xch,
 uint32_t domid,
-uint32_t machine_sbdf)
+uint32_t machine_sbdf,
+uint32_t flag)
 {
 DECLARE_DOMCTL;
 
@@ -1705,6 +1706,7 @@ int xc_assign_device(
 domctl.domain = domid;
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
 domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+domctl.u.assign_device.flag = flag;
 
 return do_domctl(xch, domctl);
 }
@@ -1792,6 +1794,11 @@ int xc_assign_dt_device(
 
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
 domctl.u.assign_device.u.dt.size = size;
+/*
+ * DT doesn't own any RDM so actually DT has nothing to do
+ * for any flag and here just fix that as 0.
+ */
+domctl.u.assign_device.flag = 0;
 set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
 
 rc = do_domctl(xch, domctl);
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index e0743f8..632c15e 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 FILE *f;
 unsigned long long start, end, flags, size;
 int irq, i, rc, hvm = 0;
+uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 
 if (type == LIBXL_DOMAIN_TYPE_INVALID)
 return ERROR_FAIL;
@@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 
 out:
 if (!libxl_is_stubdom(ctx, domid, NULL)) {
-rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev));
+rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev), 
flag);
 if (rc  0  (hvm || errno != ENOSYS)) {
 LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, xc_assign_device failed);
 return ERROR_FAIL;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c 
b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 64f1137..b7de615 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1172,12 +1172,17 @@ CAMLprim value stub_xc_domain_test_assign_device(value 
xch, value domid, value d
CAMLreturn(Val_bool(ret == 0));
 }
 
-CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
+static int domain_assign_device_rdm_flag_table[] = {
+XEN_DOMCTL_DEV_RDM_RELAXED,
+};
+
+CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+value rflag)
 {
-   CAMLparam3(xch, domid, desc);
+   CAMLparam4(xch, domid, desc, rflag);
int ret;
int domain, bus, dev, func;
-   uint32_t sbdf;
+   uint32_t sbdf, flag;
 
domain = Int_val(Field(desc, 0));
bus = Int_val(Field(desc, 1));
@@ -1185,7 +1190,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, 
value domid, value desc)
func = Int_val(Field(desc, 3));
sbdf = encode_sbdf(domain, bus, dev, func);
 
-   ret = xc_assign_device(_H(xch), _D(domid), sbdf);
+   ret = Int_val(Field(rflag, 0));
+   flag = domain_assign_device_rdm_flag_table[ret];
+
+   ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
 
if (ret  0)

[Xen-devel] [v10][PATCH 14/16] xen/vtd: enable USB device assignment

2015-07-20 Thread Tiejun Chen
USB RMRR may conflict with guest BIOS region. In such case, identity
mapping setup is simply skipped in previous implementation. Now we
can handle this scenario cleanly with new policy mechanism so previous
hack code can be removed now.

CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v5 ~ v10:

* Nothing is changed.

v4:

* Refine the patch head description

 xen/drivers/passthrough/vtd/dmar.h  |  1 -
 xen/drivers/passthrough/vtd/iommu.c | 11 ++-
 xen/drivers/passthrough/vtd/utils.c |  7 ---
 3 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.h 
b/xen/drivers/passthrough/vtd/dmar.h
index af1feef..af205f5 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -129,7 +129,6 @@ do {\
 
 int vtd_hw_check(void);
 void disable_pmr(struct iommu *iommu);
-int is_usb_device(u16 seg, u8 bus, u8 devfn);
 int is_igd_drhd(struct acpi_drhd_unit *drhd);
 
 #endif /* _DMAR_H_ */
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index b5d658e..c8b0455 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2243,11 +2243,9 @@ static int reassign_device_ownership(
 /*
  * If the device belongs to the hardware domain, and it has RMRR, don't
  * remove it from the hardware domain, because BIOS may use RMRR at
- * booting time. Also account for the special casing of USB below (in
- * intel_iommu_assign_device()).
+ * booting time.
  */
-if ( !is_hardware_domain(source) 
- !is_usb_device(pdev-seg, pdev-bus, pdev-devfn) )
+if ( !is_hardware_domain(source) )
 {
 const struct acpi_rmrr_unit *rmrr;
 u16 bdf;
@@ -2300,13 +2298,8 @@ static int intel_iommu_assign_device(
 if ( ret )
 return ret;
 
-/* FIXME: Because USB RMRR conflicts with guest bios region,
- * ignore USB RMRR temporarily.
- */
 seg = pdev-seg;
 bus = pdev-bus;
-if ( is_usb_device(seg, bus, pdev-devfn) )
-return 0;
 
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
diff --git a/xen/drivers/passthrough/vtd/utils.c 
b/xen/drivers/passthrough/vtd/utils.c
index bd14c02..b8a077f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -29,13 +29,6 @@
 #include extern.h
 #include asm/io_apic.h
 
-int is_usb_device(u16 seg, u8 bus, u8 devfn)
-{
-u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-PCI_CLASS_DEVICE);
-return (class == 0xc03);
-}
-
 /* Disable vt-d protected memory registers. */
 void disable_pmr(struct iommu *iommu)
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v10][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr

2015-07-20 Thread Tiejun Chen
Currently we're intending to cover this kind of devices
with shared RMRR simply since the case of shared RMRR is
a rare case according to our previous experiences. But
late we can group these devices which shared rmrr, and
then allow all devices within a group to be assigned to
same domain.

CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v10:

* Noting is changed.

v9:

* Correct one indentation issue

v8:

* Merge two if{} as one if{}

* Add to print RMRR range info when stop assign a group device

v5 ~ v7:

* Nothing is changed.

v4:

* Refine one code comment.

 xen/drivers/passthrough/vtd/iommu.c | 30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index c8b0455..770e484 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2294,13 +2294,37 @@ static int intel_iommu_assign_device(
 if ( list_empty(acpi_drhd_units) )
 return -ENODEV;
 
+seg = pdev-seg;
+bus = pdev-bus;
+/*
+ * In rare cases one given rmrr is shared by multiple devices but
+ * obviously this would put the security of a system at risk. So
+ * we should prevent from this sort of device assignment.
+ *
+ * TODO: in the future we can introduce group device assignment
+ * interface to make sure devices sharing RMRR are assigned to the
+ * same domain together.
+ */
+for_each_rmrr_device( rmrr, bdf, i )
+{
+if ( rmrr-segment == seg 
+ PCI_BUS(bdf) == bus 
+ PCI_DEVFN2(bdf) == devfn 
+ rmrr-scope.devices_cnt  1 )
+{
+printk(XENLOG_G_ERR VTDPREFIX
+cannot assign %04x:%02x:%02x.%u
+with shared RMRR at %PRIx64 for Dom%d.\n,
+   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+   rmrr-base_address, d-domain_id);
+return -EPERM;
+}
+}
+
 ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
 if ( ret )
 return ret;
 
-seg = pdev-seg;
-bus = pdev-bus;
-
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 10/15] x86/altp2m: add remaining support routines.

2015-07-20 Thread Jan Beulich
 On 18.07.15 at 00:32, ravi.sah...@intel.com wrote:
 From: Jan Beulich [mailto:jbeul...@suse.com]
Sent: Thursday, July 16, 2015 2:34 AM

 On 16.07.15 at 11:16, ravi.sah...@intel.com wrote:
 From: Jan Beulich [mailto:jbeul...@suse.com]
Sent: Tuesday, July 14, 2015 7:32 AM
 On 14.07.15 at 02:14, edmund.h.wh...@intel.com wrote:
 @@ -2965,9 +3003,15 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
unsigned long gla,
  if ( npfec.write_access )
  {
  paging_mark_dirty(currd, mfn_x(mfn));
 +/* If p2m is really an altp2m, unlock here to avoid
 + lock
 ordering
 + * violation when the change below is propagated from
 + host p2m
 */
 +if ( ap2m_active )
 +__put_gfn(p2m, gfn);
  p2m_change_type_one(currd, gfn, p2m_ram_logdirty,
 p2m_ram_rw);

And this won't result in any races?

 No

To be honest I expected a little more than just no here. Now I have to ask -
why?

 
 Yes, I should have described it more than that :-)  so this part of the code 
 is handling the log dirty transition of the page, and this page permission 
 transition happens always on the hostp2m. Given the way the locking order is 
 setup (hostp2m-altp2m-list-lock-altp2m and there was a separate writeup and 
 discussion with George on that), at this point in this sequence there is a 
 p2m lock (whether it's a hostp2m or altp2m lock depends on the mode of the 
 domain) - the reason we have to drop the lock here first is due to what 
 happens next; the permission changes in hostp2m will be serially propagated 
 to altp2ms and not dropping the lock here would cause a locking order 
 violation. Hope that clarifies. 

Sadly it doesn't at all: You re-explain why you need to drop the lock,
while you fail to say anything on why this won't cause a race.

 +long p2m_init_altp2m_by_id(struct domain *d, uint16_t idx) {
 +long rc = -EINVAL;

Why long (for both variable and function return type)? (More of these
in functions below.)

 Because the error variable in the code that calls these (in hvm.c) is
 a long, and you had given feedback earlier to propagate the returns
 from these functions through that calling code.

I don't see the connection. The function only returns zero or -E...
values, so why would its return type be long?

 
 do_hvm_op declares a rc that is of type long and hence this returns a 
 long

What type your caller(s) return is of no interest at all here: What
would you do if you had multiple callers with differing return types?
A function's return type should be chosen based on the range of
values it may return, and the result possibly widened to not yield
inefficient code (like in some of the uint16_t cases elsewhere in the
series would be necessary).

 +void p2m_altp2m_propagate_change(struct domain *d, gfn_t gfn,
 + mfn_t mfn, unsigned int page_order,
 + p2m_type_t p2mt, p2m_access_t
 +p2ma) {
 +struct p2m_domain *p2m;
 +p2m_access_t a;
 +p2m_type_t t;
 +mfn_t m;
 +uint16_t i;
 +bool_t reset_p2m;
 +unsigned int reset_count = 0;
 +uint16_t last_reset_idx = ~0;
 +
 +if ( !altp2m_active(d) )
 +return;
 +
 +altp2m_list_lock(d);
 +
 +for ( i = 0; i  MAX_ALTP2M; i++ )
 +{
 +if ( d-arch.altp2m_eptp[i] == INVALID_MFN )
 +continue;
 +
 +p2m = d-arch.altp2m_p2m[i];
 +m = get_gfn_type_access(p2m, gfn_x(gfn), t, a, 0, NULL);
 +
 +reset_p2m = 0;
 +
 +/* Check for a dropped page that may impact this altp2m */
 +if ( mfn_x(mfn) == INVALID_MFN 
 + gfn_x(gfn) = p2m-min_remapped_gfn 
 + gfn_x(gfn) = p2m-max_remapped_gfn )
 +reset_p2m = 1;

Considering that this looks like an optimization, what's the downside
of possibly having min=0 and max=end-of-address-space? I.e.
can there a long latency operation result that's this way a guest can 
effect?


 ... A p2m is a gfn-mfn map, amongst other things. There is a reverse
 mfn-gfn map, but that is only valid for the host p2m. Unless the
 remap altp2m hypercall is used, the gfn-mfn map in every altp2m
 mirrors the gfn-mfn map in the host p2m (or a subset thereof, due to
 lazy-copy), so handling removal of an mfn from a guest is simple: do a
 reverse look up for the host p2m and mark the relevant gfn as invalid,
 then do the same for every altp2m where that gfn is currently valid.

 Remap changes things: it says take gfn1 and replace -mfn with the
 -mfn of gfn2. Here is where the optimization is used and the  invalidate
logic is:
 record the lowest and highest gfn2's that have been used in remap ops;
 if an mfn is dropped from the hostp2m, for the purposes of altp2m
 invalidation, see if the gfn derived from the host p2m reverse lookup
 falls within the range of used gfn2's. If it does, an invalidation is
 required. Which is why min and max are inited the way they are - hope
 the explanation clarifies this 

Re: [Xen-devel] [xen-unstable test] 59681: regressions - FAIL

2015-07-20 Thread Andrew Cooper
On 18/07/2015 13:07, osstest service owner wrote:
 flight 59681 xen-unstable real [real]
 http://logs.test-lab.xenproject.org/osstest/logs/59681/

 Regressions :-(

 Tests which did not succeed and are blocking,
 including tests which could not be run:
  test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install 
 fail REGR. vs. 58965
  test-amd64-i386-xl-qemuu-debianhvm-amd64 9 debian-hvm-install fail REGR. vs. 
 58965

Wei: Am I correct in remembering that you nominated these two tests for
being added to the allowable category?

~Andrew


 Regressions which are regarded as allowable (not blocking):
  test-armhf-armhf-xl-rtds  6 xen-boot  fail REGR. vs. 
 58965
  test-amd64-i386-rumpuserxen-i386 15 
 rumpuserxen-demo-xenstorels/xenstorels.repeat fail like 58948
  test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 12 guest-localmigrate 
 fail like 58965
  test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 
 58965
  test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail like 
 58965

 Tests which did not succeed, but are not blocking:
  test-amd64-i386-libvirt  12 migrate-support-checkfail   never 
 pass
  test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never 
 pass
  test-amd64-amd64-libvirt 12 migrate-support-checkfail   never 
 pass
  test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never 
 pass
  test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never 
 pass
  test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never 
 pass
  test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never 
 pass
  test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never 
 pass
  test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never 
 pass
  test-armhf-armhf-xl  12 migrate-support-checkfail   never 
 pass
  test-armhf-armhf-libvirt 12 migrate-support-checkfail   never 
 pass
  test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never 
 pass
  test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never 
 pass
  test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never 
 pass
  test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail never 
 pass
  test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never 
 pass

 version targeted for testing:
  xen  21d9b079e53805b68047d60d28cde224d09bbb40
 baseline version:
  xen  c40317f11b3f05e7c06a2213560c8471081f2662

 Last test of basis58965  2015-06-29 02:08:30 Z   19 days
 Failing since 58974  2015-06-29 15:11:59 Z   18 days   23 attempts
 Testing same since59681  2015-07-18 00:10:50 Z0 days1 attempts

 
 People who touched revisions under test:
   Andrew Cooper andrew.coop...@citrix.com
   Anthony PERARD anthony.per...@citrix.com
   Ard Biesheuvel a...@linaro.org
   Ben Catterall ben.catter...@citrix.com
   Boris Ostrovsky boris.ostrov...@oracle.com
   Chao Peng chao.p.p...@linux.intel.com
   Chen Baozi baoz...@gmail.com
   Daniel De Graaf dgde...@tycho.nsa.gov
   Dario Faggioli dario.faggi...@citrix.com
   David Scott dave.sc...@citrix.com
   David Vrabel david.vra...@citrix.com
   Dietmar Hahn dietmar.h...@ts.fujitsu.com
   Elena Ufimtseva elena.ufimts...@oracle.com
   Eric Shelton eshel...@pobox.com
   Euan Harris euan.har...@citrix.com
   Fabio Fantoni fabio.fant...@m2r.biz
   Feng Wu feng...@intel.com
   George Dunlap george.dun...@eu.citrix.com
   Ian Campbell ian,campb...@citrix.com
   Ian Campbell ian.campb...@citrix.com
   Ian Jackson ian.jack...@eu.citrix.com
   Jan Beulich jbeul...@suse.com
   Jennifer Herbert jennifer.herb...@citrix.com
   Jim Fehlig jfeh...@suse.com
   Juergen Gross jgr...@suse.com
   Julien Grall julien.gr...@citrix.com
   Julien Grall julien.gr...@linaro.org
   Kevin Tian kevin.t...@intel.com
   Liang Li liang.z...@intel.com
   Paul Durrant paul.durr...@citrix.com
   Razvan Cojocaru rcojoc...@bitdefender.com
   Rob Hoes rob.h...@citrix.com
   Roger Pau MonnĂƒÂ© roger@citrix.com
   Ross Lagerwall ross.lagerw...@citrix.com
   Samuel Thibault samuel.thiba...@ens-lyon.org
   Sander Eikelenboom li...@eikelenboom.it
   Tamas K Lengyel tleng...@novetta.com
   Thomas Leonard tal...@gmail.com
   Tiejun Chen tiejun.c...@intel.com
   Tim Deegan t...@xen.org
   Vitaly Kuznetsov vkuzn...@redhat.com
   Wei Liu wei.l...@citrix.com
   Wen Congyang we...@cn.fujitsu.com
   Yang Hongyang yan...@cn.fujitsu.com
   Yang Zhang yang.z.zh...@intel.com

 jobs:
  build-amd64-xsm  pass
  build-armhf-xsm  pass
  build-i386-xsm   pass
  build-amd64  pass
  build-armhf  

Re: [Xen-devel] [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-20 Thread Chen, Tiejun

Before you add memory_map.nr_map, you should be able to iterate
from 0 to (not inclusive) nr. At least as far as I recall the original
patch.



Sorry, I really don't understand what you want.

Before we add memory_map.nr_map, e820[0, nr) don't include low/high
memory, right?


Why? memory_map is representing the reserved areas only, isn't it?
If that's not the case, then of course everything is fine.


I'm pretty sure the memory map we get here is an extension of the
original PV-only get_e820 hypercall, which *does* include both the
lowmem and highmem regions.

In any case, it's pretty clear from the patched code that Tiejun is
removing the old code which created the lowmem and highmem regions and
is not replacing it.  Where do you think the highmem region he's
looking for was coming from?



On second thoughts, I prefer to check/sync memory_map.map[] before copy 
them into e820 since ultimately this can make sure hvm_info, 
memory_map.map[] and e820 are on the same page.


Anyway, I would send out v10 to address others so please further post 
your comments over there.


Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 05/13] x86/altp2m: basic data structures and support routines.

2015-07-20 Thread Jan Beulich
 On 18.07.15 at 00:39, ravi.sah...@intel.com wrote:
 From: Jan Beulich [mailto:jbeul...@suse.com]
Sent: Thursday, July 16, 2015 2:02 AM

 On 16.07.15 at 10:48, ravi.sah...@intel.com wrote:
 From: Jan Beulich [mailto:jbeul...@suse.com]
Sent: Tuesday, July 14, 2015 1:53 AM
 On 14.07.15 at 02:01, ravi.sah...@intel.com wrote:
From: Jan Beulich [mailto:jbeul...@suse.com]
Sent: Monday, July 13, 2015 1:01 AM
 On 10.07.15 at 23:48, ravi.sah...@intel.com wrote:
 From: Jan Beulich [mailto:jbeul...@suse.com]
Sent: Thursday, July 09, 2015 6:30 AM
 On 01.07.15 at 20:09, edmund.h.wh...@intel.com wrote:
 @@ -294,6 +298,12 @@ struct arch_domain
  struct p2m_domain *nested_p2m[MAX_NESTEDP2M];
  mm_lock_t nested_p2m_lock;

 +/* altp2m: allow multiple copies of host p2m */
 +bool_t altp2m_active;
 +struct p2m_domain *altp2m_p2m[MAX_ALTP2M];
 +mm_lock_t altp2m_lock;
 +uint64_t *altp2m_eptp;

This is a non-insignificant increase of the structure size -
perhaps all of these should hang off of struct arch_domain via a
single, separately allocated pointer?

 Is this a nice-to-have - again we modelled after the nestedhvm
 extensions to the struct.
 This will affect a lot of our patch without really changing how
 much memory is allocated.

I understand that. To a certain point I can agree to limit changes
to what is there at this stage. But you wanting to avoid addressing
concerns basically everywhere it's not a bug overextends this. Just
because the series was submitted late doesn't mean you should now
expect us to give in on any controversy regarding aspects we would
normally expect to be changed. This would basically encourage others
to submit their stuff late too in the
 future,
hoping for relaxed review.


 Couple things - first, we have absorbed a lot of (good) feedback -
 thanks for that.
 Second, I don't think the series can be characterized as late
 (feedback from others welcome).
 V1 had almost the same structure and was submitted in January.

Still we're at v3 only here, not v10 or beyond.

 On this change - this will be a lot of effects on the code and we
 would like to avoid this one.

While this may be a lot of mechanical change, I don't this presenting
any
 major
risk of breaking the code.

 On this one specific advice on how and where to implement such a
 change would be great just so that we don't thrash on this change.

I don't follow - what to do here was said quite explicitly (still visible in 
 the
context above). I.e. I have no idea what additional advice you seek.
 
 Ok that's fine - sorry if this was unclear - I was seeking if you had some 
 specific feedback on how to allocate and manage the dynamic altp2m struct etc 
 (if you had an opinion would be good to hear that).

xmalloc() / xzalloc(). What other alternatives would you see?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Preface working plan for altp2m during freeze exception

2015-07-20 Thread Jan Beulich
 On 17.07.15 at 21:43, ravi.sah...@intel.com wrote:
 We are working on addressing review comments in this order (and you will see 
 this pattern in our review responses):
 
 * Category 1 - Address review comments that affect ABI - these are of course 
 required and will be addressed first.
 
 * Category 2 - Address review comments that do not affect ABI - we will try 
 to 
 address ones that we think we can realistically meet within the time bounds - 
 we ask you for some flexibility on these. If these cannot be addressed within 
 the allotted exception time-frame, hopefully these wont be blockers for 4.6 
 since they can be addressed by follow-on patches.

Not sure - we've had bad experience with allowing code to go in with
the promise for later adjustments (which then never happened)...

 * Category 3 - Address review comments that are really design questions - 
 These we will try to address by short descriptions in review replies that 
 attempt to give a gist of the design we followed, but of course design 
 changes obviously cannot be done at this late stage - hopefully that is 
 expected.

If you really just mean questions on the design (rather than questions
possibly resulting int the requirement to change the design), then
that'd be fine of course. I think you understand that we shouldn't be
deferring issues that require design adjustments. Otoh I don't even
recall what design questions there were.

 * Category 4 - Address trivial changes as we naturally update patches, 
 however if we run out of time, some may remain un-addressed (to be taken care 
 of post 4.6).

See above (point 2).

 Can we please get a yes - makes sense sort of acknowledgement of this plan 
 from the Maintainers? 

Considering the limitations above, this is only a maybe from me.

 Y N   [PATCH v3 05/15] x86/altp2m: basic data structures and 
 support routines
 Status if not acked:  Category 3: we will write a short description of some 
 design questions in review replies
   Category 2: moving altp2m struct to be dynamically 
 allocated - this has minor 
 benefit and big downside so will be lower priority, also some error handling 
 fixes

Big downside? You're not referring to the mechanical adjustments this
implies, are you?

 Y N   [PATCH v3 07/15] VMX: add VMFUNC leaf 0 (EPTP 
 switching) to emulator
 Status if not acked:  Category 2/3 - changes staged after Jan's feedback on 
 v5 - 
 ack with those addressed in v6?

Let's see what v6 looks like.

 Y N   [PATCH v3 08/15] x86/altp2m: add control of suppress_ve
 Status if not acked:  Now has r-b both Jan and George -  need maintainers ack 
 on 
 this one please

Who do you refer to by maintainer here? I think the trivial
adjustment to xen/arch/x86/mm/mem_sharing.c can in the worst
case go in without Andres' ack. And everything else is covered
by George's authorship and review.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 59721: regressions - trouble: broken/fail/pass

2015-07-20 Thread Ian Campbell
On Sun, 2015-07-19 at 10:19 +, osstest service owner wrote:
 flight 59721 xen-unstable real [real]
 http://logs.test-lab.xenproject.org/osstest/logs/59721/
 
 Regressions :-(
 
 Tests which did not succeed and are blocking,
 including tests which could not be run:
  test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install 
 fail REGR. vs. 58965

I think we have considered discounting i386 qemu-dm as a thing we care
about, in which case we could force push this change since this was the
only fail..


 
 Tests which are failing intermittently (not blocking):
  test-amd64-i386-freebsd10-i386  3 host-install(3) broken pass in 
 59699
  test-amd64-i386-qemuu-rhel6hvm-amd  3 host-install(3) broken pass in 
 59699
  test-amd64-i386-rumpuserxen-i386  3 host-install(3)   broken pass in 
 59699
  test-amd64-i386-qemut-rhel6hvm-intel  3 host-install(3)   broken pass in 
 59699
  test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 3 host-install(3) 
 broken pass in 59699
  test-amd64-i386-xl3 host-install(3)   broken pass in 
 59699
  test-amd64-i386-xl-qemut-debianhvm-amd64 3 host-install(3) broken pass in 
 59699
  test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 3 host-install(3) broken pass 
 in 59699
  test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 3 host-install(3) broken pass in 
 59699
  test-amd64-i386-xl-qemut-win7-amd64  3 host-install(3)broken pass in 
 59699
  test-amd64-i386-xl-qemuu-win7-amd64  3 host-install(3)broken pass in 
 59699
  test-amd64-i386-xl-qemut-winxpsp3  3 host-install(3)  broken pass in 
 59699
  test-amd64-i386-xl-qemuu-winxpsp3  3 host-install(3)  broken pass in 
 59699
  test-armhf-armhf-xl-multivcpu 14 guest-start.2 fail in 59699 pass in 
 59721
  test-amd64-amd64-xl   9 debian-install  fail pass in 
 59699

Some of these host-install fails are a bit worrying, but not a blocker
to a forced push IMHO.

 version targeted for testing:
  xen  21d9b079e53805b68047d60d28cde224d09bbb40
 baseline version:
  xen  c40317f11b3f05e7c06a2213560c8471081f2662
 
 Last test of basis58965  2015-06-29 02:08:30 Z   20 days
 Failing since 58974  2015-06-29 15:11:59 Z   19 days   25 attempts
 Testing same since59681  2015-07-18 00:10:50 Z1 days3 attempts

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] design philosophy of blktap

2015-07-20 Thread George Dunlap
On Mon, Jul 20, 2015 at 6:01 AM, Xuehan Xu xxhdx1985...@gmail.com wrote:
 Hi, everyone.

 I don't quite follow the design philosophy of blktap. Since every virtual
 disk is backed by a tapdisk process, when there are hundreds of domU running
 and doing I/O operation simultaneously, which means that hundreds of tapdisk
 process are doing I/O read/write simulataneously in dom0, won't the I/O
 performance of domU be hurt badly?

I think Felipe (cc'd) might be the best person to answer this sort of question.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/5] xl: Command line: Support -h everywhere

2015-07-20 Thread Wei Liu
On Fri, Jul 17, 2015 at 06:00:49PM +0100, Ian Jackson wrote:
 xl subcommands ought all to take -h.  def_getopt and hence
 SWITCH_FOREACH_OPT already handles 'h' by calling helpstr.  None of
 the call sites see the 'h'.
 
 In this patch:
 
  * Change SWITCH_FOREACH_OPT to always add a h to the short opts
string, using string concatenation.
 
  * Remove the now-redundant h's from some existing option strings.
 
 Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com

Acked-by: Wei Liu wei.l...@citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/4] Fix to libxl migration v2 issue blocking OSSTest

2015-07-20 Thread Wei Liu
On Fri, Jul 17, 2015 at 05:51:14PM +0100, Andrew Cooper wrote:
 And three improvements to debugging.
 
 Note that there is still a bug in libxl__toolstack_save() which
 valgrind identified, but I do not wish to block this bugfix on that
 
 Andrew Cooper (4):
   tools/libxc: Identify the path of the kernel image which cannot be
 found
   tools/libxl: Log the subject fd in datacopier messages
   tools/libxl: Identify copywhat in stream v2 datacopiers

I think all three patches should wait until next development window
opens unless we have nothing else in our queue (which doesn't seem to be
the case at the moment).

Wei.

   tools/libxl: Initialise the fd of the unused half of a datacopier
 
  tools/libxc/xg_private.c |2 +-
  tools/libxl/libxl_aoutils.c  |   24 
  tools/libxl/libxl_stream_read.c  |   19 +++
  tools/libxl/libxl_stream_write.c |5 -
  4 files changed, 28 insertions(+), 22 deletions(-)
 
 -- 
 1.7.10.4
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/5] xl: Command line: Remove maximum argument limit for network-attach

2015-07-20 Thread Wei Liu
On Fri, Jul 17, 2015 at 06:00:48PM +0100, Ian Jackson wrote:
 This limit of 11 has been in this function since it was written, but
 serves no purpose.  The extra arguments are fed one by one to
 parse_nic_config, and it is possible to have as many as you like.
 
 Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com

Acked-by: Wei Liu wei.l...@citrix.com

 ---
  tools/libxl/xl_cmdimpl.c |5 -
  1 file changed, 5 deletions(-)
 
 diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
 index 55c041c..770b71c 100644
 --- a/tools/libxl/xl_cmdimpl.c
 +++ b/tools/libxl/xl_cmdimpl.c
 @@ -6373,11 +6373,6 @@ int main_networkattach(int argc, char **argv)
  /* No options */
  }
  
 -if (argc-optind  11) {
 -help(network-attach);
 -return 0;
 -}
 -
  domid = find_domain(argv[optind]);
  
  config= xlu_cfg_init(stderr, command line);
 -- 
 1.7.10.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 5/5] xl: Command line: Support xl vcpu-set --help

2015-07-20 Thread Wei Liu
On Fri, Jul 17, 2015 at 06:00:51PM +0100, Ian Jackson wrote:
 This ended with a literal sentinel.  Use COMMON_LONG_OPTIONS (which
 mentions --help) instead.
 
 Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com

Acked-by: Wei Liu wei.l...@citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/4] tools/libxl: Log the subject fd in datacopier messages

2015-07-20 Thread Wei Liu
On Fri, Jul 17, 2015 at 05:51:16PM +0100, Andrew Cooper wrote:
 This is a substantial aid to debugging
 
 Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
 CC: Ian Campbell ian.campb...@citrix.com
 CC: Ian Jackson ian.jack...@eu.citrix.com
 CC: Wei Liu wei.l...@citrix.com

Acked-by: Wei Liu wei.l...@citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/4] tools/libxl: Identify copywhat in stream v2 datacopiers

2015-07-20 Thread Wei Liu
On Fri, Jul 17, 2015 at 05:51:17PM +0100, Andrew Cooper wrote:
 This is an aid to debugging
 
 Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
 CC: Ian Campbell ian.campb...@citrix.com
 CC: Ian Jackson ian.jack...@eu.citrix.com
 CC: Wei Liu wei.l...@citrix.com

Acked-by: Wei Liu wei.l...@citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] blktap2: update connection handling to fix build with gcc5

2015-07-20 Thread Wei Liu
On Mon, Jul 20, 2015 at 11:16:34AM +0200, Olaf Hering wrote:
 On Mon, Jul 20, Jan Beulich wrote:
 
   On 19.07.15 at 11:33, o...@aepfle.de wrote:
   [  198s] block-log.c:549:23: error: array subscript is above array bounds 
   [-Werror=array-bounds]
   [  198s]  if (s-connections[i].id == id)
   [  198s]^
  
  So what makes the compiler right with that complaint? I.e. how does
  it know i  0 here? After all - afaict - s-connected can only be 0 or
 
 It has to assume that -connected can get any value because the input
 comes from outside the unit.
 
 To reduce the patch size  i  MAX_CONNECTIONS could be added.
 

A smaller patch would be preferable at this stage.

Wei.

 Olaf

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 59681: regressions - FAIL

2015-07-20 Thread Andrew Cooper
On 20/07/2015 09:24, Wei Liu wrote:
 On Mon, Jul 20, 2015 at 09:18:48AM +0100, Andrew Cooper wrote:
 On 18/07/2015 13:07, osstest service owner wrote:
 flight 59681 xen-unstable real [real]
 http://logs.test-lab.xenproject.org/osstest/logs/59681/

 Regressions :-(

 Tests which did not succeed and are blocking,
 including tests which could not be run:
  test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install 
 fail REGR. vs. 58965
  test-amd64-i386-xl-qemuu-debianhvm-amd64 9 debian-hvm-install fail REGR. 
 vs. 58965
 Wei: Am I correct in remembering that you nominated these two tests for 
 being added to the allowable category?

 Only the first one should be marked allowable.

 The second failure looks spurious. In the next xen-unstable run, it
 passed.

Ah yes - so it did.  Should we do a force push then?  It appears that
subsequent runs started hitting problems when cloning repos to build.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 59681: regressions - FAIL

2015-07-20 Thread Wei Liu
On Mon, Jul 20, 2015 at 09:18:48AM +0100, Andrew Cooper wrote:
 On 18/07/2015 13:07, osstest service owner wrote:
  flight 59681 xen-unstable real [real]
  http://logs.test-lab.xenproject.org/osstest/logs/59681/
 
  Regressions :-(
 
  Tests which did not succeed and are blocking,
  including tests which could not be run:
   test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install 
  fail REGR. vs. 58965
   test-amd64-i386-xl-qemuu-debianhvm-amd64 9 debian-hvm-install fail REGR. 
  vs. 58965
 
 Wei: Am I correct in remembering that you nominated these two tests for 
 being added to the allowable category?
 

Only the first one should be marked allowable.

The second failure looks spurious. In the next xen-unstable run, it
passed.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] VT-d: add iommu=igfx_off option to workaround graphics issues

2015-07-20 Thread Andrew Cooper
On 19/07/2015 16:53, Julien Grall wrote:
 Hi,

 On 17/07/2015 20:05, Ting-Wei Lan wrote:
 When using Linux = 3.19 (commit 47591df) as dom0 on some Intel Ironlake
 devices, It is possible to encounter graphics issues that make screen
 unreadable or crash the system. It was reported in freedesktop bugzilla:

 https://bugs.freedesktop.org/show_bug.cgi?id=90037

 As we still cannot find a proper fix for this problem, this patch adds
 iommu=igfx_off option that is similar to Linux intel_iommu=igfx_off for
 users to manually workaround the problem.

 I would prefer to see the introduction of intel_iommu parameter in the
 vtd code rather than extending again the iommu parameter with intel
 specific options in the common code which is both for ARM and x86.

Having the parsing for this option inside some x86 ifdefary, fine, but
lets not have multiple iommu= options.  It is hard enough for people to
drive one option (as people routinely get Xen and Linux command line
options mixed up).

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable test] 59759: regressions - trouble: blocked/broken/fail/pass

2015-07-20 Thread osstest service owner
flight 59759 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59759/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-i386-libvirt3 host-install(3) broken REGR. vs. 58965
 build-i386-rumpuserxen5 rumpuserxen-build fail REGR. vs. 58965
 build-armhf-pvops 5 kernel-build  fail REGR. vs. 58965
 build-amd64-pvops 5 kernel-build  fail REGR. vs. 58965
 build-i386-pvops  5 kernel-build  fail REGR. vs. 58965

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked 
n/a
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemut-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-amd64-xl-pvh-amd   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-qemuu-rhel6hvm-intel  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm  1 build-check(1)blocked n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvh-intel  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-xsm1 build-check(1)   blocked  n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemut-debianhvm-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-qemut-rhel6hvm-intel  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked 
n/a
 test-amd64-i386-xl1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm  1 build-check(1)blocked n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64  1 build-check(1) blocked n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-debianhvm-amd64  1 build-check(1)blocked n/a
 test-armhf-armhf-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-amd64-pair 1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-winxpsp3  1 build-check(1)   blocked n/a
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemut-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-amd64-xl-qemut-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-winxpsp3  1 build-check(1)   blocked n/a
 test-amd64-i386-xl-qemuu-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemut-winxpsp3  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-winxpsp3  1 build-check(1)   blocked  n/a

version targeted for testing:
 xen  21d9b079e53805b68047d60d28cde224d09bbb40
baseline version:
 xen  

Re: [Xen-devel] [xen-unstable test] 59721: regressions - trouble: broken/fail/pass

2015-07-20 Thread Jan Beulich
 On 20.07.15 at 10:48, ian.campb...@citrix.com wrote:
 On Sun, 2015-07-19 at 10:19 +, osstest service owner wrote:
 flight 59721 xen-unstable real [real]
 http://logs.test-lab.xenproject.org/osstest/logs/59721/ 
 
 Regressions :-(
 
 Tests which did not succeed and are blocking,
 including tests which could not be run:
  test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install 
 fail REGR. 
 vs. 58965
 
 I think we have considered discounting i386 qemu-dm as a thing we care
 about, in which case we could force push this change since this was the
 only fail..

Right, together with 59699 - as also just said be Wei - I agree:
please go ahead.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] blktap2: update connection handling to fix build with gcc5

2015-07-20 Thread Olaf Hering
On Mon, Jul 20, Jan Beulich wrote:

  On 19.07.15 at 11:33, o...@aepfle.de wrote:
  [  198s] block-log.c:549:23: error: array subscript is above array bounds 
  [-Werror=array-bounds]
  [  198s]  if (s-connections[i].id == id)
  [  198s]^
 
 So what makes the compiler right with that complaint? I.e. how does
 it know i  0 here? After all - afaict - s-connected can only be 0 or

It has to assume that -connected can get any value because the input
comes from outside the unit.

To reduce the patch size  i  MAX_CONNECTIONS could be added.

Olaf

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] VT-d: add iommu=igfx_off option to workaround graphics issues

2015-07-20 Thread Julien Grall
On 20/07/15 09:27, Andrew Cooper wrote:
 On 19/07/2015 16:53, Julien Grall wrote:
 Hi,

 On 17/07/2015 20:05, Ting-Wei Lan wrote:
 When using Linux = 3.19 (commit 47591df) as dom0 on some Intel Ironlake
 devices, It is possible to encounter graphics issues that make screen
 unreadable or crash the system. It was reported in freedesktop bugzilla:

 https://bugs.freedesktop.org/show_bug.cgi?id=90037

 As we still cannot find a proper fix for this problem, this patch adds
 iommu=igfx_off option that is similar to Linux intel_iommu=igfx_off for
 users to manually workaround the problem.

 I would prefer to see the introduction of intel_iommu parameter in the
 vtd code rather than extending again the iommu parameter with intel
 specific options in the common code which is both for ARM and x86.
 
 Having the parsing for this option inside some x86 ifdefary, fine, but
 lets not have multiple iommu= options.  It is hard enough for people to
 drive one option (as people routinely get Xen and Linux command line
 options mixed up).

If you are concerned about mixing Xen and Linux command line options, it
would be more sensible to use intel_iommu=igfx_off on Xen to keep the
options exactly the same as Linux.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] blktap2: update connection handling to fix build with gcc5

2015-07-20 Thread Jan Beulich
 On 19.07.15 at 11:33, o...@aepfle.de wrote:
 blktap2 fails to build with gcc5 because it fails to recognize that
 there can be just one active connection (enforced in ctl_accept).
 
 Rearrange the code to handle just a single connection.
 Adjust two strerror calls to use errno instead -1 as input.
 
 [  198s] block-log.c: In function 'ctl_close_sock':
 [  198s] block-log.c:363:23: error: array subscript is above array bounds 
 [-Werror=array-bounds]
 [  198s]  if (s-connections[i].fd == fd) {
 [  198s]^
 [  198s] block-log.c: In function 'ctl_request':
 [  198s] block-log.c:549:23: error: array subscript is above array bounds 
 [-Werror=array-bounds]
 [  198s]  if (s-connections[i].id == id)
 [  198s]^

So what makes the compiler right with that complaint? I.e. how does
it know i  0 here? After all - afaict - s-connected can only be 0 or
1, and hence either the loop bodies don't get entered at all or the
loops have only a single iteration. Smells like a compiler bug instead,
which I'm not sure we want to work around with a non-trivial patch
like this.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xen/mmu: Copy and revector the P2M tree.

2015-07-20 Thread Dan Carpenter
Hello Konrad Rzeszutek Wilk,

The patch 7f9140626c75: xen/mmu: Copy and revector the P2M tree.
from Jul 26, 2012, leads to the following static checker warning:

arch/x86/xen/mmu.c:1105 xen_cleanhighmap()
warn: potential pointer math issue ('level2_kernel_pgt' is pointer to 
unsigned long)

arch/x86/xen/mmu.c
  1096  #ifdef CONFIG_X86_64
  1097  static void __init xen_cleanhighmap(unsigned long vaddr,
  1098  unsigned long vaddr_end)
  1099  {
  1100  unsigned long kernel_end = roundup((unsigned long)_brk_end, 
PMD_SIZE) - 1;
  1101  pmd_t *pmd = level2_kernel_pgt + pmd_index(vaddr);
  1102  
  1103  /* NOTE: The loop is more greedy than the cleanup_highmap 
variant.
  1104   * We include the PMD passed in on _both_ boundaries. */
  1105  for (; vaddr = vaddr_end  (pmd  (level2_kernel_pgt + 
PAGE_SIZE));
 
^
This pointer math is weird because we typically think of PAGE_SIZE as
a number of bytes but since level2_kernel_pgt is a pointer to unsigned
long, it looks like this loop can go through more iterations than
intended.

  1106  pmd++, vaddr += PMD_SIZE) {
  1107  if (pmd_none(*pmd))
  1108  continue;
  1109  if (vaddr  (unsigned long) _text || vaddr  kernel_end)
  1110  set_pmd(pmd, __pmd(0));
    }
  1112  /* In case we did something silly, we should crash in this 
function
  1113   * instead of somewhere later and be confusing. */
  1114  xen_mc_flush();
  1115  }

regards,
dan carpenter

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] blktap2: update connection handling to fix build with gcc5

2015-07-20 Thread Jan Beulich
 On 20.07.15 at 11:16, o...@aepfle.de wrote:
 On Mon, Jul 20, Jan Beulich wrote:
 
  On 19.07.15 at 11:33, o...@aepfle.de wrote:
  [  198s] block-log.c:549:23: error: array subscript is above array bounds 
 [-Werror=array-bounds]
  [  198s]  if (s-connections[i].id == id)
  [  198s]^
 
 So what makes the compiler right with that complaint? I.e. how does
 it know i  0 here? After all - afaict - s-connected can only be 0 or
 
 It has to assume that -connected can get any value because the input
 comes from outside the unit.

But then the warning can't be is but only may be, and a similar
warning would appear for _all_ other array accesses with compile
time known upper bound (and that's clearly not the case). I'm
guessing that the compiler wrongly applies a heuristic that every
loop over an array has at least two iteration (or some such).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/4] Fix to libxl migration v2 issue blocking OSSTest

2015-07-20 Thread Andrew Cooper
On 20/07/15 10:56, Wei Liu wrote:
 On Fri, Jul 17, 2015 at 05:51:14PM +0100, Andrew Cooper wrote:
 And three improvements to debugging.

 Note that there is still a bug in libxl__toolstack_save() which
 valgrind identified, but I do not wish to block this bugfix on that

 Andrew Cooper (4):
   tools/libxc: Identify the path of the kernel image which cannot be
 found
   tools/libxl: Log the subject fd in datacopier messages
   tools/libxl: Identify copywhat in stream v2 datacopiers
 I think all three patches should wait until next development window
 opens unless we have nothing else in our queue (which doesn't seem to be
 the case at the moment).

You mean delay until 4.7? I disagree.  Without these fixes, debugging
issues is substantially harder than they need to be.

They literally are only adding extra information into existing error
messages.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] VT-d: add iommu=igfx_off option to workaround graphics issues

2015-07-20 Thread Jan Beulich
 On 17.07.15 at 21:05, lant...@gmail.com wrote:
 @@ -87,6 +88,8 @@ static void __init parse_iommu_param(char *s)
  force_iommu = val;
  else if ( !strcmp(s, workaround_bios_bug) )
  iommu_workaround_bios_bug = val;
 +else if ( !strcmp(s, igfx_off) )
 +iommu_igfx_off = val;

Just FTR (moot with both Kevin's and Andrew's replies, which I agree
with) this should be igfx, with the intended use then being
iommu=no-igfx.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen: release lock occasionally during ballooning

2015-07-20 Thread David Vrabel
On 10/07/15 15:42, Juergen Gross wrote:
 When dom0 is being ballooned balloon_process() will hold the balloon
 mutex until it is finished. This will block e.g. creation of new
 domains as the device backends for the new domain need some
 autoballooned pages for the ring buffers.
 
 Avoid this by releasing the balloon mutex from time to time during
 ballooning. Add a state variable to indicate one balloon_process()
 is active to avoid multiple balloon processes fighting for the mutex.

Is this state variable necessary? balloon_process() is a work item so
there should only be one instance of it running anyway, yes?

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 59699: regressions - FAIL

2015-07-20 Thread Wei Liu
On Sat, Jul 18, 2015 at 11:25:48PM +, osstest service owner wrote:
 flight 59699 xen-unstable real [real]
 http://logs.test-lab.xenproject.org/osstest/logs/59699/
 
 Regressions :-(
 
 Tests which did not succeed and are blocking,
 including tests which could not be run:
  test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install 
 fail REGR. vs. 58965
 

This one should be marked allowable, because as far as I can remember it
failed when I first added it to OSSTest. It somehow passed. And now it
is blocking the push.

There are two more flights (59721, 59736) that tested the same commit,
but they failed due to infrastructure problems. So those two reports
should be ignored.

In light of this test being allowable to fail, we don't have any
blocking test failure in this flight. I suggest we do a force push.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] xl: fix vcpus to vnode assignement in config file

2015-07-20 Thread Dario Faggioli
In fact, right now, if the vcpus= list (where the
user specifies what vcpus should be part of a vnode)
has multiple elements, things don't work.
E.g., the following examples all result in failure
to create the guest:

 [ pnode=0,size=512,vcpus=0,2,vdistances=10,20  ]
 [ pnode=0,size=512,vcpus=0-1,4,vdistances=10,20  ]

Reason is we need either a multidimentional array,
or a bitmap, to temporary store the vcpus of a
vnode, while parsing the vnuma config entry.

Let's use the latter, which happens to also make it
easier to copy the outcome of the parsing to its
final destination in b_info, if everything goes ok.

Signed-off-by: Dario Faggioli dario.faggi...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
---
Changes from v1:
 * fix coding style
---
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Ian Campbell ian.campb...@citrix.com
---
 tools/libxl/xl_cmdimpl.c |   34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 8cbf30e..b41874a 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1076,9 +1076,7 @@ static void parse_vnuma_config(const XLU_Config *config,
 /* Temporary storage for parsed vcpus information to avoid
  * parsing config twice. This array has num_vnuma elements.
  */
-struct vcpu_range_parsed {
-unsigned long start, end;
-} *vcpu_range_parsed;
+libxl_bitmap *vcpu_parsed;
 
 libxl_physinfo_init(physinfo);
 if (libxl_get_physinfo(ctx, physinfo) != 0) {
@@ -1095,7 +1093,14 @@ static void parse_vnuma_config(const XLU_Config *config,
 
 b_info-num_vnuma_nodes = num_vnuma;
 b_info-vnuma_nodes = xcalloc(num_vnuma, sizeof(libxl_vnode_info));
-vcpu_range_parsed = xcalloc(num_vnuma, sizeof(*vcpu_range_parsed));
+vcpu_parsed = xcalloc(num_vnuma, sizeof(libxl_bitmap));
+for (i = 0; i  num_vnuma; i++) {
+libxl_bitmap_init(vcpu_parsed[i]);
+if (libxl_cpu_bitmap_alloc(ctx, vcpu_parsed[i], b_info-max_vcpus)) {
+fprintf(stderr, libxl_node_bitmap_alloc failed.\n);
+exit(1);
+}
+}
 
 for (i = 0; i  b_info-num_vnuma_nodes; i++) {
 libxl_vnode_info *p = b_info-vnuma_nodes[i];
@@ -1165,12 +1170,14 @@ static void parse_vnuma_config(const XLU_Config *config,
 split_string_into_string_list(value, ,, cpu_spec_list);
 len = libxl_string_list_length(cpu_spec_list);
 
-for (j = 0; j  len; j++)
+for (j = 0; j  len; j++) {
 parse_range(cpu_spec_list[j], s, e);
+for (; s = e; s++) {
+libxl_bitmap_set(vcpu_parsed[i], s);
+max_vcpus++;
+}
+}
 
-vcpu_range_parsed[i].start = s;
-vcpu_range_parsed[i].end   = e;
-max_vcpus += (e - s + 1);
 libxl_string_list_dispose(cpu_spec_list);
 } else if (!strcmp(vdistances, option)) {
 libxl_string_list vdist;
@@ -1209,17 +1216,12 @@ static void parse_vnuma_config(const XLU_Config *config,
 
 for (i = 0; i  b_info-num_vnuma_nodes; i++) {
 libxl_vnode_info *p = b_info-vnuma_nodes[i];
-int cpu;
 
-libxl_cpu_bitmap_alloc(ctx, p-vcpus, b_info-max_vcpus);
-libxl_bitmap_set_none(p-vcpus);
-for (cpu = vcpu_range_parsed[i].start;
- cpu = vcpu_range_parsed[i].end;
- cpu++)
-libxl_bitmap_set(p-vcpus, cpu);
+libxl_bitmap_copy_alloc(ctx, p-vcpus, vcpu_parsed[i]);
+libxl_bitmap_dispose(vcpu_parsed[i]);
 }
 
-free(vcpu_range_parsed);
+free(vcpu_parsed);
 }
 
 static void parse_config_data(const char *config_source,


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] blktap2: update connection handling to fix build with gcc5

2015-07-20 Thread M A Young


On Mon, 20 Jul 2015, Jan Beulich wrote:

  On 19.07.15 at 11:33, o...@aepfle.de wrote:
  blktap2 fails to build with gcc5 because it fails to recognize that
  there can be just one active connection (enforced in ctl_accept).
  
  Rearrange the code to handle just a single connection.
  Adjust two strerror calls to use errno instead -1 as input.
  
  [  198s] block-log.c: In function 'ctl_close_sock':
  [  198s] block-log.c:363:23: error: array subscript is above array bounds 
  [-Werror=array-bounds]
  [  198s]  if (s-connections[i].fd == fd) {
  [  198s]^
  [  198s] block-log.c: In function 'ctl_request':
  [  198s] block-log.c:549:23: error: array subscript is above array bounds 
  [-Werror=array-bounds]
  [  198s]  if (s-connections[i].id == id)
  [  198s]^
 
 So what makes the compiler right with that complaint? I.e. how does
 it know i  0 here? After all - afaict - s-connected can only be 0 or
 1, and hence either the loop bodies don't get entered at all or the
 loops have only a single iteration. Smells like a compiler bug instead,
 which I'm not sure we want to work around with a non-trivial patch
 like this.

Does this depend on the gcc 5 version? I think this is something I thought 
I was going to have to fix on Fedora and then found I didn't.

Michael Young

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/2] xen: sched: reorganize cpu_disable_scheduler()

2015-07-20 Thread Juergen Gross

On 07/17/2015 03:35 PM, Dario Faggioli wrote:

The function is called both when we want to remove a cpu
from a cpupool, and during cpu teardown, for suspend or
shutdown. If, however, the boot cpu (cpu 0, most of the
times) is not present in the default cpupool, during
suspend or shutdown, Xen crashes like this:

   root@Zhaman:~# xl cpupool-cpu-remove Pool-0 0
   root@Zhaman:~# shutdown -h now
   (XEN) [ Xen-4.6-unstable  x86_64  debug=y  Tainted:C ]
   ...
   (XEN) Xen call trace:
   (XEN)[82d0801238de] _csched_cpu_pick+0x156/0x61f
   (XEN)[82d080123db5] csched_cpu_pick+0xe/0x10
   (XEN)[82d08012de3c] vcpu_migrate+0x18e/0x321
   (XEN)[82d08012e4f8] cpu_disable_scheduler+0x1cf/0x2ac
   (XEN)[82d08018bb8d] __cpu_disable+0x313/0x36e
   (XEN)[82d080101424] take_cpu_down+0x34/0x3b
   (XEN)[82d08013097a] stopmachine_action+0x70/0x99
   (XEN)[82d0801325f0] do_tasklet_work+0x78/0xab
   (XEN)[82d080132926] do_tasklet+0x5e/0x8a
   (XEN)[82d08016478c] idle_loop+0x56/0x6b
   (XEN)
   (XEN)
   (XEN) 
   (XEN) Panic on CPU 15:
   (XEN) Assertion 'cpu  nr_cpu_ids' failed at 
...URCES/xen/xen/xen.git/xen/include/xen/cpumask.h:97
   (XEN) 

There also are problems when we try to suspend or shutdown
with a cpupool configured with just one cpu (no matter, in
this case, whether that is the boot cpu or not):

   root@Zhaman:~# xl create /etc/xen/test.cfg
   root@Zhaman:~# xl cpupool-migrate test Pool-1
   root@Zhaman:~# xl cpupool-list -c
   Name   CPU list
   Pool-0 0,1,2,3,4,5,6,7,8,9,10,11,13,14,15
   Pool-1 12
   root@Zhaman:~# shutdown -h now
   (XEN) [ Xen-4.6-unstable  x86_64  debug=y  Tainted:C ]
   (XEN) CPU:12
   ...
   (XEN) Xen call trace:
   (XEN)[82d08018bb91] __cpu_disable+0x317/0x36e
   (XEN)[82d080101424] take_cpu_down+0x34/0x3b
   (XEN)[82d08013097a] stopmachine_action+0x70/0x99
   (XEN)[82d0801325f0] do_tasklet_work+0x78/0xab
   (XEN)[82d080132926] do_tasklet+0x5e/0x8a
   (XEN)[82d08016478c] idle_loop+0x56/0x6b
   (XEN)
   (XEN)
   (XEN) 
   (XEN) Panic on CPU 12:
   (XEN) Xen BUG at smpboot.c:895
   (XEN) 

In both cases, the problem is the scheduler not being able
to:
  - move all the vcpus to the boot cpu (as the boot cpu is
not in the cpupool), in the former;
  - move the vcpus away from a cpu at all (as that is the
only one cpu in the cpupool), in the latter.

Solution is to distinguish, inside cpu_disable_scheduler(),
the two cases of cpupool manipulation and teardown. For
cpupool manipulation, it is correct to ask the scheduler to
take an action, as pathological situation (like there not
being any cpu in the pool where to send vcpus) are taken
care of (i.e., forbidden!) already. For suspend and shutdown,
we don't want the scheduler to be involved at all, as the
final goal is pretty simple: send all the vcpus to the
boot cpu ASAP, so we just go for it.

Signed-off-by: Dario Faggioli dario.faggi...@citrix.com
---
Changes from v1:
  * BUG_ON() if, in the suspend/shutdown case, the mask of
online pCPUs will ever get empty, as suggested during
review;
  * reorganize and improve comments inside cpu_disable_scheduler()
as suggested during review;
  * make it more clear that vcpu_move_nosched() (name
changed, as suggested during review), should only be
called from quite contextes, such us, during suspend


s/quite/quiet/


or shutdown. Do that via both comments and asserts,
as requested during review;
  * reorganize cpu_disable_scheduler() and vcpu_move_nosched()
so that calling to sleep and wakeup functions are only
called when necessary (i.e., *not* in case we are
suspending/shutting down, as requested during review.
---
Cc: George Dunlap george.dun...@eu.citrix.com
Cc: Juergen Gross jgr...@suse.com
---
  xen/common/schedule.c |  102 +
  1 file changed, 86 insertions(+), 16 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index df8c1d0..ed0f356 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c


...


@@ -644,25 +673,66 @@ int cpu_disable_scheduler(unsigned int cpu)
  cpumask_setall(v-cpu_hard_affinity);
  }

-if ( v-processor == cpu )
+if ( v-processor != cpu )
  {
-set_bit(_VPF_migrating, v-pause_flags);
+/* This vcpu is not on cpu, so we can move on. */
  vcpu_schedule_unlock_irqrestore(lock, flags, v);
-vcpu_sleep_nosync(v);
-vcpu_migrate(v);
+continue;
+}
+
+/* If it is on cpu, we must send it away. */
+if ( unlikely(system_state 

Re: [Xen-devel] [v10][PATCH 06/16] hvmloader/pci: Try to avoid placing BARs in RMRRs

2015-07-20 Thread George Dunlap
On Mon, Jul 20, 2015 at 7:16 AM, Tiejun Chen tiejun.c...@intel.com wrote:
 Try to avoid placing PCI BARs over RMRRs:

 - If mmio_hole_size is not specified, and the existing MMIO range has
   RMRRs in it, and there is space to expand the hole in lowmem without
   moving more memory, then make the MMIO hole as large as possible.

 - When placing RMRRs, find the next RMRR higher than the current base
   in the lowmem mmio hole.  If it overlaps, skip ahead of it and find
   the next one.

 This certainly won't work in all cases, but it should work in a
 significant number of cases.  Additionally, users should be able to
 work around problems by setting mmio_hole_size larger in the guest
 config.

 Signed-off-by: George Dunlap george.dun...@eu.citrix.com
 Signed-off-by: Tiejun Chen tiejun.c...@intel.com
 ---
 v10:

 * This is from George' draft patch which implements an acceptable solution in
   current cycle. Here I just implemented check_overlap_all() and some 
 cleanups.

Changes:

Reviewed-by: George Dunlap george.dun...@eu.citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] blkif.h: document physical-device node

2015-07-20 Thread Roger Pau Monné
El 07/07/15 a les 17.48, Wei Liu ha escrit:
 This node is used by toolstack (libxl, hotplug script) and blkback.
 
 Signed-off-by: Wei Liu wei.l...@citrix.com
 ---
 Cc: Ian Campbell ian.campb...@citrix.com
 Cc: Ian Jackson ian.jack...@eu.citrix.com
 Cc: Roger Pau Monne roger@citrix.com
 Cc: David Vrabel david.vra...@citrix.com
 Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 Cc: George Dunlap george.dun...@eu.citrix.com
 
 I notice this when I was looking at George's patch series to fix hotplug
 scripts. If anyone has better idea how we can document this feel free to
 provide alternative.
 ---
  xen/include/public/io/blkif.h | 6 ++
  1 file changed, 6 insertions(+)
 
 diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h
 index 6baf7fb..8f0f9a6 100644
 --- a/xen/include/public/io/blkif.h
 +++ b/xen/include/public/io/blkif.h
 @@ -92,6 +92,12 @@
   *  backend driver to open the backing device.  (e.g. the path to the
   *  file or block device representing the backing store.)
   *
 + * physical-device
 + *  Values: MAJOR:MINOR
 + *
 + *  MAJOR and MINOR are the major number and minor number of the
 + *  backing device respectively.
 + *

Acked-by: Roger Pau Monné roger@citrix.com

Although I plan to expand this field so it's also used by FreeBSD
blkback with a different format.

Roger.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Request backport to xen qemu-upstream gits of a fix that solves occasional qemu crash using spice

2015-07-20 Thread Stefano Stabellini
On Wed, 15 Jul 2015, Fabio Fantoni wrote:
 On production system using xen 4.5 I had occasional qemu crash using spice,
 from qemu log:
  qemu-system-i386: spice-qemu-char.c:173: spice_chr_add_watch: Assertion
  `cond == G_IO_OUT' failed.
 It was solved upstream with this patch:
 http://git.qemu.org/?p=qemu.git;a=commitdiff;h=f7a8beb5e6a13dc924895244777d9ef08b23b367
 I already applied it on 2 my systems with xen=4.5 without find regressions.
 It is a small patch that affect only users that use spice solving an
 occasional qemu using spice, so I think is good backport it to
 qemu-upstream-unstable.git and qemu-upstream-4.5-testing.git.
 On qemu-upstream-4.4-testing.git and older is not needed because they have
 older qemu without patch that introduced the regression (serial: poll the
 serial console with G_IO_HUP).

Done. Thanks for pointing it out!

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [Patch V2] xen: release lock occasionally during ballooning

2015-07-20 Thread Juergen Gross
When dom0 is being ballooned balloon_process() will hold the balloon
mutex until it is finished. This will block e.g. creation of new
domains as the device backends for the new domain need some
autoballooned pages for the ring buffers.

Avoid this by releasing the balloon mutex from time to time during
ballooning. Adjust the comment above balloon_process() regarding
multiple instances of balloon_process().

Instead of open coding it, just use cond_resched().

Signed-off-by: Juergen Gross jgr...@suse.com
---
 drivers/xen/balloon.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index fd93369..bf4a23c 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -472,7 +472,7 @@ static enum bp_state decrease_reservation(unsigned long 
nr_pages, gfp_t gfp)
 }
 
 /*
- * We avoid multiple worker processes conflicting via the balloon mutex.
+ * As this is a work item it is guaranteed to run as a single instance only.
  * We may of course race updates of the target counts (which are protected
  * by the balloon lock), or with changes to the Xen hard limit, but we will
  * recover from these in time.
@@ -482,9 +482,10 @@ static void balloon_process(struct work_struct *work)
enum bp_state state = BP_DONE;
long credit;
 
-   mutex_lock(balloon_mutex);
 
do {
+   mutex_lock(balloon_mutex);
+
credit = current_credit();
 
if (credit  0) {
@@ -499,17 +500,15 @@ static void balloon_process(struct work_struct *work)
 
state = update_schedule(state);
 
-#ifndef CONFIG_PREEMPT
-   if (need_resched())
-   schedule();
-#endif
+   mutex_unlock(balloon_mutex);
+
+   cond_resched();
+
} while (credit  state == BP_DONE);
 
/* Schedule more work if there is some still to be done. */
if (state == BP_EAGAIN)
schedule_delayed_work(balloon_worker, 
balloon_stats.schedule_delay * HZ);
-
-   mutex_unlock(balloon_mutex);
 }
 
 /* Resets the Xen limit, sets new target, and kicks off processing. */
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-20 Thread Jan Beulich
 On 20.07.15 at 08:16, tiejun.c...@intel.com wrote:
 +/* If there was no highmem region, just create one. */
 +if ( i == memory_map.nr_map )
 +{
 +memory_map.map[i].addr = ((uint64_t)1  32);
 +memory_map.map[i].size = add_high_mem;
 +memory_map.map[i].type = E820_RAM;

Don't you need to increment memory_map.nr_map here?

 +}
 +
 +/* A sanity check if high memory is broken. */
 +BUG_ON( high_mem_end !=
 +memory_map.map[i].addr + memory_map.map[i].size);
 +}
 +
 +/* Now fulfill e820. */

s/fulfill/fill/.

 +/* Finally we need to sort all e820 entries. */
 +for ( j = 0; j  nr-1; j++ )
 +{
 +for ( i = j+1; i  nr; i++ )

Blanks around binary operators please.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/2] xen: sched: reorganize cpu_disable_scheduler()

2015-07-20 Thread Dario Faggioli
On Mon, 2015-07-20 at 13:41 +0200, Juergen Gross wrote:
 On 07/17/2015 03:35 PM, Dario Faggioli wrote:

  @@ -644,25 +673,66 @@ int cpu_disable_scheduler(unsigned int cpu)
cpumask_setall(v-cpu_hard_affinity);
}
 
  -if ( v-processor == cpu )
  +if ( v-processor != cpu )
{
  -set_bit(_VPF_migrating, v-pause_flags);
  +/* This vcpu is not on cpu, so we can move on. */
vcpu_schedule_unlock_irqrestore(lock, flags, v);
  -vcpu_sleep_nosync(v);
  -vcpu_migrate(v);
  +continue;
  +}
  +
  +/* If it is on cpu, we must send it away. */
  +if ( unlikely(system_state == SYS_STATE_suspend) )
  +{
  +/*
  + * If we are doing a shutdown/suspend, it is not necessary 
  to
  + * ask the scheduler to chime in. In fact:
  + *  * there is no reason for it: the end result we are 
  after
  + *is just 'all the vcpus on the boot pcpu, and no vcpu
  + *anywhere else', so let's just go for it;
  + *  * it's wrong, for cpupools with only non-boot pcpus, as
  + *the scheduler would always fail to send the vcpus 
  away
  + *from the last online (non boot) pcpu!
  + *
  + * Therefore, in the shutdown/suspend case, we just pick up
  + * one (still) online pcpu. Note that, at this stage, all
  + * domains (including dom0) have been paused already, so we
  + * do not expect any vcpu activity at all.
  + */
  +cpumask_andnot(online_affinity, cpu_online_map,
  +   cpumask_of(cpu));
  +BUG_ON(cpumask_empty(online_affinity));
  +/*
  + * As boot cpu is, usually, pcpu #0, using cpumask_first()
  + * will make us converge quicker.
  + */
  +new_cpu = cpumask_first(online_affinity);
  +vcpu_move_nosched(v, new_cpu);
 
 Shouldn't there be a vcpu_schedule_unlock_irqrestore() ?
 
I'm sure I put one there, as I was sure that it was there the last time
I inspected the patch before hitting send.

But I see that it's not there now, so I must have messed up when
formatting the patch, or something like that. :-(

It's really really weird, as I forgot it during development, and then
the system was hanging, and then I added it, and that's why I'm sure I
did have it in place... but perhaps I fat fingered some stgit command
which made it disappear.

In any case, sorry for this. I will re-test (just to be sure) and
re-send (and this time I'll triple check!!)

Thanks and Regards,
Dario

-- 
This happens because I choose it to happen! (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/6] tools/libx{l, c}: Remove the toolstack_{save, restore} callbacks

2015-07-20 Thread Andrew Cooper
Update the libxc spec to indicate more sternly that TOOLSTACK records
should no longer be used.

Also, trim further toolstack infrastructure which should have gone in
c/s 39bf4e9 tools/libxl: Drop all knowledge of toolstack callbacks

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 docs/specs/libxc-migration-stream.pandoc |   14 ++
 tools/libxc/include/xenguest.h   |   12 
 tools/libxl/libxl_internal.h |1 -
 tools/libxl/libxl_save_callout.c |2 --
 4 files changed, 6 insertions(+), 23 deletions(-)

diff --git a/docs/specs/libxc-migration-stream.pandoc 
b/docs/specs/libxc-migration-stream.pandoc
index 68fa513..9d8f17b 100644
--- a/docs/specs/libxc-migration-stream.pandoc
+++ b/docs/specs/libxc-migration-stream.pandoc
@@ -219,7 +219,7 @@ type 0x: END
 
  0x000A: HVM_PARAMS
 
- 0x000B: TOOLSTACK
+ 0x000B: TOOLSTACK (deprecated)
 
  0x000C: X86_PV_VCPU_MSRS
 
@@ -540,17 +540,15 @@ param value  Parameter value.
 
 \clearpage
 
-TOOLSTACK
--
+TOOLSTACK (deprecated)
+--
+
+ *This record was only present for transitionary purposes during
+  development.  It is should not be used.*
 
 An opaque blob provided by and supplied to the higher layers of the
 toolstack (e.g., libxl) during save and restore.
 
- This is only temporary -- the intention is the toolstack takes care
- of this itself.  This record is only present for early development
- purposes and will be removed before submissions, along with changes
- to libxl which cause libxl to handle this data itself.
-
  0 1 2 3 4 5 6 7 octet
 +++
 | data|
diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index e95af54..231ec6e 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -67,14 +67,6 @@ struct save_callbacks {
 /* Enable qemu-dm logging dirty pages to xen */
 int (*switch_qemu_logdirty)(int domid, unsigned enable, void *data); /* 
HVM only */
 
-/* Save toolstack specific data
- * @param buf the buffer with the data to be saved
- * @param len the length of the buffer
- * The callee allocates the buffer, the caller frees it (buffer must
- * be free'able).
- */
-int (*toolstack_save)(uint32_t domid, uint8_t **buf, uint32_t *len, void 
*data);
-
 /* to be provided as the last argument to each callback function */
 void* data;
 };
@@ -98,10 +90,6 @@ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t 
dom, uint32_t max_ite
 
 /* callbacks provided by xc_domain_restore */
 struct restore_callbacks {
-/* callback to restore toolstack specific data */
-int (*toolstack_restore)(uint32_t domid, const uint8_t *buf,
-uint32_t size, void* data);
-
 /* A checkpoint record has been found in the stream.
  * returns: */
 #define XGR_CHECKPOINT_ERROR0 /* Terminate processing */
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 2b6b2a0..4ab564a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2744,7 +2744,6 @@ _hidden void libxl__datacopier_prefixdata(libxl__egc*, 
libxl__datacopier_state*,
 libxl__ev_fd readable;
 libxl__ev_child child;
 const char *stdin_what, *stdout_what;
-FILE *toolstack_data_file;
 
 libxl__egc *egc; /* valid only for duration of each event callback;
   * is here in this struct for the benefit of the
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index f2ce868..3af99af 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -70,7 +70,6 @@ void libxl__xc_domain_restore(libxl__egc *egc, 
libxl__domain_create_state *dcs,
 shs-completion_callback = libxl__xc_domain_restore_done;
 shs-caller_state = dcs;
 shs-need_results = 1;
-shs-toolstack_data_file = 0;
 
 run_helper(egc, shs, --restore-domain, restore_fd, 0, 0,
argnums, ARRAY_SIZE(argnums));
@@ -333,7 +332,6 @@ static void helper_done(libxl__egc *egc, 
libxl__save_helper_state *shs)
 libxl__carefd_close(shs-pipes[0]);  shs-pipes[0] = 0;
 libxl__carefd_close(shs-pipes[1]);  shs-pipes[1] = 0;
 assert(!libxl__save_helper_inuse(shs));
-if (shs-toolstack_data_file) fclose(shs-toolstack_data_file);
 
 shs-egc = egc;
 shs-completion_callback(egc, shs-caller_state,
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4/6] tools/libx{l, c}: Drop '2' suffixes from xc_domain_{save, restore}2() functions

2015-07-20 Thread Andrew Cooper
As there is now only the one implementation.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 tools/libxc/include/xenguest.h  |   14 --
 tools/libxc/xc_nomigrate.c  |   20 
 tools/libxc/xc_sr_restore.c |   14 +++---
 tools/libxc/xc_sr_save.c|6 +++---
 tools/libxl/libxl_save_helper.c |4 ++--
 5 files changed, 12 insertions(+), 46 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 9772576..f837bde 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -83,11 +83,6 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t 
dom, uint32_t max_iter
uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
struct save_callbacks* callbacks, int hvm);
 
-/* Domain Save v2 */
-int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
max_iters,
-uint32_t max_factor, uint32_t flags,
-struct save_callbacks* callbacks, int hvm);
-
 /* callbacks provided by xc_domain_restore */
 struct restore_callbacks {
 /* A checkpoint record has been found in the stream.
@@ -127,15 +122,6 @@ int xc_domain_restore(xc_interface *xch, int io_fd, 
uint32_t dom,
   int checkpointed_stream,
   struct restore_callbacks *callbacks);
 
-/* Domain Restore v2 */
-int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom,
-   unsigned int store_evtchn, unsigned long *store_mfn,
-   domid_t store_domid, unsigned int console_evtchn,
-   unsigned long *console_mfn, domid_t console_domid,
-   unsigned int hvm, unsigned int pae, int superpages,
-   int checkpointed_stream,
-   struct restore_callbacks *callbacks);
-
 /**
  * This function will create a domain for a paravirtualized Linux
  * using file names pointing to kernel and ramdisk
diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c
index 426aee6..76978a0 100644
--- a/tools/libxc/xc_nomigrate.c
+++ b/tools/libxc/xc_nomigrate.c
@@ -29,14 +29,6 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t 
dom, uint32_t max_iter
 return -1;
 }
 
-int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
max_iters,
-uint32_t max_factor, uint32_t flags,
-struct save_callbacks* callbacks, int hvm)
-{
-errno = ENOSYS;
-return -1;
-}
-
 int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
   unsigned int store_evtchn, unsigned long *store_mfn,
   domid_t store_domid, unsigned int console_evtchn,
@@ -49,18 +41,6 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t 
dom,
 return -1;
 }
 
-int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom,
-   unsigned int store_evtchn, unsigned long *store_mfn,
-   domid_t store_domid, unsigned int console_evtchn,
-   unsigned long *console_mfn, domid_t console_domid,
-   unsigned int hvm, unsigned int pae, int superpages,
-   int checkpointed_stream,
-   struct restore_callbacks *callbacks)
-{
-errno = ENOSYS;
-return -1;
-}
-
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index bf1ee15..ebc8f2f 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -714,13 +714,13 @@ static int restore(struct xc_sr_context *ctx)
 return rc;
 }
 
-int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom,
-   unsigned int store_evtchn, unsigned long *store_mfn,
-   domid_t store_domid, unsigned int console_evtchn,
-   unsigned long *console_gfn, domid_t console_domid,
-   unsigned int hvm, unsigned int pae, int superpages,
-   int checkpointed_stream,
-   struct restore_callbacks *callbacks)
+int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
+  unsigned int store_evtchn, unsigned long *store_mfn,
+  domid_t store_domid, unsigned int console_evtchn,
+  unsigned long *console_gfn, domid_t console_domid,
+  unsigned int hvm, unsigned int pae, int superpages,
+  int checkpointed_stream,
+  struct restore_callbacks *callbacks)
 {
 struct xc_sr_context ctx =
 {
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index d63b783..7b38051 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ 

[Xen-devel] [PATCH 6/6] tools/libx{l, c}: Fix trivial Coverity defects in migration v2 code

2015-07-20 Thread Andrew Cooper
All of these are UNUSED_VALUE defects where a default vaule is
unconditionally overwritten.  They are not particularly interesting,
bug wise, but keeping these defects at bay helps prevent real bugs
going unnoticed in the volume.

No functional change.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 tools/libxc/xc_sr_save.c|   10 +-
 tools/libxl/libxl_stream_read.c |2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index 46700a1..79c389a 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -473,7 +473,7 @@ static int send_memory_live(struct xc_sr_context *ctx)
 xc_shadow_op_stats_t stats = { 0, ctx-save.p2m_size };
 char *progress_str = NULL;
 unsigned x;
-int rc = -1;
+int rc;
 
 rc = update_progress_string(ctx, progress_str, 0);
 if ( rc )
@@ -525,7 +525,7 @@ static int suspend_and_send_dirty(struct xc_sr_context *ctx)
 xc_interface *xch = ctx-xch;
 xc_shadow_op_stats_t stats = { 0, ctx-save.p2m_size };
 char *progress_str = NULL;
-int rc = -1;
+int rc;
 DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
 ctx-save.dirty_bitmap_hbuf);
 
@@ -572,7 +572,7 @@ static int send_memory_verify(struct xc_sr_context *ctx)
 {
 xc_interface *xch = ctx-xch;
 xc_shadow_op_stats_t stats = { 0, ctx-save.p2m_size };
-int rc = -1;
+int rc;
 struct xc_sr_record rec =
 {
 .type = REC_TYPE_VERIFY,
@@ -612,7 +612,7 @@ static int send_memory_verify(struct xc_sr_context *ctx)
  */
 static int send_domain_memory_live(struct xc_sr_context *ctx)
 {
-int rc = -1;
+int rc;
 
 rc = enable_logdirty(ctx);
 if ( rc )
@@ -652,7 +652,7 @@ static int send_domain_memory_checkpointed(struct 
xc_sr_context *ctx)
 static int send_domain_memory_nonlive(struct xc_sr_context *ctx)
 {
 xc_interface *xch = ctx-xch;
-int rc = -1;
+int rc;
 
 rc = suspend_domain(ctx);
 if ( rc )
diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index 94247b7..213bdc2 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -583,7 +583,7 @@ static void write_emulator_blob(libxl__egc *egc,
 libxl__sr_emulator_hdr *emu_hdr;
 STATE_AO_GC(stream-ao);
 char path[256];
-int rc = 0, writefd = -1;
+int rc = 0, writefd;
 
 if (rec-hdr.length  sizeof(*emu_hdr)) {
 rc = ERROR_FAIL;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH for 4.6 0/6] Prune legacy migration and move migration v2 out of daft status

2015-07-20 Thread Andrew Cooper
This series is some cleanup following the integration of migration v2
into the codebase.  It removes the legacy migration implementation
(compatability is provided with the python conversion script), and
adjusts the migration v2 docs/implementation to no longer be
experimental.

There are no major changes in this series, but are important changes
for the status of migration v2 in Xen 4.6

Andrew Cooper (6):
  tools/libxc: Remove legacy migration implementation
  tools/libx{l,c}: Remove the toolstack_{save,restore} callbacks
  tools/libx{l,c}: Remove XC_DEVICE_MODEL_RESTORE_FILE
  tools/libx{l,c}: Drop '2' suffixes from xc_domain_{save,restore}2() functions
  docs: Migration v2 is now no longer draft
  tools/libx{l,c}: Fix trivial Coverity defects in migration v2 code

 docs/specs/libxc-migration-stream.pandoc |   19 +-
 docs/specs/libxl-migration-stream.pandoc |   19 +-
 tools/libxc/Makefile |1 -
 tools/libxc/include/xenguest.h   |   34 -
 tools/libxc/xc_domain_restore.c  | 2411 --
 tools/libxc/xc_domain_save.c | 2198 ---
 tools/libxc/xc_nomigrate.c   |   20 -
 tools/libxc/xc_offline_page.c|   59 +
 tools/libxc/xc_sr_restore.c  |   15 +-
 tools/libxc/xc_sr_save.c |   17 +-
 tools/libxc/xg_save_restore.h|  247 ---
 tools/libxl/libxl.c  |2 +-
 tools/libxl/libxl_create.c   |2 +-
 tools/libxl/libxl_internal.h |2 +-
 tools/libxl/libxl_save_callout.c |2 -
 tools/libxl/libxl_save_helper.c  |4 +-
 tools/libxl/libxl_stream_read.c  |4 +-
 17 files changed, 107 insertions(+), 4949 deletions(-)
 delete mode 100644 tools/libxc/xc_domain_restore.c
 delete mode 100644 tools/libxc/xc_domain_save.c

-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 00/16] Fix RMRR

2015-07-20 Thread George Dunlap
On Mon, Jul 20, 2015 at 7:16 AM, Tiejun Chen tiejun.c...@intel.com wrote:
 v10:

 * Patch #6: hvmloader/pci: Try to avoid placing BARs in RMRRs
   This is from George' draft patch which implements an acceptable
   solution in current cycle. Here I just implemented check_overlap_all() and
   some cleanups.

 * Patch #7: hvmloader/e820: construct guest e820 table
   Instead of correcting e820, I'd like to correct memory_map.map[]
   and then copy them into e820 directly. I think this can make sure
   hvm_info, memory_map.map[] and e820 are on the same page.

One brief request -- if you do end up sending this series again, can
you please rebase to staging?  This is the second time I've had to
manually fix up some patches so that they apply.

Thanks.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 3/6] tools/libx{l, c}: Remove XC_DEVICE_MODEL_RESTORE_FILE

2015-07-20 Thread Andrew Cooper
All handling of device model files is now at the libxl level.  Remove
XC_DEVICE_MODEL_RESTORE_FILE and introduce LIBXL_DEVICE_MODEL_RESTORE_FILE in
its place.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 tools/libxc/include/xenguest.h  |8 
 tools/libxl/libxl.c |2 +-
 tools/libxl/libxl_create.c  |2 +-
 tools/libxl/libxl_internal.h|1 +
 tools/libxl/libxl_stream_read.c |2 +-
 5 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 231ec6e..9772576 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -135,14 +135,6 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, 
uint32_t dom,
unsigned int hvm, unsigned int pae, int superpages,
int checkpointed_stream,
struct restore_callbacks *callbacks);
-/**
- * xc_domain_restore writes a file to disk that contains the device
- * model saved state.
- * The pathname of this file is XC_DEVICE_MODEL_RESTORE_FILE; The domid
- * of the new domain is automatically appended to the filename,
- * separated by a ..
- */
-#define XC_DEVICE_MODEL_RESTORE_FILE /var/lib/xen/qemu-resume
 
 /**
  * This function will create a domain for a paravirtualized Linux
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 3fe1b99..c13aa8b 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -1695,7 +1695,7 @@ static void devices_destroy_cb(libxl__egc *egc,
 rc = libxl__remove_file(gc, libxl__device_model_savefile(gc, domid));
 if (rc  0) goto out;
 rc = libxl__remove_file(gc,
- GCSPRINTF(XC_DEVICE_MODEL_RESTORE_FILE.%u, domid));
+ GCSPRINTF(LIBXL_DEVICE_MODEL_RESTORE_FILE.%u, domid));
 if (rc  0) goto out;
 
 rc = libxl__ev_child_fork(gc, dis-destroyer, domain_destroy_domid_cb);
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 5b4d333..86d1181 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1083,7 +1083,7 @@ static void domcreate_stream_done(libxl__egc *egc,
 
 if (info-type == LIBXL_DOMAIN_TYPE_HVM) {
 state-saved_state = GCSPRINTF(
-   XC_DEVICE_MODEL_RESTORE_FILE.%d, domid);
+   LIBXL_DEVICE_MODEL_RESTORE_FILE.%d, domid);
 }
 
 out:
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 4ab564a..20f4cde 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -90,6 +90,7 @@
 /* QEMU may be slow to load and start due to a bug in Linux where the I/O
  * subsystem sometime produce high latency under load. */
 #define LIBXL_DEVICE_MODEL_START_TIMEOUT 60
+#define LIBXL_DEVICE_MODEL_RESTORE_FILE /var/lib/xen/qemu-resume /* .$domid 
*/
 #define LIBXL_STUBDOM_START_TIMEOUT 30
 #define LIBXL_QEMU_BODGE_TIMEOUT 2
 #define LIBXL_XENCONSOLE_LIMIT 1048576
diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index 0535e7f..94247b7 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -592,7 +592,7 @@ static void write_emulator_blob(libxl__egc *egc,
 }
 emu_hdr = rec-body;
 
-sprintf(path, XC_DEVICE_MODEL_RESTORE_FILE.%u, dcs-guest_domid);
+sprintf(path, LIBXL_DEVICE_MODEL_RESTORE_FILE.%u, dcs-guest_domid);
 
 assert(stream-emu_carefd == NULL);
 libxl__carefd_begin();
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 5/6] docs: Migration v2 is now no longer draft

2015-07-20 Thread Andrew Cooper
Add further instructions to the libxc Future Extensions section, and
provide such a section for libxl.

In addition, drop the In experimental __func__ IPRINTF()s from the
libxc implementations.

Finally, a correction to libxl's Not Yet Included section which
should have been amended in c/s 7eaec00 when libxl Remus support was
introduced into the protocol.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 docs/specs/libxc-migration-stream.pandoc |5 -
 docs/specs/libxl-migration-stream.pandoc |   19 ---
 tools/libxc/xc_sr_restore.c  |1 -
 tools/libxc/xc_sr_save.c |1 -
 4 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/docs/specs/libxc-migration-stream.pandoc 
b/docs/specs/libxc-migration-stream.pandoc
index 9d8f17b..8cd678f 100644
--- a/docs/specs/libxc-migration-stream.pandoc
+++ b/docs/specs/libxc-migration-stream.pandoc
@@ -1,7 +1,7 @@
 % LibXenCtrl Domain Image Format
 % David Vrabel david.vra...@citrix.com
   Andrew Cooper andrew.coop...@citrix.com
-% Draft G
+% Revision 1
 
 Introduction
 
@@ -679,6 +679,9 @@ introduced in Xen 3.0.
 Future Extensions
 =
 
+All changes to this specification should bump the revision number in
+the title block.
+
 All changes to the image or domain headers require the image version
 to be increased.
 
diff --git a/docs/specs/libxl-migration-stream.pandoc 
b/docs/specs/libxl-migration-stream.pandoc
index c24a434..cdec168 100644
--- a/docs/specs/libxl-migration-stream.pandoc
+++ b/docs/specs/libxl-migration-stream.pandoc
@@ -1,6 +1,6 @@
 % LibXenLight Domain Image Format
 % Andrew Cooper andrew.coop...@citrix.com
-% Draft B
+% Revision 1
 
 Introduction
 
@@ -41,8 +41,6 @@ Not Yet Included
 The following features are not yet fully specified and will be
 included in a future draft.
 
-* Remus
-
 * ARM
 
 
@@ -215,3 +213,18 @@ A checkpoint end record marks the end of a checkpoint in 
the image.
 +-+
 
 The end record contains no fields; its body_length is 0.
+
+
+Future Extensions
+=
+
+All changes to this specification should bump the revision number in
+the title block.
+
+All changes to the header require the header version to be increased.
+
+The format may be extended by adding additional record types.
+
+Extending an existing record type must be done by adding a new record
+type.  This allows old images with the old record to still be
+restored.
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index ebc8f2f..df885b6 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -740,7 +740,6 @@ int xc_domain_restore(xc_interface *xch, int io_fd, 
uint32_t dom,
 if ( checkpointed_stream )
 assert(callbacks-checkpoint);
 
-IPRINTF(In experimental %s, __func__);
 DPRINTF(fd %d, dom %u, hvm %u, pae %u, superpages %d
 , checkpointed_stream %d, io_fd, dom, hvm, pae,
 superpages, checkpointed_stream);
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index 7b38051..46700a1 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -850,7 +850,6 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t 
dom,
 if ( ctx.save.checkpointed )
 assert(callbacks-checkpoint  callbacks-postcopy);
 
-IPRINTF(In experimental %s, __func__);
 DPRINTF(fd %d, dom %u, max_iters %u, max_factor %u, flags %u, hvm %d,
 io_fd, dom, max_iters, max_factor, flags, hvm);
 
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 06/16] hvmloader/pci: Try to avoid placing BARs in RMRRs

2015-07-20 Thread Jan Beulich
 On 20.07.15 at 08:16, tiejun.c...@intel.com wrote:
 --- a/tools/firmware/hvmloader/pci.c
 +++ b/tools/firmware/hvmloader/pci.c
 @@ -38,6 +38,43 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
  enum virtual_vga virtual_vga = VGA_none;
  unsigned long igd_opregion_pgbase = 0;
  
 +/* Check if any conflicts with all reserved device memory. */

/* Check if the specified range conflicts with any reserved device memory. */

(and the any could perhaps be left out)

 +static bool check_overlap_all(uint64_t start, uint64_t size)
 +{
 +unsigned int i;
 +
 +for ( i = 0; i  memory_map.nr_map; i++ )
 +{
 +if ( memory_map.map[i].type == E820_RESERVED 
 + check_overlap(start, size,
 +   memory_map.map[i].addr,
 +   memory_map.map[i].size) )
 +return true;
 +}
 +
 +return false;
 +}
 +
 +/* Find the lowest RMRR higher than base. */
 +static int find_next_rmrr(uint32_t base)
 +{
 +unsigned int i;
 +int next_rmrr = -1;
 +uint64_t min_base = (1ull  32);
 +
 +for ( i = 0; i  memory_map.nr_map ; i++ )
 +{
 +if ( memory_map.map[i].type == E820_RESERVED 
 + memory_map.map[i].addr  base 
 + memory_map.map[i].addr  min_base )
 +{
 +next_rmrr = i;
 +min_base = memory_map.map[i].addr;
 +}
 +}
 +return next_rmrr;
 +}

Considering _both_ callers, I think the function should actually return
the lowest RMRR higher than or equal to base. Or wait - we actually
need to find the lowest RMRR the _end_ of which is higher than base.

Also - blank line before final return statement please.

 @@ -299,6 +337,15 @@ void pci_setup(void)
  || (((pci_mem_start  1)  PAGE_SHIFT)
  = hvm_info-low_mem_pgend)) )
  pci_mem_start = 1;
 +
 +/*
 + * Try to accomodate RMRRs in our MMIO region on a best-effort basis.
 + * If we have RMRRs in the range, then make pci_mem_start just after
 + * hvm_info-low_mem_pgend.
 + */
 +if ( pci_mem_start  (hvm_info-low_mem_pgend  PAGE_SHIFT) 
 + check_overlap_all(pci_mem_start, pci_mem_end-pci_mem_start) )
 +pci_mem_start = ((hvm_info-low_mem_pgend + 1)  PAGE_SHIFT);

Since it looks like low_mem_pgend is exclusive, is the + 1 here
really needed (other than to put an arbitrary one page gap
between RAM and MMIO)?

 @@ -407,6 +456,19 @@ void pci_setup(void)
  }
  
  base = (resource-base  + bar_sz - 1)  ~(uint64_t)(bar_sz - 1);
 +
 +/* If we're using mem_resource, check for RMRR conflicts. */
 +while ( resource == mem_resource 
 +next_rmrr  0 

= I think.

 +check_overlap(base, bar_sz,
 +  memory_map.map[next_rmrr].addr,
 +  memory_map.map[next_rmrr].size) )
 +{
 +base = memory_map.map[next_rmrr].addr + 
 memory_map.map[next_rmrr].size;
 +base = (base + bar_sz - 1)  ~(uint64_t)(bar_sz - 1);

Pointless cast.

And as George said elsewhere - please make sure the whole series
applies cleanly to staging; generally committers don't do fixups while
applying.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [Patch V2] xen: release lock occasionally during ballooning

2015-07-20 Thread Juergen Gross
When dom0 is being ballooned balloon_process() will hold the balloon
mutex until it is finished. This will block e.g. creation of new
domains as the device backends for the new domain need some
autoballooned pages for the ring buffers.

Avoid this by releasing the balloon mutex from time to time during
ballooning. Adjust the comment above balloon_process() regarding
multiple instances of balloon_process().

Instead of open coding it, just use cond_resched().

Signed-off-by: Juergen Gross jgr...@suse.com
---
 drivers/xen/balloon.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index fd93369..e6d9eee 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -481,9 +481,16 @@ static void balloon_process(struct work_struct *work)
 {
enum bp_state state = BP_DONE;
long credit;
+   static bool active;
 
mutex_lock(balloon_mutex);
 
+   if (active) {
+   mutex_unlock(balloon_mutex);
+   return;
+   }
+   active = true;
+
do {
credit = current_credit();
 
@@ -499,12 +506,16 @@ static void balloon_process(struct work_struct *work)
 
state = update_schedule(state);
 
-#ifndef CONFIG_PREEMPT
-   if (need_resched())
-   schedule();
-#endif
+   mutex_unlock(balloon_mutex);
+
+   cond_resched();
+
+   mutex_lock(balloon_mutex);
+
} while (credit  state == BP_DONE);
 
+   active = false;
+
/* Schedule more work if there is some still to be done. */
if (state == BP_EAGAIN)
schedule_delayed_work(balloon_worker, 
balloon_stats.schedule_delay * HZ);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch V2] xen: release lock occasionally during ballooning

2015-07-20 Thread Juergen Gross

Please ignore, forgot stg refresh...

Juergen

On 07/20/2015 01:46 PM, Juergen Gross wrote:

When dom0 is being ballooned balloon_process() will hold the balloon
mutex until it is finished. This will block e.g. creation of new
domains as the device backends for the new domain need some
autoballooned pages for the ring buffers.

Avoid this by releasing the balloon mutex from time to time during
ballooning. Adjust the comment above balloon_process() regarding
multiple instances of balloon_process().

Instead of open coding it, just use cond_resched().

Signed-off-by: Juergen Gross jgr...@suse.com
---
  drivers/xen/balloon.c | 19 +++
  1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index fd93369..e6d9eee 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -481,9 +481,16 @@ static void balloon_process(struct work_struct *work)
  {
enum bp_state state = BP_DONE;
long credit;
+   static bool active;

mutex_lock(balloon_mutex);

+   if (active) {
+   mutex_unlock(balloon_mutex);
+   return;
+   }
+   active = true;
+
do {
credit = current_credit();

@@ -499,12 +506,16 @@ static void balloon_process(struct work_struct *work)

state = update_schedule(state);

-#ifndef CONFIG_PREEMPT
-   if (need_resched())
-   schedule();
-#endif
+   mutex_unlock(balloon_mutex);
+
+   cond_resched();
+
+   mutex_lock(balloon_mutex);
+
} while (credit  state == BP_DONE);

+   active = false;
+
/* Schedule more work if there is some still to be done. */
if (state == BP_EAGAIN)
schedule_delayed_work(balloon_worker, 
balloon_stats.schedule_delay * HZ);




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 05/17] xen/arm: ITS: implement hw_irq_controller for LPIs

2015-07-20 Thread Ian Campbell
On Sat, 2015-07-18 at 11:13 +0100, Julien Grall wrote:
 Hi Ian,
 
 On 15/07/2015 10:32, Ian Campbell wrote:
  and save 2 byte if not more with the alignment per irq_desc.
 
  If this is a concern then I would say we would either want a separate
  array of per-pLPI information which we do not want in irq_desc because
  it is irq specific, or do add a pointer to its_desc which points to an
  array of per-event information.
 
 I noticed that we have a field msi_desc in the structure irq_desc. On 
 ARM, the structure msi_desc is not defined at all but still waste a 8 
 byte for the pointer in the irq_desc.
 
 Given that LPI is an MSI, I'm wondering if we could move all LPI related 
 data in this msi_desc. This would avoid to introduce new field in the 
 irq_desc structure.

Yes, I think I suggested something similar at some point, although I
seemed to think msi_desc was more similar to its_device.

Looking at the content of the struct, alloc_msi_entry and
msi_capability_init makes me less sure, it looks like msi_entry on x86
is allocated as an array (sized for events) stored per device, which
doesn't directly map onto how ARM is structured.

So what you propose may make more sense:

 
 The msi_desc would look like:
 
 struct msi_desc
 {
   its_device *dev;
   unsigned int eventID;
 };
 
 Regards,
 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/2] xen: sched: reorganize cpu_disable_scheduler()

2015-07-20 Thread Juergen Gross

On 07/20/2015 01:59 PM, Dario Faggioli wrote:

On Mon, 2015-07-20 at 13:41 +0200, Juergen Gross wrote:

On 07/17/2015 03:35 PM, Dario Faggioli wrote:



@@ -644,25 +673,66 @@ int cpu_disable_scheduler(unsigned int cpu)
   cpumask_setall(v-cpu_hard_affinity);
   }

-if ( v-processor == cpu )
+if ( v-processor != cpu )
   {
-set_bit(_VPF_migrating, v-pause_flags);
+/* This vcpu is not on cpu, so we can move on. */
   vcpu_schedule_unlock_irqrestore(lock, flags, v);
-vcpu_sleep_nosync(v);
-vcpu_migrate(v);
+continue;
+}
+
+/* If it is on cpu, we must send it away. */
+if ( unlikely(system_state == SYS_STATE_suspend) )
+{
+/*
+ * If we are doing a shutdown/suspend, it is not necessary to
+ * ask the scheduler to chime in. In fact:
+ *  * there is no reason for it: the end result we are after
+ *is just 'all the vcpus on the boot pcpu, and no vcpu
+ *anywhere else', so let's just go for it;
+ *  * it's wrong, for cpupools with only non-boot pcpus, as
+ *the scheduler would always fail to send the vcpus away
+ *from the last online (non boot) pcpu!
+ *
+ * Therefore, in the shutdown/suspend case, we just pick up
+ * one (still) online pcpu. Note that, at this stage, all
+ * domains (including dom0) have been paused already, so we
+ * do not expect any vcpu activity at all.
+ */
+cpumask_andnot(online_affinity, cpu_online_map,
+   cpumask_of(cpu));
+BUG_ON(cpumask_empty(online_affinity));
+/*
+ * As boot cpu is, usually, pcpu #0, using cpumask_first()
+ * will make us converge quicker.
+ */
+new_cpu = cpumask_first(online_affinity);
+vcpu_move_nosched(v, new_cpu);


Shouldn't there be a vcpu_schedule_unlock_irqrestore() ?


I'm sure I put one there, as I was sure that it was there the last time
I inspected the patch before hitting send.

But I see that it's not there now, so I must have messed up when
formatting the patch, or something like that. :-(

It's really really weird, as I forgot it during development, and then
the system was hanging, and then I added it, and that's why I'm sure I
did have it in place... but perhaps I fat fingered some stgit command
which made it disappear.


Or you forgot stg refresh? I just managed to do so. :-(


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] VT-d: add iommu=igfx_off option to workaround graphics issues

2015-07-20 Thread Andrew Cooper
On 17/07/15 20:05, Ting-Wei Lan wrote:
 When using Linux = 3.19 (commit 47591df) as dom0 on some Intel Ironlake
 devices, It is possible to encounter graphics issues that make screen
 unreadable or crash the system. It was reported in freedesktop bugzilla:

 https://bugs.freedesktop.org/show_bug.cgi?id=90037

 As we still cannot find a proper fix for this problem, this patch adds
 iommu=igfx_off option that is similar to Linux intel_iommu=igfx_off for
 users to manually workaround the problem.

 Signed-off-by: Ting-Wei Lan lant...@gmail.com

Having looked into this issue, the i915 driver has several workarounds
in it for systems when the IOMMU is in use.  In some cases there are
plain errata, while in other cases there are specific hardware features
which don't function if the IOMMU is enabled.

In all cases this is gated on Linux's idea of whether the IOMMU is
enabled.  When used under Xen, Linux has no clue that the IOMMU exists,
or that Xen has turned it on.

Longterm, we should probably see about not clobbering the DMAR table and
making dom0 aware that there is an IOMMU.  (I still don't actually know
why we have this asymmetry between Intel and AMD).

However in the short term, having a command line workaround like this
seems appropriate.  It at least allows the user to tell Xen not to play
with the IOMMU behind an unsuspecting dom0.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch V6 00/16] xen: support pv-domains larger than 512GB

2015-07-20 Thread David Vrabel
On 17/07/15 05:51, Juergen Gross wrote:
 Support 64 bit pv-domains with more than 512GB of memory.
 
 Following test have been done:
 - 64 bit dom0 on 8GB machine
 - 64 bit dom0 on 1TB machine (resolving p2m/E820-map conflict)
 - 32 bit dom0 on 8GB machine
 - 64 bit dom0 on 8GB machine with faked kernel/E820-map conflict
 - 64 bit dom0 on 8GB machine with faked pgtable/E820-map conflict
 - 64 bit dom0 on 8GB machine with faked initrd/E820-map conflict
 - 64 bit dom0 on 8GB machine with faked p2m/E820-map conflict
 - 64 bit domU (sizes up to 900GB)
 - 32 bit domU

Thanks.  #12 needs an MM maintainer ack and it's good to go in I think.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] VT-d: add iommu=igfx_off option to workaround graphics issues

2015-07-20 Thread Jan Beulich
 On 20.07.15 at 14:12, andrew.coop...@citrix.com wrote:
 On 17/07/15 20:05, Ting-Wei Lan wrote:
 When using Linux = 3.19 (commit 47591df) as dom0 on some Intel Ironlake
 devices, It is possible to encounter graphics issues that make screen
 unreadable or crash the system. It was reported in freedesktop bugzilla:

 https://bugs.freedesktop.org/show_bug.cgi?id=90037 

 As we still cannot find a proper fix for this problem, this patch adds
 iommu=igfx_off option that is similar to Linux intel_iommu=igfx_off for
 users to manually workaround the problem.

 Signed-off-by: Ting-Wei Lan lant...@gmail.com
 
 Having looked into this issue, the i915 driver has several workarounds
 in it for systems when the IOMMU is in use.  In some cases there are
 plain errata, while in other cases there are specific hardware features
 which don't function if the IOMMU is enabled.
 
 In all cases this is gated on Linux's idea of whether the IOMMU is
 enabled.  When used under Xen, Linux has no clue that the IOMMU exists,
 or that Xen has turned it on.

Perhaps it should just assume an IOMMU is in use when running under
Xen. Having inspected all those code places quite some time ago, I
came to the conclusion that making this assumption is better than
the current one of there not being an enabled IOMMU (and I adjusted
our kernels accordingly).

 Longterm, we should probably see about not clobbering the DMAR table and
 making dom0 aware that there is an IOMMU.  (I still don't actually know
 why we have this asymmetry between Intel and AMD).

The presence of a DMAR table is not a 1:1 representation of whether
Xen has the IOMMU enabled.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 59721: regressions - trouble: broken/fail/pass

2015-07-20 Thread Ian Campbell
On Mon, 2015-07-20 at 09:48 +0100, Ian Campbell wrote:
  version targeted for testing:
   xen  21d9b079e53805b68047d60d28cde224d09bbb40
  baseline version:
   xen  c40317f11b3f05e7c06a2213560c8471081f2662

Based on feedback from Wei and Jan I've now gone ahead and done this
force push.

osstest@osstest:~/branches/for-xen-unstable.git$ 
OSSTEST_CONFIG=production-config ./ap-push xen-unstable 
21d9b079e53805b68047d60d28cde224d09bbb40
+ branch=xen-unstable
+ revision=21d9b079e53805b68047d60d28cde224d09bbb40
+ . cri-lock-repos
++ . cri-common
+++ . cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{Repos} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push xen-unstable 
21d9b079e53805b68047d60d28cde224d09bbb40
+ branch=xen-unstable
+ revision=21d9b079e53805b68047d60d28cde224d09bbb40
+ . cri-lock-repos
++ . cri-common
+++ . cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{Repos} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . cri-common
++ . cri-getconfig
++ umask 002
+ select_xenbranch
+ case $branch in
+ tree=xen
+ xenbranch=xen-unstable
+ '[' xxen = xlinux ']'
+ linuxbranch=
+ '[' x = x ']'
+ qemuubranch=qemu-upstream-unstable
+ : tested/2.6.39.x
+ . ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{OsstestUpstream} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/staging/qemu-xen-unstable.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://libvirt.org/libvirt.git
++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git
++ : git://xenbits.xen.org/libvirt.git
++ : https://github.com/rumpkernel/rumprun-xen
++ : git
++ : git://xenbits.xen.org/rumpuser-xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/rumpuser-xen.git
+++ besteffort_repo https://github.com/rumpkernel/rumpkernel-netbsd-src
+++ local repo=https://github.com/rumpkernel/rumpkernel-netbsd-src
+++ cached_repo https://github.com/rumpkernel/rumpkernel-netbsd-src 
'[fetch=try]'
+++ local repo=https://github.com/rumpkernel/rumpkernel-netbsd-src
+++ local 'options=[fetch=try]'
 getconfig GitCacheProxy
 perl -e '
use Osstest;
readglobalconfig();
print $c{GitCacheProxy} or die $!;
'
+++ local cache=git://cache:9419/
+++ '[' xgit://cache:9419/ '!=' x ']'
+++ echo 
'git://cache:9419/https://github.com/rumpkernel/rumpkernel-netbsd-src%20[fetch=try]'
++ : 
'git://cache:9419/https://github.com/rumpkernel/rumpkernel-netbsd-src%20[fetch=try]'
++ : git
++ : git://git.seabios.org/seabios.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/seabios.git
++ : git://xenbits.xen.org/osstest/seabios.git
++ : https://github.com/tianocore/edk2.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/ovmf.git
++ : git://xenbits.xen.org/osstest/ovmf.git
++ : git://xenbits.xen.org/osstest/linux-firmware.git
++ : osst...@xenbits.xen.org:/home/osstest/ext/linux-firmware.git
++ : git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
++ : osst...@xenbits.xen.org:/home/xen/git/linux-pvops.git
++ : git://xenbits.xen.org/linux-pvops.git
++ : tested/linux-3.14
++ : tested/linux-arm-xen
++ '[' x = x ']'
++ : git://xenbits.xen.org/linux-pvops.git
++ : tested/linux-3.14
++ '[' x = x ']'
++ : git://xenbits.xen.org/linux-pvops.git
++ : tested/linux-arm-xen
++ : git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git
++ : tested/2.6.39.x
++ : daily-cron.xen-unstable
++ : daily-cron.xen-unstable
++ : daily-cron.xen-unstable
++ : daily-cron.xen-unstable
++ : daily-cron.xen-unstable
++ : daily-cron.xen-unstable
++ : http://hg.uk.xensource.com/carbon/trunk/linux-2.6.27
++ : git://xenbits.xen.org/staging/qemu-upstream-unstable.git
++ : daily-cron.xen-unstable
++ : git://git.qemu.org/qemu.git
++ : git://xenbits.xen.org/osstest/qemu.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/qemu.git
++ : daily-cron.xen-unstable
+ TREE_LINUX=osst...@xenbits.xen.org:/home/xen/git/linux-pvops.git
+ 

Re: [Xen-devel] [v10][PATCH 00/16] Fix RMRR

2015-07-20 Thread Chen, Tiejun

One brief request -- if you do end up sending this series again, can
you please rebase to staging?  This is the second time I've had to
manually fix up some patches so that they apply.



Sorry for this inconvenience.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 1/2] OSSTEST: introduce a raisin build test

2015-07-20 Thread Ian Campbell
Ian -- various questions for you here.

On Mon, 2015-07-20 at 13:56 +0100, Stefano Stabellini wrote:
 On Thu, 9 Jul 2015, Ian Campbell wrote:
  On Mon, 2015-06-22 at 16:16 +0100, Stefano Stabellini wrote:
   Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
  [...]
 
 I didn't do a good job with the commit message. I'll try to track down
 the old discussion and write some more details.

Thanks!

   Changes in v3:
   - use $raisindir throughout ts-raisin-build
   - do not specify ENABLED_COMPONENTS so that empty REVISION variables can
   be used to disable building a raisin component
  
  I see we do again in the code below, which I suspect was deliberate and
  based on some discussion, but it's not mentioned in the Changes in v4+
  and since the changelog doesn't explain anything I can't even begin to
  guess why or how we arrived at this point.
 
  [...]
   +sub build () {
   +target_cmd_root($ho, END);
   +cd $raisindir
   +./raise -y install-builddep
  
  If two of these happen to run in parallel (build machines can be shared)
  then one or the other risks timing out on the underlying dpkg lock
  waiting for the other, since the wait might be arbitrarily long
  depending on what is going on.
  
  It also risks other builds happening in an environment which differs
  from one build to the next (if this happened to run in the middle) or
  even things changing while a build is happening.
  
  This is why all build dependencies are installed from ts-xen-build-prep,
  that step is run once for each build host as it is installed. This does
  unfortunately mean that I think we can't take advantage of raisin's
  tracking of necessary build dependencies, but at least it will be
  checking things for us.
 
 It is fine for OSSTest not use install-builddep. I think that the best
 way for OSSTest to use Raisin would be to call raise in
 ts-xen-build-prep to list the dependencies and install them from there.

Unfortunately ts-xen-build-prep will not have access to the repo, since
it won't be cloned. Also when reusing a host (which happens for builds)
I think this step is skipped (Ian J: can you confirm or deny that?)

 But here we are testing Raisin itself so I think that in this context it
 is actually important to run install-builddep. I don't think it would
 cause any troubles to other builds happening in parallel because the
 build dependencies of those builds would be installed by
 ts-xen-build-prep beforehand, that I was told is run before any build
 jobs?  If I am wrong, then we have a problem.

I think there are two problems, one is that install-builddep will take
the apt/dpkg lock, which will be shared by any other invocations --
making the timeout hard to decide upon (since it will pipeline things).

Second if install-builddep did install something extra due to some
change in raisin.git, for example, then some jobs would see new build
deps arrive half way through, or subsequent tests of older branches
might see something installed which wouldn't have been before.

Is there a check-builddep? That's the thing I would suggest we run here,
so we get an explicit failure.

 Otherwise is there a way to guarantee that only one raisin build happen
 simultaneously?

Ian?

I suspect we don't want this, 
 
 Given that the series is already at v7, I would prefer to remove the
 call to install-builddep, even though I think it is important, rather
 than make substantial modifications. However, if I do remove the call to
 install-builddep from ts-raisin-build, I would have to install any
 missing build dependencies somewhere else. Where would that be? In
 ts-xen-build-prep? Is that script even called for non-xen build jobs?

In ts-xen-build-prep, I think.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-20 Thread Chen, Tiejun

v10:

* Instead of correcting e820, I'd like to correct memory_map.map[]
   and then copy them into e820 directly. I think this can make sure
   hvm_info, memory_map.map[] and e820 are on the same page.


Actually, now that you mention it -- this should probably happen
instead when we update hvm_info-{low,high}_mem_pgend.


I also considered this point previously but I thought just right now we 
only update hvm_info-low/high_mem_pgend inside pci_setup(). But you 
can't guarantee this would be a sole place in the future. Instead, 
memory_map.map[] would always be copied into e820 when we build e820 table.


So I think we'd better do this update once at the last minute.

Thanks
Tiejun



I'm happy to leave this where it is for now, so with Jan's comments
addressed (in particular, incrementing nr_map):

Reviewed-by: George Dunlap george.dun...@eu.citrix.com

But if we have time it might be better to pull the loop in pci_setup()
which moves the memory out into a function, and have that function
modify the lowmem and highmem map[] entries at the same time we modify
low_mem_pgend and high_mem_pgend.  Alternately, you could put that on
your list of clean-ups to hvmloader for 4.7.

Thanks,
  -George



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 05/17] xen/arm: ITS: implement hw_irq_controller for LPIs

2015-07-20 Thread Ian Campbell
On Sat, 2015-07-18 at 11:13 +0100, Julien Grall wrote:
 Hi Ian,
 
 On 15/07/2015 10:32, Ian Campbell wrote:
  and save 2 byte if not more with the alignment per irq_desc.
 
  If this is a concern then I would say we would either want a separate
  array of per-pLPI information which we do not want in irq_desc because
  it is irq specific, or do add a pointer to its_desc which points to an
  array of per-event information.
 
 I noticed that we have a field msi_desc in the structure irq_desc. On 
 ARM, the structure msi_desc is not defined at all but still waste a 8 
 byte for the pointer in the irq_desc.
 
 Given that LPI is an MSI, I'm wondering if we could move all LPI related 
 data in this msi_desc. This would avoid to introduce new field in the 
 irq_desc structure.

Yes, I think I suggested something similar at some point, although I
seemed to think msi_desc was more similar to its_device.

Looking at the content of the struct, alloc_msi_entry and
msi_capability_init makes me less sure, it looks like msi_entry on x86
is allocated as an array (sized for events) stored per device, which
doesn't directly map onto how ARM is structured.

So I think you suggestion may make more sense.

 
 The msi_desc would look like:
 
 struct msi_desc
 {
   its_device *dev;
   unsigned int eventID;
 };
 
 Regards,
 




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] VT-d: add iommu=igfx_off option to workaround graphics issues

2015-07-20 Thread Andrew Cooper
On 20/07/15 13:24, Jan Beulich wrote:
 On 20.07.15 at 14:12, andrew.coop...@citrix.com wrote:
 On 17/07/15 20:05, Ting-Wei Lan wrote:
 When using Linux = 3.19 (commit 47591df) as dom0 on some Intel Ironlake
 devices, It is possible to encounter graphics issues that make screen
 unreadable or crash the system. It was reported in freedesktop bugzilla:

 https://bugs.freedesktop.org/show_bug.cgi?id=90037 

 As we still cannot find a proper fix for this problem, this patch adds
 iommu=igfx_off option that is similar to Linux intel_iommu=igfx_off for
 users to manually workaround the problem.

 Signed-off-by: Ting-Wei Lan lant...@gmail.com
 Having looked into this issue, the i915 driver has several workarounds
 in it for systems when the IOMMU is in use.  In some cases there are
 plain errata, while in other cases there are specific hardware features
 which don't function if the IOMMU is enabled.

 In all cases this is gated on Linux's idea of whether the IOMMU is
 enabled.  When used under Xen, Linux has no clue that the IOMMU exists,
 or that Xen has turned it on.
 Perhaps it should just assume an IOMMU is in use when running under
 Xen. Having inspected all those code places quite some time ago, I
 came to the conclusion that making this assumption is better than
 the current one of there not being an enabled IOMMU (and I adjusted
 our kernels accordingly).

In at least one case, an errata workaround involves issuing extra IOMMU
commands.  We cannot safely let even dom0 perform this.


 Longterm, we should probably see about not clobbering the DMAR table and
 making dom0 aware that there is an IOMMU.  (I still don't actually know
 why we have this asymmetry between Intel and AMD).
 The presence of a DMAR table is not a 1:1 representation of whether
 Xen has the IOMMU enabled.

It would (in theory) allow dom0 to get RO mappings of the IOMMU control
pages and observe that the IOMMU is enabled.  Whether this is useful in
practice is a different matter.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen: arm: Avoid reading beyond the last module

2015-07-20 Thread Julien Grall
Hi Chris,

On 17/07/15 21:48, Chris (Christopher) Brand wrote:
 nr_mods is set in add_boot_module() to the number of module
 array elements used. This function also ensures that nr_mods
 never exceeds MAX_MODULES (the size of the array). When looping
 through the array, the correct maximum index is nr_mods-1,
 not nr_mods. If the array is full, using the latter will in
 fact access beyond the end of the array.
 This was done correctly in boot_module_find_by_kind() and
 consider_modules() but incorrectly in discard_initial_modules()
 and next_module().
 
 Signed-off-by: Chris Brand chris.br...@broadcom.com

Reviewed-by: Julien Grall julien.gr...@citrix.com

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 06/16] hvmloader/pci: Try to avoid placing BARs in RMRRs

2015-07-20 Thread George Dunlap
On 07/20/2015 12:30 PM, Jan Beulich wrote:
 On 20.07.15 at 08:16, tiejun.c...@intel.com wrote:
 --- a/tools/firmware/hvmloader/pci.c
 +++ b/tools/firmware/hvmloader/pci.c
 @@ -38,6 +38,43 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
  enum virtual_vga virtual_vga = VGA_none;
  unsigned long igd_opregion_pgbase = 0;
  
 +/* Check if any conflicts with all reserved device memory. */
 
 /* Check if the specified range conflicts with any reserved device memory. */
 
 (and the any could perhaps be left out)
 
 +static bool check_overlap_all(uint64_t start, uint64_t size)
 +{
 +unsigned int i;
 +
 +for ( i = 0; i  memory_map.nr_map; i++ )
 +{
 +if ( memory_map.map[i].type == E820_RESERVED 
 + check_overlap(start, size,
 +   memory_map.map[i].addr,
 +   memory_map.map[i].size) )
 +return true;
 +}
 +
 +return false;
 +}
 +
 +/* Find the lowest RMRR higher than base. */
 +static int find_next_rmrr(uint32_t base)
 +{
 +unsigned int i;
 +int next_rmrr = -1;
 +uint64_t min_base = (1ull  32);
 +
 +for ( i = 0; i  memory_map.nr_map ; i++ )
 +{
 +if ( memory_map.map[i].type == E820_RESERVED 
 + memory_map.map[i].addr  base 
 + memory_map.map[i].addr  min_base )
 +{
 +next_rmrr = i;
 +min_base = memory_map.map[i].addr;
 +}
 +}
 +return next_rmrr;
 +}
 
 Considering _both_ callers, I think the function should actually return
 the lowest RMRR higher than or equal to base. 

You mean instead of strictly greater than the base.

 Or wait - we actually
 need to find the lowest RMRR the _end_ of which is higher than base.

Yes, you're right: there's always a risk that pci_mem_start will *start*
in the middle of a range.  Looking for the next *end* is more robust.

 @@ -407,6 +456,19 @@ void pci_setup(void)
  }
  
  base = (resource-base  + bar_sz - 1)  ~(uint64_t)(bar_sz - 1);
 +
 +/* If we're using mem_resource, check for RMRR conflicts. */
 +while ( resource == mem_resource 
 +next_rmrr  0 
 
 = I think.

Yes,  is a typo; I certainly meant to type =.

 
 +check_overlap(base, bar_sz,
 +  memory_map.map[next_rmrr].addr,
 +  memory_map.map[next_rmrr].size) )
 +{
 +base = memory_map.map[next_rmrr].addr + 
 memory_map.map[next_rmrr].size;
 +base = (base + bar_sz - 1)  ~(uint64_t)(bar_sz - 1);
 
 Pointless cast.

This was copy-and-pasted from the line above.  I had assumed that bar_sz
was uint32_t (perhaps it was at some point), but I guess now it's
pointless there too. :-)

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-20 Thread George Dunlap
On Mon, Jul 20, 2015 at 7:16 AM, Tiejun Chen tiejun.c...@intel.com wrote:
 Now use the hypervisor-supplied memory map to build our final e820 table:
 * Add regions for BIOS ranges and other special mappings not in the
   hypervisor map
 * Add in the hypervisor supplied regions
 * Adjust the lowmem and highmem regions if we've had to relocate
   memory (adding a highmem region if necessary)
 * Sort all the ranges so that they appear in memory order.

 CC: Keir Fraser k...@xen.org
 CC: Jan Beulich jbeul...@suse.com
 CC: Andrew Cooper andrew.coop...@citrix.com
 CC: Ian Jackson ian.jack...@eu.citrix.com
 CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
 CC: Ian Campbell ian.campb...@citrix.com
 CC: Wei Liu wei.l...@citrix.com
 Signed-off-by: Tiejun Chen tiejun.c...@intel.com
 ---
 v10:

 * Instead of correcting e820, I'd like to correct memory_map.map[]
   and then copy them into e820 directly. I think this can make sure
   hvm_info, memory_map.map[] and e820 are on the same page.

Actually, now that you mention it -- this should probably happen
instead when we update hvm_info-{low,high}_mem_pgend.

I'm happy to leave this where it is for now, so with Jan's comments
addressed (in particular, incrementing nr_map):

Reviewed-by: George Dunlap george.dun...@eu.citrix.com

But if we have time it might be better to pull the loop in pci_setup()
which moves the memory out into a function, and have that function
modify the lowmem and highmem map[] entries at the same time we modify
low_mem_pgend and high_mem_pgend.  Alternately, you could put that on
your list of clean-ups to hvmloader for 4.7.

Thanks,
 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 12/17] xen/arm: ITS: Initialize LPI irq descriptors and route

2015-07-20 Thread Vijay Kilari
Hi Ian,

On Fri, Jul 10, 2015 at 9:00 PM, Ian Campbell ian.campb...@citrix.com wrote:
 On Fri, 2015-07-10 at 13:12 +0530, vijay.kil...@gmail.com wrote:
 diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
 index 3ebadcf..92d2be9 100644
 --- a/xen/arch/arm/gic.c
 +++ b/xen/arch/arm/gic.c
 @@ -68,11 +68,18 @@ enum gic_version gic_hw_version(void)
 return gic_hw_ops-info-hw_version;
  }

 +/* Only validates PPIs/SGIs/SPIs supported */
  unsigned int gic_number_lines(void)
  {
  return gic_hw_ops-info-nr_lines;
  }

 +/* Validates PPIs/SGIs/SPIs/LPIs supported */
 +bool_t gic_is_valid_irq(unsigned int irq)
 +{
 +return ((irq  gic_hw_ops-info-nr_lines)  is_lpi(irq));
 +}
 +
  unsigned int gic_nr_id_bits(void)
  {
  return gic_hw_ops-info-nr_id_bits;
 @@ -148,7 +155,7 @@ void gic_route_irq_to_xen(struct irq_desc *desc, const 
 cpumask_t *cpu_mask,
  {
  ASSERT(priority = 0xff); /* Only 8 bits of priority */
  /* Can't route interrupts that don't exist */
 -ASSERT(desc-irq  gic_number_lines() || is_lpi(desc-irq));
 +ASSERT(gic_is_valid_irq(desc-irq));


 The fact that I commented on this earlier is another artefact in the way
 this series presents functionality in an essentially arbitrary order.

 I notice that you appear to have reintroduced the logic but when you
 moved this code from here into gic_is_valid_irq. AFAICT that function
 can never return true, but then I'm rather confused about how this
 series can have been tested.

  ASSERT(test_bit(_IRQ_DISABLED, desc-status));
  ASSERT(spin_is_locked(desc-lock));

 @@ -160,6 +167,17 @@ void gic_route_irq_to_xen(struct irq_desc *desc, const 
 cpumask_t *cpu_mask,
  /* Program the GIC to route an interrupt to a guest
   *   - desc.lock must be held
   */
 +int gic_route_lpi_to_guest(struct domain *d, unsigned int virq,
 +   struct irq_desc *desc, unsigned int priority)
 +{
 +ASSERT(spin_is_locked(desc-lock));

 ASSERT(virq = SOME_DEFINE_FOR_MIN_LPI);

 diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
 index 3806d98..c8ea627 100644
 --- a/xen/arch/arm/irq.c
 +++ b/xen/arch/arm/irq.c
 @@ -62,12 +62,21 @@ hw_irq_controller no_irq_type = {
  };

  static irq_desc_t irq_desc[NR_IRQS];
 +#ifdef CONFIG_ARM_64
 +static irq_desc_t irq_desc_lpi[NR_GIC_LPI];

 http://xenbits.xen.org/people/ianc/vits/draftG.html#irq-descriptors
 contains: Therefore a second dynamically allocated array will be added
 to cover the range 8192..nr_lpis

 IOW this should be dynamically allocated and therefore NR_GIC_LPI (which
 I think I already commented on earlier) should go away, since the limit
 should come from the h/w.
 @@ -267,9 +285,14 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int 
 irq, int is_fiq)
  set_bit(_IRQ_INPROGRESS, desc-status);
  desc-arch.eoi_cpu = smp_processor_id();

 +#ifdef CONFIG_ARM_64
 +if ( is_lpi(irq) )
 +vgic_vcpu_inject_lpi(info-d, irq);
 +else
 +#endif
  /* the irq cannot be a PPI, we only support delivery of SPIs to
   * guests */
 -vgic_vcpu_inject_spi(info-d, info-virq);
 +vgic_vcpu_inject_spi(info-d, info-virq);

 Comment should be reindented too.

 +if ( !is_lpi(irq) )
 +{
 +rank = vgic_rank_irq(v, irq);
 +
 +ASSERT(spin_is_locked(rank-lock));
 +priority = vgic_byte_read(rank-ipriority[REG_RANK_INDEX(8,
irq, DABT_WORD)], 0, irq  
 0x3);
 +}
 +if ( is_lpi(irq)  gic_lpi_supported() )

 Should be else if ( gic_lpi_supported() )

 +vdev = vits_find_device(d-arch.vits-dev_root, devid);
 +if ( !vdev )
 +{
 +dprintk(XENLOG_WARNING,
 +LPI %d received for dev 0x%x not assigned..dropping\n,
 +irq, devid);
 +return;
 +}

 This isn't used again and the whole R-B tree should go away.

 +p = irq_to_pending(d-vcpu[0], vitt_entry.vlpi);

 Perhaps given that we know this is a vlpi, we could skipping going via
 vcpu[0] and just go right to the entry in the d-pending_lpi array (via
 a suitable accessor of course)

 +p-desc = desc;

 This should have happened during routing, not now.

 Oh. I see what is going on, your vgic_vcpu_inject_lpi appears to be
 taking an IRQ, and not a vpli, or at least to be confused about what it
 should do with the number which it calls irq.

 Therefore you find yourself needing to lookup the irqdesc, and look in
 the vitt etc, none of which belongs here.

 For interrupts coming from a plpi all of that lookup should happen based
 on the its_device which is contained in the irq_guest and other info
 from there, before calling this function. You can get the irq_guest
 which you can get because you have a PLPI in your hand and can look up
 the irq_guest via the irq_desc which you get using the plpi number.


AUI, you are refering to

http://xenbits.xen.org/people/ianc/vits/draftG.html#virtual-lpi-injection

If we want to extract, 

Re: [Xen-devel] [PATCH v4 12/17] xen/arm: ITS: Initialize LPI irq descriptors and route

2015-07-20 Thread Julien Grall
On 20/07/15 14:07, Vijay Kilari wrote:
 AUI, you are refering to
 
 http://xenbits.xen.org/people/ianc/vits/draftG.html#virtual-lpi-injection
 
 If we want to extract, its_device, and VITT information before calling
 vgic_vcpu_inject_lpi(),
  it cannot be done in do_IRQ() as irq.c cannot have vITS related code.

IHMO it can't be done in vgic.c too...

 
 So one option is to call vits_foo() function which will validated plpi and get
 its_device / VITT info and call vgic_vcpu_inject_lpi(). So vITS will
 call back vgic function.
 OR
 Introduce new vgic function for doing this and call vgic_vcpu_inject_lpi().

I don't understand why you need to validate the plpi. It's already done
when we receive the IRQ.

Furthermore, you will need to retrieve again the VITT entry in
vgic_vcpu_inject_lpi in order to get the collection associated to the vLPI.

Given that the irq_desc as the vDevID and the eventID and the INT
command will have the same parameters in hand it looks like to me that
the prototype of vgic_vcpu_inject_lpi should be

vgic_vcpu_inject_lpi(struct domain *d, uint32_t vdevid, uint32_t eventid)

And the vgic_vcpu_inject_lpi will take care of retrieving the vLPI and
the collection associated.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 1/2] OSSTEST: introduce a raisin build test

2015-07-20 Thread Stefano Stabellini
On Thu, 9 Jul 2015, Ian Campbell wrote:
 On Mon, 2015-06-22 at 16:16 +0100, Stefano Stabellini wrote:
  Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
 
 In general I'm very suspicious of a 200 line patch with _no_ commit
 message at all.
 
 What about the details of the stuff around overriding the versions iff
 tree and revision are set etc and why that matters and how the two
 projects therefore interact wrt selecting what to build? I remember
 discussing this at length either IRL or in older versions of the thread
 and we both know it wasn't trivial to arrive at how this should be done,
 but I _already_ can't remember how we reached this point or why and that
 is information which really needs to be recorded so it can be referred
 to later when we wonder why it does this.
 
 The scope is also missing, i.e. this is currently just a straight raisin
 build test, and the results are not consumed by anything. This is
 important because of all the stuff about splitting up the dist dir and
 incremental builds which we discussed means that as it stands this test
 step is not producing build artefacts suitable as an input for actual
 tests.
 
 I expect there's also stuff in the intra-version changelog which
 actually belongs in the main changelog, such as why you are refactoring
 certain things into BuildSupport.pm, which deserves some sort of brief
 mention IMHO.
 
 [...]

I didn't do a good job with the commit message. I'll try to track down
the old discussion and write some more details.


  Changes in v3:
  - use $raisindir throughout ts-raisin-build
  - do not specify ENABLED_COMPONENTS so that empty REVISION variables can
  be used to disable building a raisin component
 
 I see we do again in the code below, which I suspect was deliberate and
 based on some discussion, but it's not mentioned in the Changes in v4+
 and since the changelog doesn't explain anything I can't even begin to
 guess why or how we arrived at this point.

 [...]
  +sub build () {
  +target_cmd_root($ho, END);
  +cd $raisindir
  +./raise -y install-builddep
 
 If two of these happen to run in parallel (build machines can be shared)
 then one or the other risks timing out on the underlying dpkg lock
 waiting for the other, since the wait might be arbitrarily long
 depending on what is going on.
 
 It also risks other builds happening in an environment which differs
 from one build to the next (if this happened to run in the middle) or
 even things changing while a build is happening.
 
 This is why all build dependencies are installed from ts-xen-build-prep,
 that step is run once for each build host as it is installed. This does
 unfortunately mean that I think we can't take advantage of raisin's
 tracking of necessary build dependencies, but at least it will be
 checking things for us.

It is fine for OSSTest not use install-builddep. I think that the best
way for OSSTest to use Raisin would be to call raise in
ts-xen-build-prep to list the dependencies and install them from there.

But here we are testing Raisin itself so I think that in this context it
is actually important to run install-builddep. I don't think it would
cause any troubles to other builds happening in parallel because the
build dependencies of those builds would be installed by
ts-xen-build-prep beforehand, that I was told is run before any build
jobs?  If I am wrong, then we have a problem.

Otherwise is there a way to guarantee that only one raisin build happen
simultaneously?

Given that the series is already at v7, I would prefer to remove the
call to install-builddep, even though I think it is important, rather
than make substantial modifications. However, if I do remove the call to
install-builddep from ts-raisin-build, I would have to install any
missing build dependencies somewhere else. Where would that be? In
ts-xen-build-prep? Is that script even called for non-xen build jobs?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch V2] xen: release lock occasionally during ballooning

2015-07-20 Thread David Vrabel
On 20/07/15 12:49, Juergen Gross wrote:
 When dom0 is being ballooned balloon_process() will hold the balloon
 mutex until it is finished. This will block e.g. creation of new
 domains as the device backends for the new domain need some
 autoballooned pages for the ring buffers.
 
 Avoid this by releasing the balloon mutex from time to time during
 ballooning. Adjust the comment above balloon_process() regarding
 multiple instances of balloon_process().
 
 Instead of open coding it, just use cond_resched().

Applied to for-linus-4.2, thanks.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen: fix non-ANSI function declaration of function xen_has_pv_devices

2015-07-20 Thread David Vrabel
On 16/07/15 20:34, Colin King wrote:
 From: Colin Ian King colin.k...@canonical.com
 
 xen_has_pv_devices has no parameters, so use the normal void
 parameter convention to make it match the prototype in the
 header file include/xen/platform_pci.h

Applied to for-linus-4.3, thanks.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 1/7] x86/PCI: add config space abstract write intercept logic

2015-07-20 Thread Jan Beulich
This is to be used by MSI code, and later to also be hooked up to
MMCFG accesses by Dom0.

Signed-off-by: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -1108,6 +1108,12 @@ void pci_cleanup_msi(struct pci_dev *pde
 msi_free_irqs(pdev);
 }
 
+int pci_msi_conf_write_intercept(struct pci_dev *pdev, unsigned int reg,
+ unsigned int size, uint32_t *data)
+{
+return 0;
+}
+
 int pci_restore_msi_state(struct pci_dev *pdev)
 {
 unsigned long flags;
--- a/xen/arch/x86/pci.c
+++ b/xen/arch/x86/pci.c
@@ -67,3 +67,28 @@ void pci_conf_write(uint32_t cf8, uint8_
 
 spin_unlock_irqrestore(pci_config_lock, flags);
 }
+
+int pci_conf_write_intercept(unsigned int seg, unsigned int bdf,
+ unsigned int reg, unsigned int size,
+ uint32_t *data)
+{
+struct pci_dev *pdev;
+int rc = 0;
+
+/*
+ * Avoid expensive operations when no hook is going to do anything
+ * for the access anyway.
+ */
+if ( reg  64 || reg = 256 )
+return 0;
+
+spin_lock(pcidevs_lock);
+
+pdev = pci_get_pdev(seg, PCI_BUS(bdf), PCI_DEVFN2(bdf));
+if ( pdev )
+rc = pci_msi_conf_write_intercept(pdev, reg, size, data);
+
+spin_unlock(pcidevs_lock);
+
+return rc;
+}
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1777,8 +1777,8 @@ static bool_t admin_io_okay(unsigned int
 return ioports_access_permitted(d, port, port + bytes - 1);
 }
 
-static bool_t pci_cfg_ok(struct domain *currd, bool_t write,
- unsigned int start, unsigned int size)
+static bool_t pci_cfg_ok(struct domain *currd, unsigned int start,
+ unsigned int size, uint32_t *write)
 {
 uint32_t machine_bdf;
 
@@ -1810,8 +1810,12 @@ static bool_t pci_cfg_ok(struct domain *
 start |= CF8_ADDR_HI(currd-arch.pci_cf8);
 }
 
-return !xsm_pci_config_permission(XSM_HOOK, currd, machine_bdf,
-  start, start + size - 1, write);
+if ( xsm_pci_config_permission(XSM_HOOK, currd, machine_bdf,
+   start, start + size - 1, !!write) != 0 )
+ return 0;
+
+return !write ||
+   pci_conf_write_intercept(0, machine_bdf, start, size, write) = 0;
 }
 
 uint32_t guest_io_read(unsigned int port, unsigned int bytes,
@@ -1863,7 +1867,7 @@ uint32_t guest_io_read(unsigned int port
 size = min(bytes, 4 - (port  3));
 if ( size == 3 )
 size = 2;
-if ( pci_cfg_ok(currd, 0, port  3, size) )
+if ( pci_cfg_ok(currd, port  3, size, NULL) )
 sub_data = pci_conf_read(currd-arch.pci_cf8, port  3, size);
 }
 
@@ -1934,7 +1938,7 @@ void guest_io_write(unsigned int port, u
 size = min(bytes, 4 - (port  3));
 if ( size == 3 )
 size = 2;
-if ( pci_cfg_ok(currd, 1, port  3, size) )
+if ( pci_cfg_ok(currd, port  3, size, data) )
 pci_conf_write(currd-arch.pci_cf8, port  3, size, data);
 }
 
--- a/xen/include/asm-x86/pci.h
+++ b/xen/include/asm-x86/pci.h
@@ -15,4 +15,10 @@ struct arch_pci_dev {
 vmask_t used_vectors;
 };
 
+int pci_conf_write_intercept(unsigned int seg, unsigned int bdf,
+ unsigned int reg, unsigned int size,
+ uint32_t *data);
+int pci_msi_conf_write_intercept(struct pci_dev *, unsigned int reg,
+ unsigned int size, uint32_t *data);
+
 #endif /* __X86_PCI_H__ */



x86/PCI: add config space abstract write intercept logic

This is to be used by MSI code, and later to also be hooked up to
MMCFG accesses by Dom0.

Signed-off-by: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -1108,6 +1108,12 @@ void pci_cleanup_msi(struct pci_dev *pde
 msi_free_irqs(pdev);
 }
 
+int pci_msi_conf_write_intercept(struct pci_dev *pdev, unsigned int reg,
+ unsigned int size, uint32_t *data)
+{
+return 0;
+}
+
 int pci_restore_msi_state(struct pci_dev *pdev)
 {
 unsigned long flags;
--- a/xen/arch/x86/pci.c
+++ b/xen/arch/x86/pci.c
@@ -67,3 +67,28 @@ void pci_conf_write(uint32_t cf8, uint8_
 
 spin_unlock_irqrestore(pci_config_lock, flags);
 }
+
+int pci_conf_write_intercept(unsigned int seg, unsigned int bdf,
+ unsigned int reg, unsigned int size,
+ uint32_t *data)
+{
+struct pci_dev *pdev;
+int rc = 0;
+
+/*
+ * Avoid expensive operations when no hook is going to do anything
+ * for the access anyway.
+ */
+if ( reg  64 || reg = 256 )
+return 0;
+
+spin_lock(pcidevs_lock);
+
+pdev = pci_get_pdev(seg, PCI_BUS(bdf), PCI_DEVFN2(bdf));

Re: [Xen-devel] [PATCH v3] xen/blkfront: convert to blk-mq APIs

2015-07-20 Thread David Vrabel
On 13/07/15 10:55, Bob Liu wrote:
 Note: This patch is based on original work of Arianna's internship for
 GNOME's Outreach Program for Women.
 
 Only one hardware queue is used now, so there is no performance change.
 
 The legacy non-mq code is deleted completely which is the same as other
 drivers like virtio, mtip, and nvme.
 
 Also dropped one unnecessary holding of info-io_lock when calling
 blk_mq_stop_hw_queues().

Applied to for-linus-4.3, thanks.

 Changes in v2:
  - Reorganized blk_mq_queue_rq()
  - Restored most io_locks in place
 
 Change in v3:
  - Rename blk_mq_queue_rq to blkif_queue_rq

Next time, please put changes after a --- marker so they're
automatically dropped when applied.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 2/7] x86/MSI-X: track host and guest mask-all requests separately

2015-07-20 Thread Jan Beulich
Host uses of the bits will be added subsequently, and must not be
overridden by guests (including Dom0, namely when acting on behalf of
a guest).

Signed-off-by: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -843,6 +843,12 @@ static int msix_capability_init(struct p
 
 if ( !msix-used_entries )
 {
+msix-host_maskall = 0;
+if ( !msix-guest_maskall )
+control = ~PCI_MSIX_FLAGS_MASKALL;
+else
+control |= PCI_MSIX_FLAGS_MASKALL;
+
 if ( rangeset_add_range(mmio_ro_ranges, msix-table.first,
 msix-table.last) )
 WARN();
@@ -,6 +1117,34 @@ void pci_cleanup_msi(struct pci_dev *pde
 int pci_msi_conf_write_intercept(struct pci_dev *pdev, unsigned int reg,
  unsigned int size, uint32_t *data)
 {
+u16 seg = pdev-seg;
+u8 bus = pdev-bus;
+u8 slot = PCI_SLOT(pdev-devfn);
+u8 func = PCI_FUNC(pdev-devfn);
+struct msi_desc *entry;
+unsigned int pos;
+
+if ( pdev-msix )
+{
+entry = find_msi_entry(pdev, -1, PCI_CAP_ID_MSIX);
+pos = entry ? entry-msi_attrib.pos
+: pci_find_cap_offset(seg, bus, slot, func,
+  PCI_CAP_ID_MSIX);
+ASSERT(pos);
+
+if ( reg  pos || reg = msix_pba_offset_reg(pos) + 4 )
+return 0;
+
+if ( reg != msix_control_reg(pos) || size != 2 )
+return -EACCES;
+
+pdev-msix-guest_maskall = !!(*data  PCI_MSIX_FLAGS_MASKALL);
+if ( pdev-msix-host_maskall )
+*data |= PCI_MSIX_FLAGS_MASKALL;
+
+return 1;
+}
+
 return 0;
 }
 
--- a/xen/include/asm-x86/msi.h
+++ b/xen/include/asm-x86/msi.h
@@ -233,6 +233,7 @@ struct arch_msix {
 int table_refcnt[MAX_MSIX_TABLE_PAGES];
 int table_idx[MAX_MSIX_TABLE_PAGES];
 spinlock_t table_lock;
+bool_t host_maskall, guest_maskall;
 domid_t warned;
 };
 



x86/MSI-X: track host and guest mask-all requests separately

Host uses of the bits will be added subsequently, and must not be
overridden by guests (including Dom0, namely when acting on behalf of
a guest).

Signed-off-by: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -843,6 +843,12 @@ static int msix_capability_init(struct p
 
 if ( !msix-used_entries )
 {
+msix-host_maskall = 0;
+if ( !msix-guest_maskall )
+control = ~PCI_MSIX_FLAGS_MASKALL;
+else
+control |= PCI_MSIX_FLAGS_MASKALL;
+
 if ( rangeset_add_range(mmio_ro_ranges, msix-table.first,
 msix-table.last) )
 WARN();
@@ -,6 +1117,34 @@ void pci_cleanup_msi(struct pci_dev *pde
 int pci_msi_conf_write_intercept(struct pci_dev *pdev, unsigned int reg,
  unsigned int size, uint32_t *data)
 {
+u16 seg = pdev-seg;
+u8 bus = pdev-bus;
+u8 slot = PCI_SLOT(pdev-devfn);
+u8 func = PCI_FUNC(pdev-devfn);
+struct msi_desc *entry;
+unsigned int pos;
+
+if ( pdev-msix )
+{
+entry = find_msi_entry(pdev, -1, PCI_CAP_ID_MSIX);
+pos = entry ? entry-msi_attrib.pos
+: pci_find_cap_offset(seg, bus, slot, func,
+  PCI_CAP_ID_MSIX);
+ASSERT(pos);
+
+if ( reg  pos || reg = msix_pba_offset_reg(pos) + 4 )
+return 0;
+
+if ( reg != msix_control_reg(pos) || size != 2 )
+return -EACCES;
+
+pdev-msix-guest_maskall = !!(*data  PCI_MSIX_FLAGS_MASKALL);
+if ( pdev-msix-host_maskall )
+*data |= PCI_MSIX_FLAGS_MASKALL;
+
+return 1;
+}
+
 return 0;
 }
 
--- a/xen/include/asm-x86/msi.h
+++ b/xen/include/asm-x86/msi.h
@@ -233,6 +233,7 @@ struct arch_msix {
 int table_refcnt[MAX_MSIX_TABLE_PAGES];
 int table_idx[MAX_MSIX_TABLE_PAGES];
 spinlock_t table_lock;
+bool_t host_maskall, guest_maskall;
 domid_t warned;
 };
 
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 3/7] x86/MSI-X: be more careful during teardown

2015-07-20 Thread Jan Beulich
When a device gets detached from a guest, pciback will clear its
command register, thus disabling both memory and I/O decoding. The
disabled memory decoding, however, has an effect on the MSI-X table
accesses the hypervisor does: These won't have the intended effect
anymore. Even worse, for PCIe devices (but not SR-IOV virtual
functions) such accesses may (will?) be treated as Unsupported
Requests, causing respective errors to be surfaced, potentially in the
form of NMIs that may be fatal to the hypervisor or Dom0 is different
ways. Hence rather than carrying out these accesses, we should avoid
them where we can, and use alternative (e.g. PCI config space based)
mechanisms to achieve at least the same effect.

At this time it continues to be unclear whether this is fixing an
actual bug or is rather just working around bogus (but apparently
common) system behavior.

Signed-off-by: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
---
Backporting note (largely to myself):
   Depends on (not yet backported to 4.4 and earlier) commit 061eebe0e
   x86/MSI: drop workaround for insecure Dom0 kernels (due to re-use
   of struct arch_msix's warned field).

--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -217,9 +217,9 @@ void destroy_irq(unsigned int irq)
 }
 
 spin_lock_irqsave(desc-lock, flags);
-desc-status  |= IRQ_DISABLED;
 desc-status  = ~IRQ_GUEST;
 desc-handler-shutdown(desc);
+desc-status |= IRQ_DISABLED;
 action = desc-action;
 desc-action  = NULL;
 desc-msi_desc = NULL;
@@ -995,8 +995,8 @@ void __init release_irq(unsigned int irq
 spin_lock_irqsave(desc-lock,flags);
 action = desc-action;
 desc-action  = NULL;
-desc-status |= IRQ_DISABLED;
 desc-handler-shutdown(desc);
+desc-status |= IRQ_DISABLED;
 spin_unlock_irqrestore(desc-lock,flags);
 
 /* Wait to make sure it's not being used on another CPU */
@@ -1732,8 +1732,8 @@ static irq_guest_action_t *__pirq_guest_
 BUG_ON(action-in_flight != 0);
 
 /* Disabling IRQ before releasing the desc_lock avoids an IRQ storm. */
-desc-status |= IRQ_DISABLED;
 desc-handler-disable(desc);
+desc-status |= IRQ_DISABLED;
 
 /*
  * Mark any remaining pending EOIs as ready to flush.
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -123,6 +123,27 @@ static void msix_put_fixmap(struct arch_
 spin_unlock(msix-table_lock);
 }
 
+static bool_t memory_decoded(const struct pci_dev *dev)
+{
+u8 bus, slot, func;
+
+if ( !dev-info.is_virtfn )
+{
+bus = dev-bus;
+slot = PCI_SLOT(dev-devfn);
+func = PCI_FUNC(dev-devfn);
+}
+else
+{
+bus = dev-info.physfn.bus;
+slot = PCI_SLOT(dev-info.physfn.devfn);
+func = PCI_FUNC(dev-info.physfn.devfn);
+}
+
+return !!(pci_conf_read16(dev-seg, bus, slot, func, PCI_COMMAND) 
+  PCI_COMMAND_MEMORY);
+}
+
 /*
  * MSI message composition
  */
@@ -166,7 +187,7 @@ void msi_compose_msg(unsigned vector, co
 }
 }
 
-static void read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
+static bool_t read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 {
 switch ( entry-msi_attrib.type )
 {
@@ -201,6 +222,8 @@ static void read_msi_msg(struct msi_desc
 {
 void __iomem *base = entry-mask_base;
 
+if ( unlikely(!memory_decoded(entry-dev)) )
+return 0;
 msg-address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
 msg-address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
 msg-data = readl(base + PCI_MSIX_ENTRY_DATA_OFFSET);
@@ -212,6 +235,8 @@ static void read_msi_msg(struct msi_desc
 
 if ( iommu_intremap )
 iommu_read_msi_from_ire(entry, msg);
+
+return 1;
 }
 
 static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
@@ -262,6 +287,8 @@ static int write_msi_msg(struct msi_desc
 {
 void __iomem *base = entry-mask_base;
 
+if ( unlikely(!memory_decoded(entry-dev)) )
+return -ENXIO;
 writel(msg-address_lo,
base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
 writel(msg-address_hi,
@@ -289,7 +316,8 @@ void set_msi_affinity(struct irq_desc *d
 ASSERT(spin_is_locked(desc-lock));
 
 memset(msg, 0, sizeof(msg));
-read_msi_msg(msi_desc, msg);
+if ( !read_msi_msg(msi_desc, msg) )
+return;
 
 msg.data = ~MSI_DATA_VECTOR_MASK;
 msg.data |= MSI_DATA_VECTOR(desc-arch.vector);
@@ -349,23 +377,27 @@ int msi_maskable_irq(const struct msi_de
|| entry-msi_attrib.maskbit;
 }
 
-static void msi_set_mask_bit(struct irq_desc *desc, bool_t host, bool_t guest)
+static bool_t msi_set_mask_bit(struct irq_desc *desc, bool_t host, bool_t 
guest)
 {
 struct msi_desc *entry = desc-msi_desc;
+struct pci_dev *pdev;
+u16 seg;
+u8 bus, slot, func;
 bool_t flag = host || guest;
 
 ASSERT(spin_is_locked(desc-lock));
 BUG_ON(!entry 

[Xen-devel] [PATCH v5 4/7] x86/MSI-X: access MSI-X table only after having enabled MSI-X

2015-07-20 Thread Jan Beulich
As done in Linux by f598282f51 (PCI: Fix the NIU MSI-X problem in a
better way) and its broken predecessor, make sure we don't access the
MSI-X table without having enabled MSI-X first, using the mask-all flag
instead to prevent interrupts from occurring.

Signed-off-by: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -144,6 +144,17 @@ static bool_t memory_decoded(const struc
   PCI_COMMAND_MEMORY);
 }
 
+static bool_t msix_memory_decoded(const struct pci_dev *dev, unsigned int pos)
+{
+u16 control = pci_conf_read16(dev-seg, dev-bus, PCI_SLOT(dev-devfn),
+  PCI_FUNC(dev-devfn), msix_control_reg(pos));
+
+if ( !(control  PCI_MSIX_FLAGS_ENABLE) )
+return 0;
+
+return memory_decoded(dev);
+}
+
 /*
  * MSI message composition
  */
@@ -222,7 +233,8 @@ static bool_t read_msi_msg(struct msi_de
 {
 void __iomem *base = entry-mask_base;
 
-if ( unlikely(!memory_decoded(entry-dev)) )
+if ( unlikely(!msix_memory_decoded(entry-dev,
+   entry-msi_attrib.pos)) )
 return 0;
 msg-address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
 msg-address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
@@ -287,7 +299,8 @@ static int write_msi_msg(struct msi_desc
 {
 void __iomem *base = entry-mask_base;
 
-if ( unlikely(!memory_decoded(entry-dev)) )
+if ( unlikely(!msix_memory_decoded(entry-dev,
+   entry-msi_attrib.pos)) )
 return -ENXIO;
 writel(msg-address_lo,
base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
@@ -381,9 +394,9 @@ static bool_t msi_set_mask_bit(struct ir
 {
 struct msi_desc *entry = desc-msi_desc;
 struct pci_dev *pdev;
-u16 seg;
+u16 seg, control;
 u8 bus, slot, func;
-bool_t flag = host || guest;
+bool_t flag = host || guest, maskall;
 
 ASSERT(spin_is_locked(desc-lock));
 BUG_ON(!entry || !entry-dev);
@@ -406,36 +419,45 @@ static bool_t msi_set_mask_bit(struct ir
 }
 break;
 case PCI_CAP_ID_MSIX:
+maskall = pdev-msix-host_maskall;
+control = pci_conf_read16(seg, bus, slot, func,
+  msix_control_reg(entry-msi_attrib.pos));
+if ( unlikely(!(control  PCI_MSIX_FLAGS_ENABLE)) )
+{
+pdev-msix-host_maskall = 1;
+pci_conf_write16(seg, bus, slot, func,
+ msix_control_reg(entry-msi_attrib.pos),
+ control | (PCI_MSIX_FLAGS_ENABLE |
+PCI_MSIX_FLAGS_MASKALL));
+}
 if ( likely(memory_decoded(pdev)) )
 {
 writel(flag, entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
 readl(entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
-break;
+if ( likely(control  PCI_MSIX_FLAGS_ENABLE) )
+break;
+flag = 1;
 }
-if ( flag )
+else if ( flag  !(control  PCI_MSIX_FLAGS_MASKALL) )
 {
-u16 control;
 domid_t domid = pdev-domain-domain_id;
 
-pdev-msix-host_maskall = 1;
-control = pci_conf_read16(seg, bus, slot, func,
-  msix_control_reg(entry-msi_attrib.pos));
-if ( control  PCI_MSIX_FLAGS_MASKALL )
-break;
-pci_conf_write16(seg, bus, slot, func,
- msix_control_reg(entry-msi_attrib.pos),
- control | PCI_MSIX_FLAGS_MASKALL);
+maskall = 1;
 if ( pdev-msix-warned != domid )
 {
 pdev-msix-warned = domid;
 printk(XENLOG_G_WARNING
-   cannot mask IRQ %d: masked MSI-X on Dom%d's 
%04x:%02x:%02x.%u\n,
+   cannot mask IRQ %d: masking MSI-X on Dom%d's 
%04x:%02x:%02x.%u\n,
desc-irq, domid, pdev-seg, pdev-bus,
PCI_SLOT(pdev-devfn), PCI_FUNC(pdev-devfn));
 }
-break;
 }
-/* fall through */
+pdev-msix-host_maskall = maskall;
+if ( maskall || pdev-msix-guest_maskall )
+control |= PCI_MSIX_FLAGS_MASKALL;
+pci_conf_write16(seg, bus, slot, func,
+ msix_control_reg(entry-msi_attrib.pos), control);
+return flag;
 default:
 return 0;
 }
@@ -461,7 +483,8 @@ static int msi_get_mask_bit(const struct
 entry-msi.mpos) 
 entry-msi_attrib.entry_nr)  1;
 case PCI_CAP_ID_MSIX:
-if ( unlikely(!memory_decoded(entry-dev)) )
+if ( unlikely(!msix_memory_decoded(entry-dev,
+   entry-msi_attrib.pos)) 

[Xen-devel] [PATCH v5 6/7] x86/MSI: properly track guest masking requests

2015-07-20 Thread Jan Beulich
... by monitoring writes to the mask register.

This allows reverting the main effect of the XSA-129 patches in qemu.

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -1303,6 +1303,37 @@ int pci_msi_conf_write_intercept(struct 
 return 1;
 }
 
+entry = find_msi_entry(pdev, -1, PCI_CAP_ID_MSI);
+if ( entry  entry-msi_attrib.maskbit )
+{
+uint16_t cntl;
+uint32_t unused;
+
+pos = entry-msi_attrib.pos;
+if ( reg  pos || reg = entry-msi.mpos + 8 )
+return 0;
+
+if ( reg == msi_control_reg(pos) )
+return size == 2 ? 1 : -EACCES;
+if ( reg  entry-msi.mpos || reg = entry-msi.mpos + 4 || size != 4 )
+return -EACCES;
+
+cntl = pci_conf_read16(seg, bus, slot, func, msi_control_reg(pos));
+unused = ~(uint32_t)0  (32 - multi_msi_capable(cntl));
+for ( pos = 0; pos  entry-msi.nvec; ++pos, ++entry )
+{
+entry-msi_attrib.guest_masked =
+*data  entry-msi_attrib.entry_nr;
+if ( entry-msi_attrib.host_masked )
+*data |= 1  pos;
+unused = ~(1  pos);
+}
+
+*data |= unused;
+
+return 1;
+}
+
 return 0;
 }
 



x86/MSI: properly track guest masking requests

... by monitoring writes to the mask register.

This allows reverting the main effect of the XSA-129 patches in qemu.

Signed-off-by: Jan Beulich jbeul...@suse.com

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -1303,6 +1303,37 @@ int pci_msi_conf_write_intercept(struct 
 return 1;
 }
 
+entry = find_msi_entry(pdev, -1, PCI_CAP_ID_MSI);
+if ( entry  entry-msi_attrib.maskbit )
+{
+uint16_t cntl;
+uint32_t unused;
+
+pos = entry-msi_attrib.pos;
+if ( reg  pos || reg = entry-msi.mpos + 8 )
+return 0;
+
+if ( reg == msi_control_reg(pos) )
+return size == 2 ? 1 : -EACCES;
+if ( reg  entry-msi.mpos || reg = entry-msi.mpos + 4 || size != 4 )
+return -EACCES;
+
+cntl = pci_conf_read16(seg, bus, slot, func, msi_control_reg(pos));
+unused = ~(uint32_t)0  (32 - multi_msi_capable(cntl));
+for ( pos = 0; pos  entry-msi.nvec; ++pos, ++entry )
+{
+entry-msi_attrib.guest_masked =
+*data  entry-msi_attrib.entry_nr;
+if ( entry-msi_attrib.host_masked )
+*data |= 1  pos;
+unused = ~(1  pos);
+}
+
+*data |= unused;
+
+return 1;
+}
+
 return 0;
 }
 
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 5/7] x86/MSI-X: reduce fiddling with control register during restore

2015-07-20 Thread Jan Beulich
Rather than disabling and enabling MSI-X once per vector, do it just
once per device.

Signed-off-by: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -1314,6 +1314,9 @@ int pci_restore_msi_state(struct pci_dev
 struct msi_desc *entry, *tmp;
 struct irq_desc *desc;
 struct msi_msg msg;
+u8 slot = PCI_SLOT(pdev-devfn), func = PCI_FUNC(pdev-devfn);
+unsigned int type = 0, pos = 0;
+u16 control = 0;
 
 ASSERT(spin_is_locked(pcidevs_lock));
 
@@ -1332,8 +1335,6 @@ int pci_restore_msi_state(struct pci_dev
 list_for_each_entry_safe( entry, tmp, pdev-msi_list, list )
 {
 unsigned int i = 0, nr = 1;
-u16 control = 0;
-u8 slot = PCI_SLOT(pdev-devfn), func = PCI_FUNC(pdev-devfn);
 
 irq = entry-irq;
 desc = irq_desc[irq];
@@ -1350,31 +1351,38 @@ int pci_restore_msi_state(struct pci_dev
 pdev-seg, pdev-bus, PCI_SLOT(pdev-devfn),
 PCI_FUNC(pdev-devfn), i);
 spin_unlock_irqrestore(desc-lock, flags);
+if ( type == PCI_CAP_ID_MSIX )
+pci_conf_write16(pdev-seg, pdev-bus, slot, func,
+ msix_control_reg(pos),
+ control  ~PCI_MSIX_FLAGS_ENABLE);
 return -EINVAL;
 }
 
+ASSERT(!type || type == entry-msi_attrib.type);
+pos = entry-msi_attrib.pos;
 if ( entry-msi_attrib.type == PCI_CAP_ID_MSI )
 {
 msi_set_enable(pdev, 0);
 nr = entry-msi.nvec;
 }
-else if ( entry-msi_attrib.type == PCI_CAP_ID_MSIX )
+else if ( !type  entry-msi_attrib.type == PCI_CAP_ID_MSIX )
 {
 control = pci_conf_read16(pdev-seg, pdev-bus, slot, func,
-  msix_control_reg(entry-msi_attrib.pos));
+  msix_control_reg(pos));
 pci_conf_write16(pdev-seg, pdev-bus, slot, func,
- msix_control_reg(entry-msi_attrib.pos),
+ msix_control_reg(pos),
  control | (PCI_MSIX_FLAGS_ENABLE |
 PCI_MSIX_FLAGS_MASKALL));
 if ( unlikely(!memory_decoded(pdev)) )
 {
 spin_unlock_irqrestore(desc-lock, flags);
 pci_conf_write16(pdev-seg, pdev-bus, slot, func,
- msix_control_reg(entry-msi_attrib.pos),
+ msix_control_reg(pos),
  control  ~PCI_MSIX_FLAGS_ENABLE);
 return -ENXIO;
 }
 }
+type = entry-msi_attrib.type;
 
 msg = entry-msg;
 write_msi_msg(entry, msg);
@@ -1398,9 +1406,9 @@ int pci_restore_msi_state(struct pci_dev
 
 spin_unlock_irqrestore(desc-lock, flags);
 
-if ( entry-msi_attrib.type == PCI_CAP_ID_MSI )
+if ( type == PCI_CAP_ID_MSI )
 {
-unsigned int cpos = msi_control_reg(entry-msi_attrib.pos);
+unsigned int cpos = msi_control_reg(pos);
 
 control = pci_conf_read16(pdev-seg, pdev-bus, slot, func, cpos) 
   ~PCI_MSI_FLAGS_QSIZE;
@@ -1410,12 +1418,13 @@ int pci_restore_msi_state(struct pci_dev
 
 msi_set_enable(pdev, 1);
 }
-else if ( entry-msi_attrib.type == PCI_CAP_ID_MSIX )
-pci_conf_write16(pdev-seg, pdev-bus, slot, func,
- msix_control_reg(entry-msi_attrib.pos),
- control | PCI_MSIX_FLAGS_ENABLE);
 }
 
+if ( type == PCI_CAP_ID_MSIX )
+pci_conf_write16(pdev-seg, pdev-bus, slot, func,
+ msix_control_reg(pos),
+ control | PCI_MSIX_FLAGS_ENABLE);
+
 return 0;
 }
 


x86/MSI-X: reduce fiddling with control register during restore

Rather than disabling and enabling MSI-X once per vector, do it just
once per device.

Signed-off-by: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -1314,6 +1314,9 @@ int pci_restore_msi_state(struct pci_dev
 struct msi_desc *entry, *tmp;
 struct irq_desc *desc;
 struct msi_msg msg;
+u8 slot = PCI_SLOT(pdev-devfn), func = PCI_FUNC(pdev-devfn);
+unsigned int type = 0, pos = 0;
+u16 control = 0;
 
 ASSERT(spin_is_locked(pcidevs_lock));
 
@@ -1332,8 +1335,6 @@ int pci_restore_msi_state(struct pci_dev
 list_for_each_entry_safe( entry, tmp, pdev-msi_list, list )
 {
 unsigned int i = 0, nr = 1;
-u16 control = 0;
-u8 slot = PCI_SLOT(pdev-devfn), func = PCI_FUNC(pdev-devfn);
 
 irq = entry-irq;
 desc = irq_desc[irq];
@@ -1350,31 +1351,38 @@ int pci_restore_msi_state(struct pci_dev
 pdev-seg, 

Re: [Xen-devel] [PATCH for-4.6 v2] tools/hotplug: Add an initscript to start xl devd in a driver domain

2015-07-20 Thread Ian Campbell
On Mon, 2015-07-20 at 16:16 +0200, Roger Pau Monné wrote:
 El 16/07/15 a les 18.58, Ian Campbell ha escrit:
  The removal of the udev rules highlighted that although it has been
  replaced by xl devd there isn't an initscript to replace it.
  
  To enable this add a --pidfile option to xl devd.
  
  Tested on Linux by running the script in dom0 and checking the daemon
  was started/stopped, but not in an actual driver domain environment
  since I don't have one conveniently available. I also checked that
  running without the --pidfile option still works.
  
  Scripts mainly cribbed from the xencommons for each platform.
  
  Signed-off-by: Ian Campbell ian.campb...@citrix.com
  Cc: Roger Pau Monné roger@citrix.com
  Cc: Konrad Rzeszutek Wilk kon...@darnok.org
  Cc: George Dunlap george.dun...@eu.citrix.com
  ---
  For 4.6: I think having removed the udev rules we ought to include
  this, it's basically zero risk to normal operation.
  
  Nothing seems to start xl devd for domain 0 in *BSD, nor is xenbackend
  started. Is that correct?
  
  v2: Check in Linux script that we don't start on dom0. I don't know
  the equivalent for *BSD.
 
 At least on FreeBSD there's no easy way to know if we are running on
 Dom0 from userspace (unless we start using hypercalls with privcmd).
 
 I could add a dummy device (let's say /dev/xen/control) that's only
 present if we are running on Dom0, does that sound reasonable?

FWIW on Linux there is a capabilities device (actually procfs thing,
close enough) which contains control_d in dom0. Not sure if that is
helpful to follow?

 Another option would be to add a small C program that can use hypercalls
 and privcmd to detect if we are running on Dom0 or not, but that seems
 over engineering it.

Agreed.

 [...]
 
  diff --git a/tools/hotplug/FreeBSD/rc.d/xendriverdomain.in 
  b/tools/hotplug/FreeBSD/rc.d/xendriverdomain.in
  new file mode 100644
  index 000..25e3edd
  --- /dev/null
  +++ b/tools/hotplug/FreeBSD/rc.d/xendriverdomain.in
  @@ -0,0 +1,48 @@
  +#!/bin/sh
  +#
  +# PROVIDE: xendriverdomain
  +# REQUIRE: DAEMON
  +#
  +# Should be run in a driver domain, but not in domain 0.
  +
  +. /etc/rc.subr
  +
  +. /etc/xen/scripts/hotplugpath.sh
 
 This should have been:
 
 . @XEN_SCRIPT_DIR@/hotplugpath.sh
 
 I will send a patch ASAP to fix it.

Sorry, and please do, thanks.

 [...]

  +if ( close(fd)  0 ) {
 
 pedantic
 The line above doesn't follow libxl coding style.
 /pedantic

Damn, sorry.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 01/23] x86/boot: remove unneeded instruction

2015-07-20 Thread Daniel Kiper
Signed-off-by: Daniel Kiper daniel.ki...@oracle.com
---
 xen/arch/x86/boot/head.S |1 -
 1 file changed, 1 deletion(-)

diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index cfd59dc..f63b349 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -169,7 +169,6 @@ __start:
 /* Apply relocations to bootstrap trampoline. */
 mov sym_phys(trampoline_phys),%edx
 mov $sym_phys(__trampoline_rel_start),%edi
-mov %edx,sym_phys(trampoline_phys)
 1:
 mov (%edi),%eax
 add %edx,(%edi,%eax)
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 00/23] x86: multiboot2 protocol support

2015-07-20 Thread Daniel Kiper
Hi,

I am sending, long awaited, second version of multiboot2 protocol
support for legacy BIOS and EFI platforms. It fixes all major issues
discovered until now. There are still some minor problems which should
be fixed in one way or another. I will address them in next releases.

This series, in general, is not targeted to Xen 4.6. However, there are
some fixes at the beginning of it which are worth considering, I think.

The final goal is xen.efi binary file which could be loaded by EFI
loader, multiboot (v1) protocol (only on legacy BIOS platforms) and
multiboot2 protocol. This way we will have:
  - smaller Xen code base,
  - one code base for xen.gz and xen.efi,
  - one build method for xen.gz and xen.efi;
xen.efi will be extracted from xen file
using objcopy; PE header will be contained
in ELF file and will precede Xen code,
  - xen.efi build will not so strongly depend
on a given GCC and binutils version.

ARM guys should check at least patches #9 - #18 and #20. In general
earlier mentioned patches touches common EFI code but they are not
change functionality significantly.

GRUB2 patch series will follow this patch series.

GRUB2 guys should check patches #20 and #23 but I am sending to you
all Xen related patches just in case.

If you are not interested in this patch series at all please drop me
a line and I will remove you from distribution list.

Daniel

 .gitignore|5 +-
 xen/arch/x86/Makefile |   21 ++--
 xen/arch/x86/Rules.mk |4 +
 xen/arch/x86/boot/Makefile|   10 +-
 xen/arch/x86/boot/build32.mk  |4 +-
 xen/arch/x86/boot/cmdline.S   |  367 
---
 xen/arch/x86/boot/cmdline.c   |  396 

 xen/arch/x86/boot/edd.S   |3 -
 xen/arch/x86/boot/head.S  |  474 
--
 xen/arch/x86/boot/reloc.c |  242 
+++---
 xen/arch/x86/boot/trampoline.S|   25 -
 xen/arch/x86/boot/video.S |6 --
 xen/arch/x86/boot/wakeup.S|6 +-
 xen/arch/x86/boot/x86_64.S|   34 +++---
 xen/arch/x86/dmi_scan.c   |4 +-
 xen/arch/x86/domain_page.c|2 +-
 xen/arch/x86/efi/Makefile |   16 +--
 xen/arch/x86/efi/efi-boot.h   |   68 ++--
 xen/arch/x86/efi/stub.c   |   16 ++-
 xen/arch/x86/mm.c |3 +-
 xen/arch/x86/mpparse.c|4 +-
 xen/arch/x86/setup.c  |   50 -
 xen/arch/x86/shutdown.c   |2 +-
 xen/arch/x86/time.c   |2 +-
 xen/arch/x86/x86_64/asm-offsets.c |   10 ++
 xen/arch/x86/x86_64/mm.c  |2 +-
 xen/arch/x86/xen.lds.S|6 +-
 xen/common/efi/boot.c |  461 
++-
 xen/common/efi/runtime.c  |   23 ++--
 xen/drivers/acpi/osl.c|2 +-
 xen/include/asm-x86/config.h  |3 +
 xen/include/asm-x86/page.h|2 +-
 xen/include/xen/efi.h |   17 ++-
 xen/include/xen/multiboot2.h  |  182 
 34 files changed, 1709 insertions(+), 763 deletions(-)

Daniel Kiper (23):
  x86/boot: remove unneeded instruction
  x86/boot: copy only text section from *.lnk file to *.bin file
  x86: zero BSS using stosl instead of stosb
  x86/boot: call reloc() using cdecl calling convention
  x86/boot/reloc: create generic alloc and copy functions
  x86/boot: use %ecx instead of %eax
  x86/boot/reloc: Rename some variables and rearrange code a bit
  x86: add multiboot2 protocol support
  efi: create efi_enabled()
  efi: build xen.gz with EFI code
  efi: split out efi_init()
  efi: split out efi_console_set_mode()
  efi: split out efi_get_gop()
  efi: split out efi_find_gop_mode()
  efi: split out efi_tables()
  efi: split out efi_variables()
  efi: split out efi_set_gop_mode()
  efi: split out efi_exit_boot()
  x86/efi: create new early memory allocator
  x86: add multiboot2 protocol support for EFI platforms
  x86/boot: implement early command line parser in C
  x86: make Xen early boot code relocatable
  x86: add multiboot2 protocol support for relocatable images


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 04/23] x86/boot: call reloc() using cdecl calling convention

2015-07-20 Thread Daniel Kiper
Suggested-by: Jan Beulich jbeul...@suse.com
Signed-off-by: Daniel Kiper daniel.ki...@oracle.com
---
 xen/arch/x86/boot/head.S  |4 +++-
 xen/arch/x86/boot/reloc.c |   20 
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index ed42782..3cbb2e6 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -119,8 +119,10 @@ __start:
 
 /* Save the Multiboot info struct (after relocation) for later use. */
 mov $sym_phys(cpu0_stack)+1024,%esp
-push%ebx
+push%ebx/* Multiboot information address. */
+push%eax/* Boot trampoline address. */
 callreloc
+add $8,%esp /* Remove reloc() args from stack. */
 mov %eax,sym_phys(multiboot_ptr)
 
 /* Initialize BSS (no nasty surprises!). */
diff --git a/xen/arch/x86/boot/reloc.c b/xen/arch/x86/boot/reloc.c
index 63045c0..708898f 100644
--- a/xen/arch/x86/boot/reloc.c
+++ b/xen/arch/x86/boot/reloc.c
@@ -10,15 +10,27 @@
  *Keir Fraser k...@xen.org
  */
 
-/* entered with %eax = BOOT_TRAMPOLINE */
+/*
+ * This entry point is entered from xen/arch/x86/boot/head.S with:
+ *   - 0x4(%esp) = BOOT_TRAMPOLINE_ADDRESS,
+ *   - 0x8(%esp) = MULTIBOOT_INFORMATION_ADDRESS.
+ */
 asm (
 .text \n
 .globl _start \n
 _start:   \n
+push %ebp \n
+mov  %esp,%ebp\n
 call 1f   \n
-1:  pop  %ebx \n
-mov  %eax,alloc-1b(%ebx)  \n
-jmp  reloc\n
+1:  pop  %ecx \n
+mov  0x8(%ebp),%eax   \n
+mov  %eax,alloc-1b(%ecx)  \n
+mov  0xc(%ebp),%eax   \n
+push %eax \n
+call reloc\n
+add  $4,%esp  \n
+pop  %ebp \n
+ret   \n
 );
 
 /*
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 03/23] x86: zero BSS using stosl instead of stosb

2015-07-20 Thread Daniel Kiper
Additionally, align relevant comment to coding style.

Suggested-by: Andrew Cooper andrew.coop...@citrix.com
Signed-off-by: Daniel Kiper daniel.ki...@oracle.com
---
 xen/arch/x86/boot/head.S |5 +++--
 xen/arch/x86/xen.lds.S   |2 ++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index f63b349..ed42782 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -123,12 +123,13 @@ __start:
 callreloc
 mov %eax,sym_phys(multiboot_ptr)
 
-/* Initialize BSS (no nasty surprises!) */
+/* Initialize BSS (no nasty surprises!). */
 mov $sym_phys(__bss_start),%edi
 mov $sym_phys(__bss_end),%ecx
 sub %edi,%ecx
+shr $2,%ecx
 xor %eax,%eax
-rep stosb
+rep stosl
 
 /* Interrogate CPU extended features via CPUID. */
 mov $0x8000,%eax
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 6553cff..3e1f2af 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -162,6 +162,7 @@ SECTIONS
   . = ALIGN(STACK_SIZE);
   __init_end = .;
 
+  . = ALIGN(4);
   .bss : { /* BSS */
__bss_start = .;
*(.bss.stack_aligned)
@@ -175,6 +176,7 @@ SECTIONS
*(.bss.percpu.read_mostly)
. = ALIGN(SMP_CACHE_BYTES);
__per_cpu_data_end = .;
+   . = ALIGN(4);
__bss_end = .;
   } :text
   _end = . ;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


  1   2   3   >