Re: [Xen-devel] Request a freeze exception for COLO in v4.6
On 07/17/2015 06:18 PM, Ian Jackson wrote: Wei Liu writes (Re: [Xen-devel] Request a freeze exception for COLO in v4.6): Ian and Ian, anything to add? What are your opinions? I think COLO is an exciting and important feature. Unfortunately I agree with Wei that at this stage taking both series as they currently stand into 4.6 is not tenable. (Perhaps if I had been able to help earlier in the release cycle this would have been different; so for that I'm sorry.) I would like to see at least the early parts of `COLOPre' in staging as soon as possible. There is much preparatory work there which would be annoying to rebase around, and which has collateral benefits. Thanks, if you are going to merge some of the patches, there is the latest version of `COLOPre' patchset: https://github.com/macrosheep/xen/tree/colo-v9 The latest 26 commits. I think all patches are ready to be merged except: 7 libxl/remus: init checkpoint_callback in Remus checkpoint callback 11 tools/libxc: support to resume uncooperative HVM guests 20 tools/libx{l,c}: add back channel to libxc If I can take one example, 11/25 tools/libxc: support to resume uncooperative HVM guests. Based on my current understanding this is even a bugfix. Sadly it is not quite ready (or at least, wasn't last night). But with a few more days we can probably get much of this cleanup and preparatory work in. Assuming nothing goes horribly wrong, I am very optimistic about COLO making it upstream soon, just not in 4.6. Thanks, Ian. . -- Thanks, Yang. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v10][PATCH 00/16] Fix RMRR
v10: * Patch #6: hvmloader/pci: Try to avoid placing BARs in RMRRs This is from George' draft patch which implements an acceptable solution in current cycle. Here I just implemented check_overlap_all() and some cleanups. * Patch #7: hvmloader/e820: construct guest e820 table Instead of correcting e820, I'd like to correct memory_map.map[] and then copy them into e820 directly. I think this can make sure hvm_info, memory_map.map[] and e820 are on the same page. v9: * Patch #3: xen/passthrough: extend hypercall to support rdm reservation policy Correct one check condition of XEN_DOMCTL_DEV_RDM_RELAXED * Patch #5: hvmloader: get guest memory map into memory_map[] Correct the patch head description: [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] - [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END); Merge two if{} as one if{}; * Patch #6: hvmloader/pci: disable all pci devices conflicting with rdm A little improvement to code implementation but again, its still argued about this solution. Myself prefer to take a look at v7 if possible. * Patch #7: hvmloader/e820: construct guest e820 table Refine that chunk of codes to check/modify highmem * Patch #15: xen/vtd: prevent from assign the device with shared rmrr Correct one indentation issue v8: * Patch #3: xen/passthrough: extend hypercall to support rdm reservation policy Force to pass 0(strict) when add or move a device in hardware domain, and improve some associated code comments. * Patch #5: hvmloader: get guest memory map into memory_map[] Actually we should check this range started from RESERVED_MEMORY_DYNAMIC_START, not RESERVED_MEMORY_DYNAMIC_START - 1. So correct this and sync the patch head description. * Patch #6: hvmloader/pci: disable all pci devices conflicting We have a big change to this patch: Based on current discussion its hard to reshape the original mmio allocation mechanism but we haven't a good and simple way to in short term. So instead, we don't bring more complicated to intervene that process but still check any conflicts to disable all associated devices. I know this is still argumented but I'd like to discuss this based on this revision and thanks for your time. * Patch #7: hvmloader/e820: construct guest e820 table define low_mem_end as uint32_t; Correct those two wrong loops, memory_map.nr_map - nr when we're trying to revise low/high memory e820 entries; Improve code comments and the patch head description; Add one check if highmem is just populated by hvmloader itself * Patch #11: tools/libxl: detect and avoid conflicts with RDM Introduce pfn_to_paddr(x) - ((uint64_t)x XC_PAGE_SHIFT) and set_rdm_entries() to factor out current codes. * Patch #13: libxl: construct e820 map with RDM information for HVM guest make that core construction function as arch-specific to make sure we don't break ARM at this point. * Patch #15: xen/vtd: prevent from assign the device with shared rmrr Merge two if{} as one if{}; Add to print RMRR range info when stop assign a group device * Some minimal code style changes v7: * Need to rename some parameters: In the xl rdm config parsing, `reserve=' should be `policy='. In the xl pci config parsing, `rdm_reserve=' should be `rdm_policy='. The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'. The field name `reserve' in `libxl_rdm_reserve' should be `policy'. * Just sync with the fallout of renaming parameters above. Note I also mask patch #10 Acked by Wei Liu, Ian Jackson and Ian Campbell. ( If I'm wrong just let me know at this point. ) And as we discussed I'd further improve something as next step after this round of review. v6: * Inside patch #01, add a comments to the nr_entries field inside xen_reserved_device_memory_map. Note this is from Jan. * Inside patch #10, we need rename something to make our policy reasonable type - strategy none - ignore and based on our discussion, we won't expose ignore in xl level and just keep that as a default, and then sync docs and the patch head description * Inside patch #10, we fix some code stypes and especially we refine libxl__xc_device_get_rdm() * Inside patch #16, we need to sync those renames introduced by patch #10. v5: * Fold our original patch #2 and #3 as this new, and here introduce a new, clear_identity_p2m_entry, which can wrapper guest_physmap_remove_page(). And we use this to clean our identity mapping. * Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our policy flag, so now 0 means strict and 1 means relaxed, and also make DT device ignore the flag field simply. And then correct all associated code comments. * Just make sure the per-device plicy always override the global policy, and so cleanup some associated comments and the patch head description. * Improve some descriptions in doc. * Make all rdm variables specific to .hvm * Inside patch #6, we're trying to rename
Re: [Xen-devel] [PATCH v5 05/15] x86/altp2m: basic data structures and support routines.
On 18.07.15 at 00:36, ravi.sah...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, July 16, 2015 2:08 AM On 16.07.15 at 10:57, ravi.sah...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Tuesday, July 14, 2015 6:13 AM On 14.07.15 at 02:14, edmund.h.wh...@intel.com wrote: @@ -722,6 +731,27 @@ void nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, l1_pgentry_t *p, l1_pgentry_t new, unsigned int level); /* + * Alternate p2m: shadow p2m tables used for alternate memory views + */ + +/* get current alternate p2m table */ static inline struct +p2m_domain *p2m_get_altp2m(struct vcpu *v) { +struct domain *d = v-domain; +uint16_t index = vcpu_altp2m(v).p2midx; + +ASSERT(index MAX_ALTP2M); + +return (index == INVALID_ALTP2M) ? NULL : +d-arch.altp2m_p2m[index]; } Looking at this again, I'm afraid I'd still prefer index MAX_ALTP2M in the return statement (and the ASSERT() dropped): The ASSERT() does nothing in a debug=n build, and hence wouldn't shield us from possibly having to issue an XSA if somehow an access outside the array's bounds turned out possible. the assert was breaking v5 anyway. BUG_ON (with the right check) is probably the right thing to do, as we do in the exit handling that checks for a VMFUNC having changed the index. So will make that change. But why use a BUG_ON() when you can deal with this more gracefully? Please try to avoid crashing the hypervisor when there are other ways to recover. So in this case there isnt a graceful fallback; this case can happen only if there is a bug in the hypervisor - which should be reported via the BUG_ON. Generally (as an example), if a hypervisor bug can be confined to a guest, killing just the guest instead of the hypervisor would still be preferred (allowing the admin to attempt to gracefully shut down other guests before updating/restarting). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 06/15] VMX/altp2m: add code to support EPTP switching and #VE.
On 17.07.15 at 23:08, ravi.sah...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, July 16, 2015 2:39 AM On 16.07.15 at 11:20, ravi.sah...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Tuesday, July 14, 2015 6:57 AM On 14.07.15 at 02:14, edmund.h.wh...@intel.com wrote: +static bool_t vmx_vcpu_emulate_ve(struct vcpu *v) { +bool_t rc = 0; +ve_info_t *veinfo = gfn_x(vcpu_altp2m(v).veinfo_gfn) != +INVALID_GFN ? +hvm_map_guest_frame_rw(gfn_x(vcpu_altp2m(v).veinfo_gfn), 0) : +NULL; + +if ( !veinfo ) +return 0; + +if ( veinfo-semaphore != 0 ) +goto out; + +rc = 1; + +veinfo-exit_reason = EXIT_REASON_EPT_VIOLATION; +veinfo-semaphore = ~0l; Isn't semaphore a 32-bit quantity? Yes. I.e. the l suffix can and should be dropped. Ok. +{ +unsigned long idx; + +if ( v-arch.hvm_vmx.secondary_exec_control +SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS ) +__vmread(EPTP_INDEX, idx); +else +{ +unsigned long eptp; + +__vmread(EPT_POINTER, eptp); + +if ( (idx = p2m_find_altp2m_by_eptp(v-domain, eptp)) == + INVALID_ALTP2M ) +{ +gdprintk(XENLOG_ERR, EPTP not found in alternate p2m list\n); +domain_crash(v-domain); +} +} + +if ( (uint16_t)idx != vcpu_altp2m(v).p2midx ) Is this cast really necessary? Yes - The index is 16-bits, this reflects how the field is specified in the vmcs also. While yes answers the question, the explanation you give suggests that the answer may be wrong: Can idx indeed have bits set beyond bit 15? Because if it can't, the cast is pointless. We were just trying to ensure we matched the hardware behavior (I think there was a message George had posted earlier for SVE that asked for that). Since hardware considers only a 16 bit field we were doing the same. +{ +BUG_ON(idx = MAX_ALTP2M); +atomic_dec(p2m_get_altp2m(v)-active_vcpus); +vcpu_altp2m(v).p2midx = (uint16_t)idx; This one surely isn't (or else the field type is wrong). Again required. idx can't be uint16_t because __vmread() requires unsigned long*, but the index is 16 bits. But it's a 16-bit VMCS field that you read it from, and hence the upper 48 bits are necessarily zero. Just to re-iterate: Casts are necessary in certain places, yes, but I see them used pointlessly or even wrongly more often than not. Same approach as above - emulating hardware exactly. Should we add a comment? No, drop the casts. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] design philosophy of blktap
Hi, everyone. I don't quite follow the design philosophy of blktap. Since every virtual disk is backed by a tapdisk process, when there are hundreds of domU running and doing I/O operation simultaneously, which means that hundreds of tapdisk process are doing I/O read/write simulataneously in dom0, won't the I/O performance of domU be hurt badly? ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] VT-d: add iommu=igfx_off option to workaround graphics issues
On 20/07/2015 02:28, Tian, Kevin wrote: From: Ting-Wei Lan [mailto:lant...@gmail.com] Sent: Saturday, July 18, 2015 3:06 AM When using Linux = 3.19 (commit 47591df) as dom0 on some Intel Ironlake devices, It is possible to encounter graphics issues that make screen unreadable or crash the system. It was reported in freedesktop bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90037 As we still cannot find a proper fix for this problem, this patch adds iommu=igfx_off option that is similar to Linux intel_iommu=igfx_off for users to manually workaround the problem. Signed-off-by: Ting-Wei Lan lant...@gmail.com Since igfx works before, I'd think a more proper fix should be on the bisected Linux commit or i915 to have two working correctly together. Otherwise this patch is just hiding problem. The linux commit is the one which actually fixes PAT support for Linux under Xen. It will cause the i915 driver to actually get WC mappings when it asks for them. There is one possible usage to do selective IOMMU disable other than global iommu=off switch. Then making this option general would be better than igfx_off, e.g. based on BDF. But I'm not sure how it is useful in reality. It is curious that just disabling the IOMMU appears to fix the problem. Are there any errata you are aware of on this class of system? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v10][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary
Previously we always fix that predefined boundary as 2G to handle conflict between memory and rdm, but now this predefined boundar can be changes with the parameter rdm_mem_boundary in .cfg file. CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com Acked-by: Ian Jackson ian.jack...@eu.citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v8 ~ v10: * Nothing is changed. v7: * Just sync with the fallout of renaming parameters from patch #10. v6: * Nothing is changed. v5: * Make this variable rdm_mem_boundary_memkb specific to .hvm v4: * Separated from the previous patch to provide a parameter to set that predefined boundary dynamically. docs/man/xl.cfg.pod.5 | 22 ++ tools/libxl/libxl.h | 6 ++ tools/libxl/libxl_create.c | 4 tools/libxl/libxl_dom.c | 8 +--- tools/libxl/libxl_types.idl | 1 + tools/libxl/xl_cmdimpl.c| 3 +++ 6 files changed, 37 insertions(+), 7 deletions(-) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index 6c55a8b..23068ec 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -867,6 +867,28 @@ More information about Xen gfx_passthru feature is available on the XenVGAPassthrough Lhttp://wiki.xen.org/wiki/XenVGAPassthrough wiki page. +=item Brdm_mem_boundary=MBYTES + +Number of megabytes to set a boundary for checking rdm conflict. + +When RDM conflicts with RAM, RDM probably scatter the whole RAM space. +Especially multiple RDM entries would worsen this to lead a complicated +memory layout. So here we're trying to figure out a simple solution to +avoid breaking existing layout. So when a conflict occurs, + +#1. Above a predefined boundary +- move lowmem_end below reserved region to solve conflict; + +#2. Below a predefined boundary +- Check strict/relaxed policy. +strict policy leads to fail libxl. Note when both policies +are specified on a given region, 'strict' is always preferred. +relaxed policy issue a warning message and also mask this +entry INVALID to indicate we shouldn't expose this entry to +hvmloader. + +Here the default is 2G. + =item Bdtdev=[ DTDEV_PATH, DTDEV_PATH, ... ] Specifies the host device tree nodes to passthrough to this guest. Each diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index a1c5d15..6f157c9 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -863,6 +863,12 @@ const char *libxl_defbool_to_string(libxl_defbool b); #define LIBXL_TIMER_MODE_DEFAULT -1 #define LIBXL_MEMKB_DEFAULT ~0ULL +/* + * We'd like to set a memory boundary to determine if we need to check + * any overlap with reserved device memory. + */ +#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024) + #define LIBXL_MS_VM_GENID_LEN 16 typedef struct { uint8_t bytes[LIBXL_MS_VM_GENID_LEN]; diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index c8a32d5..3de86a6 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info) { if (b_info-u.hvm.rdm.policy == LIBXL_RDM_RESERVE_POLICY_INVALID) b_info-u.hvm.rdm.policy = LIBXL_RDM_RESERVE_POLICY_RELAXED; + +if (b_info-u.hvm.rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT) +b_info-u.hvm.rdm_mem_boundary_memkb = +LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT; } int libxl__domain_build_info_setdefault(libxl__gc *gc, diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 80fa17d..e41d54a 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -922,12 +922,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, int ret, rc = ERROR_FAIL; uint64_t mmio_start, lowmem_end, highmem_end; libxl_domain_build_info *const info = d_config-b_info; -/* - * Currently we fix this as 2G to guarantee how to handle - * our rdm policy. But we'll provide a parameter to set - * this dynamically. - */ -uint64_t rdm_mem_boundary = 0x8000; memset(args, 0, sizeof(struct xc_hvm_build_args)); /* The params from the configuration file are in Mb, which are then @@ -966,7 +960,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, args.mmio_start = mmio_start; rc = libxl__domain_device_construct_rdm(gc, d_config, -rdm_mem_boundary, + info-u.hvm.rdm_mem_boundary_memkb*1024, args); if (rc) { LOG(ERROR, checking reserved device memory failed); diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index a3ad8d1..4eb4f8a 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@
[Xen-devel] [v10][PATCH 05/16] hvmloader: get guest memory map into memory_map[]
Now we get this map layout by call XENMEM_memory_map then save them into one global variable memory_map[]. It should include lowmem range, rdm range and highmem range. Note rdm range and highmem range may not exist in some cases. And here we need to check if any reserved memory conflicts with [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END). This range is used to allocate memory in hvmloder level, and we would lead hvmloader failed in case of conflict since its another rare possibility in real world. CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Reviewed-by: Kevin Tian kevin.t...@intel.com Reviewed-by: George Dunlap george.dun...@eu.citrix.com Acked-by: Jan Beulich jbeul...@suse.com --- v10: * Nothing is changed. v9: * Correct [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] - [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END) in the patch head description; Merge two if{} as one if{}; v8: * Actually we should check this range started from RESERVED_MEMORY_DYNAMIC_START, not RESERVED_MEMORY_DYNAMIC_START - 1. So correct this and sync the patch head description. v5 ~ v7: * Nothing is changed. v4: * Move some codes related to e820 to that specific file, e820.c. * Consolidate printf()+BUG() and BUG_ON() * Avoid another fixed width type for the parameter of get_mem_mapping_layout() tools/firmware/hvmloader/e820.c | 32 tools/firmware/hvmloader/e820.h | 7 +++ tools/firmware/hvmloader/hvmloader.c | 2 ++ tools/firmware/hvmloader/util.c | 26 ++ tools/firmware/hvmloader/util.h | 12 5 files changed, 79 insertions(+) diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c index 2e05e93..7a414ab 100644 --- a/tools/firmware/hvmloader/e820.c +++ b/tools/firmware/hvmloader/e820.c @@ -23,6 +23,38 @@ #include config.h #include util.h +struct e820map memory_map; + +void memory_map_setup(void) +{ +unsigned int nr_entries = E820MAX, i; +int rc; +uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START; +uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr; + +rc = get_mem_mapping_layout(memory_map.map, nr_entries); + +if ( rc || !nr_entries ) +{ +printf(Get guest memory maps[%d] failed. (%d)\n, nr_entries, rc); +BUG(); +} + +memory_map.nr_map = nr_entries; + +for ( i = 0; i nr_entries; i++ ) +{ +if ( memory_map.map[i].type == E820_RESERVED + check_overlap(alloc_addr, alloc_size, + memory_map.map[i].addr, memory_map.map[i].size) ) +{ +printf(Fail to setup memory map due to conflict); +printf( on dynamic reserved memory range.\n); +BUG(); +} +} +} + void dump_e820_table(struct e820entry *e820, unsigned int nr) { uint64_t last_end = 0, start, end; diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h index b2ead7f..8b5a9e0 100644 --- a/tools/firmware/hvmloader/e820.h +++ b/tools/firmware/hvmloader/e820.h @@ -15,6 +15,13 @@ struct e820entry { uint32_t type; } __attribute__((packed)); +#define E820MAX128 + +struct e820map { +unsigned int nr_map; +struct e820entry map[E820MAX]; +}; + #endif /* __HVMLOADER_E820_H__ */ /* diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c index 25b7f08..84c588c 100644 --- a/tools/firmware/hvmloader/hvmloader.c +++ b/tools/firmware/hvmloader/hvmloader.c @@ -262,6 +262,8 @@ int main(void) init_hypercalls(); +memory_map_setup(); + xenbus_setup(); bios = detect_bios(); diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c index 80d822f..122e3fa 100644 --- a/tools/firmware/hvmloader/util.c +++ b/tools/firmware/hvmloader/util.c @@ -27,6 +27,17 @@ #include xen/memory.h #include xen/sched.h +/* + * Check whether there exists overlap in the specified memory range. + * Returns true if exists, else returns false. + */ +bool check_overlap(uint64_t start, uint64_t size, + uint64_t reserved_start, uint64_t reserved_size) +{ +return (start + size reserved_start) +(start reserved_start + reserved_size); +} + void wrmsr(uint32_t idx, uint64_t v) { asm volatile ( @@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid) *p = '\0'; } +int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries) +{ +int rc; +struct xen_memory_map memmap = { +.nr_entries = *max_entries +}; + +set_xen_guest_handle(memmap.buffer, entries); + +rc =
[Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
While building a VM, HVM domain builder provides struct hvm_info_table{} to help hvmloader. Currently it includes two fields to construct guest e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should check them to fix any conflict with RDM. RMRR can reside in address space beyond 4G theoretically, but we never see this in real world. So in order to avoid breaking highmem layout we don't solve highmem conflict. Note this means highmem rmrr could still be supported if no conflict. But in the case of lowmem, RMRR probably scatter the whole RAM space. Especially multiple RMRR entries would worsen this to lead a complicated memory layout. And then its hard to extend hvm_info_table{} to work hvmloader out. So here we're trying to figure out a simple solution to avoid breaking existing layout. So when a conflict occurs, #1. Above a predefined boundary (2G) - move lowmem_end below reserved region to solve conflict; #2. Below a predefined boundary (2G) - Check strict/relaxed policy. strict policy leads to fail libxl. Note when both policies are specified on a given region, 'strict' is always preferred. relaxed policy issue a warning message and also mask this entry INVALID to indicate we shouldn't expose this entry to hvmloader. Note later we need to provide a parameter to set that predefined boundary dynamically. CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Reviewed-by: Kevin Tian kevin.t...@intel.com --- v9 ~ v10: * Nothing is changed. v8: * Introduce pfn_to_paddr(x) - ((uint64_t)x XC_PAGE_SHIFT) and set_rdm_entries() to factor out current codes. v7: * Just sync with the fallout of renaming parameters from patch #10. v6: * fix some code stypes * Refine libxl__xc_device_get_rdm() v5: * A little change to make sure the per-device policy always override the global policy and correct its associated code comments. * Fix one typo in the patch head description * Rename xc_device_get_rdm() with libxl__xc_device_get_rdm(), and then replace malloc() with libxl__malloc(), and finally cleanup this fallout. * libxl__xc_device_get_rdm() should return proper libxl error code, ERROR_FAIL. Then instead, the allocated RDM entries would be returned with an out parameter. v4: * Consistent to use term RDM. * Unconditionally set *nr_entries to 0 * Grab to all sutffs to provide a parameter to set our predefined boundary dynamically to as a separated patch later tools/libxl/libxl_create.c | 2 +- tools/libxl/libxl_dm.c | 273 +++ tools/libxl/libxl_dom.c | 17 ++- tools/libxl/libxl_internal.h | 11 +- tools/libxl/libxl_types.idl | 7 ++ 5 files changed, 307 insertions(+), 3 deletions(-) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index f75d4f1..c8a32d5 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -459,7 +459,7 @@ int libxl__domain_build(libxl__gc *gc, switch (info-type) { case LIBXL_DOMAIN_TYPE_HVM: -ret = libxl__build_hvm(gc, domid, info, state); +ret = libxl__build_hvm(gc, domid, d_config, state); if (ret) goto out; diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 317a8eb..692258b 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -90,6 +90,279 @@ const char *libxl__domain_device_model(libxl__gc *gc, return dm; } +static int +libxl__xc_device_get_rdm(libxl__gc *gc, + uint32_t flag, + uint16_t seg, + uint8_t bus, + uint8_t devfn, + unsigned int *nr_entries, + struct xen_reserved_device_memory **xrdm) +{ +int rc = 0, r; + +/* + * We really can't presume how many entries we can get in advance. + */ +*nr_entries = 0; +r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn, + NULL, nr_entries); +assert(r = 0); +/* 0 means we have no any rdm entry. */ +if (!r) goto out; + +if (errno != ENOBUFS) { +rc = ERROR_FAIL; +goto out; +} + +*xrdm = libxl__malloc(gc, + *nr_entries * sizeof(xen_reserved_device_memory_t)); +r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn, + *xrdm, nr_entries); +if (r) +rc = ERROR_FAIL; + + out: +if (rc) { +*nr_entries = 0; +*xrdm = NULL; +LOG(ERROR, Could not get reserved device memory maps.\n); +} +return rc; +} + +/* + * Check whether there exists rdm hole in the specified memory range. + * Returns true if exists, else
[Xen-devel] [v10][PATCH 10/16] tools: introduce some new parameters to set rdm policy
This patch introduces user configurable parameters to specify RDM resource and according policies, Global RDM parameter: rdm = strategy=host,policy=strict/relaxed Per-device RDM parameter: pci = [ 'sbdf, rdm_policy=strict/relaxed' ] Global RDM parameter, strategy, allows user to specify reserved regions explicitly, Currently, using 'host' to include all reserved regions reported on this platform which is good to handle hotplug scenario. In the future this parameter may be further extended to allow specifying random regions, e.g. even those belonging to another platform as a preparation for live migration with passthrough devices. By default this isn't set so we don't check all rdms. Instead, we just check rdm specific to a given device if you're assigning this kind of device. Note this option is not recommended unless you can make sure any conflict does exist. 'strict/relaxed' policy decides how to handle conflict when reserving RDM regions in pfn space. If conflict exists, 'strict' means an immediate error so VM can't keep running, while 'relaxed' allows moving forward with a warning message thrown out. Default per-device RDM policy is same as default global RDM policy as being 'relaxed'. And the per-device policy would override the global policy like others. CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com Acked-by: Ian Jackson ian.jack...@eu.citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v9 ~ v10: * Nothing is changed. v8: * One minimal code style change v7: * Need to rename some parameters: In the xl rdm config parsing, `reserve=' should be `policy='. In the xl pci config parsing, `rdm_reserve=' should be `rdm_policy='. The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'. The field name `reserve' in `libxl_rdm_reserve' should be `policy'. v6: * Some rename to make our policy reasonable type - strategy none - ignore * Don't expose ignore in xl level and just keep that as a default. And then sync docs and the patch head description v5: * Just make sure the per-device plicy always override the global policy, and so cleanup some associated comments and the patch head description. * A little change to follow one bit, XEN_DOMCTL_DEV_RDM_RELAXED. * Improve all descriptions in doc. * Make all rdm variables specific to .hvm v4: * No need to define init_val for libxl_rdm_reserve_type since its just zero * Grab those changes to xl/libxlu to as a final patch docs/man/xl.cfg.pod.5| 81 docs/misc/vtd.txt| 24 + tools/libxl/libxl_create.c | 7 tools/libxl/libxl_internal.h | 2 ++ tools/libxl/libxl_pci.c | 9 + tools/libxl/libxl_types.idl | 18 ++ 6 files changed, 141 insertions(+) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index a3e0e2e..6c55a8b 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -655,6 +655,79 @@ assigned slave device. =back +=item Brdm=RDM_RESERVATION_STRING + +(HVM/x86 only) Specifies information about Reserved Device Memory (RDM), +which is necessary to enable robust device passthrough. One example of RDM +is reported through ACPI Reserved Memory Region Reporting (RMRR) structure +on x86 platform. + +BRDM_RESERVE_STRING has the form C[KEY=VALUE,KEY=VALUE,... where: + +=over 4 + +=item BKEY=VALUE + +Possible BKEYs are: + +=over 4 + +=item Bstrategy=STRING + +Currently there is only one valid type: + +host means all reserved device memory on this platform should be checked to +reserve regions in this VM's guest address space. This global rdm parameter +allows user to specify reserved regions explicitly, and using host includes +all reserved regions reported on this platform, which is useful when doing +hotplug. + +By default this isn't set so we don't check all rdms. Instead, we just check +rdm specific to a given device if you're assigning this kind of device. Note +this option is not recommended unless you can make sure any conflict does exist. + +For example, you're trying to set memory = 2800 to allocate memory to one +given VM but the platform owns two RDM regions like, + +Device A [sbdf_A]: RMRR region_A: base_addr ac6d3000 end_address ac6e6fff +Device B [sbdf_B]: RMRR region_B: base_addr ad80 end_address afff + +In this conflict case, + +#1. If Bstrategy is set to host, for example, + +rdm = strategy=host,policy=strict or rdm = strategy=host,policy=relaxed + +It means all conflicts will be handled according to the policy +introduced by Bpolicy as described below. + +#2. If Bstrategy is not set at all, but + +pci = [ 'sbdf_A, rdm_policy=x' ] + +It means only one conflict of region_A will be handled according to the policy +introduced by Brdm_policy=STRING as described inside pci options. + +=item Bpolicy=STRING +
[Xen-devel] [v10][PATCH 07/16] hvmloader/e820: construct guest e820 table
Now use the hypervisor-supplied memory map to build our final e820 table: * Add regions for BIOS ranges and other special mappings not in the hypervisor map * Add in the hypervisor supplied regions * Adjust the lowmem and highmem regions if we've had to relocate memory (adding a highmem region if necessary) * Sort all the ranges so that they appear in memory order. CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v10: * Instead of correcting e820, I'd like to correct memory_map.map[] and then copy them into e820 directly. I think this can make sure hvm_info, memory_map.map[] and e820 are on the same page. v9: * Refine that chunk of codes to check/modify highmem v8: * define low_mem_end as uint32_t * Correct those two wrong loops, memory_map.nr_map - nr when we're trying to revise low/high memory e820 entries. * Improve code comments and the patch head description * Add one check if highmem is just populated by hvmloader itself v5 ~ v7: * Nothing is changed. v4: * Rename local variable, low_mem_pgend, to low_mem_end. * Improve some code comments * Adjust highmem after lowmem is changed. tools/firmware/hvmloader/e820.c | 108 +++- 1 file changed, 95 insertions(+), 13 deletions(-) diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c index 7a414ab..ca794ad 100644 --- a/tools/firmware/hvmloader/e820.c +++ b/tools/firmware/hvmloader/e820.c @@ -105,7 +105,11 @@ int build_e820_table(struct e820entry *e820, unsigned int lowmem_reserved_base, unsigned int bios_image_base) { -unsigned int nr = 0; +unsigned int nr = 0, i, j; +uint32_t low_mem_end = hvm_info-low_mem_pgend PAGE_SHIFT; +uint32_t add_high_mem = 0; +uint64_t high_mem_end = (uint64_t)hvm_info-high_mem_pgend PAGE_SHIFT; +uint64_t map_start, map_size, map_end; if ( !lowmem_reserved_base ) lowmem_reserved_base = 0xA; @@ -149,13 +153,6 @@ int build_e820_table(struct e820entry *e820, e820[nr].type = E820_RESERVED; nr++; -/* Low RAM goes here. Reserve space for special pages. */ -BUG_ON((hvm_info-low_mem_pgend PAGE_SHIFT) (2u 20)); -e820[nr].addr = 0x10; -e820[nr].size = (hvm_info-low_mem_pgend PAGE_SHIFT) - e820[nr].addr; -e820[nr].type = E820_RAM; -nr++; - /* * Explicitly reserve space for special pages. * This space starts at RESERVED_MEMBASE an extends to cover various @@ -191,16 +188,101 @@ int build_e820_table(struct e820entry *e820, nr++; } +/* Low RAM goes here. Reserve space for special pages. */ +BUG_ON(low_mem_end (2u 20)); -if ( hvm_info-high_mem_pgend ) +/* + * Construct E820 table according to recorded memory map. + * + * The memory map created by toolstack may include, + * + * #1. Low memory region + * + * Low RAM starts at least from 1M to make sure all standard regions + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios, + * have enough space. + * + * #2. Reserved regions if they exist + * + * #3. High memory region if it exists + * + * Note we just have one low memory entry and one high mmeory entry if + * exists. + * + * But we may have relocated RAM to allocate sufficient MMIO previously + * so low_mem_pgend would be changed over there. And here memory_map[] + * records the original low/high memory, so if low_mem_end is less than + * the original we need to revise low/high memory range firstly. + */ +for ( i = 0; i memory_map.nr_map; i++ ) { -e820[nr].addr = ((uint64_t)1 32); -e820[nr].size = -((uint64_t)hvm_info-high_mem_pgend PAGE_SHIFT) - e820[nr].addr; -e820[nr].type = E820_RAM; +map_start = memory_map.map[i].addr; +map_size = memory_map.map[i].size; +map_end = map_start + map_size; + +/* If we need to adjust lowmem. */ +if ( memory_map.map[i].type == E820_RAM + low_mem_end map_start low_mem_end map_end ) +{ +add_high_mem = map_end - low_mem_end; +memory_map.map[i].size = low_mem_end - map_start; +break; +} +} + +/* If we need to adjust highmem. */ +if ( add_high_mem ) +{ +/* Modify the existing highmem region if it exists. */ +for ( i = 0; i memory_map.nr_map; i++ ) +{ +map_start = memory_map.map[i].addr; +map_size = memory_map.map[i].size; +map_end = map_start + map_size; + +if ( memory_map.map[i].type == E820_RAM +
[Xen-devel] [v10][PATCH 16/16] tools: parse to enable new rdm policy parameters
This patch parses to enable user configurable parameters to specify RDM resource and according policies which are defined previously, Global RDM parameter: rdm = strategy=host,policy=strict/relaxed Per-device RDM parameter: pci = [ 'sbdf, rdm_policy=strict/relaxed' ] Default per-device RDM policy is same as default global RDM policy as being 'relaxed'. And the per-device policy would override the global policy like others. CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v9 ~ v10: * Nothing is changed. v8: * Clean some codes style issues. v7: * Just sync with the fallout of renaming parameters from patch #10. v6: * Just sync those renames introduced by patch #10. v5: * Need a rebase after we make all rdm variables specific to .hvm. * Like other pci option, the per-device policy always follows the global policy by default. v4: * Separated from current patch #11 to parse/enable our rdm policy parameters since its make a lot sense and these stuffs are specific to xl/libxlu. tools/libxl/libxlu_pci.c | 92 +++- tools/libxl/libxlutil.h | 4 +++ tools/libxl/xl_cmdimpl.c | 13 +++ 3 files changed, 108 insertions(+), 1 deletion(-) diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c index 26fb143..026413b 100644 --- a/tools/libxl/libxlu_pci.c +++ b/tools/libxl/libxlu_pci.c @@ -42,6 +42,9 @@ static int pcidev_struct_fill(libxl_device_pci *pcidev, unsigned int domain, #define STATE_OPTIONS_K 6 #define STATE_OPTIONS_V 7 #define STATE_TERMINAL 8 +#define STATE_TYPE 9 +#define STATE_RDM_STRATEGY 10 +#define STATE_RESERVE_POLICY11 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str) { unsigned state = STATE_DOMAIN; @@ -143,7 +146,18 @@ int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str pcidev-permissive = atoi(tok); }else if ( !strcmp(optkey, seize) ) { pcidev-seize = atoi(tok); -}else{ +} else if (!strcmp(optkey, rdm_policy)) { +if (!strcmp(tok, strict)) { +pcidev-rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT; +} else if (!strcmp(tok, relaxed)) { +pcidev-rdm_policy = LIBXL_RDM_RESERVE_POLICY_RELAXED; +} else { +XLU__PCI_ERR(cfg, %s is not an valid PCI RDM property + policy: 'strict' or 'relaxed'., + tok); +goto parse_error; +} +} else { XLU__PCI_ERR(cfg, Unknown PCI BDF option: %s, optkey); } tok = ptr + 1; @@ -167,6 +181,82 @@ parse_error: return ERROR_INVAL; } +int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str) +{ +unsigned state = STATE_TYPE; +char *buf2, *tok, *ptr, *end; + +if (NULL == (buf2 = ptr = strdup(str))) +return ERROR_NOMEM; + +for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr end; ptr++) { +switch(state) { +case STATE_TYPE: +if (*ptr == '=') { +state = STATE_RDM_STRATEGY; +*ptr = '\0'; +if (strcmp(tok, strategy)) { +XLU__PCI_ERR(cfg, Unknown RDM state option: %s, tok); +goto parse_error; +} +tok = ptr + 1; +} +break; +case STATE_RDM_STRATEGY: +if (*ptr == '\0' || *ptr == ',') { +state = STATE_RESERVE_POLICY; +*ptr = '\0'; +if (!strcmp(tok, host)) { +rdm-strategy = LIBXL_RDM_RESERVE_STRATEGY_HOST; +} else { +XLU__PCI_ERR(cfg, Unknown RDM strategy option: %s, tok); +goto parse_error; +} +tok = ptr + 1; +} +break; +case STATE_RESERVE_POLICY: +if (*ptr == '=') { +state = STATE_OPTIONS_V; +*ptr = '\0'; +if (strcmp(tok, policy)) { +XLU__PCI_ERR(cfg, Unknown RDM property value: %s, tok); +goto parse_error; +} +tok = ptr + 1; +} +break; +case STATE_OPTIONS_V: +if (*ptr == ',' || *ptr == '\0') { +state = STATE_TERMINAL; +*ptr = '\0'; +if (!strcmp(tok, strict)) { +rdm-policy = LIBXL_RDM_RESERVE_POLICY_STRICT; +} else if (!strcmp(tok,
[Xen-devel] [v10][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map
From: Jan Beulich jbeul...@suse.com This is a prerequisite for punching holes into HVM and PVH guests' P2M to allow passing through devices that are associated with (on VT-d) RMRRs. CC: Jan Beulich jbeul...@suse.com CC: Yang Zhang yang.z.zh...@intel.com CC: Kevin Tian kevin.t...@intel.com Signed-off-by: Jan Beulich jbeul...@suse.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com --- v7 ~ v10: * Nothing is changed. v6: * Add a comments to the nr_entries field inside xen_reserved_device_memory_map v5 ~ v4: * Nothing is changed. xen/common/compat/memory.c | 66 xen/common/memory.c | 64 ++ xen/drivers/passthrough/iommu.c | 10 ++ xen/drivers/passthrough/vtd/dmar.c | 32 + xen/drivers/passthrough/vtd/extern.h | 1 + xen/drivers/passthrough/vtd/iommu.c | 1 + xen/include/public/memory.h | 37 +++- xen/include/xen/iommu.h | 10 ++ xen/include/xen/pci.h| 2 ++ xen/include/xlat.lst | 3 +- 10 files changed, 224 insertions(+), 2 deletions(-) diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c index b258138..b608496 100644 --- a/xen/common/compat/memory.c +++ b/xen/common/compat/memory.c @@ -17,6 +17,45 @@ CHECK_TYPE(domid); CHECK_mem_access_op; CHECK_vmemrange; +#ifdef HAS_PASSTHROUGH +struct get_reserved_device_memory { +struct compat_reserved_device_memory_map map; +unsigned int used_entries; +}; + +static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr, + u32 id, void *ctxt) +{ +struct get_reserved_device_memory *grdm = ctxt; +u32 sbdf; +struct compat_reserved_device_memory rdm = { +.start_pfn = start, .nr_pages = nr +}; + +sbdf = PCI_SBDF2(grdm-map.seg, grdm-map.bus, grdm-map.devfn); +if ( (grdm-map.flag PCI_DEV_RDM_ALL) || (sbdf == id) ) +{ +if ( grdm-used_entries grdm-map.nr_entries ) +{ +if ( rdm.start_pfn != start || rdm.nr_pages != nr ) +return -ERANGE; + +if ( __copy_to_compat_offset(grdm-map.buffer, + grdm-used_entries, + rdm, + 1) ) +{ +return -EFAULT; +} +} +++grdm-used_entries; +return 1; +} + +return 0; +} +#endif + int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat) { int split, op = cmd MEMOP_CMD_MASK; @@ -303,6 +342,33 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat) break; } +#ifdef HAS_PASSTHROUGH +case XENMEM_reserved_device_memory_map: +{ +struct get_reserved_device_memory grdm; + +if ( copy_from_guest(grdm.map, compat, 1) || + !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) ) +return -EFAULT; + +grdm.used_entries = 0; +rc = iommu_get_reserved_device_memory(get_reserved_device_memory, + grdm); + +if ( !rc grdm.map.nr_entries grdm.used_entries ) +rc = -ENOBUFS; + +grdm.map.nr_entries = grdm.used_entries; +if ( grdm.map.nr_entries ) +{ +if ( __copy_to_guest(compat, grdm.map, 1) ) +rc = -EFAULT; +} + +return rc; +} +#endif + default: return compat_arch_memory_op(cmd, compat); } diff --git a/xen/common/memory.c b/xen/common/memory.c index c84fcdd..7b6281b 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -748,6 +748,43 @@ static int construct_memop_from_reservation( return 0; } +#ifdef HAS_PASSTHROUGH +struct get_reserved_device_memory { +struct xen_reserved_device_memory_map map; +unsigned int used_entries; +}; + +static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr, + u32 id, void *ctxt) +{ +struct get_reserved_device_memory *grdm = ctxt; +u32 sbdf; + +sbdf = PCI_SBDF2(grdm-map.seg, grdm-map.bus, grdm-map.devfn); +if ( (grdm-map.flag PCI_DEV_RDM_ALL) || (sbdf == id) ) +{ +if ( grdm-used_entries grdm-map.nr_entries ) +{ +struct xen_reserved_device_memory rdm = { +.start_pfn = start, .nr_pages = nr +}; + +if ( __copy_to_guest_offset(grdm-map.buffer, +grdm-used_entries, +rdm, +1) ) +{ +return -EFAULT; +} +} +++grdm-used_entries; +return
[Xen-devel] [v10][PATCH 08/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map
We will introduce the hypercall xc_reserved_device_memory_map approach to libxc. This helps us get rdm entry info according to different parameters. If flag == PCI_DEV_RDM_ALL, all entries should be exposed. Or we just expose that rdm entry specific to a SBDF. CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Reviewed-by: Kevin Tian kevin.t...@intel.com Acked-by: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v4 ~ v10: * Nothing is changed. tools/libxc/include/xenctrl.h | 8 tools/libxc/xc_domain.c | 36 2 files changed, 44 insertions(+) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index d1d2ab3..9160623 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -1326,6 +1326,14 @@ int xc_domain_set_memory_map(xc_interface *xch, int xc_get_machine_memory_map(xc_interface *xch, struct e820entry entries[], uint32_t max_entries); + +int xc_reserved_device_memory_map(xc_interface *xch, + uint32_t flag, + uint16_t seg, + uint8_t bus, + uint8_t devfn, + struct xen_reserved_device_memory entries[], + uint32_t *max_entries); #endif int xc_domain_set_time_offset(xc_interface *xch, uint32_t domid, diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index ce51e69..0951291 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -684,6 +684,42 @@ int xc_domain_set_memory_map(xc_interface *xch, return rc; } + +int xc_reserved_device_memory_map(xc_interface *xch, + uint32_t flag, + uint16_t seg, + uint8_t bus, + uint8_t devfn, + struct xen_reserved_device_memory entries[], + uint32_t *max_entries) +{ +int rc; +struct xen_reserved_device_memory_map xrdmmap = { +.flag = flag, +.seg = seg, +.bus = bus, +.devfn = devfn, +.nr_entries = *max_entries +}; +DECLARE_HYPERCALL_BOUNCE(entries, + sizeof(struct xen_reserved_device_memory) * + *max_entries, XC_HYPERCALL_BUFFER_BOUNCE_OUT); + +if ( xc_hypercall_bounce_pre(xch, entries) ) +return -1; + +set_xen_guest_handle(xrdmmap.buffer, entries); + +rc = do_memory_op(xch, XENMEM_reserved_device_memory_map, + xrdmmap, sizeof(xrdmmap)); + +xc_hypercall_bounce_post(xch, entries); + +*max_entries = xrdmmap.nr_entries; + +return rc; +} + int xc_get_machine_memory_map(xc_interface *xch, struct e820entry entries[], uint32_t max_entries) -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v10][PATCH 04/16] xen: enable XENMEM_memory_map in hvm
This patch enables XENMEM_memory_map in hvm. So hvmloader can use it to setup the e820 mappings. CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Reviewed-by: Tim Deegan t...@xen.org Reviewed-by: Kevin Tian kevin.t...@intel.com Acked-by: Jan Beulich jbeul...@suse.com Acked-by: George Dunlap george.dun...@eu.citrix.com --- v5 ~ v10: * Nothing is changed. v4: * Just refine the patch head description as Jan commented. xen/arch/x86/hvm/hvm.c | 2 -- xen/arch/x86/mm.c | 6 -- 2 files changed, 8 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 535d622..638daee 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4741,7 +4741,6 @@ static long hvm_memory_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) switch ( cmd MEMOP_CMD_MASK ) { -case XENMEM_memory_map: case XENMEM_machine_memory_map: case XENMEM_machphys_mapping: return -ENOSYS; @@ -4817,7 +4816,6 @@ static long hvm_memory_op_compat32(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) switch ( cmd MEMOP_CMD_MASK ) { -case XENMEM_memory_map: case XENMEM_machine_memory_map: case XENMEM_machphys_mapping: return -ENOSYS; diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index fd151c6..92eccd0 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4717,12 +4717,6 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) return rc; } -if ( is_hvm_domain(d) ) -{ -rcu_unlock_domain(d); -return -EPERM; -} - e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries); if ( e820 == NULL ) { -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v10][PATCH 02/16] xen/vtd: create RMRR mapping
RMRR reserved regions must be setup in the pfn space with an identity mapping to reported mfn. However existing code has problem to setup correct mapping when VT-d shares EPT page table, so lead to problem when assigning devices (e.g GPU) with RMRR reported. So instead, this patch aims to setup identity mapping in p2m layer, regardless of whether EPT is shared or not. And we still keep creating VT-d table. And we also need to introduce a pair of helper to create/clear this sort of identity mapping as follows: set_identity_p2m_entry(): If the gfn space is unoccupied, we just set the mapping. If space is already occupied by desired identity mapping, do nothing. Otherwise, failure is returned. clear_identity_p2m_entry(): We just define macro to wrapper guest_physmap_remove_page() with a returning value as necessary. CC: Tim Deegan t...@xen.org CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com CC: Yang Zhang yang.z.zh...@intel.com CC: Kevin Tian kevin.t...@intel.com Reviewed-by: Kevin Tian kevin.t...@intel.com Reviewed-by: Tim Deegan t...@xen.org Acked-by: George Dunlap george.dun...@eu.citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v6 ~ v10: * Nothing is changed. v5: * Fold our original patch #2 and #3 as this new * Introduce a new, clear_identity_p2m_entry, which can wrapper guest_physmap_remove_page(). And we use this to clean our identity mapping. v4: * Change that orginal condition, if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm ) to make sure we catch those invalid mfn mapping as we expected. * To have if ( !paging_mode_translate(p2m-domain) ) return 0; at the start, instead of indenting the whole body of the function in an inner scope. * extend guest_physmap_remove_page() to return a value as a proper unmapping helper * Instead of intel_iommu_unmap_page(), we should use guest_physmap_remove_page() to unmap rmrr mapping correctly. * Drop iommu_map_page() since actually ept_set_entry() can do this internally. xen/arch/x86/mm/p2m.c | 40 +++-- xen/drivers/passthrough/vtd/iommu.c | 5 ++--- xen/include/asm-x86/p2m.h | 13 +--- 3 files changed, 50 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 6b39733..99a26ca 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -584,14 +584,16 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn, unsigned long mfn, p2m-default_access); } -void +int guest_physmap_remove_page(struct domain *d, unsigned long gfn, unsigned long mfn, unsigned int page_order) { struct p2m_domain *p2m = p2m_get_hostp2m(d); +int rc; gfn_lock(p2m, gfn, page_order); -p2m_remove_page(p2m, gfn, mfn, page_order); +rc = p2m_remove_page(p2m, gfn, mfn, page_order); gfn_unlock(p2m, gfn, page_order); +return rc; } int @@ -898,6 +900,40 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access); } +int set_identity_p2m_entry(struct domain *d, unsigned long gfn, + p2m_access_t p2ma) +{ +p2m_type_t p2mt; +p2m_access_t a; +mfn_t mfn; +struct p2m_domain *p2m = p2m_get_hostp2m(d); +int ret; + +if ( !paging_mode_translate(p2m-domain) ) +return 0; + +gfn_lock(p2m, gfn, 0); + +mfn = p2m-get_entry(p2m, gfn, p2mt, a, 0, NULL); + +if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm ) +ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K, +p2m_mmio_direct, p2ma); +else if ( mfn_x(mfn) == gfn p2mt == p2m_mmio_direct a == p2ma ) +ret = 0; +else +{ +ret = -EBUSY; +printk(XENLOG_G_WARNING + Cannot setup identity map d%d:%lx, +gfn already mapped to %lx.\n, + d-domain_id, gfn, mfn_x(mfn)); +} + +gfn_unlock(p2m, gfn, 0); +return ret; +} + /* Returns: 0 for success, -errno for failure */ int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn) { diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 44ed23d..8415958 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map, while ( base_pfn end_pfn ) { -if ( intel_iommu_unmap_page(d, base_pfn) ) +if ( clear_identity_p2m_entry(d, base_pfn, 0) ) ret = -ENXIO; base_pfn++; } @@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map, while ( base_pfn end_pfn ) { -int err = intel_iommu_map_page(d, base_pfn, base_pfn, -
[Xen-devel] [v10][PATCH 06/16] hvmloader/pci: Try to avoid placing BARs in RMRRs
Try to avoid placing PCI BARs over RMRRs: - If mmio_hole_size is not specified, and the existing MMIO range has RMRRs in it, and there is space to expand the hole in lowmem without moving more memory, then make the MMIO hole as large as possible. - When placing RMRRs, find the next RMRR higher than the current base in the lowmem mmio hole. If it overlaps, skip ahead of it and find the next one. This certainly won't work in all cases, but it should work in a significant number of cases. Additionally, users should be able to work around problems by setting mmio_hole_size larger in the guest config. Signed-off-by: George Dunlap george.dun...@eu.citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v10: * This is from George' draft patch which implements an acceptable solution in current cycle. Here I just implemented check_overlap_all() and some cleanups. v9: * A little improvement to code implementation but again, its still argued about this solution. v8: * Based on current discussion its hard to reshape the original mmio allocation mechanism but we haven't a good and simple way to in short term. So instead, we don't bring more complicated to intervene that process but still check any conflicts to disable all associated devices. v6 ~ v7: * Nothing is changed. v5: * Rename that field, is_64bar, inside struct bars with flag, and then extend to also indicate if this bar is already allocated. v4: * We have to re-design this as follows: #1. Goal MMIO region should exclude all reserved device memory #2. Requirements #2.1 Still need to make sure MMIO region is fit all pci devices as before #2.2 Accommodate the not aligned reserved memory regions If I'm missing something let me know. #3. How to #3.1 Address #2.1 We need to either of populating more RAM, or of expanding more highmem. But we should know just 64bit-bar can work with highmem, and as you mentioned we also should avoid expanding highmem as possible. So my implementation is to allocate 32bit-bar and 64bit-bar orderly. 1. The first allocation round just to 32bit-bar If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar with all remaining resources including low pci memory. If not, we need to calculate how much RAM should be populated to allocate the remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go to the second allocation round 2. 2. The second allocation round to the remaining 32bit-bar We should can finish allocating all 32bit-bar in theory, then go to the third allocation round 3. 3. The third allocation round to 64bit-bar We'll try to first allocate from the remaining low memory resource. If that isn't enough, we try to expand highmem to allocate for 64bit-bar. This process should be same as the original. #3.2 Address #2.2 I'm trying to accommodate the not aligned reserved memory regions: We should skip all reserved device memory, but we also need to check if other smaller bars can be allocated if a mmio hole exists between resource-base and reserved device memory. If a hole exists between base and reserved device memory, lets go out simply to try allocate for next bar since all bars are in descending order of size. If not, we need to move resource-base to reserved_end just to reallocate this bar. tools/firmware/hvmloader/pci.c | 62 ++ 1 file changed, 62 insertions(+) diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c index 5ff87a7..f229a91 100644 --- a/tools/firmware/hvmloader/pci.c +++ b/tools/firmware/hvmloader/pci.c @@ -38,6 +38,43 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0; enum virtual_vga virtual_vga = VGA_none; unsigned long igd_opregion_pgbase = 0; +/* Check if any conflicts with all reserved device memory. */ +static bool check_overlap_all(uint64_t start, uint64_t size) +{ +unsigned int i; + +for ( i = 0; i memory_map.nr_map; i++ ) +{ +if ( memory_map.map[i].type == E820_RESERVED + check_overlap(start, size, + memory_map.map[i].addr, + memory_map.map[i].size) ) +return true; +} + +return false; +} + +/* Find the lowest RMRR higher than base. */ +static int find_next_rmrr(uint32_t base) +{ +unsigned int i; +int next_rmrr = -1; +uint64_t min_base = (1ull 32); + +for ( i = 0; i memory_map.nr_map ; i++ ) +{ +if ( memory_map.map[i].type == E820_RESERVED + memory_map.map[i].addr base + memory_map.map[i].addr min_base ) +{ +next_rmrr = i; +min_base = memory_map.map[i].addr; +} +} +return next_rmrr; +} + void pci_setup(void) { uint8_t is_64bar, using_64bar, bar64_relocate = 0; @@ -46,6 +83,7 @@ void pci_setup(void) uint32_t vga_devfn = 256; uint16_t class,
[Xen-devel] [v10][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest
Here we'll construct a basic guest e820 table via XENMEM_set_memory_map. This table includes lowmem, highmem and RDMs if they exist, and hvmloader would need this info later. Note this guest e820 table would be same as before if the platform has no any RDM or we disable RDM (by default). CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v8 ~ v10: * Nothing is changed. v7: * Just sync with the fallout of renaming parameters from patch #10. v6: * Nothing is changed. v5: * Make this variable rdm_mem_boundary_memkb specific to .hvm v4: * Separated from the previous patch to provide a parameter to set that predefined boundary dynamically. tools/libxl/libxl_arch.h | 7 tools/libxl/libxl_arm.c | 8 + tools/libxl/libxl_dom.c | 5 +++ tools/libxl/libxl_x86.c | 83 4 files changed, 103 insertions(+) diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h index d04871c..939178a 100644 --- a/tools/libxl/libxl_arch.h +++ b/tools/libxl/libxl_arch.h @@ -49,4 +49,11 @@ int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc, _hidden int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq); +/* arch specific to construct memory mapping function */ +_hidden +int libxl__arch_domain_construct_memmap(libxl__gc *gc, +libxl_domain_config *d_config, +uint32_t domid, +struct xc_hvm_build_args *args); + #endif diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c index f09c860..1526467 100644 --- a/tools/libxl/libxl_arm.c +++ b/tools/libxl/libxl_arm.c @@ -926,6 +926,14 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq) return xc_domain_bind_pt_spi_irq(CTX-xch, domid, irq, irq); } +int libxl__arch_domain_construct_memmap(libxl__gc *gc, +libxl_domain_config *d_config, +uint32_t domid, +struct xc_hvm_build_args *args) +{ +return 0; +} + /* * Local variables: * mode: C diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index e41d54a..a8c6aa9 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, goto out; } +if (libxl__arch_domain_construct_memmap(gc, d_config, domid, args)) { +LOG(ERROR, setting domain memory map failed); +goto out; +} + ret = hvm_build_set_params(ctx-xch, domid, info, state-store_port, state-store_mfn, state-console_port, state-console_mfn, state-store_domid, diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c index ed2bd38..66b3d7f 100644 --- a/tools/libxl/libxl_x86.c +++ b/tools/libxl/libxl_x86.c @@ -438,6 +438,89 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq) } /* + * Here we're just trying to set these kinds of e820 mappings: + * + * #1. Low memory region + * + * Low RAM starts at least from 1M to make sure all standard regions + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios, + * have enough space. + * Note: Those stuffs below 1M are still constructed with multiple + * e820 entries by hvmloader. At this point we don't change anything. + * + * #2. RDM region if it exists + * + * #3. High memory region if it exists + * + * Note: these regions are not overlapping since we already check + * to adjust them. Please refer to libxl__domain_device_construct_rdm(). + */ +#define GUEST_LOW_MEM_START_DEFAULT 0x10 +int libxl__arch_domain_construct_memmap(libxl__gc *gc, +libxl_domain_config *d_config, +uint32_t domid, +struct xc_hvm_build_args *args) +{ +int rc = 0; +unsigned int nr = 0, i; +/* We always own at least one lowmem entry. */ +unsigned int e820_entries = 1; +struct e820entry *e820 = NULL; +uint64_t highmem_size = +args-highmem_end ? args-highmem_end - (1ull 32) : 0; + +/* Add all rdm entries. */ +for (i = 0; i d_config-num_rdms; i++) +if (d_config-rdms[i].policy != LIBXL_RDM_RESERVE_POLICY_INVALID) +e820_entries++; + + +/* If we should have a highmem range. */ +if (highmem_size) +e820_entries++; + +if (e820_entries = E820MAX) { +LOG(ERROR, Ooops! Too many entries in the memory map!\n); +rc = ERROR_INVAL; +goto out; +} + +e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries); + +/* Low
[Xen-devel] [v10][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
This patch extends the existing hypercall to support rdm reservation policy. We return error or just throw out a warning message depending on whether the policy is strict or relaxed when reserving RDM regions in pfn space. Note in some special cases, e.g. add a device to hwdomain, and remove a device from user domain, 'relaxed' is fine enough since this is always safe to hwdomain. CC: Tim Deegan t...@xen.org CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com CC: Suravee Suthikulpanit suravee.suthikulpa...@amd.com CC: Aravind Gopalakrishnan aravind.gopalakrish...@amd.com CC: Ian Campbell ian.campb...@citrix.com CC: Stefano Stabellini stefano.stabell...@citrix.com CC: Yang Zhang yang.z.zh...@intel.com CC: Kevin Tian kevin.t...@intel.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Reviewed-by: George Dunlap george.dun...@eu.citrix.com Acked-by: Jan Beulich jbeul...@suse.com --- v10: * Nothing is changed. v9: * Correct one check condition of XEN_DOMCTL_DEV_RDM_RELAXED v8: * Force to pass 0(strict) when add or move a device in hardware domain, and improve some associated code comments. v6 ~ v7: * Nothing is changed. v5: * Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our flag, so 0 means strict and 1 means relaxed. * So make DT device ignore the flag field * Improve the code comments v4: * Add code comments to describer why we fix to set a policy flag in some cases like adding a device to hwdomain, and removing a device from user domain. * Avoid using fixed width types for the parameter of set_identity_p2m_entry() * Fix one judging condition domctl-u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM - domctl-u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM * Add to range check the flag passed to make future extensions possible (and to avoid ambiguity on what out of range values would mean). xen/arch/x86/mm/p2m.c | 7 -- xen/drivers/passthrough/amd/pci_amd_iommu.c | 3 ++- xen/drivers/passthrough/arm/smmu.c | 2 +- xen/drivers/passthrough/device_tree.c | 3 ++- xen/drivers/passthrough/pci.c | 15 xen/drivers/passthrough/vtd/iommu.c | 37 ++--- xen/include/asm-x86/p2m.h | 2 +- xen/include/public/domctl.h | 3 +++ xen/include/xen/iommu.h | 2 +- 9 files changed, 55 insertions(+), 19 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 99a26ca..47785dc 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -901,7 +901,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, } int set_identity_p2m_entry(struct domain *d, unsigned long gfn, - p2m_access_t p2ma) + p2m_access_t p2ma, unsigned int flag) { p2m_type_t p2mt; p2m_access_t a; @@ -923,7 +923,10 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn, ret = 0; else { -ret = -EBUSY; +if ( flag XEN_DOMCTL_DEV_RDM_RELAXED ) +ret = 0; +else +ret = -EBUSY; printk(XENLOG_G_WARNING Cannot setup identity map d%d:%lx, gfn already mapped to %lx.\n, diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c index e83bb35..920b35a 100644 --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct domain *target, } static int amd_iommu_assign_device(struct domain *d, u8 devfn, - struct pci_dev *pdev) + struct pci_dev *pdev, + u32 flag) { struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev-seg); int bdf = PCI_BDF2(pdev-bus, devfn); diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c index 6cc4394..9a667e9 100644 --- a/xen/drivers/passthrough/arm/smmu.c +++ b/xen/drivers/passthrough/arm/smmu.c @@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct iommu_domain *domain) } static int arm_smmu_assign_dev(struct domain *d, u8 devfn, - struct device *dev) + struct device *dev, u32 flag) { struct iommu_domain *domain; struct arm_smmu_xen_domain *xen_domain; diff --git a/xen/drivers/passthrough/device_tree.c b/xen/drivers/passthrough/device_tree.c index 5d3842a..7ff79f8 100644 --- a/xen/drivers/passthrough/device_tree.c +++ b/xen/drivers/passthrough/device_tree.c @@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev) goto fail; } -rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev)); +/* The flag field doesn't matter
[Xen-devel] [v10][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy
This patch passes rdm reservation policy to xc_assign_device() so the policy is checked when assigning devices to a VM. Note this also bring some fallout to python usage of xc_assign_device(). CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com CC: David Scott dave.sc...@eu.citrix.com Acked-by: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v6 ~ v10: * Nothing is changed. v5: * Fix the flag field as 0 to DT device v4: * In the patch head description, I add to explain why we need to sync the xc.c file tools/libxc/include/xenctrl.h | 3 ++- tools/libxc/xc_domain.c | 9 - tools/libxl/libxl_pci.c | 3 ++- tools/ocaml/libs/xc/xenctrl_stubs.c | 16 tools/python/xen/lowlevel/xc/xc.c | 30 -- 5 files changed, 44 insertions(+), 17 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 9160623..89cbc5a 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2079,7 +2079,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch, /* HVM guest pass-through */ int xc_assign_device(xc_interface *xch, uint32_t domid, - uint32_t machine_sbdf); + uint32_t machine_sbdf, + uint32_t flag); int xc_get_device_group(xc_interface *xch, uint32_t domid, diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index 0951291..ef41228 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch, int xc_assign_device( xc_interface *xch, uint32_t domid, -uint32_t machine_sbdf) +uint32_t machine_sbdf, +uint32_t flag) { DECLARE_DOMCTL; @@ -1705,6 +1706,7 @@ int xc_assign_device( domctl.domain = domid; domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI; domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf; +domctl.u.assign_device.flag = flag; return do_domctl(xch, domctl); } @@ -1792,6 +1794,11 @@ int xc_assign_dt_device( domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT; domctl.u.assign_device.u.dt.size = size; +/* + * DT doesn't own any RDM so actually DT has nothing to do + * for any flag and here just fix that as 0. + */ +domctl.u.assign_device.flag = 0; set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path); rc = do_domctl(xch, domctl); diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index e0743f8..632c15e 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i FILE *f; unsigned long long start, end, flags, size; int irq, i, rc, hvm = 0; +uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED; if (type == LIBXL_DOMAIN_TYPE_INVALID) return ERROR_FAIL; @@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i out: if (!libxl_is_stubdom(ctx, domid, NULL)) { -rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev)); +rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev), flag); if (rc 0 (hvm || errno != ENOSYS)) { LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, xc_assign_device failed); return ERROR_FAIL; diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c index 64f1137..b7de615 100644 --- a/tools/ocaml/libs/xc/xenctrl_stubs.c +++ b/tools/ocaml/libs/xc/xenctrl_stubs.c @@ -1172,12 +1172,17 @@ CAMLprim value stub_xc_domain_test_assign_device(value xch, value domid, value d CAMLreturn(Val_bool(ret == 0)); } -CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc) +static int domain_assign_device_rdm_flag_table[] = { +XEN_DOMCTL_DEV_RDM_RELAXED, +}; + +CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc, +value rflag) { - CAMLparam3(xch, domid, desc); + CAMLparam4(xch, domid, desc, rflag); int ret; int domain, bus, dev, func; - uint32_t sbdf; + uint32_t sbdf, flag; domain = Int_val(Field(desc, 0)); bus = Int_val(Field(desc, 1)); @@ -1185,7 +1190,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc) func = Int_val(Field(desc, 3)); sbdf = encode_sbdf(domain, bus, dev, func); - ret = xc_assign_device(_H(xch), _D(domid), sbdf); + ret = Int_val(Field(rflag, 0)); + flag = domain_assign_device_rdm_flag_table[ret]; + + ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag); if (ret 0)
[Xen-devel] [v10][PATCH 14/16] xen/vtd: enable USB device assignment
USB RMRR may conflict with guest BIOS region. In such case, identity mapping setup is simply skipped in previous implementation. Now we can handle this scenario cleanly with new policy mechanism so previous hack code can be removed now. CC: Yang Zhang yang.z.zh...@intel.com CC: Kevin Tian kevin.t...@intel.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com --- v5 ~ v10: * Nothing is changed. v4: * Refine the patch head description xen/drivers/passthrough/vtd/dmar.h | 1 - xen/drivers/passthrough/vtd/iommu.c | 11 ++- xen/drivers/passthrough/vtd/utils.c | 7 --- 3 files changed, 2 insertions(+), 17 deletions(-) diff --git a/xen/drivers/passthrough/vtd/dmar.h b/xen/drivers/passthrough/vtd/dmar.h index af1feef..af205f5 100644 --- a/xen/drivers/passthrough/vtd/dmar.h +++ b/xen/drivers/passthrough/vtd/dmar.h @@ -129,7 +129,6 @@ do {\ int vtd_hw_check(void); void disable_pmr(struct iommu *iommu); -int is_usb_device(u16 seg, u8 bus, u8 devfn); int is_igd_drhd(struct acpi_drhd_unit *drhd); #endif /* _DMAR_H_ */ diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index b5d658e..c8b0455 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -2243,11 +2243,9 @@ static int reassign_device_ownership( /* * If the device belongs to the hardware domain, and it has RMRR, don't * remove it from the hardware domain, because BIOS may use RMRR at - * booting time. Also account for the special casing of USB below (in - * intel_iommu_assign_device()). + * booting time. */ -if ( !is_hardware_domain(source) - !is_usb_device(pdev-seg, pdev-bus, pdev-devfn) ) +if ( !is_hardware_domain(source) ) { const struct acpi_rmrr_unit *rmrr; u16 bdf; @@ -2300,13 +2298,8 @@ static int intel_iommu_assign_device( if ( ret ) return ret; -/* FIXME: Because USB RMRR conflicts with guest bios region, - * ignore USB RMRR temporarily. - */ seg = pdev-seg; bus = pdev-bus; -if ( is_usb_device(seg, bus, pdev-devfn) ) -return 0; /* Setup rmrr identity mapping */ for_each_rmrr_device( rmrr, bdf, i ) diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c index bd14c02..b8a077f 100644 --- a/xen/drivers/passthrough/vtd/utils.c +++ b/xen/drivers/passthrough/vtd/utils.c @@ -29,13 +29,6 @@ #include extern.h #include asm/io_apic.h -int is_usb_device(u16 seg, u8 bus, u8 devfn) -{ -u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), -PCI_CLASS_DEVICE); -return (class == 0xc03); -} - /* Disable vt-d protected memory registers. */ void disable_pmr(struct iommu *iommu) { -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v10][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr
Currently we're intending to cover this kind of devices with shared RMRR simply since the case of shared RMRR is a rare case according to our previous experiences. But late we can group these devices which shared rmrr, and then allow all devices within a group to be assigned to same domain. CC: Yang Zhang yang.z.zh...@intel.com CC: Kevin Tian kevin.t...@intel.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com --- v10: * Noting is changed. v9: * Correct one indentation issue v8: * Merge two if{} as one if{} * Add to print RMRR range info when stop assign a group device v5 ~ v7: * Nothing is changed. v4: * Refine one code comment. xen/drivers/passthrough/vtd/iommu.c | 30 +++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index c8b0455..770e484 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -2294,13 +2294,37 @@ static int intel_iommu_assign_device( if ( list_empty(acpi_drhd_units) ) return -ENODEV; +seg = pdev-seg; +bus = pdev-bus; +/* + * In rare cases one given rmrr is shared by multiple devices but + * obviously this would put the security of a system at risk. So + * we should prevent from this sort of device assignment. + * + * TODO: in the future we can introduce group device assignment + * interface to make sure devices sharing RMRR are assigned to the + * same domain together. + */ +for_each_rmrr_device( rmrr, bdf, i ) +{ +if ( rmrr-segment == seg + PCI_BUS(bdf) == bus + PCI_DEVFN2(bdf) == devfn + rmrr-scope.devices_cnt 1 ) +{ +printk(XENLOG_G_ERR VTDPREFIX +cannot assign %04x:%02x:%02x.%u +with shared RMRR at %PRIx64 for Dom%d.\n, + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), + rmrr-base_address, d-domain_id); +return -EPERM; +} +} + ret = reassign_device_ownership(hardware_domain, d, devfn, pdev); if ( ret ) return ret; -seg = pdev-seg; -bus = pdev-bus; - /* Setup rmrr identity mapping */ for_each_rmrr_device( rmrr, bdf, i ) { -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 10/15] x86/altp2m: add remaining support routines.
On 18.07.15 at 00:32, ravi.sah...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, July 16, 2015 2:34 AM On 16.07.15 at 11:16, ravi.sah...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Tuesday, July 14, 2015 7:32 AM On 14.07.15 at 02:14, edmund.h.wh...@intel.com wrote: @@ -2965,9 +3003,15 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla, if ( npfec.write_access ) { paging_mark_dirty(currd, mfn_x(mfn)); +/* If p2m is really an altp2m, unlock here to avoid + lock ordering + * violation when the change below is propagated from + host p2m */ +if ( ap2m_active ) +__put_gfn(p2m, gfn); p2m_change_type_one(currd, gfn, p2m_ram_logdirty, p2m_ram_rw); And this won't result in any races? No To be honest I expected a little more than just no here. Now I have to ask - why? Yes, I should have described it more than that :-) so this part of the code is handling the log dirty transition of the page, and this page permission transition happens always on the hostp2m. Given the way the locking order is setup (hostp2m-altp2m-list-lock-altp2m and there was a separate writeup and discussion with George on that), at this point in this sequence there is a p2m lock (whether it's a hostp2m or altp2m lock depends on the mode of the domain) - the reason we have to drop the lock here first is due to what happens next; the permission changes in hostp2m will be serially propagated to altp2ms and not dropping the lock here would cause a locking order violation. Hope that clarifies. Sadly it doesn't at all: You re-explain why you need to drop the lock, while you fail to say anything on why this won't cause a race. +long p2m_init_altp2m_by_id(struct domain *d, uint16_t idx) { +long rc = -EINVAL; Why long (for both variable and function return type)? (More of these in functions below.) Because the error variable in the code that calls these (in hvm.c) is a long, and you had given feedback earlier to propagate the returns from these functions through that calling code. I don't see the connection. The function only returns zero or -E... values, so why would its return type be long? do_hvm_op declares a rc that is of type long and hence this returns a long What type your caller(s) return is of no interest at all here: What would you do if you had multiple callers with differing return types? A function's return type should be chosen based on the range of values it may return, and the result possibly widened to not yield inefficient code (like in some of the uint16_t cases elsewhere in the series would be necessary). +void p2m_altp2m_propagate_change(struct domain *d, gfn_t gfn, + mfn_t mfn, unsigned int page_order, + p2m_type_t p2mt, p2m_access_t +p2ma) { +struct p2m_domain *p2m; +p2m_access_t a; +p2m_type_t t; +mfn_t m; +uint16_t i; +bool_t reset_p2m; +unsigned int reset_count = 0; +uint16_t last_reset_idx = ~0; + +if ( !altp2m_active(d) ) +return; + +altp2m_list_lock(d); + +for ( i = 0; i MAX_ALTP2M; i++ ) +{ +if ( d-arch.altp2m_eptp[i] == INVALID_MFN ) +continue; + +p2m = d-arch.altp2m_p2m[i]; +m = get_gfn_type_access(p2m, gfn_x(gfn), t, a, 0, NULL); + +reset_p2m = 0; + +/* Check for a dropped page that may impact this altp2m */ +if ( mfn_x(mfn) == INVALID_MFN + gfn_x(gfn) = p2m-min_remapped_gfn + gfn_x(gfn) = p2m-max_remapped_gfn ) +reset_p2m = 1; Considering that this looks like an optimization, what's the downside of possibly having min=0 and max=end-of-address-space? I.e. can there a long latency operation result that's this way a guest can effect? ... A p2m is a gfn-mfn map, amongst other things. There is a reverse mfn-gfn map, but that is only valid for the host p2m. Unless the remap altp2m hypercall is used, the gfn-mfn map in every altp2m mirrors the gfn-mfn map in the host p2m (or a subset thereof, due to lazy-copy), so handling removal of an mfn from a guest is simple: do a reverse look up for the host p2m and mark the relevant gfn as invalid, then do the same for every altp2m where that gfn is currently valid. Remap changes things: it says take gfn1 and replace -mfn with the -mfn of gfn2. Here is where the optimization is used and the invalidate logic is: record the lowest and highest gfn2's that have been used in remap ops; if an mfn is dropped from the hostp2m, for the purposes of altp2m invalidation, see if the gfn derived from the host p2m reverse lookup falls within the range of used gfn2's. If it does, an invalidation is required. Which is why min and max are inited the way they are - hope the explanation clarifies this
Re: [Xen-devel] [xen-unstable test] 59681: regressions - FAIL
On 18/07/2015 13:07, osstest service owner wrote: flight 59681 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/59681/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail REGR. vs. 58965 test-amd64-i386-xl-qemuu-debianhvm-amd64 9 debian-hvm-install fail REGR. vs. 58965 Wei: Am I correct in remembering that you nominated these two tests for being added to the allowable category? ~Andrew Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds 6 xen-boot fail REGR. vs. 58965 test-amd64-i386-rumpuserxen-i386 15 rumpuserxen-demo-xenstorels/xenstorels.repeat fail like 58948 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 12 guest-localmigrate fail like 58965 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 58965 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 58965 Tests which did not succeed, but are not blocking: test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never pass version targeted for testing: xen 21d9b079e53805b68047d60d28cde224d09bbb40 baseline version: xen c40317f11b3f05e7c06a2213560c8471081f2662 Last test of basis58965 2015-06-29 02:08:30 Z 19 days Failing since 58974 2015-06-29 15:11:59 Z 18 days 23 attempts Testing same since59681 2015-07-18 00:10:50 Z0 days1 attempts People who touched revisions under test: Andrew Cooper andrew.coop...@citrix.com Anthony PERARD anthony.per...@citrix.com Ard Biesheuvel a...@linaro.org Ben Catterall ben.catter...@citrix.com Boris Ostrovsky boris.ostrov...@oracle.com Chao Peng chao.p.p...@linux.intel.com Chen Baozi baoz...@gmail.com Daniel De Graaf dgde...@tycho.nsa.gov Dario Faggioli dario.faggi...@citrix.com David Scott dave.sc...@citrix.com David Vrabel david.vra...@citrix.com Dietmar Hahn dietmar.h...@ts.fujitsu.com Elena Ufimtseva elena.ufimts...@oracle.com Eric Shelton eshel...@pobox.com Euan Harris euan.har...@citrix.com Fabio Fantoni fabio.fant...@m2r.biz Feng Wu feng...@intel.com George Dunlap george.dun...@eu.citrix.com Ian Campbell ian,campb...@citrix.com Ian Campbell ian.campb...@citrix.com Ian Jackson ian.jack...@eu.citrix.com Jan Beulich jbeul...@suse.com Jennifer Herbert jennifer.herb...@citrix.com Jim Fehlig jfeh...@suse.com Juergen Gross jgr...@suse.com Julien Grall julien.gr...@citrix.com Julien Grall julien.gr...@linaro.org Kevin Tian kevin.t...@intel.com Liang Li liang.z...@intel.com Paul Durrant paul.durr...@citrix.com Razvan Cojocaru rcojoc...@bitdefender.com Rob Hoes rob.h...@citrix.com Roger Pau MonnĂƒÂ© roger@citrix.com Ross Lagerwall ross.lagerw...@citrix.com Samuel Thibault samuel.thiba...@ens-lyon.org Sander Eikelenboom li...@eikelenboom.it Tamas K Lengyel tleng...@novetta.com Thomas Leonard tal...@gmail.com Tiejun Chen tiejun.c...@intel.com Tim Deegan t...@xen.org Vitaly Kuznetsov vkuzn...@redhat.com Wei Liu wei.l...@citrix.com Wen Congyang we...@cn.fujitsu.com Yang Hongyang yan...@cn.fujitsu.com Yang Zhang yang.z.zh...@intel.com jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-armhf
Re: [Xen-devel] [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
Before you add memory_map.nr_map, you should be able to iterate from 0 to (not inclusive) nr. At least as far as I recall the original patch. Sorry, I really don't understand what you want. Before we add memory_map.nr_map, e820[0, nr) don't include low/high memory, right? Why? memory_map is representing the reserved areas only, isn't it? If that's not the case, then of course everything is fine. I'm pretty sure the memory map we get here is an extension of the original PV-only get_e820 hypercall, which *does* include both the lowmem and highmem regions. In any case, it's pretty clear from the patched code that Tiejun is removing the old code which created the lowmem and highmem regions and is not replacing it. Where do you think the highmem region he's looking for was coming from? On second thoughts, I prefer to check/sync memory_map.map[] before copy them into e820 since ultimately this can make sure hvm_info, memory_map.map[] and e820 are on the same page. Anyway, I would send out v10 to address others so please further post your comments over there. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 05/13] x86/altp2m: basic data structures and support routines.
On 18.07.15 at 00:39, ravi.sah...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, July 16, 2015 2:02 AM On 16.07.15 at 10:48, ravi.sah...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Tuesday, July 14, 2015 1:53 AM On 14.07.15 at 02:01, ravi.sah...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Monday, July 13, 2015 1:01 AM On 10.07.15 at 23:48, ravi.sah...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, July 09, 2015 6:30 AM On 01.07.15 at 20:09, edmund.h.wh...@intel.com wrote: @@ -294,6 +298,12 @@ struct arch_domain struct p2m_domain *nested_p2m[MAX_NESTEDP2M]; mm_lock_t nested_p2m_lock; +/* altp2m: allow multiple copies of host p2m */ +bool_t altp2m_active; +struct p2m_domain *altp2m_p2m[MAX_ALTP2M]; +mm_lock_t altp2m_lock; +uint64_t *altp2m_eptp; This is a non-insignificant increase of the structure size - perhaps all of these should hang off of struct arch_domain via a single, separately allocated pointer? Is this a nice-to-have - again we modelled after the nestedhvm extensions to the struct. This will affect a lot of our patch without really changing how much memory is allocated. I understand that. To a certain point I can agree to limit changes to what is there at this stage. But you wanting to avoid addressing concerns basically everywhere it's not a bug overextends this. Just because the series was submitted late doesn't mean you should now expect us to give in on any controversy regarding aspects we would normally expect to be changed. This would basically encourage others to submit their stuff late too in the future, hoping for relaxed review. Couple things - first, we have absorbed a lot of (good) feedback - thanks for that. Second, I don't think the series can be characterized as late (feedback from others welcome). V1 had almost the same structure and was submitted in January. Still we're at v3 only here, not v10 or beyond. On this change - this will be a lot of effects on the code and we would like to avoid this one. While this may be a lot of mechanical change, I don't this presenting any major risk of breaking the code. On this one specific advice on how and where to implement such a change would be great just so that we don't thrash on this change. I don't follow - what to do here was said quite explicitly (still visible in the context above). I.e. I have no idea what additional advice you seek. Ok that's fine - sorry if this was unclear - I was seeking if you had some specific feedback on how to allocate and manage the dynamic altp2m struct etc (if you had an opinion would be good to hear that). xmalloc() / xzalloc(). What other alternatives would you see? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Preface working plan for altp2m during freeze exception
On 17.07.15 at 21:43, ravi.sah...@intel.com wrote: We are working on addressing review comments in this order (and you will see this pattern in our review responses): * Category 1 - Address review comments that affect ABI - these are of course required and will be addressed first. * Category 2 - Address review comments that do not affect ABI - we will try to address ones that we think we can realistically meet within the time bounds - we ask you for some flexibility on these. If these cannot be addressed within the allotted exception time-frame, hopefully these wont be blockers for 4.6 since they can be addressed by follow-on patches. Not sure - we've had bad experience with allowing code to go in with the promise for later adjustments (which then never happened)... * Category 3 - Address review comments that are really design questions - These we will try to address by short descriptions in review replies that attempt to give a gist of the design we followed, but of course design changes obviously cannot be done at this late stage - hopefully that is expected. If you really just mean questions on the design (rather than questions possibly resulting int the requirement to change the design), then that'd be fine of course. I think you understand that we shouldn't be deferring issues that require design adjustments. Otoh I don't even recall what design questions there were. * Category 4 - Address trivial changes as we naturally update patches, however if we run out of time, some may remain un-addressed (to be taken care of post 4.6). See above (point 2). Can we please get a yes - makes sense sort of acknowledgement of this plan from the Maintainers? Considering the limitations above, this is only a maybe from me. Y N [PATCH v3 05/15] x86/altp2m: basic data structures and support routines Status if not acked: Category 3: we will write a short description of some design questions in review replies Category 2: moving altp2m struct to be dynamically allocated - this has minor benefit and big downside so will be lower priority, also some error handling fixes Big downside? You're not referring to the mechanical adjustments this implies, are you? Y N [PATCH v3 07/15] VMX: add VMFUNC leaf 0 (EPTP switching) to emulator Status if not acked: Category 2/3 - changes staged after Jan's feedback on v5 - ack with those addressed in v6? Let's see what v6 looks like. Y N [PATCH v3 08/15] x86/altp2m: add control of suppress_ve Status if not acked: Now has r-b both Jan and George - need maintainers ack on this one please Who do you refer to by maintainer here? I think the trivial adjustment to xen/arch/x86/mm/mem_sharing.c can in the worst case go in without Andres' ack. And everything else is covered by George's authorship and review. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 59721: regressions - trouble: broken/fail/pass
On Sun, 2015-07-19 at 10:19 +, osstest service owner wrote: flight 59721 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/59721/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail REGR. vs. 58965 I think we have considered discounting i386 qemu-dm as a thing we care about, in which case we could force push this change since this was the only fail.. Tests which are failing intermittently (not blocking): test-amd64-i386-freebsd10-i386 3 host-install(3) broken pass in 59699 test-amd64-i386-qemuu-rhel6hvm-amd 3 host-install(3) broken pass in 59699 test-amd64-i386-rumpuserxen-i386 3 host-install(3) broken pass in 59699 test-amd64-i386-qemut-rhel6hvm-intel 3 host-install(3) broken pass in 59699 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 3 host-install(3) broken pass in 59699 test-amd64-i386-xl3 host-install(3) broken pass in 59699 test-amd64-i386-xl-qemut-debianhvm-amd64 3 host-install(3) broken pass in 59699 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 3 host-install(3) broken pass in 59699 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 3 host-install(3) broken pass in 59699 test-amd64-i386-xl-qemut-win7-amd64 3 host-install(3)broken pass in 59699 test-amd64-i386-xl-qemuu-win7-amd64 3 host-install(3)broken pass in 59699 test-amd64-i386-xl-qemut-winxpsp3 3 host-install(3) broken pass in 59699 test-amd64-i386-xl-qemuu-winxpsp3 3 host-install(3) broken pass in 59699 test-armhf-armhf-xl-multivcpu 14 guest-start.2 fail in 59699 pass in 59721 test-amd64-amd64-xl 9 debian-install fail pass in 59699 Some of these host-install fails are a bit worrying, but not a blocker to a forced push IMHO. version targeted for testing: xen 21d9b079e53805b68047d60d28cde224d09bbb40 baseline version: xen c40317f11b3f05e7c06a2213560c8471081f2662 Last test of basis58965 2015-06-29 02:08:30 Z 20 days Failing since 58974 2015-06-29 15:11:59 Z 19 days 25 attempts Testing same since59681 2015-07-18 00:10:50 Z1 days3 attempts Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] design philosophy of blktap
On Mon, Jul 20, 2015 at 6:01 AM, Xuehan Xu xxhdx1985...@gmail.com wrote: Hi, everyone. I don't quite follow the design philosophy of blktap. Since every virtual disk is backed by a tapdisk process, when there are hundreds of domU running and doing I/O operation simultaneously, which means that hundreds of tapdisk process are doing I/O read/write simulataneously in dom0, won't the I/O performance of domU be hurt badly? I think Felipe (cc'd) might be the best person to answer this sort of question. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/5] xl: Command line: Support -h everywhere
On Fri, Jul 17, 2015 at 06:00:49PM +0100, Ian Jackson wrote: xl subcommands ought all to take -h. def_getopt and hence SWITCH_FOREACH_OPT already handles 'h' by calling helpstr. None of the call sites see the 'h'. In this patch: * Change SWITCH_FOREACH_OPT to always add a h to the short opts string, using string concatenation. * Remove the now-redundant h's from some existing option strings. Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com Acked-by: Wei Liu wei.l...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/4] Fix to libxl migration v2 issue blocking OSSTest
On Fri, Jul 17, 2015 at 05:51:14PM +0100, Andrew Cooper wrote: And three improvements to debugging. Note that there is still a bug in libxl__toolstack_save() which valgrind identified, but I do not wish to block this bugfix on that Andrew Cooper (4): tools/libxc: Identify the path of the kernel image which cannot be found tools/libxl: Log the subject fd in datacopier messages tools/libxl: Identify copywhat in stream v2 datacopiers I think all three patches should wait until next development window opens unless we have nothing else in our queue (which doesn't seem to be the case at the moment). Wei. tools/libxl: Initialise the fd of the unused half of a datacopier tools/libxc/xg_private.c |2 +- tools/libxl/libxl_aoutils.c | 24 tools/libxl/libxl_stream_read.c | 19 +++ tools/libxl/libxl_stream_write.c |5 - 4 files changed, 28 insertions(+), 22 deletions(-) -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/5] xl: Command line: Remove maximum argument limit for network-attach
On Fri, Jul 17, 2015 at 06:00:48PM +0100, Ian Jackson wrote: This limit of 11 has been in this function since it was written, but serves no purpose. The extra arguments are fed one by one to parse_nic_config, and it is possible to have as many as you like. Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com Acked-by: Wei Liu wei.l...@citrix.com --- tools/libxl/xl_cmdimpl.c |5 - 1 file changed, 5 deletions(-) diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 55c041c..770b71c 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -6373,11 +6373,6 @@ int main_networkattach(int argc, char **argv) /* No options */ } -if (argc-optind 11) { -help(network-attach); -return 0; -} - domid = find_domain(argv[optind]); config= xlu_cfg_init(stderr, command line); -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 5/5] xl: Command line: Support xl vcpu-set --help
On Fri, Jul 17, 2015 at 06:00:51PM +0100, Ian Jackson wrote: This ended with a literal sentinel. Use COMMON_LONG_OPTIONS (which mentions --help) instead. Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com Acked-by: Wei Liu wei.l...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/4] tools/libxl: Log the subject fd in datacopier messages
On Fri, Jul 17, 2015 at 05:51:16PM +0100, Andrew Cooper wrote: This is a substantial aid to debugging Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/4] tools/libxl: Identify copywhat in stream v2 datacopiers
On Fri, Jul 17, 2015 at 05:51:17PM +0100, Andrew Cooper wrote: This is an aid to debugging Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] blktap2: update connection handling to fix build with gcc5
On Mon, Jul 20, 2015 at 11:16:34AM +0200, Olaf Hering wrote: On Mon, Jul 20, Jan Beulich wrote: On 19.07.15 at 11:33, o...@aepfle.de wrote: [ 198s] block-log.c:549:23: error: array subscript is above array bounds [-Werror=array-bounds] [ 198s] if (s-connections[i].id == id) [ 198s]^ So what makes the compiler right with that complaint? I.e. how does it know i 0 here? After all - afaict - s-connected can only be 0 or It has to assume that -connected can get any value because the input comes from outside the unit. To reduce the patch size i MAX_CONNECTIONS could be added. A smaller patch would be preferable at this stage. Wei. Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 59681: regressions - FAIL
On 20/07/2015 09:24, Wei Liu wrote: On Mon, Jul 20, 2015 at 09:18:48AM +0100, Andrew Cooper wrote: On 18/07/2015 13:07, osstest service owner wrote: flight 59681 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/59681/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail REGR. vs. 58965 test-amd64-i386-xl-qemuu-debianhvm-amd64 9 debian-hvm-install fail REGR. vs. 58965 Wei: Am I correct in remembering that you nominated these two tests for being added to the allowable category? Only the first one should be marked allowable. The second failure looks spurious. In the next xen-unstable run, it passed. Ah yes - so it did. Should we do a force push then? It appears that subsequent runs started hitting problems when cloning repos to build. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 59681: regressions - FAIL
On Mon, Jul 20, 2015 at 09:18:48AM +0100, Andrew Cooper wrote: On 18/07/2015 13:07, osstest service owner wrote: flight 59681 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/59681/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail REGR. vs. 58965 test-amd64-i386-xl-qemuu-debianhvm-amd64 9 debian-hvm-install fail REGR. vs. 58965 Wei: Am I correct in remembering that you nominated these two tests for being added to the allowable category? Only the first one should be marked allowable. The second failure looks spurious. In the next xen-unstable run, it passed. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] VT-d: add iommu=igfx_off option to workaround graphics issues
On 19/07/2015 16:53, Julien Grall wrote: Hi, On 17/07/2015 20:05, Ting-Wei Lan wrote: When using Linux = 3.19 (commit 47591df) as dom0 on some Intel Ironlake devices, It is possible to encounter graphics issues that make screen unreadable or crash the system. It was reported in freedesktop bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90037 As we still cannot find a proper fix for this problem, this patch adds iommu=igfx_off option that is similar to Linux intel_iommu=igfx_off for users to manually workaround the problem. I would prefer to see the introduction of intel_iommu parameter in the vtd code rather than extending again the iommu parameter with intel specific options in the common code which is both for ARM and x86. Having the parsing for this option inside some x86 ifdefary, fine, but lets not have multiple iommu= options. It is hard enough for people to drive one option (as people routinely get Xen and Linux command line options mixed up). ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-unstable test] 59759: regressions - trouble: blocked/broken/fail/pass
flight 59759 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/59759/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-i386-libvirt3 host-install(3) broken REGR. vs. 58965 build-i386-rumpuserxen5 rumpuserxen-build fail REGR. vs. 58965 build-armhf-pvops 5 kernel-build fail REGR. vs. 58965 build-amd64-pvops 5 kernel-build fail REGR. vs. 58965 build-i386-pvops 5 kernel-build fail REGR. vs. 58965 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-amd64-rumpuserxen-amd64 1 build-check(1) blocked n/a test-amd64-amd64-xl-multivcpu 1 build-check(1) blocked n/a test-amd64-i386-libvirt 1 build-check(1) blocked n/a test-amd64-i386-qemut-rhel6hvm-amd 1 build-check(1) blocked n/a test-amd64-amd64-xl-pvh-amd 1 build-check(1) blocked n/a test-amd64-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-i386-libvirt-xsm 1 build-check(1) blocked n/a test-armhf-armhf-xl-credit2 1 build-check(1) blocked n/a test-amd64-amd64-xl-xsm 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-qemuu-rhel6hvm-intel 1 build-check(1) blocked n/a test-amd64-amd64-xl-credit2 1 build-check(1) blocked n/a test-amd64-amd64-xl 1 build-check(1) blocked n/a test-armhf-armhf-xl-xsm 1 build-check(1) blocked n/a test-amd64-amd64-xl-rtds 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 1 build-check(1)blocked n/a test-armhf-armhf-xl-rtds 1 build-check(1) blocked n/a test-armhf-armhf-xl-arndale 1 build-check(1) blocked n/a test-amd64-amd64-xl-pvh-intel 1 build-check(1) blocked n/a test-armhf-armhf-xl 1 build-check(1) blocked n/a test-amd64-i386-freebsd10-i386 1 build-check(1) blocked n/a test-amd64-i386-qemuu-rhel6hvm-amd 1 build-check(1) blocked n/a test-amd64-i386-freebsd10-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-xsm1 build-check(1) blocked n/a test-amd64-i386-rumpuserxen-i386 1 build-check(1) blocked n/a test-amd64-i386-xl-qemut-debianhvm-amd64 1 build-check(1) blocked n/a test-amd64-i386-qemut-rhel6hvm-intel 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-i386-xl1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 1 build-check(1)blocked n/a test-amd64-amd64-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-debianhvm-amd64 1 build-check(1)blocked n/a test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-debianhvm-amd64 1 build-check(1) blocked n/a test-armhf-armhf-xl-multivcpu 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemut-debianhvm-amd64 1 build-check(1)blocked n/a test-armhf-armhf-libvirt-xsm 1 build-check(1) blocked n/a test-armhf-armhf-xl-cubietruck 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-amd64-pair 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-winxpsp3 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 1 build-check(1) blocked n/a test-amd64-i386-xl-qemut-win7-amd64 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemut-win7-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemut-winxpsp3-vcpus1 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-win7-amd64 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemut-winxpsp3 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-win7-amd64 1 build-check(1) blocked n/a test-amd64-i386-pair 1 build-check(1) blocked n/a test-amd64-i386-xl-qemut-winxpsp3 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-winxpsp3 1 build-check(1) blocked n/a version targeted for testing: xen 21d9b079e53805b68047d60d28cde224d09bbb40 baseline version: xen
Re: [Xen-devel] [xen-unstable test] 59721: regressions - trouble: broken/fail/pass
On 20.07.15 at 10:48, ian.campb...@citrix.com wrote: On Sun, 2015-07-19 at 10:19 +, osstest service owner wrote: flight 59721 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/59721/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail REGR. vs. 58965 I think we have considered discounting i386 qemu-dm as a thing we care about, in which case we could force push this change since this was the only fail.. Right, together with 59699 - as also just said be Wei - I agree: please go ahead. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] blktap2: update connection handling to fix build with gcc5
On Mon, Jul 20, Jan Beulich wrote: On 19.07.15 at 11:33, o...@aepfle.de wrote: [ 198s] block-log.c:549:23: error: array subscript is above array bounds [-Werror=array-bounds] [ 198s] if (s-connections[i].id == id) [ 198s]^ So what makes the compiler right with that complaint? I.e. how does it know i 0 here? After all - afaict - s-connected can only be 0 or It has to assume that -connected can get any value because the input comes from outside the unit. To reduce the patch size i MAX_CONNECTIONS could be added. Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] VT-d: add iommu=igfx_off option to workaround graphics issues
On 20/07/15 09:27, Andrew Cooper wrote: On 19/07/2015 16:53, Julien Grall wrote: Hi, On 17/07/2015 20:05, Ting-Wei Lan wrote: When using Linux = 3.19 (commit 47591df) as dom0 on some Intel Ironlake devices, It is possible to encounter graphics issues that make screen unreadable or crash the system. It was reported in freedesktop bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90037 As we still cannot find a proper fix for this problem, this patch adds iommu=igfx_off option that is similar to Linux intel_iommu=igfx_off for users to manually workaround the problem. I would prefer to see the introduction of intel_iommu parameter in the vtd code rather than extending again the iommu parameter with intel specific options in the common code which is both for ARM and x86. Having the parsing for this option inside some x86 ifdefary, fine, but lets not have multiple iommu= options. It is hard enough for people to drive one option (as people routinely get Xen and Linux command line options mixed up). If you are concerned about mixing Xen and Linux command line options, it would be more sensible to use intel_iommu=igfx_off on Xen to keep the options exactly the same as Linux. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] blktap2: update connection handling to fix build with gcc5
On 19.07.15 at 11:33, o...@aepfle.de wrote: blktap2 fails to build with gcc5 because it fails to recognize that there can be just one active connection (enforced in ctl_accept). Rearrange the code to handle just a single connection. Adjust two strerror calls to use errno instead -1 as input. [ 198s] block-log.c: In function 'ctl_close_sock': [ 198s] block-log.c:363:23: error: array subscript is above array bounds [-Werror=array-bounds] [ 198s] if (s-connections[i].fd == fd) { [ 198s]^ [ 198s] block-log.c: In function 'ctl_request': [ 198s] block-log.c:549:23: error: array subscript is above array bounds [-Werror=array-bounds] [ 198s] if (s-connections[i].id == id) [ 198s]^ So what makes the compiler right with that complaint? I.e. how does it know i 0 here? After all - afaict - s-connected can only be 0 or 1, and hence either the loop bodies don't get entered at all or the loops have only a single iteration. Smells like a compiler bug instead, which I'm not sure we want to work around with a non-trivial patch like this. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] xen/mmu: Copy and revector the P2M tree.
Hello Konrad Rzeszutek Wilk, The patch 7f9140626c75: xen/mmu: Copy and revector the P2M tree. from Jul 26, 2012, leads to the following static checker warning: arch/x86/xen/mmu.c:1105 xen_cleanhighmap() warn: potential pointer math issue ('level2_kernel_pgt' is pointer to unsigned long) arch/x86/xen/mmu.c 1096 #ifdef CONFIG_X86_64 1097 static void __init xen_cleanhighmap(unsigned long vaddr, 1098 unsigned long vaddr_end) 1099 { 1100 unsigned long kernel_end = roundup((unsigned long)_brk_end, PMD_SIZE) - 1; 1101 pmd_t *pmd = level2_kernel_pgt + pmd_index(vaddr); 1102 1103 /* NOTE: The loop is more greedy than the cleanup_highmap variant. 1104 * We include the PMD passed in on _both_ boundaries. */ 1105 for (; vaddr = vaddr_end (pmd (level2_kernel_pgt + PAGE_SIZE)); ^ This pointer math is weird because we typically think of PAGE_SIZE as a number of bytes but since level2_kernel_pgt is a pointer to unsigned long, it looks like this loop can go through more iterations than intended. 1106 pmd++, vaddr += PMD_SIZE) { 1107 if (pmd_none(*pmd)) 1108 continue; 1109 if (vaddr (unsigned long) _text || vaddr kernel_end) 1110 set_pmd(pmd, __pmd(0)); } 1112 /* In case we did something silly, we should crash in this function 1113 * instead of somewhere later and be confusing. */ 1114 xen_mc_flush(); 1115 } regards, dan carpenter ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] blktap2: update connection handling to fix build with gcc5
On 20.07.15 at 11:16, o...@aepfle.de wrote: On Mon, Jul 20, Jan Beulich wrote: On 19.07.15 at 11:33, o...@aepfle.de wrote: [ 198s] block-log.c:549:23: error: array subscript is above array bounds [-Werror=array-bounds] [ 198s] if (s-connections[i].id == id) [ 198s]^ So what makes the compiler right with that complaint? I.e. how does it know i 0 here? After all - afaict - s-connected can only be 0 or It has to assume that -connected can get any value because the input comes from outside the unit. But then the warning can't be is but only may be, and a similar warning would appear for _all_ other array accesses with compile time known upper bound (and that's clearly not the case). I'm guessing that the compiler wrongly applies a heuristic that every loop over an array has at least two iteration (or some such). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/4] Fix to libxl migration v2 issue blocking OSSTest
On 20/07/15 10:56, Wei Liu wrote: On Fri, Jul 17, 2015 at 05:51:14PM +0100, Andrew Cooper wrote: And three improvements to debugging. Note that there is still a bug in libxl__toolstack_save() which valgrind identified, but I do not wish to block this bugfix on that Andrew Cooper (4): tools/libxc: Identify the path of the kernel image which cannot be found tools/libxl: Log the subject fd in datacopier messages tools/libxl: Identify copywhat in stream v2 datacopiers I think all three patches should wait until next development window opens unless we have nothing else in our queue (which doesn't seem to be the case at the moment). You mean delay until 4.7? I disagree. Without these fixes, debugging issues is substantially harder than they need to be. They literally are only adding extra information into existing error messages. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] VT-d: add iommu=igfx_off option to workaround graphics issues
On 17.07.15 at 21:05, lant...@gmail.com wrote: @@ -87,6 +88,8 @@ static void __init parse_iommu_param(char *s) force_iommu = val; else if ( !strcmp(s, workaround_bios_bug) ) iommu_workaround_bios_bug = val; +else if ( !strcmp(s, igfx_off) ) +iommu_igfx_off = val; Just FTR (moot with both Kevin's and Andrew's replies, which I agree with) this should be igfx, with the intended use then being iommu=no-igfx. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen: release lock occasionally during ballooning
On 10/07/15 15:42, Juergen Gross wrote: When dom0 is being ballooned balloon_process() will hold the balloon mutex until it is finished. This will block e.g. creation of new domains as the device backends for the new domain need some autoballooned pages for the ring buffers. Avoid this by releasing the balloon mutex from time to time during ballooning. Add a state variable to indicate one balloon_process() is active to avoid multiple balloon processes fighting for the mutex. Is this state variable necessary? balloon_process() is a work item so there should only be one instance of it running anyway, yes? David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 59699: regressions - FAIL
On Sat, Jul 18, 2015 at 11:25:48PM +, osstest service owner wrote: flight 59699 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/59699/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail REGR. vs. 58965 This one should be marked allowable, because as far as I can remember it failed when I first added it to OSSTest. It somehow passed. And now it is blocking the push. There are two more flights (59721, 59736) that tested the same commit, but they failed due to infrastructure problems. So those two reports should be ignored. In light of this test being allowable to fail, we don't have any blocking test failure in this flight. I suggest we do a force push. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2] xl: fix vcpus to vnode assignement in config file
In fact, right now, if the vcpus= list (where the user specifies what vcpus should be part of a vnode) has multiple elements, things don't work. E.g., the following examples all result in failure to create the guest: [ pnode=0,size=512,vcpus=0,2,vdistances=10,20 ] [ pnode=0,size=512,vcpus=0-1,4,vdistances=10,20 ] Reason is we need either a multidimentional array, or a bitmap, to temporary store the vcpus of a vnode, while parsing the vnuma config entry. Let's use the latter, which happens to also make it easier to copy the outcome of the parsing to its final destination in b_info, if everything goes ok. Signed-off-by: Dario Faggioli dario.faggi...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com --- Changes from v1: * fix coding style --- Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Ian Campbell ian.campb...@citrix.com --- tools/libxl/xl_cmdimpl.c | 34 ++ 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 8cbf30e..b41874a 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -1076,9 +1076,7 @@ static void parse_vnuma_config(const XLU_Config *config, /* Temporary storage for parsed vcpus information to avoid * parsing config twice. This array has num_vnuma elements. */ -struct vcpu_range_parsed { -unsigned long start, end; -} *vcpu_range_parsed; +libxl_bitmap *vcpu_parsed; libxl_physinfo_init(physinfo); if (libxl_get_physinfo(ctx, physinfo) != 0) { @@ -1095,7 +1093,14 @@ static void parse_vnuma_config(const XLU_Config *config, b_info-num_vnuma_nodes = num_vnuma; b_info-vnuma_nodes = xcalloc(num_vnuma, sizeof(libxl_vnode_info)); -vcpu_range_parsed = xcalloc(num_vnuma, sizeof(*vcpu_range_parsed)); +vcpu_parsed = xcalloc(num_vnuma, sizeof(libxl_bitmap)); +for (i = 0; i num_vnuma; i++) { +libxl_bitmap_init(vcpu_parsed[i]); +if (libxl_cpu_bitmap_alloc(ctx, vcpu_parsed[i], b_info-max_vcpus)) { +fprintf(stderr, libxl_node_bitmap_alloc failed.\n); +exit(1); +} +} for (i = 0; i b_info-num_vnuma_nodes; i++) { libxl_vnode_info *p = b_info-vnuma_nodes[i]; @@ -1165,12 +1170,14 @@ static void parse_vnuma_config(const XLU_Config *config, split_string_into_string_list(value, ,, cpu_spec_list); len = libxl_string_list_length(cpu_spec_list); -for (j = 0; j len; j++) +for (j = 0; j len; j++) { parse_range(cpu_spec_list[j], s, e); +for (; s = e; s++) { +libxl_bitmap_set(vcpu_parsed[i], s); +max_vcpus++; +} +} -vcpu_range_parsed[i].start = s; -vcpu_range_parsed[i].end = e; -max_vcpus += (e - s + 1); libxl_string_list_dispose(cpu_spec_list); } else if (!strcmp(vdistances, option)) { libxl_string_list vdist; @@ -1209,17 +1216,12 @@ static void parse_vnuma_config(const XLU_Config *config, for (i = 0; i b_info-num_vnuma_nodes; i++) { libxl_vnode_info *p = b_info-vnuma_nodes[i]; -int cpu; -libxl_cpu_bitmap_alloc(ctx, p-vcpus, b_info-max_vcpus); -libxl_bitmap_set_none(p-vcpus); -for (cpu = vcpu_range_parsed[i].start; - cpu = vcpu_range_parsed[i].end; - cpu++) -libxl_bitmap_set(p-vcpus, cpu); +libxl_bitmap_copy_alloc(ctx, p-vcpus, vcpu_parsed[i]); +libxl_bitmap_dispose(vcpu_parsed[i]); } -free(vcpu_range_parsed); +free(vcpu_parsed); } static void parse_config_data(const char *config_source, ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] blktap2: update connection handling to fix build with gcc5
On Mon, 20 Jul 2015, Jan Beulich wrote: On 19.07.15 at 11:33, o...@aepfle.de wrote: blktap2 fails to build with gcc5 because it fails to recognize that there can be just one active connection (enforced in ctl_accept). Rearrange the code to handle just a single connection. Adjust two strerror calls to use errno instead -1 as input. [ 198s] block-log.c: In function 'ctl_close_sock': [ 198s] block-log.c:363:23: error: array subscript is above array bounds [-Werror=array-bounds] [ 198s] if (s-connections[i].fd == fd) { [ 198s]^ [ 198s] block-log.c: In function 'ctl_request': [ 198s] block-log.c:549:23: error: array subscript is above array bounds [-Werror=array-bounds] [ 198s] if (s-connections[i].id == id) [ 198s]^ So what makes the compiler right with that complaint? I.e. how does it know i 0 here? After all - afaict - s-connected can only be 0 or 1, and hence either the loop bodies don't get entered at all or the loops have only a single iteration. Smells like a compiler bug instead, which I'm not sure we want to work around with a non-trivial patch like this. Does this depend on the gcc 5 version? I think this is something I thought I was going to have to fix on Fedora and then found I didn't. Michael Young ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] xen: sched: reorganize cpu_disable_scheduler()
On 07/17/2015 03:35 PM, Dario Faggioli wrote: The function is called both when we want to remove a cpu from a cpupool, and during cpu teardown, for suspend or shutdown. If, however, the boot cpu (cpu 0, most of the times) is not present in the default cpupool, during suspend or shutdown, Xen crashes like this: root@Zhaman:~# xl cpupool-cpu-remove Pool-0 0 root@Zhaman:~# shutdown -h now (XEN) [ Xen-4.6-unstable x86_64 debug=y Tainted:C ] ... (XEN) Xen call trace: (XEN)[82d0801238de] _csched_cpu_pick+0x156/0x61f (XEN)[82d080123db5] csched_cpu_pick+0xe/0x10 (XEN)[82d08012de3c] vcpu_migrate+0x18e/0x321 (XEN)[82d08012e4f8] cpu_disable_scheduler+0x1cf/0x2ac (XEN)[82d08018bb8d] __cpu_disable+0x313/0x36e (XEN)[82d080101424] take_cpu_down+0x34/0x3b (XEN)[82d08013097a] stopmachine_action+0x70/0x99 (XEN)[82d0801325f0] do_tasklet_work+0x78/0xab (XEN)[82d080132926] do_tasklet+0x5e/0x8a (XEN)[82d08016478c] idle_loop+0x56/0x6b (XEN) (XEN) (XEN) (XEN) Panic on CPU 15: (XEN) Assertion 'cpu nr_cpu_ids' failed at ...URCES/xen/xen/xen.git/xen/include/xen/cpumask.h:97 (XEN) There also are problems when we try to suspend or shutdown with a cpupool configured with just one cpu (no matter, in this case, whether that is the boot cpu or not): root@Zhaman:~# xl create /etc/xen/test.cfg root@Zhaman:~# xl cpupool-migrate test Pool-1 root@Zhaman:~# xl cpupool-list -c Name CPU list Pool-0 0,1,2,3,4,5,6,7,8,9,10,11,13,14,15 Pool-1 12 root@Zhaman:~# shutdown -h now (XEN) [ Xen-4.6-unstable x86_64 debug=y Tainted:C ] (XEN) CPU:12 ... (XEN) Xen call trace: (XEN)[82d08018bb91] __cpu_disable+0x317/0x36e (XEN)[82d080101424] take_cpu_down+0x34/0x3b (XEN)[82d08013097a] stopmachine_action+0x70/0x99 (XEN)[82d0801325f0] do_tasklet_work+0x78/0xab (XEN)[82d080132926] do_tasklet+0x5e/0x8a (XEN)[82d08016478c] idle_loop+0x56/0x6b (XEN) (XEN) (XEN) (XEN) Panic on CPU 12: (XEN) Xen BUG at smpboot.c:895 (XEN) In both cases, the problem is the scheduler not being able to: - move all the vcpus to the boot cpu (as the boot cpu is not in the cpupool), in the former; - move the vcpus away from a cpu at all (as that is the only one cpu in the cpupool), in the latter. Solution is to distinguish, inside cpu_disable_scheduler(), the two cases of cpupool manipulation and teardown. For cpupool manipulation, it is correct to ask the scheduler to take an action, as pathological situation (like there not being any cpu in the pool where to send vcpus) are taken care of (i.e., forbidden!) already. For suspend and shutdown, we don't want the scheduler to be involved at all, as the final goal is pretty simple: send all the vcpus to the boot cpu ASAP, so we just go for it. Signed-off-by: Dario Faggioli dario.faggi...@citrix.com --- Changes from v1: * BUG_ON() if, in the suspend/shutdown case, the mask of online pCPUs will ever get empty, as suggested during review; * reorganize and improve comments inside cpu_disable_scheduler() as suggested during review; * make it more clear that vcpu_move_nosched() (name changed, as suggested during review), should only be called from quite contextes, such us, during suspend s/quite/quiet/ or shutdown. Do that via both comments and asserts, as requested during review; * reorganize cpu_disable_scheduler() and vcpu_move_nosched() so that calling to sleep and wakeup functions are only called when necessary (i.e., *not* in case we are suspending/shutting down, as requested during review. --- Cc: George Dunlap george.dun...@eu.citrix.com Cc: Juergen Gross jgr...@suse.com --- xen/common/schedule.c | 102 + 1 file changed, 86 insertions(+), 16 deletions(-) diff --git a/xen/common/schedule.c b/xen/common/schedule.c index df8c1d0..ed0f356 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c ... @@ -644,25 +673,66 @@ int cpu_disable_scheduler(unsigned int cpu) cpumask_setall(v-cpu_hard_affinity); } -if ( v-processor == cpu ) +if ( v-processor != cpu ) { -set_bit(_VPF_migrating, v-pause_flags); +/* This vcpu is not on cpu, so we can move on. */ vcpu_schedule_unlock_irqrestore(lock, flags, v); -vcpu_sleep_nosync(v); -vcpu_migrate(v); +continue; +} + +/* If it is on cpu, we must send it away. */ +if ( unlikely(system_state
Re: [Xen-devel] [v10][PATCH 06/16] hvmloader/pci: Try to avoid placing BARs in RMRRs
On Mon, Jul 20, 2015 at 7:16 AM, Tiejun Chen tiejun.c...@intel.com wrote: Try to avoid placing PCI BARs over RMRRs: - If mmio_hole_size is not specified, and the existing MMIO range has RMRRs in it, and there is space to expand the hole in lowmem without moving more memory, then make the MMIO hole as large as possible. - When placing RMRRs, find the next RMRR higher than the current base in the lowmem mmio hole. If it overlaps, skip ahead of it and find the next one. This certainly won't work in all cases, but it should work in a significant number of cases. Additionally, users should be able to work around problems by setting mmio_hole_size larger in the guest config. Signed-off-by: George Dunlap george.dun...@eu.citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v10: * This is from George' draft patch which implements an acceptable solution in current cycle. Here I just implemented check_overlap_all() and some cleanups. Changes: Reviewed-by: George Dunlap george.dun...@eu.citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] blkif.h: document physical-device node
El 07/07/15 a les 17.48, Wei Liu ha escrit: This node is used by toolstack (libxl, hotplug script) and blkback. Signed-off-by: Wei Liu wei.l...@citrix.com --- Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Roger Pau Monne roger@citrix.com Cc: David Vrabel david.vra...@citrix.com Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: George Dunlap george.dun...@eu.citrix.com I notice this when I was looking at George's patch series to fix hotplug scripts. If anyone has better idea how we can document this feel free to provide alternative. --- xen/include/public/io/blkif.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h index 6baf7fb..8f0f9a6 100644 --- a/xen/include/public/io/blkif.h +++ b/xen/include/public/io/blkif.h @@ -92,6 +92,12 @@ * backend driver to open the backing device. (e.g. the path to the * file or block device representing the backing store.) * + * physical-device + * Values: MAJOR:MINOR + * + * MAJOR and MINOR are the major number and minor number of the + * backing device respectively. + * Acked-by: Roger Pau Monné roger@citrix.com Although I plan to expand this field so it's also used by FreeBSD blkback with a different format. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Request backport to xen qemu-upstream gits of a fix that solves occasional qemu crash using spice
On Wed, 15 Jul 2015, Fabio Fantoni wrote: On production system using xen 4.5 I had occasional qemu crash using spice, from qemu log: qemu-system-i386: spice-qemu-char.c:173: spice_chr_add_watch: Assertion `cond == G_IO_OUT' failed. It was solved upstream with this patch: http://git.qemu.org/?p=qemu.git;a=commitdiff;h=f7a8beb5e6a13dc924895244777d9ef08b23b367 I already applied it on 2 my systems with xen=4.5 without find regressions. It is a small patch that affect only users that use spice solving an occasional qemu using spice, so I think is good backport it to qemu-upstream-unstable.git and qemu-upstream-4.5-testing.git. On qemu-upstream-4.4-testing.git and older is not needed because they have older qemu without patch that introduced the regression (serial: poll the serial console with G_IO_HUP). Done. Thanks for pointing it out! ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [Patch V2] xen: release lock occasionally during ballooning
When dom0 is being ballooned balloon_process() will hold the balloon mutex until it is finished. This will block e.g. creation of new domains as the device backends for the new domain need some autoballooned pages for the ring buffers. Avoid this by releasing the balloon mutex from time to time during ballooning. Adjust the comment above balloon_process() regarding multiple instances of balloon_process(). Instead of open coding it, just use cond_resched(). Signed-off-by: Juergen Gross jgr...@suse.com --- drivers/xen/balloon.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index fd93369..bf4a23c 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -472,7 +472,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) } /* - * We avoid multiple worker processes conflicting via the balloon mutex. + * As this is a work item it is guaranteed to run as a single instance only. * We may of course race updates of the target counts (which are protected * by the balloon lock), or with changes to the Xen hard limit, but we will * recover from these in time. @@ -482,9 +482,10 @@ static void balloon_process(struct work_struct *work) enum bp_state state = BP_DONE; long credit; - mutex_lock(balloon_mutex); do { + mutex_lock(balloon_mutex); + credit = current_credit(); if (credit 0) { @@ -499,17 +500,15 @@ static void balloon_process(struct work_struct *work) state = update_schedule(state); -#ifndef CONFIG_PREEMPT - if (need_resched()) - schedule(); -#endif + mutex_unlock(balloon_mutex); + + cond_resched(); + } while (credit state == BP_DONE); /* Schedule more work if there is some still to be done. */ if (state == BP_EAGAIN) schedule_delayed_work(balloon_worker, balloon_stats.schedule_delay * HZ); - - mutex_unlock(balloon_mutex); } /* Resets the Xen limit, sets new target, and kicks off processing. */ -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 07/16] hvmloader/e820: construct guest e820 table
On 20.07.15 at 08:16, tiejun.c...@intel.com wrote: +/* If there was no highmem region, just create one. */ +if ( i == memory_map.nr_map ) +{ +memory_map.map[i].addr = ((uint64_t)1 32); +memory_map.map[i].size = add_high_mem; +memory_map.map[i].type = E820_RAM; Don't you need to increment memory_map.nr_map here? +} + +/* A sanity check if high memory is broken. */ +BUG_ON( high_mem_end != +memory_map.map[i].addr + memory_map.map[i].size); +} + +/* Now fulfill e820. */ s/fulfill/fill/. +/* Finally we need to sort all e820 entries. */ +for ( j = 0; j nr-1; j++ ) +{ +for ( i = j+1; i nr; i++ ) Blanks around binary operators please. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] xen: sched: reorganize cpu_disable_scheduler()
On Mon, 2015-07-20 at 13:41 +0200, Juergen Gross wrote: On 07/17/2015 03:35 PM, Dario Faggioli wrote: @@ -644,25 +673,66 @@ int cpu_disable_scheduler(unsigned int cpu) cpumask_setall(v-cpu_hard_affinity); } -if ( v-processor == cpu ) +if ( v-processor != cpu ) { -set_bit(_VPF_migrating, v-pause_flags); +/* This vcpu is not on cpu, so we can move on. */ vcpu_schedule_unlock_irqrestore(lock, flags, v); -vcpu_sleep_nosync(v); -vcpu_migrate(v); +continue; +} + +/* If it is on cpu, we must send it away. */ +if ( unlikely(system_state == SYS_STATE_suspend) ) +{ +/* + * If we are doing a shutdown/suspend, it is not necessary to + * ask the scheduler to chime in. In fact: + * * there is no reason for it: the end result we are after + *is just 'all the vcpus on the boot pcpu, and no vcpu + *anywhere else', so let's just go for it; + * * it's wrong, for cpupools with only non-boot pcpus, as + *the scheduler would always fail to send the vcpus away + *from the last online (non boot) pcpu! + * + * Therefore, in the shutdown/suspend case, we just pick up + * one (still) online pcpu. Note that, at this stage, all + * domains (including dom0) have been paused already, so we + * do not expect any vcpu activity at all. + */ +cpumask_andnot(online_affinity, cpu_online_map, + cpumask_of(cpu)); +BUG_ON(cpumask_empty(online_affinity)); +/* + * As boot cpu is, usually, pcpu #0, using cpumask_first() + * will make us converge quicker. + */ +new_cpu = cpumask_first(online_affinity); +vcpu_move_nosched(v, new_cpu); Shouldn't there be a vcpu_schedule_unlock_irqrestore() ? I'm sure I put one there, as I was sure that it was there the last time I inspected the patch before hitting send. But I see that it's not there now, so I must have messed up when formatting the patch, or something like that. :-( It's really really weird, as I forgot it during development, and then the system was hanging, and then I added it, and that's why I'm sure I did have it in place... but perhaps I fat fingered some stgit command which made it disappear. In any case, sorry for this. I will re-test (just to be sure) and re-send (and this time I'll triple check!!) Thanks and Regards, Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/6] tools/libx{l, c}: Remove the toolstack_{save, restore} callbacks
Update the libxc spec to indicate more sternly that TOOLSTACK records should no longer be used. Also, trim further toolstack infrastructure which should have gone in c/s 39bf4e9 tools/libxl: Drop all knowledge of toolstack callbacks Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- docs/specs/libxc-migration-stream.pandoc | 14 ++ tools/libxc/include/xenguest.h | 12 tools/libxl/libxl_internal.h |1 - tools/libxl/libxl_save_callout.c |2 -- 4 files changed, 6 insertions(+), 23 deletions(-) diff --git a/docs/specs/libxc-migration-stream.pandoc b/docs/specs/libxc-migration-stream.pandoc index 68fa513..9d8f17b 100644 --- a/docs/specs/libxc-migration-stream.pandoc +++ b/docs/specs/libxc-migration-stream.pandoc @@ -219,7 +219,7 @@ type 0x: END 0x000A: HVM_PARAMS - 0x000B: TOOLSTACK + 0x000B: TOOLSTACK (deprecated) 0x000C: X86_PV_VCPU_MSRS @@ -540,17 +540,15 @@ param value Parameter value. \clearpage -TOOLSTACK -- +TOOLSTACK (deprecated) +-- + + *This record was only present for transitionary purposes during + development. It is should not be used.* An opaque blob provided by and supplied to the higher layers of the toolstack (e.g., libxl) during save and restore. - This is only temporary -- the intention is the toolstack takes care - of this itself. This record is only present for early development - purposes and will be removed before submissions, along with changes - to libxl which cause libxl to handle this data itself. - 0 1 2 3 4 5 6 7 octet +++ | data| diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index e95af54..231ec6e 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -67,14 +67,6 @@ struct save_callbacks { /* Enable qemu-dm logging dirty pages to xen */ int (*switch_qemu_logdirty)(int domid, unsigned enable, void *data); /* HVM only */ -/* Save toolstack specific data - * @param buf the buffer with the data to be saved - * @param len the length of the buffer - * The callee allocates the buffer, the caller frees it (buffer must - * be free'able). - */ -int (*toolstack_save)(uint32_t domid, uint8_t **buf, uint32_t *len, void *data); - /* to be provided as the last argument to each callback function */ void* data; }; @@ -98,10 +90,6 @@ int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_ite /* callbacks provided by xc_domain_restore */ struct restore_callbacks { -/* callback to restore toolstack specific data */ -int (*toolstack_restore)(uint32_t domid, const uint8_t *buf, -uint32_t size, void* data); - /* A checkpoint record has been found in the stream. * returns: */ #define XGR_CHECKPOINT_ERROR0 /* Terminate processing */ diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 2b6b2a0..4ab564a 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2744,7 +2744,6 @@ _hidden void libxl__datacopier_prefixdata(libxl__egc*, libxl__datacopier_state*, libxl__ev_fd readable; libxl__ev_child child; const char *stdin_what, *stdout_what; -FILE *toolstack_data_file; libxl__egc *egc; /* valid only for duration of each event callback; * is here in this struct for the benefit of the diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c index f2ce868..3af99af 100644 --- a/tools/libxl/libxl_save_callout.c +++ b/tools/libxl/libxl_save_callout.c @@ -70,7 +70,6 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs, shs-completion_callback = libxl__xc_domain_restore_done; shs-caller_state = dcs; shs-need_results = 1; -shs-toolstack_data_file = 0; run_helper(egc, shs, --restore-domain, restore_fd, 0, 0, argnums, ARRAY_SIZE(argnums)); @@ -333,7 +332,6 @@ static void helper_done(libxl__egc *egc, libxl__save_helper_state *shs) libxl__carefd_close(shs-pipes[0]); shs-pipes[0] = 0; libxl__carefd_close(shs-pipes[1]); shs-pipes[1] = 0; assert(!libxl__save_helper_inuse(shs)); -if (shs-toolstack_data_file) fclose(shs-toolstack_data_file); shs-egc = egc; shs-completion_callback(egc, shs-caller_state, -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 4/6] tools/libx{l, c}: Drop '2' suffixes from xc_domain_{save, restore}2() functions
As there is now only the one implementation. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxc/include/xenguest.h | 14 -- tools/libxc/xc_nomigrate.c | 20 tools/libxc/xc_sr_restore.c | 14 +++--- tools/libxc/xc_sr_save.c|6 +++--- tools/libxl/libxl_save_helper.c |4 ++-- 5 files changed, 12 insertions(+), 46 deletions(-) diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index 9772576..f837bde 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -83,11 +83,6 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */, struct save_callbacks* callbacks, int hvm); -/* Domain Save v2 */ -int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters, -uint32_t max_factor, uint32_t flags, -struct save_callbacks* callbacks, int hvm); - /* callbacks provided by xc_domain_restore */ struct restore_callbacks { /* A checkpoint record has been found in the stream. @@ -127,15 +122,6 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, int checkpointed_stream, struct restore_callbacks *callbacks); -/* Domain Restore v2 */ -int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom, - unsigned int store_evtchn, unsigned long *store_mfn, - domid_t store_domid, unsigned int console_evtchn, - unsigned long *console_mfn, domid_t console_domid, - unsigned int hvm, unsigned int pae, int superpages, - int checkpointed_stream, - struct restore_callbacks *callbacks); - /** * This function will create a domain for a paravirtualized Linux * using file names pointing to kernel and ramdisk diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c index 426aee6..76978a0 100644 --- a/tools/libxc/xc_nomigrate.c +++ b/tools/libxc/xc_nomigrate.c @@ -29,14 +29,6 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter return -1; } -int xc_domain_save2(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters, -uint32_t max_factor, uint32_t flags, -struct save_callbacks* callbacks, int hvm) -{ -errno = ENOSYS; -return -1; -} - int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, unsigned int store_evtchn, unsigned long *store_mfn, domid_t store_domid, unsigned int console_evtchn, @@ -49,18 +41,6 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, return -1; } -int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom, - unsigned int store_evtchn, unsigned long *store_mfn, - domid_t store_domid, unsigned int console_evtchn, - unsigned long *console_mfn, domid_t console_domid, - unsigned int hvm, unsigned int pae, int superpages, - int checkpointed_stream, - struct restore_callbacks *callbacks) -{ -errno = ENOSYS; -return -1; -} - /* * Local variables: * mode: C diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index bf1ee15..ebc8f2f 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -714,13 +714,13 @@ static int restore(struct xc_sr_context *ctx) return rc; } -int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom, - unsigned int store_evtchn, unsigned long *store_mfn, - domid_t store_domid, unsigned int console_evtchn, - unsigned long *console_gfn, domid_t console_domid, - unsigned int hvm, unsigned int pae, int superpages, - int checkpointed_stream, - struct restore_callbacks *callbacks) +int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, + unsigned int store_evtchn, unsigned long *store_mfn, + domid_t store_domid, unsigned int console_evtchn, + unsigned long *console_gfn, domid_t console_domid, + unsigned int hvm, unsigned int pae, int superpages, + int checkpointed_stream, + struct restore_callbacks *callbacks) { struct xc_sr_context ctx = { diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c index d63b783..7b38051 100644 --- a/tools/libxc/xc_sr_save.c +++ b/tools/libxc/xc_sr_save.c @@
[Xen-devel] [PATCH 6/6] tools/libx{l, c}: Fix trivial Coverity defects in migration v2 code
All of these are UNUSED_VALUE defects where a default vaule is unconditionally overwritten. They are not particularly interesting, bug wise, but keeping these defects at bay helps prevent real bugs going unnoticed in the volume. No functional change. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxc/xc_sr_save.c| 10 +- tools/libxl/libxl_stream_read.c |2 +- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c index 46700a1..79c389a 100644 --- a/tools/libxc/xc_sr_save.c +++ b/tools/libxc/xc_sr_save.c @@ -473,7 +473,7 @@ static int send_memory_live(struct xc_sr_context *ctx) xc_shadow_op_stats_t stats = { 0, ctx-save.p2m_size }; char *progress_str = NULL; unsigned x; -int rc = -1; +int rc; rc = update_progress_string(ctx, progress_str, 0); if ( rc ) @@ -525,7 +525,7 @@ static int suspend_and_send_dirty(struct xc_sr_context *ctx) xc_interface *xch = ctx-xch; xc_shadow_op_stats_t stats = { 0, ctx-save.p2m_size }; char *progress_str = NULL; -int rc = -1; +int rc; DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap, ctx-save.dirty_bitmap_hbuf); @@ -572,7 +572,7 @@ static int send_memory_verify(struct xc_sr_context *ctx) { xc_interface *xch = ctx-xch; xc_shadow_op_stats_t stats = { 0, ctx-save.p2m_size }; -int rc = -1; +int rc; struct xc_sr_record rec = { .type = REC_TYPE_VERIFY, @@ -612,7 +612,7 @@ static int send_memory_verify(struct xc_sr_context *ctx) */ static int send_domain_memory_live(struct xc_sr_context *ctx) { -int rc = -1; +int rc; rc = enable_logdirty(ctx); if ( rc ) @@ -652,7 +652,7 @@ static int send_domain_memory_checkpointed(struct xc_sr_context *ctx) static int send_domain_memory_nonlive(struct xc_sr_context *ctx) { xc_interface *xch = ctx-xch; -int rc = -1; +int rc; rc = suspend_domain(ctx); if ( rc ) diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c index 94247b7..213bdc2 100644 --- a/tools/libxl/libxl_stream_read.c +++ b/tools/libxl/libxl_stream_read.c @@ -583,7 +583,7 @@ static void write_emulator_blob(libxl__egc *egc, libxl__sr_emulator_hdr *emu_hdr; STATE_AO_GC(stream-ao); char path[256]; -int rc = 0, writefd = -1; +int rc = 0, writefd; if (rec-hdr.length sizeof(*emu_hdr)) { rc = ERROR_FAIL; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH for 4.6 0/6] Prune legacy migration and move migration v2 out of daft status
This series is some cleanup following the integration of migration v2 into the codebase. It removes the legacy migration implementation (compatability is provided with the python conversion script), and adjusts the migration v2 docs/implementation to no longer be experimental. There are no major changes in this series, but are important changes for the status of migration v2 in Xen 4.6 Andrew Cooper (6): tools/libxc: Remove legacy migration implementation tools/libx{l,c}: Remove the toolstack_{save,restore} callbacks tools/libx{l,c}: Remove XC_DEVICE_MODEL_RESTORE_FILE tools/libx{l,c}: Drop '2' suffixes from xc_domain_{save,restore}2() functions docs: Migration v2 is now no longer draft tools/libx{l,c}: Fix trivial Coverity defects in migration v2 code docs/specs/libxc-migration-stream.pandoc | 19 +- docs/specs/libxl-migration-stream.pandoc | 19 +- tools/libxc/Makefile |1 - tools/libxc/include/xenguest.h | 34 - tools/libxc/xc_domain_restore.c | 2411 -- tools/libxc/xc_domain_save.c | 2198 --- tools/libxc/xc_nomigrate.c | 20 - tools/libxc/xc_offline_page.c| 59 + tools/libxc/xc_sr_restore.c | 15 +- tools/libxc/xc_sr_save.c | 17 +- tools/libxc/xg_save_restore.h| 247 --- tools/libxl/libxl.c |2 +- tools/libxl/libxl_create.c |2 +- tools/libxl/libxl_internal.h |2 +- tools/libxl/libxl_save_callout.c |2 - tools/libxl/libxl_save_helper.c |4 +- tools/libxl/libxl_stream_read.c |4 +- 17 files changed, 107 insertions(+), 4949 deletions(-) delete mode 100644 tools/libxc/xc_domain_restore.c delete mode 100644 tools/libxc/xc_domain_save.c -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 00/16] Fix RMRR
On Mon, Jul 20, 2015 at 7:16 AM, Tiejun Chen tiejun.c...@intel.com wrote: v10: * Patch #6: hvmloader/pci: Try to avoid placing BARs in RMRRs This is from George' draft patch which implements an acceptable solution in current cycle. Here I just implemented check_overlap_all() and some cleanups. * Patch #7: hvmloader/e820: construct guest e820 table Instead of correcting e820, I'd like to correct memory_map.map[] and then copy them into e820 directly. I think this can make sure hvm_info, memory_map.map[] and e820 are on the same page. One brief request -- if you do end up sending this series again, can you please rebase to staging? This is the second time I've had to manually fix up some patches so that they apply. Thanks. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 3/6] tools/libx{l, c}: Remove XC_DEVICE_MODEL_RESTORE_FILE
All handling of device model files is now at the libxl level. Remove XC_DEVICE_MODEL_RESTORE_FILE and introduce LIBXL_DEVICE_MODEL_RESTORE_FILE in its place. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxc/include/xenguest.h |8 tools/libxl/libxl.c |2 +- tools/libxl/libxl_create.c |2 +- tools/libxl/libxl_internal.h|1 + tools/libxl/libxl_stream_read.c |2 +- 5 files changed, 4 insertions(+), 11 deletions(-) diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index 231ec6e..9772576 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -135,14 +135,6 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom, unsigned int hvm, unsigned int pae, int superpages, int checkpointed_stream, struct restore_callbacks *callbacks); -/** - * xc_domain_restore writes a file to disk that contains the device - * model saved state. - * The pathname of this file is XC_DEVICE_MODEL_RESTORE_FILE; The domid - * of the new domain is automatically appended to the filename, - * separated by a .. - */ -#define XC_DEVICE_MODEL_RESTORE_FILE /var/lib/xen/qemu-resume /** * This function will create a domain for a paravirtualized Linux diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 3fe1b99..c13aa8b 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -1695,7 +1695,7 @@ static void devices_destroy_cb(libxl__egc *egc, rc = libxl__remove_file(gc, libxl__device_model_savefile(gc, domid)); if (rc 0) goto out; rc = libxl__remove_file(gc, - GCSPRINTF(XC_DEVICE_MODEL_RESTORE_FILE.%u, domid)); + GCSPRINTF(LIBXL_DEVICE_MODEL_RESTORE_FILE.%u, domid)); if (rc 0) goto out; rc = libxl__ev_child_fork(gc, dis-destroyer, domain_destroy_domid_cb); diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 5b4d333..86d1181 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -1083,7 +1083,7 @@ static void domcreate_stream_done(libxl__egc *egc, if (info-type == LIBXL_DOMAIN_TYPE_HVM) { state-saved_state = GCSPRINTF( - XC_DEVICE_MODEL_RESTORE_FILE.%d, domid); + LIBXL_DEVICE_MODEL_RESTORE_FILE.%d, domid); } out: diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 4ab564a..20f4cde 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -90,6 +90,7 @@ /* QEMU may be slow to load and start due to a bug in Linux where the I/O * subsystem sometime produce high latency under load. */ #define LIBXL_DEVICE_MODEL_START_TIMEOUT 60 +#define LIBXL_DEVICE_MODEL_RESTORE_FILE /var/lib/xen/qemu-resume /* .$domid */ #define LIBXL_STUBDOM_START_TIMEOUT 30 #define LIBXL_QEMU_BODGE_TIMEOUT 2 #define LIBXL_XENCONSOLE_LIMIT 1048576 diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c index 0535e7f..94247b7 100644 --- a/tools/libxl/libxl_stream_read.c +++ b/tools/libxl/libxl_stream_read.c @@ -592,7 +592,7 @@ static void write_emulator_blob(libxl__egc *egc, } emu_hdr = rec-body; -sprintf(path, XC_DEVICE_MODEL_RESTORE_FILE.%u, dcs-guest_domid); +sprintf(path, LIBXL_DEVICE_MODEL_RESTORE_FILE.%u, dcs-guest_domid); assert(stream-emu_carefd == NULL); libxl__carefd_begin(); -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 5/6] docs: Migration v2 is now no longer draft
Add further instructions to the libxc Future Extensions section, and provide such a section for libxl. In addition, drop the In experimental __func__ IPRINTF()s from the libxc implementations. Finally, a correction to libxl's Not Yet Included section which should have been amended in c/s 7eaec00 when libxl Remus support was introduced into the protocol. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- docs/specs/libxc-migration-stream.pandoc |5 - docs/specs/libxl-migration-stream.pandoc | 19 --- tools/libxc/xc_sr_restore.c |1 - tools/libxc/xc_sr_save.c |1 - 4 files changed, 20 insertions(+), 6 deletions(-) diff --git a/docs/specs/libxc-migration-stream.pandoc b/docs/specs/libxc-migration-stream.pandoc index 9d8f17b..8cd678f 100644 --- a/docs/specs/libxc-migration-stream.pandoc +++ b/docs/specs/libxc-migration-stream.pandoc @@ -1,7 +1,7 @@ % LibXenCtrl Domain Image Format % David Vrabel david.vra...@citrix.com Andrew Cooper andrew.coop...@citrix.com -% Draft G +% Revision 1 Introduction @@ -679,6 +679,9 @@ introduced in Xen 3.0. Future Extensions = +All changes to this specification should bump the revision number in +the title block. + All changes to the image or domain headers require the image version to be increased. diff --git a/docs/specs/libxl-migration-stream.pandoc b/docs/specs/libxl-migration-stream.pandoc index c24a434..cdec168 100644 --- a/docs/specs/libxl-migration-stream.pandoc +++ b/docs/specs/libxl-migration-stream.pandoc @@ -1,6 +1,6 @@ % LibXenLight Domain Image Format % Andrew Cooper andrew.coop...@citrix.com -% Draft B +% Revision 1 Introduction @@ -41,8 +41,6 @@ Not Yet Included The following features are not yet fully specified and will be included in a future draft. -* Remus - * ARM @@ -215,3 +213,18 @@ A checkpoint end record marks the end of a checkpoint in the image. +-+ The end record contains no fields; its body_length is 0. + + +Future Extensions += + +All changes to this specification should bump the revision number in +the title block. + +All changes to the header require the header version to be increased. + +The format may be extended by adding additional record types. + +Extending an existing record type must be done by adding a new record +type. This allows old images with the old record to still be +restored. diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index ebc8f2f..df885b6 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -740,7 +740,6 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, if ( checkpointed_stream ) assert(callbacks-checkpoint); -IPRINTF(In experimental %s, __func__); DPRINTF(fd %d, dom %u, hvm %u, pae %u, superpages %d , checkpointed_stream %d, io_fd, dom, hvm, pae, superpages, checkpointed_stream); diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c index 7b38051..46700a1 100644 --- a/tools/libxc/xc_sr_save.c +++ b/tools/libxc/xc_sr_save.c @@ -850,7 +850,6 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, if ( ctx.save.checkpointed ) assert(callbacks-checkpoint callbacks-postcopy); -IPRINTF(In experimental %s, __func__); DPRINTF(fd %d, dom %u, max_iters %u, max_factor %u, flags %u, hvm %d, io_fd, dom, max_iters, max_factor, flags, hvm); -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 06/16] hvmloader/pci: Try to avoid placing BARs in RMRRs
On 20.07.15 at 08:16, tiejun.c...@intel.com wrote: --- a/tools/firmware/hvmloader/pci.c +++ b/tools/firmware/hvmloader/pci.c @@ -38,6 +38,43 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0; enum virtual_vga virtual_vga = VGA_none; unsigned long igd_opregion_pgbase = 0; +/* Check if any conflicts with all reserved device memory. */ /* Check if the specified range conflicts with any reserved device memory. */ (and the any could perhaps be left out) +static bool check_overlap_all(uint64_t start, uint64_t size) +{ +unsigned int i; + +for ( i = 0; i memory_map.nr_map; i++ ) +{ +if ( memory_map.map[i].type == E820_RESERVED + check_overlap(start, size, + memory_map.map[i].addr, + memory_map.map[i].size) ) +return true; +} + +return false; +} + +/* Find the lowest RMRR higher than base. */ +static int find_next_rmrr(uint32_t base) +{ +unsigned int i; +int next_rmrr = -1; +uint64_t min_base = (1ull 32); + +for ( i = 0; i memory_map.nr_map ; i++ ) +{ +if ( memory_map.map[i].type == E820_RESERVED + memory_map.map[i].addr base + memory_map.map[i].addr min_base ) +{ +next_rmrr = i; +min_base = memory_map.map[i].addr; +} +} +return next_rmrr; +} Considering _both_ callers, I think the function should actually return the lowest RMRR higher than or equal to base. Or wait - we actually need to find the lowest RMRR the _end_ of which is higher than base. Also - blank line before final return statement please. @@ -299,6 +337,15 @@ void pci_setup(void) || (((pci_mem_start 1) PAGE_SHIFT) = hvm_info-low_mem_pgend)) ) pci_mem_start = 1; + +/* + * Try to accomodate RMRRs in our MMIO region on a best-effort basis. + * If we have RMRRs in the range, then make pci_mem_start just after + * hvm_info-low_mem_pgend. + */ +if ( pci_mem_start (hvm_info-low_mem_pgend PAGE_SHIFT) + check_overlap_all(pci_mem_start, pci_mem_end-pci_mem_start) ) +pci_mem_start = ((hvm_info-low_mem_pgend + 1) PAGE_SHIFT); Since it looks like low_mem_pgend is exclusive, is the + 1 here really needed (other than to put an arbitrary one page gap between RAM and MMIO)? @@ -407,6 +456,19 @@ void pci_setup(void) } base = (resource-base + bar_sz - 1) ~(uint64_t)(bar_sz - 1); + +/* If we're using mem_resource, check for RMRR conflicts. */ +while ( resource == mem_resource +next_rmrr 0 = I think. +check_overlap(base, bar_sz, + memory_map.map[next_rmrr].addr, + memory_map.map[next_rmrr].size) ) +{ +base = memory_map.map[next_rmrr].addr + memory_map.map[next_rmrr].size; +base = (base + bar_sz - 1) ~(uint64_t)(bar_sz - 1); Pointless cast. And as George said elsewhere - please make sure the whole series applies cleanly to staging; generally committers don't do fixups while applying. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [Patch V2] xen: release lock occasionally during ballooning
When dom0 is being ballooned balloon_process() will hold the balloon mutex until it is finished. This will block e.g. creation of new domains as the device backends for the new domain need some autoballooned pages for the ring buffers. Avoid this by releasing the balloon mutex from time to time during ballooning. Adjust the comment above balloon_process() regarding multiple instances of balloon_process(). Instead of open coding it, just use cond_resched(). Signed-off-by: Juergen Gross jgr...@suse.com --- drivers/xen/balloon.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index fd93369..e6d9eee 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -481,9 +481,16 @@ static void balloon_process(struct work_struct *work) { enum bp_state state = BP_DONE; long credit; + static bool active; mutex_lock(balloon_mutex); + if (active) { + mutex_unlock(balloon_mutex); + return; + } + active = true; + do { credit = current_credit(); @@ -499,12 +506,16 @@ static void balloon_process(struct work_struct *work) state = update_schedule(state); -#ifndef CONFIG_PREEMPT - if (need_resched()) - schedule(); -#endif + mutex_unlock(balloon_mutex); + + cond_resched(); + + mutex_lock(balloon_mutex); + } while (credit state == BP_DONE); + active = false; + /* Schedule more work if there is some still to be done. */ if (state == BP_EAGAIN) schedule_delayed_work(balloon_worker, balloon_stats.schedule_delay * HZ); -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Patch V2] xen: release lock occasionally during ballooning
Please ignore, forgot stg refresh... Juergen On 07/20/2015 01:46 PM, Juergen Gross wrote: When dom0 is being ballooned balloon_process() will hold the balloon mutex until it is finished. This will block e.g. creation of new domains as the device backends for the new domain need some autoballooned pages for the ring buffers. Avoid this by releasing the balloon mutex from time to time during ballooning. Adjust the comment above balloon_process() regarding multiple instances of balloon_process(). Instead of open coding it, just use cond_resched(). Signed-off-by: Juergen Gross jgr...@suse.com --- drivers/xen/balloon.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index fd93369..e6d9eee 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -481,9 +481,16 @@ static void balloon_process(struct work_struct *work) { enum bp_state state = BP_DONE; long credit; + static bool active; mutex_lock(balloon_mutex); + if (active) { + mutex_unlock(balloon_mutex); + return; + } + active = true; + do { credit = current_credit(); @@ -499,12 +506,16 @@ static void balloon_process(struct work_struct *work) state = update_schedule(state); -#ifndef CONFIG_PREEMPT - if (need_resched()) - schedule(); -#endif + mutex_unlock(balloon_mutex); + + cond_resched(); + + mutex_lock(balloon_mutex); + } while (credit state == BP_DONE); + active = false; + /* Schedule more work if there is some still to be done. */ if (state == BP_EAGAIN) schedule_delayed_work(balloon_worker, balloon_stats.schedule_delay * HZ); ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 05/17] xen/arm: ITS: implement hw_irq_controller for LPIs
On Sat, 2015-07-18 at 11:13 +0100, Julien Grall wrote: Hi Ian, On 15/07/2015 10:32, Ian Campbell wrote: and save 2 byte if not more with the alignment per irq_desc. If this is a concern then I would say we would either want a separate array of per-pLPI information which we do not want in irq_desc because it is irq specific, or do add a pointer to its_desc which points to an array of per-event information. I noticed that we have a field msi_desc in the structure irq_desc. On ARM, the structure msi_desc is not defined at all but still waste a 8 byte for the pointer in the irq_desc. Given that LPI is an MSI, I'm wondering if we could move all LPI related data in this msi_desc. This would avoid to introduce new field in the irq_desc structure. Yes, I think I suggested something similar at some point, although I seemed to think msi_desc was more similar to its_device. Looking at the content of the struct, alloc_msi_entry and msi_capability_init makes me less sure, it looks like msi_entry on x86 is allocated as an array (sized for events) stored per device, which doesn't directly map onto how ARM is structured. So what you propose may make more sense: The msi_desc would look like: struct msi_desc { its_device *dev; unsigned int eventID; }; Regards, ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] xen: sched: reorganize cpu_disable_scheduler()
On 07/20/2015 01:59 PM, Dario Faggioli wrote: On Mon, 2015-07-20 at 13:41 +0200, Juergen Gross wrote: On 07/17/2015 03:35 PM, Dario Faggioli wrote: @@ -644,25 +673,66 @@ int cpu_disable_scheduler(unsigned int cpu) cpumask_setall(v-cpu_hard_affinity); } -if ( v-processor == cpu ) +if ( v-processor != cpu ) { -set_bit(_VPF_migrating, v-pause_flags); +/* This vcpu is not on cpu, so we can move on. */ vcpu_schedule_unlock_irqrestore(lock, flags, v); -vcpu_sleep_nosync(v); -vcpu_migrate(v); +continue; +} + +/* If it is on cpu, we must send it away. */ +if ( unlikely(system_state == SYS_STATE_suspend) ) +{ +/* + * If we are doing a shutdown/suspend, it is not necessary to + * ask the scheduler to chime in. In fact: + * * there is no reason for it: the end result we are after + *is just 'all the vcpus on the boot pcpu, and no vcpu + *anywhere else', so let's just go for it; + * * it's wrong, for cpupools with only non-boot pcpus, as + *the scheduler would always fail to send the vcpus away + *from the last online (non boot) pcpu! + * + * Therefore, in the shutdown/suspend case, we just pick up + * one (still) online pcpu. Note that, at this stage, all + * domains (including dom0) have been paused already, so we + * do not expect any vcpu activity at all. + */ +cpumask_andnot(online_affinity, cpu_online_map, + cpumask_of(cpu)); +BUG_ON(cpumask_empty(online_affinity)); +/* + * As boot cpu is, usually, pcpu #0, using cpumask_first() + * will make us converge quicker. + */ +new_cpu = cpumask_first(online_affinity); +vcpu_move_nosched(v, new_cpu); Shouldn't there be a vcpu_schedule_unlock_irqrestore() ? I'm sure I put one there, as I was sure that it was there the last time I inspected the patch before hitting send. But I see that it's not there now, so I must have messed up when formatting the patch, or something like that. :-( It's really really weird, as I forgot it during development, and then the system was hanging, and then I added it, and that's why I'm sure I did have it in place... but perhaps I fat fingered some stgit command which made it disappear. Or you forgot stg refresh? I just managed to do so. :-( Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] VT-d: add iommu=igfx_off option to workaround graphics issues
On 17/07/15 20:05, Ting-Wei Lan wrote: When using Linux = 3.19 (commit 47591df) as dom0 on some Intel Ironlake devices, It is possible to encounter graphics issues that make screen unreadable or crash the system. It was reported in freedesktop bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90037 As we still cannot find a proper fix for this problem, this patch adds iommu=igfx_off option that is similar to Linux intel_iommu=igfx_off for users to manually workaround the problem. Signed-off-by: Ting-Wei Lan lant...@gmail.com Having looked into this issue, the i915 driver has several workarounds in it for systems when the IOMMU is in use. In some cases there are plain errata, while in other cases there are specific hardware features which don't function if the IOMMU is enabled. In all cases this is gated on Linux's idea of whether the IOMMU is enabled. When used under Xen, Linux has no clue that the IOMMU exists, or that Xen has turned it on. Longterm, we should probably see about not clobbering the DMAR table and making dom0 aware that there is an IOMMU. (I still don't actually know why we have this asymmetry between Intel and AMD). However in the short term, having a command line workaround like this seems appropriate. It at least allows the user to tell Xen not to play with the IOMMU behind an unsuspecting dom0. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Patch V6 00/16] xen: support pv-domains larger than 512GB
On 17/07/15 05:51, Juergen Gross wrote: Support 64 bit pv-domains with more than 512GB of memory. Following test have been done: - 64 bit dom0 on 8GB machine - 64 bit dom0 on 1TB machine (resolving p2m/E820-map conflict) - 32 bit dom0 on 8GB machine - 64 bit dom0 on 8GB machine with faked kernel/E820-map conflict - 64 bit dom0 on 8GB machine with faked pgtable/E820-map conflict - 64 bit dom0 on 8GB machine with faked initrd/E820-map conflict - 64 bit dom0 on 8GB machine with faked p2m/E820-map conflict - 64 bit domU (sizes up to 900GB) - 32 bit domU Thanks. #12 needs an MM maintainer ack and it's good to go in I think. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] VT-d: add iommu=igfx_off option to workaround graphics issues
On 20.07.15 at 14:12, andrew.coop...@citrix.com wrote: On 17/07/15 20:05, Ting-Wei Lan wrote: When using Linux = 3.19 (commit 47591df) as dom0 on some Intel Ironlake devices, It is possible to encounter graphics issues that make screen unreadable or crash the system. It was reported in freedesktop bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90037 As we still cannot find a proper fix for this problem, this patch adds iommu=igfx_off option that is similar to Linux intel_iommu=igfx_off for users to manually workaround the problem. Signed-off-by: Ting-Wei Lan lant...@gmail.com Having looked into this issue, the i915 driver has several workarounds in it for systems when the IOMMU is in use. In some cases there are plain errata, while in other cases there are specific hardware features which don't function if the IOMMU is enabled. In all cases this is gated on Linux's idea of whether the IOMMU is enabled. When used under Xen, Linux has no clue that the IOMMU exists, or that Xen has turned it on. Perhaps it should just assume an IOMMU is in use when running under Xen. Having inspected all those code places quite some time ago, I came to the conclusion that making this assumption is better than the current one of there not being an enabled IOMMU (and I adjusted our kernels accordingly). Longterm, we should probably see about not clobbering the DMAR table and making dom0 aware that there is an IOMMU. (I still don't actually know why we have this asymmetry between Intel and AMD). The presence of a DMAR table is not a 1:1 representation of whether Xen has the IOMMU enabled. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 59721: regressions - trouble: broken/fail/pass
On Mon, 2015-07-20 at 09:48 +0100, Ian Campbell wrote: version targeted for testing: xen 21d9b079e53805b68047d60d28cde224d09bbb40 baseline version: xen c40317f11b3f05e7c06a2213560c8471081f2662 Based on feedback from Wei and Jan I've now gone ahead and done this force push. osstest@osstest:~/branches/for-xen-unstable.git$ OSSTEST_CONFIG=production-config ./ap-push xen-unstable 21d9b079e53805b68047d60d28cde224d09bbb40 + branch=xen-unstable + revision=21d9b079e53805b68047d60d28cde224d09bbb40 + . cri-lock-repos ++ . cri-common +++ . cri-getconfig +++ umask 002 +++ getrepos getconfig Repos perl -e ' use Osstest; readglobalconfig(); print $c{Repos} or die $!; ' +++ local repos=/home/osstest/repos +++ '[' -z /home/osstest/repos ']' +++ '[' '!' -d /home/osstest/repos ']' +++ echo /home/osstest/repos ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x '!=' x/home/osstest/repos/lock ']' ++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock ++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push xen-unstable 21d9b079e53805b68047d60d28cde224d09bbb40 + branch=xen-unstable + revision=21d9b079e53805b68047d60d28cde224d09bbb40 + . cri-lock-repos ++ . cri-common +++ . cri-getconfig +++ umask 002 +++ getrepos getconfig Repos perl -e ' use Osstest; readglobalconfig(); print $c{Repos} or die $!; ' +++ local repos=/home/osstest/repos +++ '[' -z /home/osstest/repos ']' +++ '[' '!' -d /home/osstest/repos ']' +++ echo /home/osstest/repos ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']' + . cri-common ++ . cri-getconfig ++ umask 002 + select_xenbranch + case $branch in + tree=xen + xenbranch=xen-unstable + '[' xxen = xlinux ']' + linuxbranch= + '[' x = x ']' + qemuubranch=qemu-upstream-unstable + : tested/2.6.39.x + . ap-common ++ : osst...@xenbits.xen.org +++ getconfig OsstestUpstream +++ perl -e ' use Osstest; readglobalconfig(); print $c{OsstestUpstream} or die $!; ' ++ : ++ : git://xenbits.xen.org/xen.git ++ : osst...@xenbits.xen.org:/home/xen/git/xen.git ++ : git://xenbits.xen.org/staging/qemu-xen-unstable.git ++ : git://git.kernel.org ++ : git://git.kernel.org/pub/scm/linux/kernel/git ++ : git ++ : git://libvirt.org/libvirt.git ++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git ++ : git://xenbits.xen.org/libvirt.git ++ : https://github.com/rumpkernel/rumprun-xen ++ : git ++ : git://xenbits.xen.org/rumpuser-xen.git ++ : osst...@xenbits.xen.org:/home/xen/git/rumpuser-xen.git +++ besteffort_repo https://github.com/rumpkernel/rumpkernel-netbsd-src +++ local repo=https://github.com/rumpkernel/rumpkernel-netbsd-src +++ cached_repo https://github.com/rumpkernel/rumpkernel-netbsd-src '[fetch=try]' +++ local repo=https://github.com/rumpkernel/rumpkernel-netbsd-src +++ local 'options=[fetch=try]' getconfig GitCacheProxy perl -e ' use Osstest; readglobalconfig(); print $c{GitCacheProxy} or die $!; ' +++ local cache=git://cache:9419/ +++ '[' xgit://cache:9419/ '!=' x ']' +++ echo 'git://cache:9419/https://github.com/rumpkernel/rumpkernel-netbsd-src%20[fetch=try]' ++ : 'git://cache:9419/https://github.com/rumpkernel/rumpkernel-netbsd-src%20[fetch=try]' ++ : git ++ : git://git.seabios.org/seabios.git ++ : osst...@xenbits.xen.org:/home/xen/git/osstest/seabios.git ++ : git://xenbits.xen.org/osstest/seabios.git ++ : https://github.com/tianocore/edk2.git ++ : osst...@xenbits.xen.org:/home/xen/git/osstest/ovmf.git ++ : git://xenbits.xen.org/osstest/ovmf.git ++ : git://xenbits.xen.org/osstest/linux-firmware.git ++ : osst...@xenbits.xen.org:/home/osstest/ext/linux-firmware.git ++ : git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git ++ : osst...@xenbits.xen.org:/home/xen/git/linux-pvops.git ++ : git://xenbits.xen.org/linux-pvops.git ++ : tested/linux-3.14 ++ : tested/linux-arm-xen ++ '[' x = x ']' ++ : git://xenbits.xen.org/linux-pvops.git ++ : tested/linux-3.14 ++ '[' x = x ']' ++ : git://xenbits.xen.org/linux-pvops.git ++ : tested/linux-arm-xen ++ : git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git ++ : tested/2.6.39.x ++ : daily-cron.xen-unstable ++ : daily-cron.xen-unstable ++ : daily-cron.xen-unstable ++ : daily-cron.xen-unstable ++ : daily-cron.xen-unstable ++ : daily-cron.xen-unstable ++ : http://hg.uk.xensource.com/carbon/trunk/linux-2.6.27 ++ : git://xenbits.xen.org/staging/qemu-upstream-unstable.git ++ : daily-cron.xen-unstable ++ : git://git.qemu.org/qemu.git ++ : git://xenbits.xen.org/osstest/qemu.git ++ : osst...@xenbits.xen.org:/home/xen/git/osstest/qemu.git ++ : daily-cron.xen-unstable + TREE_LINUX=osst...@xenbits.xen.org:/home/xen/git/linux-pvops.git +
Re: [Xen-devel] [v10][PATCH 00/16] Fix RMRR
One brief request -- if you do end up sending this series again, can you please rebase to staging? This is the second time I've had to manually fix up some patches so that they apply. Sorry for this inconvenience. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 1/2] OSSTEST: introduce a raisin build test
Ian -- various questions for you here. On Mon, 2015-07-20 at 13:56 +0100, Stefano Stabellini wrote: On Thu, 9 Jul 2015, Ian Campbell wrote: On Mon, 2015-06-22 at 16:16 +0100, Stefano Stabellini wrote: Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com [...] I didn't do a good job with the commit message. I'll try to track down the old discussion and write some more details. Thanks! Changes in v3: - use $raisindir throughout ts-raisin-build - do not specify ENABLED_COMPONENTS so that empty REVISION variables can be used to disable building a raisin component I see we do again in the code below, which I suspect was deliberate and based on some discussion, but it's not mentioned in the Changes in v4+ and since the changelog doesn't explain anything I can't even begin to guess why or how we arrived at this point. [...] +sub build () { +target_cmd_root($ho, END); +cd $raisindir +./raise -y install-builddep If two of these happen to run in parallel (build machines can be shared) then one or the other risks timing out on the underlying dpkg lock waiting for the other, since the wait might be arbitrarily long depending on what is going on. It also risks other builds happening in an environment which differs from one build to the next (if this happened to run in the middle) or even things changing while a build is happening. This is why all build dependencies are installed from ts-xen-build-prep, that step is run once for each build host as it is installed. This does unfortunately mean that I think we can't take advantage of raisin's tracking of necessary build dependencies, but at least it will be checking things for us. It is fine for OSSTest not use install-builddep. I think that the best way for OSSTest to use Raisin would be to call raise in ts-xen-build-prep to list the dependencies and install them from there. Unfortunately ts-xen-build-prep will not have access to the repo, since it won't be cloned. Also when reusing a host (which happens for builds) I think this step is skipped (Ian J: can you confirm or deny that?) But here we are testing Raisin itself so I think that in this context it is actually important to run install-builddep. I don't think it would cause any troubles to other builds happening in parallel because the build dependencies of those builds would be installed by ts-xen-build-prep beforehand, that I was told is run before any build jobs? If I am wrong, then we have a problem. I think there are two problems, one is that install-builddep will take the apt/dpkg lock, which will be shared by any other invocations -- making the timeout hard to decide upon (since it will pipeline things). Second if install-builddep did install something extra due to some change in raisin.git, for example, then some jobs would see new build deps arrive half way through, or subsequent tests of older branches might see something installed which wouldn't have been before. Is there a check-builddep? That's the thing I would suggest we run here, so we get an explicit failure. Otherwise is there a way to guarantee that only one raisin build happen simultaneously? Ian? I suspect we don't want this, Given that the series is already at v7, I would prefer to remove the call to install-builddep, even though I think it is important, rather than make substantial modifications. However, if I do remove the call to install-builddep from ts-raisin-build, I would have to install any missing build dependencies somewhere else. Where would that be? In ts-xen-build-prep? Is that script even called for non-xen build jobs? In ts-xen-build-prep, I think. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 07/16] hvmloader/e820: construct guest e820 table
v10: * Instead of correcting e820, I'd like to correct memory_map.map[] and then copy them into e820 directly. I think this can make sure hvm_info, memory_map.map[] and e820 are on the same page. Actually, now that you mention it -- this should probably happen instead when we update hvm_info-{low,high}_mem_pgend. I also considered this point previously but I thought just right now we only update hvm_info-low/high_mem_pgend inside pci_setup(). But you can't guarantee this would be a sole place in the future. Instead, memory_map.map[] would always be copied into e820 when we build e820 table. So I think we'd better do this update once at the last minute. Thanks Tiejun I'm happy to leave this where it is for now, so with Jan's comments addressed (in particular, incrementing nr_map): Reviewed-by: George Dunlap george.dun...@eu.citrix.com But if we have time it might be better to pull the loop in pci_setup() which moves the memory out into a function, and have that function modify the lowmem and highmem map[] entries at the same time we modify low_mem_pgend and high_mem_pgend. Alternately, you could put that on your list of clean-ups to hvmloader for 4.7. Thanks, -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 05/17] xen/arm: ITS: implement hw_irq_controller for LPIs
On Sat, 2015-07-18 at 11:13 +0100, Julien Grall wrote: Hi Ian, On 15/07/2015 10:32, Ian Campbell wrote: and save 2 byte if not more with the alignment per irq_desc. If this is a concern then I would say we would either want a separate array of per-pLPI information which we do not want in irq_desc because it is irq specific, or do add a pointer to its_desc which points to an array of per-event information. I noticed that we have a field msi_desc in the structure irq_desc. On ARM, the structure msi_desc is not defined at all but still waste a 8 byte for the pointer in the irq_desc. Given that LPI is an MSI, I'm wondering if we could move all LPI related data in this msi_desc. This would avoid to introduce new field in the irq_desc structure. Yes, I think I suggested something similar at some point, although I seemed to think msi_desc was more similar to its_device. Looking at the content of the struct, alloc_msi_entry and msi_capability_init makes me less sure, it looks like msi_entry on x86 is allocated as an array (sized for events) stored per device, which doesn't directly map onto how ARM is structured. So I think you suggestion may make more sense. The msi_desc would look like: struct msi_desc { its_device *dev; unsigned int eventID; }; Regards, ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] VT-d: add iommu=igfx_off option to workaround graphics issues
On 20/07/15 13:24, Jan Beulich wrote: On 20.07.15 at 14:12, andrew.coop...@citrix.com wrote: On 17/07/15 20:05, Ting-Wei Lan wrote: When using Linux = 3.19 (commit 47591df) as dom0 on some Intel Ironlake devices, It is possible to encounter graphics issues that make screen unreadable or crash the system. It was reported in freedesktop bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90037 As we still cannot find a proper fix for this problem, this patch adds iommu=igfx_off option that is similar to Linux intel_iommu=igfx_off for users to manually workaround the problem. Signed-off-by: Ting-Wei Lan lant...@gmail.com Having looked into this issue, the i915 driver has several workarounds in it for systems when the IOMMU is in use. In some cases there are plain errata, while in other cases there are specific hardware features which don't function if the IOMMU is enabled. In all cases this is gated on Linux's idea of whether the IOMMU is enabled. When used under Xen, Linux has no clue that the IOMMU exists, or that Xen has turned it on. Perhaps it should just assume an IOMMU is in use when running under Xen. Having inspected all those code places quite some time ago, I came to the conclusion that making this assumption is better than the current one of there not being an enabled IOMMU (and I adjusted our kernels accordingly). In at least one case, an errata workaround involves issuing extra IOMMU commands. We cannot safely let even dom0 perform this. Longterm, we should probably see about not clobbering the DMAR table and making dom0 aware that there is an IOMMU. (I still don't actually know why we have this asymmetry between Intel and AMD). The presence of a DMAR table is not a 1:1 representation of whether Xen has the IOMMU enabled. It would (in theory) allow dom0 to get RO mappings of the IOMMU control pages and observe that the IOMMU is enabled. Whether this is useful in practice is a different matter. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen: arm: Avoid reading beyond the last module
Hi Chris, On 17/07/15 21:48, Chris (Christopher) Brand wrote: nr_mods is set in add_boot_module() to the number of module array elements used. This function also ensures that nr_mods never exceeds MAX_MODULES (the size of the array). When looping through the array, the correct maximum index is nr_mods-1, not nr_mods. If the array is full, using the latter will in fact access beyond the end of the array. This was done correctly in boot_module_find_by_kind() and consider_modules() but incorrectly in discard_initial_modules() and next_module(). Signed-off-by: Chris Brand chris.br...@broadcom.com Reviewed-by: Julien Grall julien.gr...@citrix.com Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 06/16] hvmloader/pci: Try to avoid placing BARs in RMRRs
On 07/20/2015 12:30 PM, Jan Beulich wrote: On 20.07.15 at 08:16, tiejun.c...@intel.com wrote: --- a/tools/firmware/hvmloader/pci.c +++ b/tools/firmware/hvmloader/pci.c @@ -38,6 +38,43 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0; enum virtual_vga virtual_vga = VGA_none; unsigned long igd_opregion_pgbase = 0; +/* Check if any conflicts with all reserved device memory. */ /* Check if the specified range conflicts with any reserved device memory. */ (and the any could perhaps be left out) +static bool check_overlap_all(uint64_t start, uint64_t size) +{ +unsigned int i; + +for ( i = 0; i memory_map.nr_map; i++ ) +{ +if ( memory_map.map[i].type == E820_RESERVED + check_overlap(start, size, + memory_map.map[i].addr, + memory_map.map[i].size) ) +return true; +} + +return false; +} + +/* Find the lowest RMRR higher than base. */ +static int find_next_rmrr(uint32_t base) +{ +unsigned int i; +int next_rmrr = -1; +uint64_t min_base = (1ull 32); + +for ( i = 0; i memory_map.nr_map ; i++ ) +{ +if ( memory_map.map[i].type == E820_RESERVED + memory_map.map[i].addr base + memory_map.map[i].addr min_base ) +{ +next_rmrr = i; +min_base = memory_map.map[i].addr; +} +} +return next_rmrr; +} Considering _both_ callers, I think the function should actually return the lowest RMRR higher than or equal to base. You mean instead of strictly greater than the base. Or wait - we actually need to find the lowest RMRR the _end_ of which is higher than base. Yes, you're right: there's always a risk that pci_mem_start will *start* in the middle of a range. Looking for the next *end* is more robust. @@ -407,6 +456,19 @@ void pci_setup(void) } base = (resource-base + bar_sz - 1) ~(uint64_t)(bar_sz - 1); + +/* If we're using mem_resource, check for RMRR conflicts. */ +while ( resource == mem_resource +next_rmrr 0 = I think. Yes, is a typo; I certainly meant to type =. +check_overlap(base, bar_sz, + memory_map.map[next_rmrr].addr, + memory_map.map[next_rmrr].size) ) +{ +base = memory_map.map[next_rmrr].addr + memory_map.map[next_rmrr].size; +base = (base + bar_sz - 1) ~(uint64_t)(bar_sz - 1); Pointless cast. This was copy-and-pasted from the line above. I had assumed that bar_sz was uint32_t (perhaps it was at some point), but I guess now it's pointless there too. :-) -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 07/16] hvmloader/e820: construct guest e820 table
On Mon, Jul 20, 2015 at 7:16 AM, Tiejun Chen tiejun.c...@intel.com wrote: Now use the hypervisor-supplied memory map to build our final e820 table: * Add regions for BIOS ranges and other special mappings not in the hypervisor map * Add in the hypervisor supplied regions * Adjust the lowmem and highmem regions if we've had to relocate memory (adding a highmem region if necessary) * Sort all the ranges so that they appear in memory order. CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v10: * Instead of correcting e820, I'd like to correct memory_map.map[] and then copy them into e820 directly. I think this can make sure hvm_info, memory_map.map[] and e820 are on the same page. Actually, now that you mention it -- this should probably happen instead when we update hvm_info-{low,high}_mem_pgend. I'm happy to leave this where it is for now, so with Jan's comments addressed (in particular, incrementing nr_map): Reviewed-by: George Dunlap george.dun...@eu.citrix.com But if we have time it might be better to pull the loop in pci_setup() which moves the memory out into a function, and have that function modify the lowmem and highmem map[] entries at the same time we modify low_mem_pgend and high_mem_pgend. Alternately, you could put that on your list of clean-ups to hvmloader for 4.7. Thanks, -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 12/17] xen/arm: ITS: Initialize LPI irq descriptors and route
Hi Ian, On Fri, Jul 10, 2015 at 9:00 PM, Ian Campbell ian.campb...@citrix.com wrote: On Fri, 2015-07-10 at 13:12 +0530, vijay.kil...@gmail.com wrote: diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 3ebadcf..92d2be9 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -68,11 +68,18 @@ enum gic_version gic_hw_version(void) return gic_hw_ops-info-hw_version; } +/* Only validates PPIs/SGIs/SPIs supported */ unsigned int gic_number_lines(void) { return gic_hw_ops-info-nr_lines; } +/* Validates PPIs/SGIs/SPIs/LPIs supported */ +bool_t gic_is_valid_irq(unsigned int irq) +{ +return ((irq gic_hw_ops-info-nr_lines) is_lpi(irq)); +} + unsigned int gic_nr_id_bits(void) { return gic_hw_ops-info-nr_id_bits; @@ -148,7 +155,7 @@ void gic_route_irq_to_xen(struct irq_desc *desc, const cpumask_t *cpu_mask, { ASSERT(priority = 0xff); /* Only 8 bits of priority */ /* Can't route interrupts that don't exist */ -ASSERT(desc-irq gic_number_lines() || is_lpi(desc-irq)); +ASSERT(gic_is_valid_irq(desc-irq)); The fact that I commented on this earlier is another artefact in the way this series presents functionality in an essentially arbitrary order. I notice that you appear to have reintroduced the logic but when you moved this code from here into gic_is_valid_irq. AFAICT that function can never return true, but then I'm rather confused about how this series can have been tested. ASSERT(test_bit(_IRQ_DISABLED, desc-status)); ASSERT(spin_is_locked(desc-lock)); @@ -160,6 +167,17 @@ void gic_route_irq_to_xen(struct irq_desc *desc, const cpumask_t *cpu_mask, /* Program the GIC to route an interrupt to a guest * - desc.lock must be held */ +int gic_route_lpi_to_guest(struct domain *d, unsigned int virq, + struct irq_desc *desc, unsigned int priority) +{ +ASSERT(spin_is_locked(desc-lock)); ASSERT(virq = SOME_DEFINE_FOR_MIN_LPI); diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c index 3806d98..c8ea627 100644 --- a/xen/arch/arm/irq.c +++ b/xen/arch/arm/irq.c @@ -62,12 +62,21 @@ hw_irq_controller no_irq_type = { }; static irq_desc_t irq_desc[NR_IRQS]; +#ifdef CONFIG_ARM_64 +static irq_desc_t irq_desc_lpi[NR_GIC_LPI]; http://xenbits.xen.org/people/ianc/vits/draftG.html#irq-descriptors contains: Therefore a second dynamically allocated array will be added to cover the range 8192..nr_lpis IOW this should be dynamically allocated and therefore NR_GIC_LPI (which I think I already commented on earlier) should go away, since the limit should come from the h/w. @@ -267,9 +285,14 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, int is_fiq) set_bit(_IRQ_INPROGRESS, desc-status); desc-arch.eoi_cpu = smp_processor_id(); +#ifdef CONFIG_ARM_64 +if ( is_lpi(irq) ) +vgic_vcpu_inject_lpi(info-d, irq); +else +#endif /* the irq cannot be a PPI, we only support delivery of SPIs to * guests */ -vgic_vcpu_inject_spi(info-d, info-virq); +vgic_vcpu_inject_spi(info-d, info-virq); Comment should be reindented too. +if ( !is_lpi(irq) ) +{ +rank = vgic_rank_irq(v, irq); + +ASSERT(spin_is_locked(rank-lock)); +priority = vgic_byte_read(rank-ipriority[REG_RANK_INDEX(8, irq, DABT_WORD)], 0, irq 0x3); +} +if ( is_lpi(irq) gic_lpi_supported() ) Should be else if ( gic_lpi_supported() ) +vdev = vits_find_device(d-arch.vits-dev_root, devid); +if ( !vdev ) +{ +dprintk(XENLOG_WARNING, +LPI %d received for dev 0x%x not assigned..dropping\n, +irq, devid); +return; +} This isn't used again and the whole R-B tree should go away. +p = irq_to_pending(d-vcpu[0], vitt_entry.vlpi); Perhaps given that we know this is a vlpi, we could skipping going via vcpu[0] and just go right to the entry in the d-pending_lpi array (via a suitable accessor of course) +p-desc = desc; This should have happened during routing, not now. Oh. I see what is going on, your vgic_vcpu_inject_lpi appears to be taking an IRQ, and not a vpli, or at least to be confused about what it should do with the number which it calls irq. Therefore you find yourself needing to lookup the irqdesc, and look in the vitt etc, none of which belongs here. For interrupts coming from a plpi all of that lookup should happen based on the its_device which is contained in the irq_guest and other info from there, before calling this function. You can get the irq_guest which you can get because you have a PLPI in your hand and can look up the irq_guest via the irq_desc which you get using the plpi number. AUI, you are refering to http://xenbits.xen.org/people/ianc/vits/draftG.html#virtual-lpi-injection If we want to extract,
Re: [Xen-devel] [PATCH v4 12/17] xen/arm: ITS: Initialize LPI irq descriptors and route
On 20/07/15 14:07, Vijay Kilari wrote: AUI, you are refering to http://xenbits.xen.org/people/ianc/vits/draftG.html#virtual-lpi-injection If we want to extract, its_device, and VITT information before calling vgic_vcpu_inject_lpi(), it cannot be done in do_IRQ() as irq.c cannot have vITS related code. IHMO it can't be done in vgic.c too... So one option is to call vits_foo() function which will validated plpi and get its_device / VITT info and call vgic_vcpu_inject_lpi(). So vITS will call back vgic function. OR Introduce new vgic function for doing this and call vgic_vcpu_inject_lpi(). I don't understand why you need to validate the plpi. It's already done when we receive the IRQ. Furthermore, you will need to retrieve again the VITT entry in vgic_vcpu_inject_lpi in order to get the collection associated to the vLPI. Given that the irq_desc as the vDevID and the eventID and the INT command will have the same parameters in hand it looks like to me that the prototype of vgic_vcpu_inject_lpi should be vgic_vcpu_inject_lpi(struct domain *d, uint32_t vdevid, uint32_t eventid) And the vgic_vcpu_inject_lpi will take care of retrieving the vLPI and the collection associated. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 1/2] OSSTEST: introduce a raisin build test
On Thu, 9 Jul 2015, Ian Campbell wrote: On Mon, 2015-06-22 at 16:16 +0100, Stefano Stabellini wrote: Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com In general I'm very suspicious of a 200 line patch with _no_ commit message at all. What about the details of the stuff around overriding the versions iff tree and revision are set etc and why that matters and how the two projects therefore interact wrt selecting what to build? I remember discussing this at length either IRL or in older versions of the thread and we both know it wasn't trivial to arrive at how this should be done, but I _already_ can't remember how we reached this point or why and that is information which really needs to be recorded so it can be referred to later when we wonder why it does this. The scope is also missing, i.e. this is currently just a straight raisin build test, and the results are not consumed by anything. This is important because of all the stuff about splitting up the dist dir and incremental builds which we discussed means that as it stands this test step is not producing build artefacts suitable as an input for actual tests. I expect there's also stuff in the intra-version changelog which actually belongs in the main changelog, such as why you are refactoring certain things into BuildSupport.pm, which deserves some sort of brief mention IMHO. [...] I didn't do a good job with the commit message. I'll try to track down the old discussion and write some more details. Changes in v3: - use $raisindir throughout ts-raisin-build - do not specify ENABLED_COMPONENTS so that empty REVISION variables can be used to disable building a raisin component I see we do again in the code below, which I suspect was deliberate and based on some discussion, but it's not mentioned in the Changes in v4+ and since the changelog doesn't explain anything I can't even begin to guess why or how we arrived at this point. [...] +sub build () { +target_cmd_root($ho, END); +cd $raisindir +./raise -y install-builddep If two of these happen to run in parallel (build machines can be shared) then one or the other risks timing out on the underlying dpkg lock waiting for the other, since the wait might be arbitrarily long depending on what is going on. It also risks other builds happening in an environment which differs from one build to the next (if this happened to run in the middle) or even things changing while a build is happening. This is why all build dependencies are installed from ts-xen-build-prep, that step is run once for each build host as it is installed. This does unfortunately mean that I think we can't take advantage of raisin's tracking of necessary build dependencies, but at least it will be checking things for us. It is fine for OSSTest not use install-builddep. I think that the best way for OSSTest to use Raisin would be to call raise in ts-xen-build-prep to list the dependencies and install them from there. But here we are testing Raisin itself so I think that in this context it is actually important to run install-builddep. I don't think it would cause any troubles to other builds happening in parallel because the build dependencies of those builds would be installed by ts-xen-build-prep beforehand, that I was told is run before any build jobs? If I am wrong, then we have a problem. Otherwise is there a way to guarantee that only one raisin build happen simultaneously? Given that the series is already at v7, I would prefer to remove the call to install-builddep, even though I think it is important, rather than make substantial modifications. However, if I do remove the call to install-builddep from ts-raisin-build, I would have to install any missing build dependencies somewhere else. Where would that be? In ts-xen-build-prep? Is that script even called for non-xen build jobs? ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Patch V2] xen: release lock occasionally during ballooning
On 20/07/15 12:49, Juergen Gross wrote: When dom0 is being ballooned balloon_process() will hold the balloon mutex until it is finished. This will block e.g. creation of new domains as the device backends for the new domain need some autoballooned pages for the ring buffers. Avoid this by releasing the balloon mutex from time to time during ballooning. Adjust the comment above balloon_process() regarding multiple instances of balloon_process(). Instead of open coding it, just use cond_resched(). Applied to for-linus-4.2, thanks. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen: fix non-ANSI function declaration of function xen_has_pv_devices
On 16/07/15 20:34, Colin King wrote: From: Colin Ian King colin.k...@canonical.com xen_has_pv_devices has no parameters, so use the normal void parameter convention to make it match the prototype in the header file include/xen/platform_pci.h Applied to for-linus-4.3, thanks. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 1/7] x86/PCI: add config space abstract write intercept logic
This is to be used by MSI code, and later to also be hooked up to MMCFG accesses by Dom0. Signed-off-by: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -1108,6 +1108,12 @@ void pci_cleanup_msi(struct pci_dev *pde msi_free_irqs(pdev); } +int pci_msi_conf_write_intercept(struct pci_dev *pdev, unsigned int reg, + unsigned int size, uint32_t *data) +{ +return 0; +} + int pci_restore_msi_state(struct pci_dev *pdev) { unsigned long flags; --- a/xen/arch/x86/pci.c +++ b/xen/arch/x86/pci.c @@ -67,3 +67,28 @@ void pci_conf_write(uint32_t cf8, uint8_ spin_unlock_irqrestore(pci_config_lock, flags); } + +int pci_conf_write_intercept(unsigned int seg, unsigned int bdf, + unsigned int reg, unsigned int size, + uint32_t *data) +{ +struct pci_dev *pdev; +int rc = 0; + +/* + * Avoid expensive operations when no hook is going to do anything + * for the access anyway. + */ +if ( reg 64 || reg = 256 ) +return 0; + +spin_lock(pcidevs_lock); + +pdev = pci_get_pdev(seg, PCI_BUS(bdf), PCI_DEVFN2(bdf)); +if ( pdev ) +rc = pci_msi_conf_write_intercept(pdev, reg, size, data); + +spin_unlock(pcidevs_lock); + +return rc; +} --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -1777,8 +1777,8 @@ static bool_t admin_io_okay(unsigned int return ioports_access_permitted(d, port, port + bytes - 1); } -static bool_t pci_cfg_ok(struct domain *currd, bool_t write, - unsigned int start, unsigned int size) +static bool_t pci_cfg_ok(struct domain *currd, unsigned int start, + unsigned int size, uint32_t *write) { uint32_t machine_bdf; @@ -1810,8 +1810,12 @@ static bool_t pci_cfg_ok(struct domain * start |= CF8_ADDR_HI(currd-arch.pci_cf8); } -return !xsm_pci_config_permission(XSM_HOOK, currd, machine_bdf, - start, start + size - 1, write); +if ( xsm_pci_config_permission(XSM_HOOK, currd, machine_bdf, + start, start + size - 1, !!write) != 0 ) + return 0; + +return !write || + pci_conf_write_intercept(0, machine_bdf, start, size, write) = 0; } uint32_t guest_io_read(unsigned int port, unsigned int bytes, @@ -1863,7 +1867,7 @@ uint32_t guest_io_read(unsigned int port size = min(bytes, 4 - (port 3)); if ( size == 3 ) size = 2; -if ( pci_cfg_ok(currd, 0, port 3, size) ) +if ( pci_cfg_ok(currd, port 3, size, NULL) ) sub_data = pci_conf_read(currd-arch.pci_cf8, port 3, size); } @@ -1934,7 +1938,7 @@ void guest_io_write(unsigned int port, u size = min(bytes, 4 - (port 3)); if ( size == 3 ) size = 2; -if ( pci_cfg_ok(currd, 1, port 3, size) ) +if ( pci_cfg_ok(currd, port 3, size, data) ) pci_conf_write(currd-arch.pci_cf8, port 3, size, data); } --- a/xen/include/asm-x86/pci.h +++ b/xen/include/asm-x86/pci.h @@ -15,4 +15,10 @@ struct arch_pci_dev { vmask_t used_vectors; }; +int pci_conf_write_intercept(unsigned int seg, unsigned int bdf, + unsigned int reg, unsigned int size, + uint32_t *data); +int pci_msi_conf_write_intercept(struct pci_dev *, unsigned int reg, + unsigned int size, uint32_t *data); + #endif /* __X86_PCI_H__ */ x86/PCI: add config space abstract write intercept logic This is to be used by MSI code, and later to also be hooked up to MMCFG accesses by Dom0. Signed-off-by: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -1108,6 +1108,12 @@ void pci_cleanup_msi(struct pci_dev *pde msi_free_irqs(pdev); } +int pci_msi_conf_write_intercept(struct pci_dev *pdev, unsigned int reg, + unsigned int size, uint32_t *data) +{ +return 0; +} + int pci_restore_msi_state(struct pci_dev *pdev) { unsigned long flags; --- a/xen/arch/x86/pci.c +++ b/xen/arch/x86/pci.c @@ -67,3 +67,28 @@ void pci_conf_write(uint32_t cf8, uint8_ spin_unlock_irqrestore(pci_config_lock, flags); } + +int pci_conf_write_intercept(unsigned int seg, unsigned int bdf, + unsigned int reg, unsigned int size, + uint32_t *data) +{ +struct pci_dev *pdev; +int rc = 0; + +/* + * Avoid expensive operations when no hook is going to do anything + * for the access anyway. + */ +if ( reg 64 || reg = 256 ) +return 0; + +spin_lock(pcidevs_lock); + +pdev = pci_get_pdev(seg, PCI_BUS(bdf), PCI_DEVFN2(bdf));
Re: [Xen-devel] [PATCH v3] xen/blkfront: convert to blk-mq APIs
On 13/07/15 10:55, Bob Liu wrote: Note: This patch is based on original work of Arianna's internship for GNOME's Outreach Program for Women. Only one hardware queue is used now, so there is no performance change. The legacy non-mq code is deleted completely which is the same as other drivers like virtio, mtip, and nvme. Also dropped one unnecessary holding of info-io_lock when calling blk_mq_stop_hw_queues(). Applied to for-linus-4.3, thanks. Changes in v2: - Reorganized blk_mq_queue_rq() - Restored most io_locks in place Change in v3: - Rename blk_mq_queue_rq to blkif_queue_rq Next time, please put changes after a --- marker so they're automatically dropped when applied. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 2/7] x86/MSI-X: track host and guest mask-all requests separately
Host uses of the bits will be added subsequently, and must not be overridden by guests (including Dom0, namely when acting on behalf of a guest). Signed-off-by: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -843,6 +843,12 @@ static int msix_capability_init(struct p if ( !msix-used_entries ) { +msix-host_maskall = 0; +if ( !msix-guest_maskall ) +control = ~PCI_MSIX_FLAGS_MASKALL; +else +control |= PCI_MSIX_FLAGS_MASKALL; + if ( rangeset_add_range(mmio_ro_ranges, msix-table.first, msix-table.last) ) WARN(); @@ -,6 +1117,34 @@ void pci_cleanup_msi(struct pci_dev *pde int pci_msi_conf_write_intercept(struct pci_dev *pdev, unsigned int reg, unsigned int size, uint32_t *data) { +u16 seg = pdev-seg; +u8 bus = pdev-bus; +u8 slot = PCI_SLOT(pdev-devfn); +u8 func = PCI_FUNC(pdev-devfn); +struct msi_desc *entry; +unsigned int pos; + +if ( pdev-msix ) +{ +entry = find_msi_entry(pdev, -1, PCI_CAP_ID_MSIX); +pos = entry ? entry-msi_attrib.pos +: pci_find_cap_offset(seg, bus, slot, func, + PCI_CAP_ID_MSIX); +ASSERT(pos); + +if ( reg pos || reg = msix_pba_offset_reg(pos) + 4 ) +return 0; + +if ( reg != msix_control_reg(pos) || size != 2 ) +return -EACCES; + +pdev-msix-guest_maskall = !!(*data PCI_MSIX_FLAGS_MASKALL); +if ( pdev-msix-host_maskall ) +*data |= PCI_MSIX_FLAGS_MASKALL; + +return 1; +} + return 0; } --- a/xen/include/asm-x86/msi.h +++ b/xen/include/asm-x86/msi.h @@ -233,6 +233,7 @@ struct arch_msix { int table_refcnt[MAX_MSIX_TABLE_PAGES]; int table_idx[MAX_MSIX_TABLE_PAGES]; spinlock_t table_lock; +bool_t host_maskall, guest_maskall; domid_t warned; }; x86/MSI-X: track host and guest mask-all requests separately Host uses of the bits will be added subsequently, and must not be overridden by guests (including Dom0, namely when acting on behalf of a guest). Signed-off-by: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -843,6 +843,12 @@ static int msix_capability_init(struct p if ( !msix-used_entries ) { +msix-host_maskall = 0; +if ( !msix-guest_maskall ) +control = ~PCI_MSIX_FLAGS_MASKALL; +else +control |= PCI_MSIX_FLAGS_MASKALL; + if ( rangeset_add_range(mmio_ro_ranges, msix-table.first, msix-table.last) ) WARN(); @@ -,6 +1117,34 @@ void pci_cleanup_msi(struct pci_dev *pde int pci_msi_conf_write_intercept(struct pci_dev *pdev, unsigned int reg, unsigned int size, uint32_t *data) { +u16 seg = pdev-seg; +u8 bus = pdev-bus; +u8 slot = PCI_SLOT(pdev-devfn); +u8 func = PCI_FUNC(pdev-devfn); +struct msi_desc *entry; +unsigned int pos; + +if ( pdev-msix ) +{ +entry = find_msi_entry(pdev, -1, PCI_CAP_ID_MSIX); +pos = entry ? entry-msi_attrib.pos +: pci_find_cap_offset(seg, bus, slot, func, + PCI_CAP_ID_MSIX); +ASSERT(pos); + +if ( reg pos || reg = msix_pba_offset_reg(pos) + 4 ) +return 0; + +if ( reg != msix_control_reg(pos) || size != 2 ) +return -EACCES; + +pdev-msix-guest_maskall = !!(*data PCI_MSIX_FLAGS_MASKALL); +if ( pdev-msix-host_maskall ) +*data |= PCI_MSIX_FLAGS_MASKALL; + +return 1; +} + return 0; } --- a/xen/include/asm-x86/msi.h +++ b/xen/include/asm-x86/msi.h @@ -233,6 +233,7 @@ struct arch_msix { int table_refcnt[MAX_MSIX_TABLE_PAGES]; int table_idx[MAX_MSIX_TABLE_PAGES]; spinlock_t table_lock; +bool_t host_maskall, guest_maskall; domid_t warned; }; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 3/7] x86/MSI-X: be more careful during teardown
When a device gets detached from a guest, pciback will clear its command register, thus disabling both memory and I/O decoding. The disabled memory decoding, however, has an effect on the MSI-X table accesses the hypervisor does: These won't have the intended effect anymore. Even worse, for PCIe devices (but not SR-IOV virtual functions) such accesses may (will?) be treated as Unsupported Requests, causing respective errors to be surfaced, potentially in the form of NMIs that may be fatal to the hypervisor or Dom0 is different ways. Hence rather than carrying out these accesses, we should avoid them where we can, and use alternative (e.g. PCI config space based) mechanisms to achieve at least the same effect. At this time it continues to be unclear whether this is fixing an actual bug or is rather just working around bogus (but apparently common) system behavior. Signed-off-by: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- Backporting note (largely to myself): Depends on (not yet backported to 4.4 and earlier) commit 061eebe0e x86/MSI: drop workaround for insecure Dom0 kernels (due to re-use of struct arch_msix's warned field). --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -217,9 +217,9 @@ void destroy_irq(unsigned int irq) } spin_lock_irqsave(desc-lock, flags); -desc-status |= IRQ_DISABLED; desc-status = ~IRQ_GUEST; desc-handler-shutdown(desc); +desc-status |= IRQ_DISABLED; action = desc-action; desc-action = NULL; desc-msi_desc = NULL; @@ -995,8 +995,8 @@ void __init release_irq(unsigned int irq spin_lock_irqsave(desc-lock,flags); action = desc-action; desc-action = NULL; -desc-status |= IRQ_DISABLED; desc-handler-shutdown(desc); +desc-status |= IRQ_DISABLED; spin_unlock_irqrestore(desc-lock,flags); /* Wait to make sure it's not being used on another CPU */ @@ -1732,8 +1732,8 @@ static irq_guest_action_t *__pirq_guest_ BUG_ON(action-in_flight != 0); /* Disabling IRQ before releasing the desc_lock avoids an IRQ storm. */ -desc-status |= IRQ_DISABLED; desc-handler-disable(desc); +desc-status |= IRQ_DISABLED; /* * Mark any remaining pending EOIs as ready to flush. --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -123,6 +123,27 @@ static void msix_put_fixmap(struct arch_ spin_unlock(msix-table_lock); } +static bool_t memory_decoded(const struct pci_dev *dev) +{ +u8 bus, slot, func; + +if ( !dev-info.is_virtfn ) +{ +bus = dev-bus; +slot = PCI_SLOT(dev-devfn); +func = PCI_FUNC(dev-devfn); +} +else +{ +bus = dev-info.physfn.bus; +slot = PCI_SLOT(dev-info.physfn.devfn); +func = PCI_FUNC(dev-info.physfn.devfn); +} + +return !!(pci_conf_read16(dev-seg, bus, slot, func, PCI_COMMAND) + PCI_COMMAND_MEMORY); +} + /* * MSI message composition */ @@ -166,7 +187,7 @@ void msi_compose_msg(unsigned vector, co } } -static void read_msi_msg(struct msi_desc *entry, struct msi_msg *msg) +static bool_t read_msi_msg(struct msi_desc *entry, struct msi_msg *msg) { switch ( entry-msi_attrib.type ) { @@ -201,6 +222,8 @@ static void read_msi_msg(struct msi_desc { void __iomem *base = entry-mask_base; +if ( unlikely(!memory_decoded(entry-dev)) ) +return 0; msg-address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET); msg-address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET); msg-data = readl(base + PCI_MSIX_ENTRY_DATA_OFFSET); @@ -212,6 +235,8 @@ static void read_msi_msg(struct msi_desc if ( iommu_intremap ) iommu_read_msi_from_ire(entry, msg); + +return 1; } static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) @@ -262,6 +287,8 @@ static int write_msi_msg(struct msi_desc { void __iomem *base = entry-mask_base; +if ( unlikely(!memory_decoded(entry-dev)) ) +return -ENXIO; writel(msg-address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET); writel(msg-address_hi, @@ -289,7 +316,8 @@ void set_msi_affinity(struct irq_desc *d ASSERT(spin_is_locked(desc-lock)); memset(msg, 0, sizeof(msg)); -read_msi_msg(msi_desc, msg); +if ( !read_msi_msg(msi_desc, msg) ) +return; msg.data = ~MSI_DATA_VECTOR_MASK; msg.data |= MSI_DATA_VECTOR(desc-arch.vector); @@ -349,23 +377,27 @@ int msi_maskable_irq(const struct msi_de || entry-msi_attrib.maskbit; } -static void msi_set_mask_bit(struct irq_desc *desc, bool_t host, bool_t guest) +static bool_t msi_set_mask_bit(struct irq_desc *desc, bool_t host, bool_t guest) { struct msi_desc *entry = desc-msi_desc; +struct pci_dev *pdev; +u16 seg; +u8 bus, slot, func; bool_t flag = host || guest; ASSERT(spin_is_locked(desc-lock)); BUG_ON(!entry
[Xen-devel] [PATCH v5 4/7] x86/MSI-X: access MSI-X table only after having enabled MSI-X
As done in Linux by f598282f51 (PCI: Fix the NIU MSI-X problem in a better way) and its broken predecessor, make sure we don't access the MSI-X table without having enabled MSI-X first, using the mask-all flag instead to prevent interrupts from occurring. Signed-off-by: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -144,6 +144,17 @@ static bool_t memory_decoded(const struc PCI_COMMAND_MEMORY); } +static bool_t msix_memory_decoded(const struct pci_dev *dev, unsigned int pos) +{ +u16 control = pci_conf_read16(dev-seg, dev-bus, PCI_SLOT(dev-devfn), + PCI_FUNC(dev-devfn), msix_control_reg(pos)); + +if ( !(control PCI_MSIX_FLAGS_ENABLE) ) +return 0; + +return memory_decoded(dev); +} + /* * MSI message composition */ @@ -222,7 +233,8 @@ static bool_t read_msi_msg(struct msi_de { void __iomem *base = entry-mask_base; -if ( unlikely(!memory_decoded(entry-dev)) ) +if ( unlikely(!msix_memory_decoded(entry-dev, + entry-msi_attrib.pos)) ) return 0; msg-address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET); msg-address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET); @@ -287,7 +299,8 @@ static int write_msi_msg(struct msi_desc { void __iomem *base = entry-mask_base; -if ( unlikely(!memory_decoded(entry-dev)) ) +if ( unlikely(!msix_memory_decoded(entry-dev, + entry-msi_attrib.pos)) ) return -ENXIO; writel(msg-address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET); @@ -381,9 +394,9 @@ static bool_t msi_set_mask_bit(struct ir { struct msi_desc *entry = desc-msi_desc; struct pci_dev *pdev; -u16 seg; +u16 seg, control; u8 bus, slot, func; -bool_t flag = host || guest; +bool_t flag = host || guest, maskall; ASSERT(spin_is_locked(desc-lock)); BUG_ON(!entry || !entry-dev); @@ -406,36 +419,45 @@ static bool_t msi_set_mask_bit(struct ir } break; case PCI_CAP_ID_MSIX: +maskall = pdev-msix-host_maskall; +control = pci_conf_read16(seg, bus, slot, func, + msix_control_reg(entry-msi_attrib.pos)); +if ( unlikely(!(control PCI_MSIX_FLAGS_ENABLE)) ) +{ +pdev-msix-host_maskall = 1; +pci_conf_write16(seg, bus, slot, func, + msix_control_reg(entry-msi_attrib.pos), + control | (PCI_MSIX_FLAGS_ENABLE | +PCI_MSIX_FLAGS_MASKALL)); +} if ( likely(memory_decoded(pdev)) ) { writel(flag, entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); readl(entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); -break; +if ( likely(control PCI_MSIX_FLAGS_ENABLE) ) +break; +flag = 1; } -if ( flag ) +else if ( flag !(control PCI_MSIX_FLAGS_MASKALL) ) { -u16 control; domid_t domid = pdev-domain-domain_id; -pdev-msix-host_maskall = 1; -control = pci_conf_read16(seg, bus, slot, func, - msix_control_reg(entry-msi_attrib.pos)); -if ( control PCI_MSIX_FLAGS_MASKALL ) -break; -pci_conf_write16(seg, bus, slot, func, - msix_control_reg(entry-msi_attrib.pos), - control | PCI_MSIX_FLAGS_MASKALL); +maskall = 1; if ( pdev-msix-warned != domid ) { pdev-msix-warned = domid; printk(XENLOG_G_WARNING - cannot mask IRQ %d: masked MSI-X on Dom%d's %04x:%02x:%02x.%u\n, + cannot mask IRQ %d: masking MSI-X on Dom%d's %04x:%02x:%02x.%u\n, desc-irq, domid, pdev-seg, pdev-bus, PCI_SLOT(pdev-devfn), PCI_FUNC(pdev-devfn)); } -break; } -/* fall through */ +pdev-msix-host_maskall = maskall; +if ( maskall || pdev-msix-guest_maskall ) +control |= PCI_MSIX_FLAGS_MASKALL; +pci_conf_write16(seg, bus, slot, func, + msix_control_reg(entry-msi_attrib.pos), control); +return flag; default: return 0; } @@ -461,7 +483,8 @@ static int msi_get_mask_bit(const struct entry-msi.mpos) entry-msi_attrib.entry_nr) 1; case PCI_CAP_ID_MSIX: -if ( unlikely(!memory_decoded(entry-dev)) ) +if ( unlikely(!msix_memory_decoded(entry-dev, + entry-msi_attrib.pos))
[Xen-devel] [PATCH v5 6/7] x86/MSI: properly track guest masking requests
... by monitoring writes to the mask register. This allows reverting the main effect of the XSA-129 patches in qemu. Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -1303,6 +1303,37 @@ int pci_msi_conf_write_intercept(struct return 1; } +entry = find_msi_entry(pdev, -1, PCI_CAP_ID_MSI); +if ( entry entry-msi_attrib.maskbit ) +{ +uint16_t cntl; +uint32_t unused; + +pos = entry-msi_attrib.pos; +if ( reg pos || reg = entry-msi.mpos + 8 ) +return 0; + +if ( reg == msi_control_reg(pos) ) +return size == 2 ? 1 : -EACCES; +if ( reg entry-msi.mpos || reg = entry-msi.mpos + 4 || size != 4 ) +return -EACCES; + +cntl = pci_conf_read16(seg, bus, slot, func, msi_control_reg(pos)); +unused = ~(uint32_t)0 (32 - multi_msi_capable(cntl)); +for ( pos = 0; pos entry-msi.nvec; ++pos, ++entry ) +{ +entry-msi_attrib.guest_masked = +*data entry-msi_attrib.entry_nr; +if ( entry-msi_attrib.host_masked ) +*data |= 1 pos; +unused = ~(1 pos); +} + +*data |= unused; + +return 1; +} + return 0; } x86/MSI: properly track guest masking requests ... by monitoring writes to the mask register. This allows reverting the main effect of the XSA-129 patches in qemu. Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -1303,6 +1303,37 @@ int pci_msi_conf_write_intercept(struct return 1; } +entry = find_msi_entry(pdev, -1, PCI_CAP_ID_MSI); +if ( entry entry-msi_attrib.maskbit ) +{ +uint16_t cntl; +uint32_t unused; + +pos = entry-msi_attrib.pos; +if ( reg pos || reg = entry-msi.mpos + 8 ) +return 0; + +if ( reg == msi_control_reg(pos) ) +return size == 2 ? 1 : -EACCES; +if ( reg entry-msi.mpos || reg = entry-msi.mpos + 4 || size != 4 ) +return -EACCES; + +cntl = pci_conf_read16(seg, bus, slot, func, msi_control_reg(pos)); +unused = ~(uint32_t)0 (32 - multi_msi_capable(cntl)); +for ( pos = 0; pos entry-msi.nvec; ++pos, ++entry ) +{ +entry-msi_attrib.guest_masked = +*data entry-msi_attrib.entry_nr; +if ( entry-msi_attrib.host_masked ) +*data |= 1 pos; +unused = ~(1 pos); +} + +*data |= unused; + +return 1; +} + return 0; } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 5/7] x86/MSI-X: reduce fiddling with control register during restore
Rather than disabling and enabling MSI-X once per vector, do it just once per device. Signed-off-by: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -1314,6 +1314,9 @@ int pci_restore_msi_state(struct pci_dev struct msi_desc *entry, *tmp; struct irq_desc *desc; struct msi_msg msg; +u8 slot = PCI_SLOT(pdev-devfn), func = PCI_FUNC(pdev-devfn); +unsigned int type = 0, pos = 0; +u16 control = 0; ASSERT(spin_is_locked(pcidevs_lock)); @@ -1332,8 +1335,6 @@ int pci_restore_msi_state(struct pci_dev list_for_each_entry_safe( entry, tmp, pdev-msi_list, list ) { unsigned int i = 0, nr = 1; -u16 control = 0; -u8 slot = PCI_SLOT(pdev-devfn), func = PCI_FUNC(pdev-devfn); irq = entry-irq; desc = irq_desc[irq]; @@ -1350,31 +1351,38 @@ int pci_restore_msi_state(struct pci_dev pdev-seg, pdev-bus, PCI_SLOT(pdev-devfn), PCI_FUNC(pdev-devfn), i); spin_unlock_irqrestore(desc-lock, flags); +if ( type == PCI_CAP_ID_MSIX ) +pci_conf_write16(pdev-seg, pdev-bus, slot, func, + msix_control_reg(pos), + control ~PCI_MSIX_FLAGS_ENABLE); return -EINVAL; } +ASSERT(!type || type == entry-msi_attrib.type); +pos = entry-msi_attrib.pos; if ( entry-msi_attrib.type == PCI_CAP_ID_MSI ) { msi_set_enable(pdev, 0); nr = entry-msi.nvec; } -else if ( entry-msi_attrib.type == PCI_CAP_ID_MSIX ) +else if ( !type entry-msi_attrib.type == PCI_CAP_ID_MSIX ) { control = pci_conf_read16(pdev-seg, pdev-bus, slot, func, - msix_control_reg(entry-msi_attrib.pos)); + msix_control_reg(pos)); pci_conf_write16(pdev-seg, pdev-bus, slot, func, - msix_control_reg(entry-msi_attrib.pos), + msix_control_reg(pos), control | (PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL)); if ( unlikely(!memory_decoded(pdev)) ) { spin_unlock_irqrestore(desc-lock, flags); pci_conf_write16(pdev-seg, pdev-bus, slot, func, - msix_control_reg(entry-msi_attrib.pos), + msix_control_reg(pos), control ~PCI_MSIX_FLAGS_ENABLE); return -ENXIO; } } +type = entry-msi_attrib.type; msg = entry-msg; write_msi_msg(entry, msg); @@ -1398,9 +1406,9 @@ int pci_restore_msi_state(struct pci_dev spin_unlock_irqrestore(desc-lock, flags); -if ( entry-msi_attrib.type == PCI_CAP_ID_MSI ) +if ( type == PCI_CAP_ID_MSI ) { -unsigned int cpos = msi_control_reg(entry-msi_attrib.pos); +unsigned int cpos = msi_control_reg(pos); control = pci_conf_read16(pdev-seg, pdev-bus, slot, func, cpos) ~PCI_MSI_FLAGS_QSIZE; @@ -1410,12 +1418,13 @@ int pci_restore_msi_state(struct pci_dev msi_set_enable(pdev, 1); } -else if ( entry-msi_attrib.type == PCI_CAP_ID_MSIX ) -pci_conf_write16(pdev-seg, pdev-bus, slot, func, - msix_control_reg(entry-msi_attrib.pos), - control | PCI_MSIX_FLAGS_ENABLE); } +if ( type == PCI_CAP_ID_MSIX ) +pci_conf_write16(pdev-seg, pdev-bus, slot, func, + msix_control_reg(pos), + control | PCI_MSIX_FLAGS_ENABLE); + return 0; } x86/MSI-X: reduce fiddling with control register during restore Rather than disabling and enabling MSI-X once per vector, do it just once per device. Signed-off-by: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -1314,6 +1314,9 @@ int pci_restore_msi_state(struct pci_dev struct msi_desc *entry, *tmp; struct irq_desc *desc; struct msi_msg msg; +u8 slot = PCI_SLOT(pdev-devfn), func = PCI_FUNC(pdev-devfn); +unsigned int type = 0, pos = 0; +u16 control = 0; ASSERT(spin_is_locked(pcidevs_lock)); @@ -1332,8 +1335,6 @@ int pci_restore_msi_state(struct pci_dev list_for_each_entry_safe( entry, tmp, pdev-msi_list, list ) { unsigned int i = 0, nr = 1; -u16 control = 0; -u8 slot = PCI_SLOT(pdev-devfn), func = PCI_FUNC(pdev-devfn); irq = entry-irq; desc = irq_desc[irq]; @@ -1350,31 +1351,38 @@ int pci_restore_msi_state(struct pci_dev pdev-seg,
Re: [Xen-devel] [PATCH for-4.6 v2] tools/hotplug: Add an initscript to start xl devd in a driver domain
On Mon, 2015-07-20 at 16:16 +0200, Roger Pau Monné wrote: El 16/07/15 a les 18.58, Ian Campbell ha escrit: The removal of the udev rules highlighted that although it has been replaced by xl devd there isn't an initscript to replace it. To enable this add a --pidfile option to xl devd. Tested on Linux by running the script in dom0 and checking the daemon was started/stopped, but not in an actual driver domain environment since I don't have one conveniently available. I also checked that running without the --pidfile option still works. Scripts mainly cribbed from the xencommons for each platform. Signed-off-by: Ian Campbell ian.campb...@citrix.com Cc: Roger Pau Monné roger@citrix.com Cc: Konrad Rzeszutek Wilk kon...@darnok.org Cc: George Dunlap george.dun...@eu.citrix.com --- For 4.6: I think having removed the udev rules we ought to include this, it's basically zero risk to normal operation. Nothing seems to start xl devd for domain 0 in *BSD, nor is xenbackend started. Is that correct? v2: Check in Linux script that we don't start on dom0. I don't know the equivalent for *BSD. At least on FreeBSD there's no easy way to know if we are running on Dom0 from userspace (unless we start using hypercalls with privcmd). I could add a dummy device (let's say /dev/xen/control) that's only present if we are running on Dom0, does that sound reasonable? FWIW on Linux there is a capabilities device (actually procfs thing, close enough) which contains control_d in dom0. Not sure if that is helpful to follow? Another option would be to add a small C program that can use hypercalls and privcmd to detect if we are running on Dom0 or not, but that seems over engineering it. Agreed. [...] diff --git a/tools/hotplug/FreeBSD/rc.d/xendriverdomain.in b/tools/hotplug/FreeBSD/rc.d/xendriverdomain.in new file mode 100644 index 000..25e3edd --- /dev/null +++ b/tools/hotplug/FreeBSD/rc.d/xendriverdomain.in @@ -0,0 +1,48 @@ +#!/bin/sh +# +# PROVIDE: xendriverdomain +# REQUIRE: DAEMON +# +# Should be run in a driver domain, but not in domain 0. + +. /etc/rc.subr + +. /etc/xen/scripts/hotplugpath.sh This should have been: . @XEN_SCRIPT_DIR@/hotplugpath.sh I will send a patch ASAP to fix it. Sorry, and please do, thanks. [...] +if ( close(fd) 0 ) { pedantic The line above doesn't follow libxl coding style. /pedantic Damn, sorry. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 01/23] x86/boot: remove unneeded instruction
Signed-off-by: Daniel Kiper daniel.ki...@oracle.com --- xen/arch/x86/boot/head.S |1 - 1 file changed, 1 deletion(-) diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index cfd59dc..f63b349 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -169,7 +169,6 @@ __start: /* Apply relocations to bootstrap trampoline. */ mov sym_phys(trampoline_phys),%edx mov $sym_phys(__trampoline_rel_start),%edi -mov %edx,sym_phys(trampoline_phys) 1: mov (%edi),%eax add %edx,(%edi,%eax) -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 00/23] x86: multiboot2 protocol support
Hi, I am sending, long awaited, second version of multiboot2 protocol support for legacy BIOS and EFI platforms. It fixes all major issues discovered until now. There are still some minor problems which should be fixed in one way or another. I will address them in next releases. This series, in general, is not targeted to Xen 4.6. However, there are some fixes at the beginning of it which are worth considering, I think. The final goal is xen.efi binary file which could be loaded by EFI loader, multiboot (v1) protocol (only on legacy BIOS platforms) and multiboot2 protocol. This way we will have: - smaller Xen code base, - one code base for xen.gz and xen.efi, - one build method for xen.gz and xen.efi; xen.efi will be extracted from xen file using objcopy; PE header will be contained in ELF file and will precede Xen code, - xen.efi build will not so strongly depend on a given GCC and binutils version. ARM guys should check at least patches #9 - #18 and #20. In general earlier mentioned patches touches common EFI code but they are not change functionality significantly. GRUB2 patch series will follow this patch series. GRUB2 guys should check patches #20 and #23 but I am sending to you all Xen related patches just in case. If you are not interested in this patch series at all please drop me a line and I will remove you from distribution list. Daniel .gitignore|5 +- xen/arch/x86/Makefile | 21 ++-- xen/arch/x86/Rules.mk |4 + xen/arch/x86/boot/Makefile| 10 +- xen/arch/x86/boot/build32.mk |4 +- xen/arch/x86/boot/cmdline.S | 367 --- xen/arch/x86/boot/cmdline.c | 396 xen/arch/x86/boot/edd.S |3 - xen/arch/x86/boot/head.S | 474 -- xen/arch/x86/boot/reloc.c | 242 +++--- xen/arch/x86/boot/trampoline.S| 25 - xen/arch/x86/boot/video.S |6 -- xen/arch/x86/boot/wakeup.S|6 +- xen/arch/x86/boot/x86_64.S| 34 +++--- xen/arch/x86/dmi_scan.c |4 +- xen/arch/x86/domain_page.c|2 +- xen/arch/x86/efi/Makefile | 16 +-- xen/arch/x86/efi/efi-boot.h | 68 ++-- xen/arch/x86/efi/stub.c | 16 ++- xen/arch/x86/mm.c |3 +- xen/arch/x86/mpparse.c|4 +- xen/arch/x86/setup.c | 50 - xen/arch/x86/shutdown.c |2 +- xen/arch/x86/time.c |2 +- xen/arch/x86/x86_64/asm-offsets.c | 10 ++ xen/arch/x86/x86_64/mm.c |2 +- xen/arch/x86/xen.lds.S|6 +- xen/common/efi/boot.c | 461 ++- xen/common/efi/runtime.c | 23 ++-- xen/drivers/acpi/osl.c|2 +- xen/include/asm-x86/config.h |3 + xen/include/asm-x86/page.h|2 +- xen/include/xen/efi.h | 17 ++- xen/include/xen/multiboot2.h | 182 34 files changed, 1709 insertions(+), 763 deletions(-) Daniel Kiper (23): x86/boot: remove unneeded instruction x86/boot: copy only text section from *.lnk file to *.bin file x86: zero BSS using stosl instead of stosb x86/boot: call reloc() using cdecl calling convention x86/boot/reloc: create generic alloc and copy functions x86/boot: use %ecx instead of %eax x86/boot/reloc: Rename some variables and rearrange code a bit x86: add multiboot2 protocol support efi: create efi_enabled() efi: build xen.gz with EFI code efi: split out efi_init() efi: split out efi_console_set_mode() efi: split out efi_get_gop() efi: split out efi_find_gop_mode() efi: split out efi_tables() efi: split out efi_variables() efi: split out efi_set_gop_mode() efi: split out efi_exit_boot() x86/efi: create new early memory allocator x86: add multiboot2 protocol support for EFI platforms x86/boot: implement early command line parser in C x86: make Xen early boot code relocatable x86: add multiboot2 protocol support for relocatable images ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 04/23] x86/boot: call reloc() using cdecl calling convention
Suggested-by: Jan Beulich jbeul...@suse.com Signed-off-by: Daniel Kiper daniel.ki...@oracle.com --- xen/arch/x86/boot/head.S |4 +++- xen/arch/x86/boot/reloc.c | 20 2 files changed, 19 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index ed42782..3cbb2e6 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -119,8 +119,10 @@ __start: /* Save the Multiboot info struct (after relocation) for later use. */ mov $sym_phys(cpu0_stack)+1024,%esp -push%ebx +push%ebx/* Multiboot information address. */ +push%eax/* Boot trampoline address. */ callreloc +add $8,%esp /* Remove reloc() args from stack. */ mov %eax,sym_phys(multiboot_ptr) /* Initialize BSS (no nasty surprises!). */ diff --git a/xen/arch/x86/boot/reloc.c b/xen/arch/x86/boot/reloc.c index 63045c0..708898f 100644 --- a/xen/arch/x86/boot/reloc.c +++ b/xen/arch/x86/boot/reloc.c @@ -10,15 +10,27 @@ *Keir Fraser k...@xen.org */ -/* entered with %eax = BOOT_TRAMPOLINE */ +/* + * This entry point is entered from xen/arch/x86/boot/head.S with: + * - 0x4(%esp) = BOOT_TRAMPOLINE_ADDRESS, + * - 0x8(%esp) = MULTIBOOT_INFORMATION_ADDRESS. + */ asm ( .text \n .globl _start \n _start: \n +push %ebp \n +mov %esp,%ebp\n call 1f \n -1: pop %ebx \n -mov %eax,alloc-1b(%ebx) \n -jmp reloc\n +1: pop %ecx \n +mov 0x8(%ebp),%eax \n +mov %eax,alloc-1b(%ecx) \n +mov 0xc(%ebp),%eax \n +push %eax \n +call reloc\n +add $4,%esp \n +pop %ebp \n +ret \n ); /* -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 03/23] x86: zero BSS using stosl instead of stosb
Additionally, align relevant comment to coding style. Suggested-by: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Daniel Kiper daniel.ki...@oracle.com --- xen/arch/x86/boot/head.S |5 +++-- xen/arch/x86/xen.lds.S |2 ++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index f63b349..ed42782 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -123,12 +123,13 @@ __start: callreloc mov %eax,sym_phys(multiboot_ptr) -/* Initialize BSS (no nasty surprises!) */ +/* Initialize BSS (no nasty surprises!). */ mov $sym_phys(__bss_start),%edi mov $sym_phys(__bss_end),%ecx sub %edi,%ecx +shr $2,%ecx xor %eax,%eax -rep stosb +rep stosl /* Interrogate CPU extended features via CPUID. */ mov $0x8000,%eax diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S index 6553cff..3e1f2af 100644 --- a/xen/arch/x86/xen.lds.S +++ b/xen/arch/x86/xen.lds.S @@ -162,6 +162,7 @@ SECTIONS . = ALIGN(STACK_SIZE); __init_end = .; + . = ALIGN(4); .bss : { /* BSS */ __bss_start = .; *(.bss.stack_aligned) @@ -175,6 +176,7 @@ SECTIONS *(.bss.percpu.read_mostly) . = ALIGN(SMP_CACHE_BYTES); __per_cpu_data_end = .; + . = ALIGN(4); __bss_end = .; } :text _end = . ; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel